News-Extraction
所属分类:前端开发
开发工具:Python
文件大小:6446KB
下载次数:0
上传日期:2022-08-14 18:38:58
上 传 者:
sh-1993
说明: 这是我第一个尝试使用python进行新闻提取的存储库。从新闻网页上,我摘录了...
(This is my first repository where I have tried news extraction using python. From the news webpage, I have extracted the headlines, image URL, source URL, author, summary each news article on that web page. After extracting the data, I have created a front-end to display it using HTML and with help of Bootstrap.)
文件列表:
Extraction.py (2165, 2022-06-15)
__pycache__ (0, 2022-06-15)
__pycache__\Extraction.cpython-310.pyc (1324, 2022-06-15)
__pycache__\Extraction.cpython-38.pyc (1335, 2022-06-15)
__pycache__\Extraction.cpython-39.pyc (1277, 2022-06-15)
screenshots (0, 2022-06-15)
screenshots\Screenshot (4).png (3166638, 2022-06-15)
screenshots\Screenshot (5).png (2546489, 2022-06-15)
screenshots\Screenshot (6).png (521952, 2022-06-15)
screenshots\Screenshot (7).png (308390, 2022-06-15)
static (0, 2022-06-15)
static\css (0, 2022-06-15)
static\css\main.css (1828, 2022-06-15)
static\images (0, 2022-06-15)
static\images\bg-photo.jpg (78320, 2022-06-15)
templates (0, 2022-06-15)
templates\homepage.html (1952, 2022-06-15)
test.py (296, 2022-06-15)
words.txt (22227, 2022-06-15)
# News-Extraction
- This is my first repository where I have tried news extraction using python
- From the news webpage, I have extracted the headlines, keyword, image URL, source URL, author, summary of each news article on that web page
- After extracting the data, I have created a front-end to display it using HTML and CSS
# Objective
- Extraction is the process of removing required data from a source
- Here the source is a news website, Inshorts (https://www.inshorts.com/en/read)
- Ultimately the data extracted is in the form of string. After extracting the data, I have tried to parse it
- Parsing is a method where one string of data gets converted into different forms of data
# Technologies used
- I have written the source code in python with the help of some libraries and built-in functions
- The libraries I've used are 'requests', 'bs4 (BeautifulSoup4)' and 'flask'
- 'requests' to requests the access of the data from the news webpage URL
- 'bs4 (BeautifulSoup4)' to parse the data in the required form
- 'flask' to build the web application
- The source code 'test.py' can run on any IDE that supports python 3 and has the above libraries installed
- You will also need a search engine to view the website
# Code Output
- After running the code, the code will host a server of a webpage which has the code output
- The webpage is written in HTML and CSS
![Screenshot 1](https://github.com/sehajdeep1814/News-Extraction/blob/master/<./screenshots/Screenshot%20(4).png>)
![Screenshot 1](https://github.com/sehajdeep1814/News-Extraction/blob/master/<./screenshots/Screenshot%20(5).png>)
# References
- To learn web scrapping, I had an access to a course on "My Captain" platform (https://app.mycaptain.in/)
# Authors
- Sehajdeep Singh - https://github.com/sehajdeep1814/
- Qasim Shaikh - https://github.com/shaikhmq20/
近期下载者:
相关文件:
收藏者: