web-scrapping

所属分类:WEB开发
开发工具:Jupyter Notebook
文件大小:183KB
下载次数:0
上传日期:2021-01-03 18:02:47
上 传 者sh-1993
说明:  使用请求和Beautifulsoup刮掉和总结新闻文章
(Scrape and summarize news articles using requests and Beautifulsoup)

文件列表:
codility (0, 2021-01-04)
codility\How to Scrap News Article.ipynb (2532, 2021-01-04)
codility\How to Scrap News Article.md (786, 2021-01-04)
codility\How to Scrap News Article.py (793, 2021-01-04)
images (0, 2021-01-04)
images\Capture.PNG (1116, 2021-01-04)
images\Capture1.PNG (2546, 2021-01-04)
images\Capture2.PNG (8592, 2021-01-04)
images\Capture3.PNG (8037, 2021-01-04)
images\Capture4.PNG (83265, 2021-01-04)
images\Capture5.PNG (11520, 2021-01-04)
images\Capture6.PNG (32691, 2021-01-04)
images\Capture7.PNG (21395, 2021-01-04)
images\Capturel.PNG (25504, 2021-01-04)

# How to Scrape News Articles Using Python ![](https://images.unsplash.com/photo-1504711434969-e33886168f5c?ixlib=rb-1.2.1&ixid=MXwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHw%3D&auto=format&fit=crop&w=1350&q=80) (“·: AbsolutVision on Unsplash) Newspaper is a Python module used for extracting and parsing news articles. To extract information from the websites of newspapers and magazines we are going to use Newspaper library. Using Newspaper library, not only extracting and curating articles, but you can also extract text, summary text, get text keyword. ## Getting Started: **1. Download [Anaconda](https://docs.anaconda.com/anaconda/user-guide/tasks/install-packages/).** * I downloaded Individual Edition. Anaconda is the world's most popular Python distribution platform which contains Python packages including Jupyter Notebook, Scipy, Numpy, Panda, Matplotlib, TensorFlow, etc. **2. Launch a Jupyter Notebook. You need to launch a Jupyter notebook to write and iterate your Python code. See [Instruction](https://www.codecademy.com/articles/how-to-use-jupyter-notebooks/).** **3. Installation Newspaper Module in your Jupyter notebook terminal** ``` pip install newspaper3k ``` ## Ready to Scrape News Articles! * Import nltk - nltk is an open source Python package ```py import nltk from newspaper import Article ``` * download news article ```py url = 'https://www.cnbc.com/2020/12/27/wonder-woman-1***4-opening-weekend-leads-to-fast-tracked-third-film.html' article = Article(url) article.download() article.parse() nltk.download('punkt') article.nlp() ``` * get the publication date of the news article ```py article.publish_date ``` * get article text ```py print(article.text) ``` * load article keywords ```py article.keywords ``` * summarize news article ```py print(article.summary) ``` * get author's name ```py article.authors ``` **This Newspapaer library works in 30 + languages!** ```py import newspaper newspaper.languages() ``` :file_folder: [See my module](https://github.com/Conniekoh/Web-Scrapping/blob/master/codility/How%20to%20Scrap%20News%20Article.ipynb) ___ ## Before Release - [x] Finish my changes - [x] this is a complete item - [ ] this is an incomplete item

近期下载者

相关文件


收藏者