NewsCollector

所属分类:Python工具库
开发工具:HTML
文件大小:0KB
下载次数:0
上传日期:2023-02-10 23:54:11
上 传 者sh-1993
说明:  NewsCollector-Python包,用于自动收集当天最相关的新闻文章。
(NewsCollector - Python package for automated collection of most relevant news articles of the day.)

文件列表:
LICENSE (1070, 2023-02-10)
misc/ (0, 2023-02-10)
misc/collected_news.png (19880, 2023-02-10)
misc/newsletter_rendered_23Jan.png (3866274, 2023-02-10)
misc/newsletter_unrendered.png (391044, 2023-02-10)
newscollector/ (0, 2023-02-10)
newscollector/__init__.py (41, 2023-02-10)
newscollector/newscollector.py (25090, 2023-02-10)
newscollector/sources.json (3187, 2023-02-10)
newscollector/static/ (0, 2023-02-10)
newscollector/static/assets/ (0, 2023-02-10)
newscollector/static/assets/logo.png (87384, 2023-02-10)
newscollector/static/css/ (0, 2023-02-10)
newscollector/static/css/newsletter.css (1827, 2023-02-10)
newscollector/templates/ (0, 2023-02-10)
newscollector/templates/newsletter.html (5896, 2023-02-10)
requirements.txt (1658, 2023-02-10)
sample_newsletter.html (27003, 2023-02-10)
sample_newsletter.pdf (6899333, 2023-02-10)

# :newspaper: Python NewsCollector ![PyPIv](https://img.shields.io/pypi/v/py-newscollector) ![PyPI status](https://img.shields.io/pypi/status/py-newscollector) ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/py-newscollector) ![PyPI - License](https://img.shields.io/pypi/l/py-AutoClean) As the internet has grown, the available **sources of information at our disposal have equally grown**. Nowadays, if you want to update yourself with the most important news of the day, you have a **vast variety of news sources** to choose from. Since we have that many news sources at our disposal, instead of manually going through all their content... **Couldn't we let **automation** pick the top news stories from various newspapers for us, and nicely combine them into a newsletter?** **:fire: This is what the Python NewsCollector can do for us!** ```Python pip install py-newscollector ``` > :bulb: For a **detailed usage guide**, please refer to the official [NewsCollector Usage Documentation](https://github.com/elisemercury/News-Collector/wiki/NewsCollector-Usage-Documentation). > :closed_book: Read more about how the algorithm of NewsCollector works in [my Medium article](https://medium.com/@eliselandman/automated-news-article-collection-with-python-9267968c9ea). ------- ## Description The Python NewsCollector lets you define a variety of news sources from which it will pick the **most relevant articles** and bundle these in a **nice HTML-based newsletter**. Below is an example of an auto-generated newsletter by NewsCollector on 23 January 2023:

View the full sample newsletter in PDF format here.

The NewsCollector algorithm **scrapes** the source links provided and compares the articles it found based on their **similarity**. If it finds multiple articles from different sources covering similar topics, these will be considered as being **relevant articles** and will be included in the output newsletter.

> :closed_book: Read more about how the algorithm of NewsCollector works in [my Medium article](https://medium.com/@eliselandman/automated-news-article-collection-with-python-9267968c9ea). ## Basic Usage You can run the NewsCollector algorithm as follows: ```Python from newscollector import * newsletter = NewsCollector() output = newsletter.create() ``` This will run the full NewsCollector pipeline by scraping the sources from the package default `sources.json` file and outputting an HTML newsletter. The `output` object will hold the location path of the generated newsletter, so that you can easily retrieve it programmatically: ```Python output > 'C:\\Output\\Path\\newsletter.html' ``` > :bulb: For a **detailed usage guide**, please refer to the official [NewsCollector Usage Documentation](https://github.com/elisemercury/News-Collector/wiki/NewsCollector-Usage-Documentation). ## CLI Usage The NewsCollector can also be run directly via the CLI with the following parameters: ```python newscollector.py [-h] [-s [SOURCES]] [-n [NEWS_NAME]] [-d [NEWS_DATE]] [-t [TEMPLATE]] [-o [OUTPUT_FILENAME]] [-a [AUTO_OPEN]] [-r [RETURN_DETAILS]] ``` ## Output The NewsCollector will output an **HTML newsletter** with the most **relevant articles** it found while scraping the sources provided.

View the full sample newsletter in PDF format here.

By default, the output newsletter will be **created as an HTML file** in the installation directory of your package, saved in the folder `rendered` under the filename `newsletter_YYYY-MM-DD.html`, where the date is the respective date the NewsCollector scraped its articles from. To adjust the default settings, please refer to [Additional Parameters](https://github.com/elisemercury/NewsCollector#additional-parameters). ## Additional Parameters You can customize the NewsCollector algorithm with the following optional parameters: ```Python newsletter = NewsCollector(sources="sources.json", news_name="Daily News Update", news_date=date.today(), template='newsletter.html', output_filename='default', auto_open=False, return_details=False) ``` > :bulb: For a **detailed usage guide**, please refer to the official [NewsCollector Usage Documentation](https://github.com/elisemercury/News-Collector/wiki/NewsCollector-Usage-Documentation). -------

:heart: Open Source


近期下载者

相关文件


收藏者