aristotle

所属分类:虚拟/增强现实-VR/AR
开发工具:Python
文件大小:149KB
下载次数:0
上传日期:2021-03-08 19:01:28
上 传 者sh-1993
说明:  高度可定制的新闻收集器
(highly customizable news collector)

文件列表:
LICENSE (35142, 2021-03-09)
aristotle.png (124681, 2021-03-09)
aristotle (0, 2021-03-09)
aristotle\config (0, 2021-03-09)
aristotle\config\logging.yaml (943, 2021-03-09)
aristotle\config\properties.yaml (610, 2021-03-09)
aristotle\config\sources-en_EN.yaml (963, 2021-03-09)
aristotle\config\sources-tr_TR.yaml (22497, 2021-03-09)
aristotle\connection.py (1044, 2021-03-09)
aristotle\crawler.py (3463, 2021-03-09)
aristotle\db.py (1232, 2021-03-09)
aristotle\entity.py (963, 2021-03-09)
aristotle\itest (0, 2021-03-09)
aristotle\itest\__init__.py (0, 2021-03-09)
aristotle\itest\itest_crawler.py (289, 2021-03-09)
aristotle\link_parser.py (1874, 2021-03-09)
aristotle\main.py (575, 2021-03-09)
aristotle\meta_parser.py (1695, 2021-03-09)
aristotle\news.py (5659, 2021-03-09)
aristotle\properties_parser.py (1112, 2021-03-09)
aristotle\settings.py (1226, 2021-03-09)
aristotle\tests (0, 2021-03-09)
aristotle\tests\__init__.py (0, 2021-03-09)
aristotle\tests\test_crawler.py (70, 2021-03-09)
aristotle\tests\test_db.py (65, 2021-03-09)
aristotle\tests\test_link_parser.py (63, 2021-03-09)
aristotle\tests\test_meta_parser.py (63, 2021-03-09)
aristotle\tests\test_news.py (67, 2021-03-09)
aristotle\tests\test_properties_parser.py (63, 2021-03-09)
aristotle\tests\test_util.py (63, 2021-03-09)
aristotle\util.py (866, 2021-03-09)
requirements.txt (71, 2021-03-09)

Aristotle is a highly customizable tool that collects links from sites. ![Aristotle](https://github.com/egcodes/aristotle/blob/master/aristotle.png) With the properties in the config files, it scans all the defined sites and saves the metadata [title, description, imageLink, publishDate] of the site in the database. ## Usage #### config/properties.yaml These settings are basically: 1. **database:** Currently, databases in this list ((https://docs.sqlalchemy.org/en/13/dialects/)) are supported. The settings of the DB where the links will be stored are entered here. For the `name` property, a database must be created in the DB and its name must be entered in this parameter. 2. **locale:** According to the language of the sites to be fetched, the feature to be localized must be entered here. For example, in English, en_EN should be entered. 3. **request:** General features of the request. 4. **parser:** In the parsing phase, if desired, title and description strings can be trimmed as much as the parameter given ```yaml database: dialect: mysql+pymysql url: localhost port: 3306 name: aristotle userName: root password: root locale: en_EN request: timeout: 3 userAgent: Mozilla/5.0 (X11; Linux x86_***) AppleWebKit/537.11 (KHTML) Chrome/23.0.1271.97 Safari/537.11 parser: titleCharLimit: 100 descriptionCharLimit: 300 ``` #### config/sources-{locale}.yaml ```yaml article: - domain: cnn.com active: true link: https://edition.cnn.com/ filterForLink: mandatoryWords: ["/politics/"] permissibleWords: [] impermissibleWords: [] tagForMetadata: title: description: image: publishDate: publishDateFormat: "%Y-%m-%d" technology: - domain: mashable.com active: true link: https://mashable.com filterForLink: mandatoryWords: ["-"] permissibleWords: ['/article/'] impermissibleWords: [] tagForMetadata: title: description: image: publishDate: datetime publishDateFormat: "%d.%m.%Y" ``` ## Development If you'd like to contribute the project, feel free to clone a development version of this repository locally: `git clone https://github.com/egcodes/aristotle.git` Once you have a copy of the source, you can embed it in your Python package, or install it into your site-packages easily: ``` $ pip3 install -r requirements.txt $ python3 setup.py install ``` ### Requirements - Python 3.x - beautifulsoup4>=4.9.1 - requests>=2.24.0 - PyYAML>=5.3.1 - SQLAlchemy>=1.3.18 For database dialect, you must install the special dialect package for the database you use. For example, if you are using MySQL, the `PyMySQL` package must be installed.

近期下载者

相关文件


收藏者