aristotle
所属分类:虚拟/增强现实-VR/AR
开发工具:Python
文件大小:149KB
下载次数:0
上传日期:2021-03-08 19:01:28
上 传 者:
sh-1993
说明: 高度可定制的新闻收集器
(highly customizable news collector)
文件列表:
LICENSE (35142, 2021-03-09)
aristotle.png (124681, 2021-03-09)
aristotle (0, 2021-03-09)
aristotle\config (0, 2021-03-09)
aristotle\config\logging.yaml (943, 2021-03-09)
aristotle\config\properties.yaml (610, 2021-03-09)
aristotle\config\sources-en_EN.yaml (963, 2021-03-09)
aristotle\config\sources-tr_TR.yaml (22497, 2021-03-09)
aristotle\connection.py (1044, 2021-03-09)
aristotle\crawler.py (3463, 2021-03-09)
aristotle\db.py (1232, 2021-03-09)
aristotle\entity.py (963, 2021-03-09)
aristotle\itest (0, 2021-03-09)
aristotle\itest\__init__.py (0, 2021-03-09)
aristotle\itest\itest_crawler.py (289, 2021-03-09)
aristotle\link_parser.py (1874, 2021-03-09)
aristotle\main.py (575, 2021-03-09)
aristotle\meta_parser.py (1695, 2021-03-09)
aristotle\news.py (5659, 2021-03-09)
aristotle\properties_parser.py (1112, 2021-03-09)
aristotle\settings.py (1226, 2021-03-09)
aristotle\tests (0, 2021-03-09)
aristotle\tests\__init__.py (0, 2021-03-09)
aristotle\tests\test_crawler.py (70, 2021-03-09)
aristotle\tests\test_db.py (65, 2021-03-09)
aristotle\tests\test_link_parser.py (63, 2021-03-09)
aristotle\tests\test_meta_parser.py (63, 2021-03-09)
aristotle\tests\test_news.py (67, 2021-03-09)
aristotle\tests\test_properties_parser.py (63, 2021-03-09)
aristotle\tests\test_util.py (63, 2021-03-09)
aristotle\util.py (866, 2021-03-09)
requirements.txt (71, 2021-03-09)
Aristotle is a highly customizable tool that collects links from sites.
![Aristotle](https://github.com/egcodes/aristotle/blob/master/aristotle.png)
With the properties in the config files, it scans all the defined sites and saves the metadata [title, description, imageLink, publishDate] of the site in the database.
## Usage
#### config/properties.yaml
These settings are basically:
1. **database:** Currently, databases in this list ((https://docs.sqlalchemy.org/en/13/dialects/)) are supported. The settings of the DB where the links will be stored are entered here. For the `name` property, a database must be created in the DB and its name must be entered in this parameter.
2. **locale:** According to the language of the sites to be fetched, the feature to be localized must be entered here. For example, in English, en_EN should be entered.
3. **request:** General features of the request.
4. **parser:** In the parsing phase, if desired, title and description strings can be trimmed as much as the parameter given
```yaml
database:
dialect: mysql+pymysql
url: localhost
port: 3306
name: aristotle
userName: root
password: root
locale: en_EN
request:
timeout: 3
userAgent: Mozilla/5.0 (X11; Linux x86_***) AppleWebKit/537.11 (KHTML) Chrome/23.0.1271.97 Safari/537.11
parser:
titleCharLimit: 100
descriptionCharLimit: 300
```
#### config/sources-{locale}.yaml
```yaml
article:
- domain: cnn.com
active: true
link: https://edition.cnn.com/
filterForLink:
mandatoryWords: ["/politics/"]
permissibleWords: []
impermissibleWords: []
tagForMetadata:
title:
description:
image:
publishDate:
publishDateFormat: "%Y-%m-%d"
technology:
- domain: mashable.com
active: true
link: https://mashable.com
filterForLink:
mandatoryWords: ["-"]
permissibleWords: ['/article/']
impermissibleWords: []
tagForMetadata:
title:
description:
image:
publishDate: datetime
publishDateFormat: "%d.%m.%Y"
```
## Development
If you'd like to contribute the project, feel free to clone a development version of this repository locally:
`git clone https://github.com/egcodes/aristotle.git`
Once you have a copy of the source, you can embed it in your Python package, or install it into your site-packages easily:
```
$ pip3 install -r requirements.txt
$ python3 setup.py install
```
### Requirements
- Python 3.x
- beautifulsoup4>=4.9.1
- requests>=2.24.0
- PyYAML>=5.3.1
- SQLAlchemy>=1.3.18
For database dialect, you must install the special dialect package for the database you use.
For example, if you are using MySQL, the `PyMySQL` package must be installed.
近期下载者:
相关文件:
收藏者: