ferret
所属分类:Python工具库
开发工具:HTML
文件大小:7450KB
下载次数:0
上传日期:2017-10-06 16:00:20
上 传 者:
sh-1993
说明: 雪貂,一个从新闻页面中提取数据的现代Python库
(ferret,A modern pythonic lib to extract data from news pages)
文件列表:
GOOSE-LICENSE.txt (10850, 2017-10-07)
ferret (0, 2017-10-07)
ferret\__init__.py (0, 2017-10-07)
ferret\cleaner (0, 2017-10-07)
ferret\cleaner\__init__.py (0, 2017-10-07)
ferret\cleaner\tag.py (278, 2017-10-07)
ferret\cleaner\text.py (601, 2017-10-07)
ferret\extractors (0, 2017-10-07)
ferret\extractors\__init__.py (0, 2017-10-07)
ferret\extractors\content_extractor.py (9684, 2017-10-07)
ferret\extractors\published_date_extractor.py (6851, 2017-10-07)
ferret\extractors\title_extractor.py (7410, 2017-10-07)
ferret\main.py (4184, 2017-10-07)
ferret\util (0, 2017-10-07)
ferret\util\__init__.py (0, 2017-10-07)
ferret\util\date.py (5650, 2017-10-07)
ferret\util\http.py (103, 2017-10-07)
ferret\util\tag.py (435, 2017-10-07)
ferret\util\title.py (913, 2017-10-07)
pytest.ini (61, 2017-10-07)
requirements-dev.txt (73, 2017-10-07)
requirements-scrapely.txt (49, 2017-10-07)
requirements.txt (205, 2017-10-07)
setup.cfg (21, 2017-10-07)
setup.py (630, 2017-10-07)
tests (0, 2017-10-07)
tests\regression (0, 2017-10-07)
tests\regression\en (0, 2017-10-07)
tests\regression\en\test_publish_date_extraction.py (1853, 2017-10-07)
tests\regression\en\test_title_extraction.py (1141, 2017-10-07)
tests\regression\pt (0, 2017-10-07)
tests\regression\pt\test_content_extractor.py (854, 2017-10-07)
tests\regression\pt\test_publish_date_extraction.py (4438, 2017-10-07)
tests\regression\pt\test_title_extraction.py (3749, 2017-10-07)
tests\resources (0, 2017-10-07)
tests\resources\en (0, 2017-10-07)
tests\resources\en\9to5mac (0, 2017-10-07)
... ...
![ferret-logo](https://cloud.githubusercontent.com/assets/4680755/24068678/e7aedea0-0b73-11e7-9dd3-3775f959e46f.png)
## Quick Start
The library is pretty straightforward to be used:
```python
from ferret.main import Ferret
ferret = Ferret(url='http://g1.globo.com/politica/blog/cristiana-lobo/post/setor-de-propina-da-odebrecht-movimentou-us-33-bi-diz-delator.html')
ferret.get_article()
```
Ferret also takes two optional arguments: HTML and/or language.
```python
from ferret.main import Ferret
ferret = Ferret(url='http://g1.globo.com/politica/blog/cristiana-lobo/post/setor-de-propina-da-odebrecht-movimentou-us-33-bi-diz-delator.html', html='
Titulo da pagina de notcias
', lang='pt')
ferret.get_article()
```
## Contribute
## License
Authored and maintained by Victor Martinez.
Ferret uses a small routine to inherited from python-goose's code. See their license [here](https://github.com/grangier/python-goose/blob/develop/LICENSE.txt).
## Credits to Logo
The image contains copyright to skaterjob at [Vecteezy](https://www.vecteezy.com/vector-art/128860-chinchilla-vector-set).
近期下载者:
相关文件:
收藏者: