ferret

所属分类:Python工具库
开发工具:HTML
文件大小:7450KB
下载次数:0
上传日期:2017-10-06 16:00:20
上 传 者sh-1993
说明:  雪貂,一个从新闻页面中提取数据的现代Python库
(ferret,A modern pythonic lib to extract data from news pages)

文件列表:
GOOSE-LICENSE.txt (10850, 2017-10-07)
ferret (0, 2017-10-07)
ferret\__init__.py (0, 2017-10-07)
ferret\cleaner (0, 2017-10-07)
ferret\cleaner\__init__.py (0, 2017-10-07)
ferret\cleaner\tag.py (278, 2017-10-07)
ferret\cleaner\text.py (601, 2017-10-07)
ferret\extractors (0, 2017-10-07)
ferret\extractors\__init__.py (0, 2017-10-07)
ferret\extractors\content_extractor.py (9684, 2017-10-07)
ferret\extractors\published_date_extractor.py (6851, 2017-10-07)
ferret\extractors\title_extractor.py (7410, 2017-10-07)
ferret\main.py (4184, 2017-10-07)
ferret\util (0, 2017-10-07)
ferret\util\__init__.py (0, 2017-10-07)
ferret\util\date.py (5650, 2017-10-07)
ferret\util\http.py (103, 2017-10-07)
ferret\util\tag.py (435, 2017-10-07)
ferret\util\title.py (913, 2017-10-07)
pytest.ini (61, 2017-10-07)
requirements-dev.txt (73, 2017-10-07)
requirements-scrapely.txt (49, 2017-10-07)
requirements.txt (205, 2017-10-07)
setup.cfg (21, 2017-10-07)
setup.py (630, 2017-10-07)
tests (0, 2017-10-07)
tests\regression (0, 2017-10-07)
tests\regression\en (0, 2017-10-07)
tests\regression\en\test_publish_date_extraction.py (1853, 2017-10-07)
tests\regression\en\test_title_extraction.py (1141, 2017-10-07)
tests\regression\pt (0, 2017-10-07)
tests\regression\pt\test_content_extractor.py (854, 2017-10-07)
tests\regression\pt\test_publish_date_extraction.py (4438, 2017-10-07)
tests\regression\pt\test_title_extraction.py (3749, 2017-10-07)
tests\resources (0, 2017-10-07)
tests\resources\en (0, 2017-10-07)
tests\resources\en\9to5mac (0, 2017-10-07)
... ...

![ferret-logo](https://cloud.githubusercontent.com/assets/4680755/24068678/e7aedea0-0b73-11e7-9dd3-3775f959e46f.png) ## Quick Start The library is pretty straightforward to be used: ```python from ferret.main import Ferret ferret = Ferret(url='http://g1.globo.com/politica/blog/cristiana-lobo/post/setor-de-propina-da-odebrecht-movimentou-us-33-bi-diz-delator.html') ferret.get_article() ``` Ferret also takes two optional arguments: HTML and/or language. ```python from ferret.main import Ferret ferret = Ferret(url='http://g1.globo.com/politica/blog/cristiana-lobo/post/setor-de-propina-da-odebrecht-movimentou-us-33-bi-diz-delator.html', html='

Titulo da pagina de notcias

', lang='pt') ferret.get_article() ``` ## Contribute ## License Authored and maintained by Victor Martinez. Ferret uses a small routine to inherited from python-goose's code. See their license [here](https://github.com/grangier/python-goose/blob/develop/LICENSE.txt). ## Credits to Logo The image contains copyright to skaterjob at [Vecteezy](https://www.vecteezy.com/vector-art/128860-chinchilla-vector-set).

近期下载者

相关文件


收藏者