article-parser

所属分类:代码编辑器
开发工具:Python
文件大小:9KB
下载次数:0
上传日期:2023-05-23 03:12:05
上 传 者sh-1993
说明:  通过url或html提取文章或新闻,解析标题和内容,以降价格式输出。
(Extract article or news by url or html, parse the title and content, output in markdown format.)

文件列表:
LICENSE (1064, 2023-04-06)
article_parser (0, 2023-04-06)
article_parser\Extractor.py (4091, 2023-04-06)
article_parser\__init__.py (622, 2023-04-06)
requirements.txt (92, 2023-04-06)
setup.py (1535, 2023-04-06)
test (0, 2023-04-06)
test\__init__.py (0, 2023-04-06)
test\test_parser.py (1186, 2023-04-06)

# article-parser ![GitHub Repo stars](https://img.shields.io/github/stars/myifeng/article-parser) ![GitHub Workflow Status](https://img.shields.io/github/workflow/status/myifeng/article-parser/CI) [![python](https://img.shields.io/pypi/pyversions/article-parser)](https://pypi.org/project/article-parser/) [![pypi](https://img.shields.io/pypi/v/article-parser)](https://pypi.org/project/article-parser/) [![wheel](https://img.shields.io/pypi/wheel/article-parser)](https://pypi.org/project/article-parser/) [![license](https://img.shields.io/github/license/myifeng/article-parser)](https://pypi.org/project/article-parser/) ![PyPI - Downloads](https://img.shields.io/pypi/dd/article-parser) Extract article or news by url or html, parse the title and content, output in markdown format. ## How to install `article-parser` is available on pypi https://pypi.org/project/article-parser/ ``` $ pip install article-parser ``` ## Basic Usage ```python >>> import article_parser article_parser.parse( url='', # The URL of the article. html='', # The HTML of the article. threshold=0.9, # The ratio of text to the entire document, default 0.9. output='html', # Result output format, support ``markdown`` and ``html``, default ``html``. **kwargs # Optional arguments that `request` takes. optional ), ## ouput markdown >>> title, content = article_parser.parse(url="http://www.chinadaily.com.cn/a/202009/22/WS5f6962b2a31024ad0ba7afcb.html", output='markdown', timeout=5) ## output html >>> title, content = article_parser.parse(url="http://www.chinadaily.com.cn/a/202009/22/WS5f6962b2a31024ad0ba7afcb.html", timeout=5) ``` ## Example [Djokovic wins record 36th Masters title in Rome - Chinadaily.com.cn](http://www.chinadaily.com.cn/a/202009/22/WS5f6962b2a31024ad0ba7afcb.html) * Markdown ```python >>> import article_parser >>> title, content = article_parser.parse(url="http://www.chinadaily.com.cn/a/202009/22/WS5f6962b2a31024ad0ba7afcb.html", output='markdown', timeout=5) >>> print(title) >>> print('----------------') >>> print(content) Djokovic wins record 36th Masters title in Rome ---------------- ![](http://img2.chinadaily.com.cn/images/202009/22/5f6962b2a31024adbd959228.jpeg) Serbia's Novak Djokovic kisses the trophy after winning the final against Argentina's Diego Schwartzman at Italian Open, Foro Italico, Rome, Italy, Sept 21, 2020. [Photo/Agencies] ROME - Novak Djokovic won a record 36th Masters crown as he beat Diego Schwartzman in the men's final of the ATP Italian Open on Monday. Djokovic, the world number one and the top seed at the tournament, won 7-5, 6-3 against Argentine Schwartzman to lift his 36th Masters title, one more than Rafael Nadal. The Serb said he did not play his best tennis this time in Rome, but could find it when needed. Simona Halep, top seed of the women's draw, won her first title in Rome after defending champion Karolina Pliskova of the Czech Republic retired while trailing 6-0, 2-1 in the final. ``` * HTML ```python >>> import article_parser >>> title, content = article_parser.parse(url="http://www.chinadaily.com.cn/a/202009/22/WS5f6962b2a31024ad0ba7afcb.html", timeout=5) >>> print(title) >>> print('----------------') >>> print(content) Djokovic wins record 36th Masters title in Rome ----------------
Serbia's Novak Djokovic kisses the trophy after winning the final against Argentina's Diego Schwartzman at Italian Open, Foro Italico, Rome, Italy, Sept 21, 2020. [Photo/Agencies]

ROME - Novak Djokovic won a record 36th Masters crown as he beat Diego Schwartzman in the men's final of the ATP Italian Open on Monday.

Djokovic, the world number one and the top seed at the tournament, won 7-5, 6-3 against Argentine Schwartzman to lift his 36th Masters title, one more than Rafael Nadal.

The Serb said he did not play his best tennis this time in Rome, but could find it when needed.

Simona Halep, top seed of the women's draw, won her first title in Rome after defending champion Karolina Pliskova of the Czech Republic retired while trailing 6-0, 2-1 in the final.

``` ## Contributors [![All contributions](https://contrib.rocks/image?repo=myifeng/article-parser)](https://github.com/myifeng/article-parser/graphs/contributors)

近期下载者

相关文件


收藏者