Taiwan-news-crawlers

所属分类:数据采集/爬虫
开发工具:Python
文件大小:22KB
下载次数:0
上传日期:2022-11-11 07:51:27
上 传 者sh-1993
说明:  基于Scrapy的台湾新闻爬虫
(Scrapy-based Crawlers for news of Taiwan)

文件列表:
LICENSE (1067, 2018-04-16)
TaiwanNewsCrawler (0, 2018-04-16)
TaiwanNewsCrawler\__init__.py (0, 2018-04-16)
TaiwanNewsCrawler\items.py (376, 2018-04-16)
TaiwanNewsCrawler\pipelines.py (297, 2018-04-16)
TaiwanNewsCrawler\settings.py (3492, 2018-04-16)
TaiwanNewsCrawler\spiders (0, 2018-04-16)
TaiwanNewsCrawler\spiders\__init__.py (161, 2018-04-16)
TaiwanNewsCrawler\spiders\apple_realtimenews_spider.py (2694, 2018-04-16)
TaiwanNewsCrawler\spiders\apple_spider.py (2775, 2018-04-16)
TaiwanNewsCrawler\spiders\china_spider.py (1845, 2018-04-16)
TaiwanNewsCrawler\spiders\cna_spider.py (1994, 2018-04-16)
TaiwanNewsCrawler\spiders\cts_spider.py (1938, 2018-04-16)
TaiwanNewsCrawler\spiders\ettoday_spider.py (2561, 2018-04-16)
TaiwanNewsCrawler\spiders\ettoday_tag_spider.py (2679, 2018-04-16)
TaiwanNewsCrawler\spiders\liberty_realtimenews_spider.py (4091, 2018-04-16)
TaiwanNewsCrawler\spiders\liberty_spider.py (4435, 2018-04-16)
TaiwanNewsCrawler\spiders\liberty_tag_spider.py (3178, 2018-04-16)
TaiwanNewsCrawler\spiders\pts_spider.py (2421, 2018-04-16)
TaiwanNewsCrawler\spiders\setn_spider.py (2334, 2018-04-16)
TaiwanNewsCrawler\spiders\tvbs_spider.py (2492, 2018-04-16)
TaiwanNewsCrawler\spiders\udn_spider.py (2196, 2018-04-16)
requirements.txt (29, 2018-04-16)
scrapy.cfg (278, 2018-04-16)

# Taiwan-news-crawlers [Scrapy](https://scrapy.org)-based Crawlers for news of Taiwan including 10 media companies: 1. è—± 2. ± 3. ¤¤ 4. èè– 5. ±–°èé 6. è”±± 7. …è– 8. ‰ 9. TVBS 10. UDN ## Getting Started ``` $ git clone https://github.com/TaiwanStat/Taiwan-news-crawlers.git $ cd Taiwan-news-crawlers $ pip install -r requirements.txt $ scrapy crawl apple -o apple_news.json ``` ## Prerequisites - Python3 - Scrapy 1.3.0 ## Usage ```scrapy crawl -o ``` ### Available spiders 1. apple 2. appleRealtime 3. china 4. cna 5. cts 6. ettoday 7. liberty 8. libertyRealtime 9. pts 10. setn 11. tvbs 12. udn ## Output | Key | Value | | :--- | :--- | | website | the publisher| | url | the origin web| | title | the news title| | content | the news content | | category | the category of news | ## License The MIT License

近期下载者

相关文件


收藏者