Taiwan-news-crawlers
所属分类:数据采集/爬虫
开发工具:Python
文件大小:22KB
下载次数:0
上传日期:2022-11-11 07:51:27
上 传 者:
sh-1993
说明: 基于Scrapy的台湾新闻爬虫
(Scrapy-based Crawlers for news of Taiwan)
文件列表:
LICENSE (1067, 2018-04-16)
TaiwanNewsCrawler (0, 2018-04-16)
TaiwanNewsCrawler\__init__.py (0, 2018-04-16)
TaiwanNewsCrawler\items.py (376, 2018-04-16)
TaiwanNewsCrawler\pipelines.py (297, 2018-04-16)
TaiwanNewsCrawler\settings.py (3492, 2018-04-16)
TaiwanNewsCrawler\spiders (0, 2018-04-16)
TaiwanNewsCrawler\spiders\__init__.py (161, 2018-04-16)
TaiwanNewsCrawler\spiders\apple_realtimenews_spider.py (2694, 2018-04-16)
TaiwanNewsCrawler\spiders\apple_spider.py (2775, 2018-04-16)
TaiwanNewsCrawler\spiders\china_spider.py (1845, 2018-04-16)
TaiwanNewsCrawler\spiders\cna_spider.py (1994, 2018-04-16)
TaiwanNewsCrawler\spiders\cts_spider.py (1938, 2018-04-16)
TaiwanNewsCrawler\spiders\ettoday_spider.py (2561, 2018-04-16)
TaiwanNewsCrawler\spiders\ettoday_tag_spider.py (2679, 2018-04-16)
TaiwanNewsCrawler\spiders\liberty_realtimenews_spider.py (4091, 2018-04-16)
TaiwanNewsCrawler\spiders\liberty_spider.py (4435, 2018-04-16)
TaiwanNewsCrawler\spiders\liberty_tag_spider.py (3178, 2018-04-16)
TaiwanNewsCrawler\spiders\pts_spider.py (2421, 2018-04-16)
TaiwanNewsCrawler\spiders\setn_spider.py (2334, 2018-04-16)
TaiwanNewsCrawler\spiders\tvbs_spider.py (2492, 2018-04-16)
TaiwanNewsCrawler\spiders\udn_spider.py (2196, 2018-04-16)
requirements.txt (29, 2018-04-16)
scrapy.cfg (278, 2018-04-16)
# Taiwan-news-crawlers
[Scrapy](https://scrapy.org)-based Crawlers for news of Taiwan including 10 media companies:
1. è—±
2. ±
3. ¤¤
4. èè–
5. ±–°èé
6. è”±±
7. …è–
8. ‰
9. TVBS
10. UDN
## Getting Started
```
$ git clone https://github.com/TaiwanStat/Taiwan-news-crawlers.git
$ cd Taiwan-news-crawlers
$ pip install -r requirements.txt
$ scrapy crawl apple -o apple_news.json
```
## Prerequisites
- Python3
- Scrapy 1.3.0
## Usage
```scrapy crawl -o ```
### Available spiders
1. apple
2. appleRealtime
3. china
4. cna
5. cts
6. ettoday
7. liberty
8. libertyRealtime
9. pts
10. setn
11. tvbs
12. udn
## Output
| Key | Value |
| :--- | :--- |
| website | the publisher|
| url | the origin web|
| title | the news title|
| content | the news content |
| category | the category of news |
## License
The MIT License
近期下载者:
相关文件:
收藏者: