warta-scrap
所属分类:数据采集/爬虫
开发工具:Python
文件大小:391KB
下载次数:0
上传日期:2018-10-12 02:45:36
上 传 者:
sh-1993
说明: 印尼指数新闻爬虫,包括10个在线媒体
(Indonesia Index News Crawler, including 10 online media)
文件列表:
antara (0, 2018-10-12)
antara\antara (0, 2018-10-12)
antara\antara\__init__.py (0, 2018-10-12)
antara\antara\items.py (204, 2018-10-12)
antara\antara\middlewares.py (1904, 2018-10-12)
antara\antara\pipelines.py (286, 2018-10-12)
antara\antara\settings.py (3128, 2018-10-12)
antara\antara\spiders (0, 2018-10-12)
antara\antara\spiders\__init__.py (161, 2018-10-12)
antara\antara\spiders\antara_spider.py (1008, 2018-10-12)
antara\sampleResult.json (2897, 2018-10-12)
antara\scrapy.cfg (256, 2018-10-12)
detik (0, 2018-10-12)
detik\detik (0, 2018-10-12)
detik\detik\__init__.py (0, 2018-10-12)
detik\detik\items.py (202, 2018-10-12)
detik\detik\middlewares.py (1903, 2018-10-12)
detik\detik\pipelines.py (285, 2018-10-12)
detik\detik\settings.py (3118, 2018-10-12)
detik\detik\spiders (0, 2018-10-12)
detik\detik\spiders\__init__.py (161, 2018-10-12)
detik\detik\spiders\detik_spider.py (2012, 2018-10-12)
detik\sampleResult.json (5503, 2018-10-12)
detik\scrapy.cfg (254, 2018-10-12)
kompas (0, 2018-10-12)
kompas\kompas (0, 2018-10-12)
kompas\kompas\__init__.py (0, 2018-10-12)
kompas\kompas\items.py (204, 2018-10-12)
kompas\kompas\middlewares.py (1904, 2018-10-12)
kompas\kompas\pipelines.py (286, 2018-10-12)
kompas\kompas\settings.py (3128, 2018-10-12)
kompas\kompas\spiders (0, 2018-10-12)
kompas\kompas\spiders\__init__.py (161, 2018-10-12)
kompas\kompas\spiders\kompas_spider.py (1214, 2018-10-12)
kompas\sampleResult.json (5728, 2018-10-12)
kompas\scrapy.cfg (256, 2018-10-12)
liputan6 (0, 2018-10-12)
... ...
# warta-scrap
Indonesia Index News Crawler, including 10 online
### Online Media List:
- Detik.com
http://news.detik.com/indeks
- Republika.co.id
http://www.republika.co.id/indeks
- Viva.co.id
http://www.viva.co.id/indeks
- Kompas.com
http://indeks.kompas.com/
- Antaranews.com
http://www.antaranews.com/terkini
- Tempo.co
https://www.tempo.co/indeks
- Okezone.com
http://index.okezone.com/
- Liputan6.com
http://www.liputan6.com/indeks
- Merdeka.com
https://www.merdeka.com/berita-hari-ini/
- Tirto.id
https://tirto.id/indeks
### Installation :
Open Terminal, and clone this repo:
> git clone https://github.com/harryandriyan/warta-scrap
Go to project folder
> cd warta-scrap
Setup virtualenv
> virtualenv venv
Activate virtualenv
> . venv/bin/activate
Install requirements
> pip install -r requirements.txt
### How to use
Open the specific project, example
> cd republika
Run crawl command, example
> scrapy crawl republika -o sampleResult.json -t json
近期下载者:
相关文件:
收藏者: