pWebCrawler
所属分类:数据采集/爬虫
开发工具:Jupyter Notebook
文件大小:5963KB
下载次数:0
上传日期:2020-04-27 15:45:56
上 传 者:
sh-1993
说明: 爬网可编程Web数据
(crawl programmableweb data)
文件列表:
.ipynb_checkpoints (0, 2020-04-27)
.ipynb_checkpoints\WSWordCloud-checkpoint.ipynb (7467, 2020-04-27)
.ipynb_checkpoints\WebServiceClassify-checkpoint.ipynb (23017, 2020-04-27)
CrawlerAPIContent.py (3746, 2020-04-27)
CrawlerAPIList.py (2451, 2020-04-27)
CrawlerAPITest.py (5450, 2020-04-27)
WSWordCloud.ipynb (248865, 2020-04-27)
WebServiceClassify.ipynb (23017, 2020-04-27)
api_8459.csv (4088997, 2020-04-27)
api_8459.xlsx (1873526, 2020-04-27)
api_wordcloud_data.csv (3804297, 2020-04-27)
apis (0, 2020-04-27)
apis\api_8459.json (5578544, 2020-04-27)
apis\categories_20.json (354, 2020-04-27)
apis\serv_net.csv (27897, 2020-04-27)
apis\servnet_id_mapping.csv (11292, 2020-04-27)
pWebAPIContentTest1.csv (8622, 2020-04-27)
pWebAPIListTest1.txt (205, 2020-04-27)
pWebSpider (0, 2020-04-27)
pWebSpider\pWebSpider (0, 2020-04-27)
pWebSpider\pWebSpider\__init__.py (0, 2020-04-27)
pWebSpider\pWebSpider\__pycache__ (0, 2020-04-27)
pWebSpider\pWebSpider\__pycache__\__init__.cpython-38.pyc (158, 2020-04-27)
pWebSpider\pWebSpider\__pycache__\settings.cpython-38.pyc (542, 2020-04-27)
pWebSpider\pWebSpider\items.py (291, 2020-04-27)
pWebSpider\pWebSpider\middlewares.py (3605, 2020-04-27)
pWebSpider\pWebSpider\pipelines.py (292, 2020-04-27)
pWebSpider\pWebSpider\run.py (381, 2020-04-27)
pWebSpider\pWebSpider\settings.py (3256, 2020-04-27)
pWebSpider\pWebSpider\spiders (0, 2020-04-27)
pWebSpider\pWebSpider\spiders\__init__.py (161, 2020-04-27)
pWebSpider\pWebSpider\spiders\__pycache__ (0, 2020-04-27)
pWebSpider\pWebSpider\spiders\__pycache__\__init__.cpython-38.pyc (166, 2020-04-27)
pWebSpider\pWebSpider\spiders\__pycache__\programmableweb.cpython-38.pyc (1521, 2020-04-27)
pWebSpider\pWebSpider\spiders\programmableweb.py (2075, 2020-04-27)
pWebSpider\pweb.code-workspace (60, 2020-04-27)
pWebSpider\scrapy.cfg (263, 2020-04-27)
... ...
# pWebCrawler
crawl programmableweb data
- ##### `pWebAPIContentTest1.csv`
- 输出数据示例
- ##### `pWebAPIListTest1.txt`
- 内容详情页网址列表
- ##### `CrawlerAPIList.py`
- 获取内容详情页网址列表
- 使用 selenium 下载网页
- ##### `CrawlerAPITest.py`
- 测试文件
- ##### `CrawlerAPIContent.py`
- 根据网址列表爬取具体内容
- 使用 request 获取网页
- 使用 bs4 库和 re 库解析网页
- ##### `pWebSpider`
- 功能类似 CrawlerAPIContent.py,性能提升两倍
- 使用 scrapy 框架
近期下载者:
相关文件:
收藏者: