pWebCrawler

所属分类:数据采集/爬虫
开发工具:Jupyter Notebook
文件大小:5963KB
下载次数:0
上传日期:2020-04-27 15:45:56
上 传 者sh-1993
说明:  爬网可编程Web数据
(crawl programmableweb data)

文件列表:
.ipynb_checkpoints (0, 2020-04-27)
.ipynb_checkpoints\WSWordCloud-checkpoint.ipynb (7467, 2020-04-27)
.ipynb_checkpoints\WebServiceClassify-checkpoint.ipynb (23017, 2020-04-27)
CrawlerAPIContent.py (3746, 2020-04-27)
CrawlerAPIList.py (2451, 2020-04-27)
CrawlerAPITest.py (5450, 2020-04-27)
WSWordCloud.ipynb (248865, 2020-04-27)
WebServiceClassify.ipynb (23017, 2020-04-27)
api_8459.csv (4088997, 2020-04-27)
api_8459.xlsx (1873526, 2020-04-27)
api_wordcloud_data.csv (3804297, 2020-04-27)
apis (0, 2020-04-27)
apis\api_8459.json (5578544, 2020-04-27)
apis\categories_20.json (354, 2020-04-27)
apis\serv_net.csv (27897, 2020-04-27)
apis\servnet_id_mapping.csv (11292, 2020-04-27)
pWebAPIContentTest1.csv (8622, 2020-04-27)
pWebAPIListTest1.txt (205, 2020-04-27)
pWebSpider (0, 2020-04-27)
pWebSpider\pWebSpider (0, 2020-04-27)
pWebSpider\pWebSpider\__init__.py (0, 2020-04-27)
pWebSpider\pWebSpider\__pycache__ (0, 2020-04-27)
pWebSpider\pWebSpider\__pycache__\__init__.cpython-38.pyc (158, 2020-04-27)
pWebSpider\pWebSpider\__pycache__\settings.cpython-38.pyc (542, 2020-04-27)
pWebSpider\pWebSpider\items.py (291, 2020-04-27)
pWebSpider\pWebSpider\middlewares.py (3605, 2020-04-27)
pWebSpider\pWebSpider\pipelines.py (292, 2020-04-27)
pWebSpider\pWebSpider\run.py (381, 2020-04-27)
pWebSpider\pWebSpider\settings.py (3256, 2020-04-27)
pWebSpider\pWebSpider\spiders (0, 2020-04-27)
pWebSpider\pWebSpider\spiders\__init__.py (161, 2020-04-27)
pWebSpider\pWebSpider\spiders\__pycache__ (0, 2020-04-27)
pWebSpider\pWebSpider\spiders\__pycache__\__init__.cpython-38.pyc (166, 2020-04-27)
pWebSpider\pWebSpider\spiders\__pycache__\programmableweb.cpython-38.pyc (1521, 2020-04-27)
pWebSpider\pWebSpider\spiders\programmableweb.py (2075, 2020-04-27)
pWebSpider\pweb.code-workspace (60, 2020-04-27)
pWebSpider\scrapy.cfg (263, 2020-04-27)
... ...

# pWebCrawler crawl programmableweb data - ##### `pWebAPIContentTest1.csv` - 输出数据示例 - ##### `pWebAPIListTest1.txt` - 内容详情页网址列表 - ##### `CrawlerAPIList.py` - 获取内容详情页网址列表 - 使用 selenium 下载网页 - ##### `CrawlerAPITest.py` - 测试文件 - ##### `CrawlerAPIContent.py` - 根据网址列表爬取具体内容 - 使用 request 获取网页 - 使用 bs4 库和 re 库解析网页 - ##### `pWebSpider` - 功能类似 CrawlerAPIContent.py,性能提升两倍 - 使用 scrapy 框架

近期下载者

相关文件


收藏者