Python_Crawler-master 联合开发网

Pudn.com > 下载中心 > 其他 > Python_Crawler-master

Python_Crawler-master

所属分类：其他
开发工具：Python
文件大小：23KB
下载次数：2
上传日期：2018-06-11 14:40:30
上传者：时光酿醉

说明： python爬虫，使用urllib2来爬取百度百科，新手在学习爬虫时可以用来练手
(Python crawler, which USES urllib2 to crawl baidu encyclopedia, can be used by novice when learning crawler)

文件列表:

.idea (0, 2017-02-04)
.idea\TestCrawler.iml (398, 2017-02-04)
.idea\inspectionProfiles (0, 2017-02-04)
.idea\inspectionProfiles\profiles_settings.xml (228, 2017-02-04)
.idea\misc.xml (225, 2017-02-04)
.idea\modules.xml (274, 2017-02-04)
.idea\workspace.xml (32299, 2017-02-04)
LICENSE (11357, 2017-02-04)
baike_spider (0, 2017-02-04)
baike_spider\__init__.py (0, 2017-02-04)
baike_spider\__pycache__ (0, 2017-02-04)
baike_spider\__pycache__\__init__.cpython-34.pyc (164, 2017-02-04)
baike_spider\__pycache__\html_downloder.cpython-34.pyc (636, 2017-02-04)
baike_spider\__pycache__\html_output.cpython-34.pyc (1283, 2017-02-04)
baike_spider\__pycache__\html_parser.cpython-34.pyc (1502, 2017-02-04)
baike_spider\__pycache__\url_manager.cpython-34.pyc (1282, 2017-02-04)
baike_spider\html_downloder.py (312, 2017-02-04)
baike_spider\html_output.py (1086, 2017-02-04)
baike_spider\html_parser.py (1719, 2017-02-04)
baike_spider\output.html (7892, 2017-02-04)
baike_spider\spider_main.py (1812, 2017-02-04)
baike_spider\url_manager.py (847, 2017-02-04)
test (0, 2017-02-04)
test\__init__.py (0, 2017-02-04)
test\test_urllib2.py (790, 2017-02-04)
test\testbs4.py (938, 2017-02-04)

## Python实现的爬虫 test文件夹下是对用到的网页下载器urllib和网页解析器BeautifulSoup的测试 baike_spider是项目文件该轻量级的爬虫是对百度百科中由Python词条地址为根url进行爬取, 爬取页面中的url,以及页面词条名称以及简介若爬取到的url未进行过爬取则循环进行爬取将爬取到的标题和简介保存到一个html文件中,并进行输出限定爬取15条。

近期下载者：

相关文件：

评论：[我要评论] [举报此文件]

收藏者：