4zhengquan_news
所属分类:大数据
开发工具:Jupyter Notebook
文件大小:1705KB
下载次数:0
上传日期:2020-01-10 03:58:20
上 传 者:
sh-1993
说明: 四大证券报新闻数据抓取、清理、格式化
(News data capture, cleaning and formatting of the four major securities newspapers)
文件列表:
.idea (0, 2020-01-10)
.idea\4zhengquan_news.iml (284, 2020-01-10)
.idea\inspectionProfiles (0, 2020-01-10)
.idea\inspectionProfiles\profiles_settings.xml (174, 2020-01-10)
.idea\misc.xml (288, 2020-01-10)
.idea\modules.xml (282, 2020-01-10)
.idea\vcs.xml (180, 2020-01-10)
.vs (0, 2020-01-10)
.vs\4zhengquan_news (0, 2020-01-10)
.vs\4zhengquan_news\v15 (0, 2020-01-10)
.vs\4zhengquan_news\v15\.suo (22528, 2020-01-10)
.vs\ProjectSettings.json (35, 2020-01-10)
.vs\VSWorkspaceState.json (148, 2020-01-10)
.vs\slnx.sqlite (77824, 2020-01-10)
img (0, 2020-01-10)
img\内容页.png (313856, 2020-01-10)
img\总览页.png (163038, 2020-01-10)
img\运行结果.png (267036, 2020-01-10)
img\运行过程.gif (1013895, 2020-01-10)
zhengquan_news (0, 2020-01-10)
zhengquan_news\data_ana.ipynb (8403, 2020-01-10)
zhengquan_news\get_news.py (2948, 2020-01-10)
zhengquan_news\四大证券报.xlsx (170360, 2020-01-10)
# 4zhengquan_news
四大证券报新闻数据抓取、清理、格式化
是一个将新浪网上的四大证券报专题的数据爬取下来的爬虫:
![总览页](https://github.com/GaoHuaTJ/4zhengquan_news/blob/master/ / img/总览页.png)
发送请求和返回数据的思路比较简单。
比较难的是后续的数据清洗和格式化走了比较多的弯路。算法不难,但是需要多思考。
![内容页](https://github.com/GaoHuaTJ/4zhengquan_news/blob/master/ / img/内容页.png)
运行之后的输出结果:
![运行结果页](https://github.com/GaoHuaTJ/4zhengquan_news/blob/master/ / img/运行结果.png)
运行过程如下图所示:
![运行过程](https://github.com/GaoHuaTJ/4zhengquan_news/blob/master/ / img/运行过程.gif)s
近期下载者:
相关文件:
收藏者: