4zhengquan_news

所属分类:大数据
开发工具:Jupyter Notebook
文件大小:1705KB
下载次数:0
上传日期:2020-01-10 03:58:20
上 传 者sh-1993
说明:  四大证券报新闻数据抓取、清理、格式化
(News data capture, cleaning and formatting of the four major securities newspapers)

文件列表:
.idea (0, 2020-01-10)
.idea\4zhengquan_news.iml (284, 2020-01-10)
.idea\inspectionProfiles (0, 2020-01-10)
.idea\inspectionProfiles\profiles_settings.xml (174, 2020-01-10)
.idea\misc.xml (288, 2020-01-10)
.idea\modules.xml (282, 2020-01-10)
.idea\vcs.xml (180, 2020-01-10)
.vs (0, 2020-01-10)
.vs\4zhengquan_news (0, 2020-01-10)
.vs\4zhengquan_news\v15 (0, 2020-01-10)
.vs\4zhengquan_news\v15\.suo (22528, 2020-01-10)
.vs\ProjectSettings.json (35, 2020-01-10)
.vs\VSWorkspaceState.json (148, 2020-01-10)
.vs\slnx.sqlite (77824, 2020-01-10)
img (0, 2020-01-10)
img\内容页.png (313856, 2020-01-10)
img\总览页.png (163038, 2020-01-10)
img\运行结果.png (267036, 2020-01-10)
img\运行过程.gif (1013895, 2020-01-10)
zhengquan_news (0, 2020-01-10)
zhengquan_news\data_ana.ipynb (8403, 2020-01-10)
zhengquan_news\get_news.py (2948, 2020-01-10)
zhengquan_news\四大证券报.xlsx (170360, 2020-01-10)

# 4zhengquan_news 四大证券报新闻数据抓取、清理、格式化 是一个将新浪网上的四大证券报专题的数据爬取下来的爬虫: ![总览页](https://github.com/GaoHuaTJ/4zhengquan_news/blob/master/ / img/总览页.png) 发送请求和返回数据的思路比较简单。 比较难的是后续的数据清洗和格式化走了比较多的弯路。算法不难,但是需要多思考。 ![内容页](https://github.com/GaoHuaTJ/4zhengquan_news/blob/master/ / img/内容页.png) 运行之后的输出结果: ![运行结果页](https://github.com/GaoHuaTJ/4zhengquan_news/blob/master/ / img/运行结果.png) 运行过程如下图所示: ![运行过程](https://github.com/GaoHuaTJ/4zhengquan_news/blob/master/ / img/运行过程.gif)s

近期下载者

相关文件


收藏者