data-512-a1
所属分类:人工智能/神经网络/深度学习
开发工具:Jupyter Notebook
文件大小:962KB
下载次数:0
上传日期:2019-10-17 22:46:33
上 传 者:
sh-1993
说明: 数据512 A1:_Data_curation
(Data 512 A1:_Data_curation)
文件列表:
LICENSE (1073, 2019-10-18)
en-wikipedia_traffic_200801-201910.csv (7423, 2019-10-18)
hcds-a1-data-curation.ipynb (161251, 2019-10-18)
pagecounts_desktop-site_200101-201810.json (14083, 2019-10-18)
pagecounts_mobile-site_200101-201810.json (3069, 2019-10-18)
pageviews_desktop_201507-201910.json (7202, 2019-10-18)
pageviews_mobile-app_201507-201910.json (7291, 2019-10-18)
pageviews_mobile-web_201507-201910.json (7355, 2019-10-18)
testviz.jpeg (1879836, 2019-10-18)
# data-512-a1
Data 512 A1:_Data_curation
Thomas Winegarden
# Goal of the Project
This README is part of assignment 1 of the class Human Centered Data Science (DATA 512) at The University of Washington. Assignment requirements can be found here: https://wiki.communitydata.science/Human_Centered_Data_Science_(Fall_2019)/Assignments#A1:_Data_curation
"The purpose of the assignment is to demonstrate that you can follow best practices for open scientific research in designing and implementing your project, and make your project fully reproducible by others: from data collection to data analysis." - Jonathan Morgan (JMO), DATA 512 (FALL 2019): A1
# Data Use & Licensing
https://www.mediawiki.org/wiki/REST_API#Terms_and_conditions
This project project is available under an MIT license.
# API Docs
Two different APIs provided by wikimedia were used.
Legacy Page Count: https://wikitech.wikimedia.org/wiki/Analytics/AQS/Legacy_Pagecounts
Page Views: https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews
# Final field values
The repo contains 5 JSON files from
The repo also contains a munged .csv of the final data.
Columns: year (YYYY), month (MM), pagecount_all_views (float), pagecount_desktop_views (float), pagecount_mobile_views (float), pageview_all_views (float), pageview_desktop_views (float), pageview_mobile_views (float)
year YYYY
month MM
pagecount_all_views num_views
pagecount_desktop_views num_views
pagecount_mobile_views num_views
pageview_all_views num_views
pageview_desktop_views num_views
pageview_mobile_views num_views
# Important things to note
Data after 2015 within the page views API does not include scrapers or web crawler views on wikipedia.
The notebook uses Python version 3.7.1
近期下载者:
相关文件:
收藏者: