academic-graph

所属分类:论文
开发工具:Jupyter Notebook
文件大小:4308KB
下载次数:0
上传日期:2017-02-07 13:26:21
上 传 者sh-1993
说明:  微软学术图的数据娱乐
(data amusement on the microsoft academic graph)

文件列表:
batch2.sh (146, 2017-02-07)
batch_chi.sh (290, 2017-02-07)
batch_construct.sh (282, 2017-02-07)
batch_cryptoinfo.sh (388, 2017-02-07)
batch_ml.sh (151, 2017-02-07)
batch_plot_citations.ipynb (25184, 2017-02-07)
citation_join.py (4596, 2017-02-07)
citation_joins.ipynb (1846090, 2017-02-07)
construct_citation_table.ipynb (12561, 2017-02-07)
construct_citation_table.py (4026, 2017-02-07)
csrank_03.sh (168, 2017-02-07)
csrank_04.sh (162, 2017-02-07)
export_citations.ipynb (61710, 2017-02-07)
export_citations.py (2880, 2017-02-07)
filter_citations.ipynb (2535755, 2017-02-07)
gather_conf_data.py (374, 2017-02-07)
load_data.ipynb (93838, 2017-02-07)
load_data.py (471, 2017-02-07)
paperdb.sql (181, 2017-02-07)
plot_citations.ipynb (2526318, 2017-02-07)
plot_citations.py (32595, 2017-02-07)
plot_conf_citations.ipynb (2165019, 2017-02-07)
plot_conf_citations.py (26513, 2017-02-07)
prune_papers.ipynb (17381, 2017-02-07)
prune_papers.py (1817, 2017-02-07)

This repo contains the scripts to process Microsoft Academic Graph, in order to profile the citation influence and reference heritage of a publication venue (e.g. conferences). ### developer workflow to analyze a new conference /venue 0) prep Paper.db (once for each new version of MAG data) first run prune_papers.ipynb then import the result to sqlite sqlite> create table paper_pruned(id TEXT, year INTEGER, venueid TEXT); sqlite> .separator "," sqlite> .import ./data_txt/Papers_pruned.txt paper_pruned or sqlite3 Papers.db < paperdb.sql note: 75M+ papers with unknown venues among 120M in all (jan 2016) 73M+ papers with unknown venues among 126M in all (apr 2016) 1) get its citings and cited record (~30 mins) python export_citations.py WSDM prep-step: [_This is arleady been done in export_citations.py below_] get subset for its published papers: xlx@braun:/data2/xlx/MicrosoftAcademicGraph$ grep WSDM data_txt/ConferenceSeries.txt 42C7B402 WSDM Web Search and Data Mining xlx@braun:/data2/xlx/MicrosoftAcademicGraph$ grep 42C7B4025 data_txt/Papers.txt > papers.WSDM.txt 2) do the necessary joins (can take a few hrs) python construct_citation_table.py MM

近期下载者

相关文件


收藏者