academic-graph
所属分类:论文
开发工具:Jupyter Notebook
文件大小:4308KB
下载次数:0
上传日期:2017-02-07 13:26:21
上 传 者:
sh-1993
说明: 微软学术图的数据娱乐
(data amusement on the microsoft academic graph)
文件列表:
batch2.sh (146, 2017-02-07)
batch_chi.sh (290, 2017-02-07)
batch_construct.sh (282, 2017-02-07)
batch_cryptoinfo.sh (388, 2017-02-07)
batch_ml.sh (151, 2017-02-07)
batch_plot_citations.ipynb (25184, 2017-02-07)
citation_join.py (4596, 2017-02-07)
citation_joins.ipynb (1846090, 2017-02-07)
construct_citation_table.ipynb (12561, 2017-02-07)
construct_citation_table.py (4026, 2017-02-07)
csrank_03.sh (168, 2017-02-07)
csrank_04.sh (162, 2017-02-07)
export_citations.ipynb (61710, 2017-02-07)
export_citations.py (2880, 2017-02-07)
filter_citations.ipynb (2535755, 2017-02-07)
gather_conf_data.py (374, 2017-02-07)
load_data.ipynb (93838, 2017-02-07)
load_data.py (471, 2017-02-07)
paperdb.sql (181, 2017-02-07)
plot_citations.ipynb (2526318, 2017-02-07)
plot_citations.py (32595, 2017-02-07)
plot_conf_citations.ipynb (2165019, 2017-02-07)
plot_conf_citations.py (26513, 2017-02-07)
prune_papers.ipynb (17381, 2017-02-07)
prune_papers.py (1817, 2017-02-07)
This repo contains the scripts to process Microsoft Academic Graph,
in order to profile the citation influence and reference heritage of a publication venue (e.g. conferences).
### developer workflow to analyze a new conference /venue
0) prep Paper.db (once for each new version of MAG data)
first run prune_papers.ipynb
then import the result to sqlite
sqlite> create table paper_pruned(id TEXT, year INTEGER, venueid TEXT);
sqlite> .separator ","
sqlite> .import ./data_txt/Papers_pruned.txt paper_pruned
or
sqlite3 Papers.db < paperdb.sql
note:
75M+ papers with unknown venues among 120M in all (jan 2016)
73M+ papers with unknown venues among 126M in all (apr 2016)
1) get its citings and cited record (~30 mins)
python export_citations.py WSDM
prep-step: [_This is arleady been done in export_citations.py below_] get subset for its published papers:
xlx@braun:/data2/xlx/MicrosoftAcademicGraph$ grep WSDM data_txt/ConferenceSeries.txt
42C7B402 WSDM Web Search and Data Mining
xlx@braun:/data2/xlx/MicrosoftAcademicGraph$ grep 42C7B4025 data_txt/Papers.txt > papers.WSDM.txt
2) do the necessary joins (can take a few hrs)
python construct_citation_table.py MM
近期下载者:
相关文件:
收藏者: