DBLPParser-master

所属分类:网络编程
开发工具:Python
文件大小:211KB
下载次数:8
上传日期:2019-04-03 15:20:06
上 传 者中二宫君
说明:  用于解决实体消歧,完整代码,值得下载参考。用于学习
(Used to resolve entity disambiguation)

文件列表:
LICENSE (1066, 2019-03-20)
img (0, 2019-03-20)
img\all_feature.png (61644, 2019-03-20)
img\article_feature.png (28538, 2019-03-20)
img\article_year.png (152492, 2019-03-20)
img\general.png (34336, 2019-03-20)
src (0, 2019-03-20)
src\dblp_parser.py (10016, 2019-03-20)
src\filter_and_statistic.py (6696, 2019-03-20)

# DBLP Dataset Parser ![Authour](https://img.shields.io/badge/Author-Zhang%20Hao%20(Isaac%20Changhau)-blue.svg) ![Python](https://img.shields.io/badge/Python-3.6.5-brightgreen.svg) It is a python parser for [DBLP dataset](https://dblp.uni-trier.de/), the XML format dumped file can be downloaded [here](http://dblp.org/xml/) from [DBLP Homepage](https://dblp.org/). This parser requires `dtd` file, so make sure you have both `dblp-XXX.xml` (dataset) and `dblp-XXX.dtd` files. Note that you also should guarantee that both `xml` and `dtd` files are in the same directory, and the name of `dtd` file shoud same as the name given in the `` tag of the `xml` file. Such information can be easily accessed through `head dblp-XXX.xml` command. As shown below ```xml Carmen Heine Modell zur Produktion von Online-Hilfen. ... ``` A sample to use the parser: ```python def main(): dblp_path = 'dataset/dblp.xml' save_path = 'article.json' try: context_iter(dblp_path) log_msg("LOG: Successfully loaded \"{}\".".format(dblp_path)) except IOError: log_msg("ERROR: Failed to load file \"{}\". Please check your XML and DTD files.".format(dblp_path)) exit() parse_article(dblp_path, save_path, save_to_csv=False) # default save as json format ``` Some extracted results: **Count the number of all different type of publications**: ![general](/img/general.png) **Count the number of all different attributes among all publications**: ![all_feature](/img/all_feature.png) **Count the number of five different features of articles**: ![article_feature](/img/article_feature.png) **Distribution of published year of articles**: ![article_year](/img/article_year.png)

近期下载者

相关文件


收藏者