DBLPParser-master
所属分类:网络编程
开发工具:Python
文件大小:211KB
下载次数:8
上传日期:2019-04-03 15:20:06
上 传 者:
中二宫君
说明: 用于解决实体消歧,完整代码,值得下载参考。用于学习
(Used to resolve entity disambiguation)
文件列表:
LICENSE (1066, 2019-03-20)
img (0, 2019-03-20)
img\all_feature.png (61644, 2019-03-20)
img\article_feature.png (28538, 2019-03-20)
img\article_year.png (152492, 2019-03-20)
img\general.png (34336, 2019-03-20)
src (0, 2019-03-20)
src\dblp_parser.py (10016, 2019-03-20)
src\filter_and_statistic.py (6696, 2019-03-20)
# DBLP Dataset Parser
![Authour](https://img.shields.io/badge/Author-Zhang%20Hao%20(Isaac%20Changhau)-blue.svg) ![Python](https://img.shields.io/badge/Python-3.6.5-brightgreen.svg)
It is a python parser for [DBLP dataset](https://dblp.uni-trier.de/), the XML format dumped file can be downloaded [here](http://dblp.org/xml/) from [DBLP Homepage](https://dblp.org/).
This parser requires `dtd` file, so make sure you have both `dblp-XXX.xml` (dataset) and `dblp-XXX.dtd` files. Note that you also should guarantee that both `xml` and `dtd` files are in the same directory, and the name of `dtd` file shoud same as the name given in the `` tag of the `xml` file. Such information can be easily accessed through `head dblp-XXX.xml` command. As shown below
```xml
Carmen Heine
Modell zur Produktion von Online-Hilfen.
...
```
A sample to use the parser:
```python
def main():
dblp_path = 'dataset/dblp.xml'
save_path = 'article.json'
try:
context_iter(dblp_path)
log_msg("LOG: Successfully loaded \"{}\".".format(dblp_path))
except IOError:
log_msg("ERROR: Failed to load file \"{}\". Please check your XML and DTD files.".format(dblp_path))
exit()
parse_article(dblp_path, save_path, save_to_csv=False) # default save as json format
```
Some extracted results:
**Count the number of all different type of publications**:
![general](/img/general.png)
**Count the number of all different attributes among all publications**:
![all_feature](/img/all_feature.png)
**Count the number of five different features of articles**:
![article_feature](/img/article_feature.png)
**Distribution of published year of articles**:
![article_year](/img/article_year.png)
近期下载者:
相关文件:
收藏者: