CNRec
所属分类:图神经网络
开发工具:Others
文件大小:672KB
下载次数:0
上传日期:2019-02-26 23:53:16
上 传 者:
sh-1993
说明: 基于知识图上最短实体距离的CNRec数据与基于内容的新闻推荐
(CNRec Data Associated with Content based News Recommendation via Shortest Entity Distance over Knowledge Graph)
文件列表:
CNRec.zip (734204, 2019-02-27)
# CNRec: Content News Recommendation Dataset
CNRec Dataset Associated with Content based News Recommendation via Shortest Entity Distance over Knowledge Graph
CNRec provides document to document similarity as well as whether a pair of articles was considered a good recommendation. The data set consists of 2700 pairs of news articles, selected from 30 groupings of 10 articles of human perceived similarity. In total we have 300 unique news articles originally published in a period of 3 consecutive days between August 25-28, 2014. Each article is paired with all other articles in the same group. This results in 45 pairs that should produce positive similarity ratings. Another 45 pairs are randomly generated across other groups resulting in 2700 total pairs. The 3 day period, as well as the grouping and pairing procedure, provides a ideal set of articles to process. It allows engineers to focus on direct algorithm design rather than filtering relevant articles by time, or overlapping entities, before computations.
## Annotation
Each pair of articles is rated by 6 human annotators against two questions:
1. In terms of content delivered, how similar do you think these two articles are? The annoators were given 3 choices:
* Not Similar
* Similar
* Very Similar
Their answers were converted into numerical values 0/1/2.
2. If one of these articles was recommended based on the other would you have followed the link? Each annotator choose between:
NO and YES, which were converted to numerical values 0/1.
## Dataset Files
CNRec.zip should contain the following files:
articleToID.csv
CNRec_All_Data.csv
CNRec_groundTruth.csv
and the following folder: CNRec_RawText
### CNRec_groundTruth
Contains the following fields:
* `art1`: id of the first article in the pair
* `art2`: id of the second article in the pair
* `meanGoodR`: the mean good recommendation rating across the six participants
* `meanSimRating`: the mean similarity rating across the six participants
* `GoodR_75`: A indicator value of 0 or 1 if it should be considered a good recommendation if the meanGoodR was >= 0.75
* `GoodR_50`: A indicator value of 0 or 1 if it should be considered a good recommendation if the meanGoodR was >= 0.5
* `pair_id`: the id of the pair of articles (note that the pair 1 0 and 0 1 share the same pair ID)
* `diversity_75`: A indicator value of 0 or 1 if it should be considered a good recommendation if the meanGoodR was >= 0.75 and the meanSimRating was <= 1
* `diversity_50`: A indicator value of 0 or 1 if it should be considered a good recommendation if the meanGoodR was >= 0.5 and the meanSimRating was <= 1
### CNRec_All_Data
Has the following fields:
* `art1`: id of the first article in the pair
* `art2`: id of the second article in the pair
* `rating`: the similarity rating of 0 / 1 / 2
* `goodR`: the good recommendation rating of 0 or 1
* `username`: Which user made the rating, either A, B, C, D ,E, or F
* `time`: the time at which the rating was made
* `pair_id`: the id of the pair of articles
### articleToID
Consists of two fields:
* `art`: the article ID
* `filename`: name of the article in the CNRec_RawText folder
### CNRec_RawText folder
Should contain 300 articles:
find CNRec_RawText/ -type f | wc -l
300
近期下载者:
相关文件:
收藏者: