Extractive-Text-Summarization-of-News-Articles

所属分类:数值算法/人工智能
开发工具:Python
文件大小:9538KB
下载次数:0
上传日期:2016-12-03 08:01:59
上 传 者sh-1993
说明:  新闻文章的提取文本摘要,,
(Extractive-Text-Summarization-of-News-Articles,,)

文件列表:
Dataset (0, 2016-12-03)
Dataset\.DS_Store (6148, 2016-12-03)
Dataset\bbc-2 (0, 2016-12-03)
Dataset\bbc-2\.DS_Store (8196, 2016-12-03)
Dataset\bbc-2\business (0, 2016-12-03)
Dataset\bbc-2\business\001.txt (2560, 2016-12-03)
Dataset\bbc-2\business\002.txt (2252, 2016-12-03)
Dataset\bbc-2\business\003.txt (1552, 2016-12-03)
Dataset\bbc-2\business\004.txt (2412, 2016-12-03)
Dataset\bbc-2\business\005.txt (1570, 2016-12-03)
Dataset\bbc-2\business\006.txt (1187, 2016-12-03)
Dataset\bbc-2\business\007.txt (1669, 2016-12-03)
Dataset\bbc-2\business\008.txt (1922, 2016-12-03)
Dataset\bbc-2\business\009.txt (1494, 2016-12-03)
Dataset\bbc-2\business\010.txt (1449, 2016-12-03)
Dataset\bbc-2\business\011.txt (1144, 2016-12-03)
Dataset\bbc-2\business\012.txt (1847, 2016-12-03)
Dataset\bbc-2\business\013.txt (1830, 2016-12-03)
Dataset\bbc-2\business\014.txt (2981, 2016-12-03)
Dataset\bbc-2\business\015.txt (3808, 2016-12-03)
Dataset\bbc-2\business\016.txt (1393, 2016-12-03)
Dataset\bbc-2\business\017.txt (1299, 2016-12-03)
Dataset\bbc-2\business\018.txt (1002, 2016-12-03)
Dataset\bbc-2\business\019.txt (1733, 2016-12-03)
Dataset\bbc-2\business\020.txt (3854, 2016-12-03)
Dataset\bbc-2\business\021.txt (2046, 2016-12-03)
Dataset\bbc-2\business\022.txt (1933, 2016-12-03)
Dataset\bbc-2\business\023.txt (1267, 2016-12-03)
Dataset\bbc-2\business\024.txt (1954, 2016-12-03)
Dataset\bbc-2\business\025.txt (2704, 2016-12-03)
Dataset\bbc-2\business\026.txt (1829, 2016-12-03)
Dataset\bbc-2\business\027.txt (1620, 2016-12-03)
Dataset\bbc-2\business\028.txt (1249, 2016-12-03)
Dataset\bbc-2\business\029.txt (2492, 2016-12-03)
Dataset\bbc-2\business\030.txt (2487, 2016-12-03)
Dataset\bbc-2\business\031.txt (1888, 2016-12-03)
Dataset\bbc-2\business\032.txt (1733, 2016-12-03)
Dataset\bbc-2\business\033.txt (1348, 2016-12-03)
... ...

## Authors : Deepthi Girish Singapura
Varsha Chandan Bellara
Srushti Singatagere Basavaraj
Shiva Shankar Bidadi Nanjundaswamy
## Project Name Title : Extractive Text Summarization with Lexical Chain, Modified Text Rank, Text Rank and TConspectus for news articles.
## Data Set: Raw Data Sets to be Downloaded from : http://mlg.ucd.ie/datasets/bbc.html
## PreProcessing Article Tokenization
Case folding of token
Stop word removal
Lemmatization, Remove non alpha-numeric characters ## Usage LexRank: python3 main.py
PageRank: python3 main.py
Lexical Chain: python3 main.py
Hybrid: python3 summarizer.py
Sumy: python3 extractSummary.py
Evaluate Summaries: python3 documentComparator.py ## Algorithm Evaluation: Generated summaries based on compression ratio
Expressed documents as vectors
Compared all the Algo-generated-summaries using cosine values
## Accuracy: ## 10% LexRank : 52.57 %
Pagerank : 51.18 %
Lexical Chain: 74.36 %
Hybrid Algorithm : 61.45 %
## 20% LexRank : 63.12 %
Pagerank : 61.91 %
Lexical Chain: 79.95 %
Hybrid Algorithm : 70.86 %
## 30% LexRank : 72.31 %
Pagerank : 71.11 %
Lexical Chain: 84.21 %
Hybrid Algorithm : 77.91 %
## 40% LexRank : 78.59 %
Pagerank : 78.13 %
Lexical Chain: 86.*** %
Hybrid Algorithm : 83.87 %

近期下载者

相关文件


收藏者