Document-Clustering-and-title-Generation

所属分类:聚类算法
开发工具:HTML
文件大小:48445KB
下载次数:0
上传日期:2017-10-23 09:20:38
上 传 者sh-1993
说明:  一种具有监督学习算法的新闻文档分类器。
(A news Document classifier with supervised learning algorithm.)

文件列表:
.idea (0, 2017-10-23)
.idea\dictionaries (0, 2017-10-23)
.idea\dictionaries\Pankaj_Kumar.xml (148, 2017-10-23)
.idea\inspectionProfiles (0, 2017-10-23)
.idea\inspectionProfiles\Project_Default.xml (404, 2017-10-23)
.idea\inspectionProfiles\profiles_settings.xml (235, 2017-10-23)
.idea\major_project.iml (828, 2017-10-23)
.idea\markdown-navigator.xml (4007, 2017-10-23)
.idea\markdown-navigator (0, 2017-10-23)
.idea\markdown-navigator\profiles_settings.xml (104, 2017-10-23)
.idea\misc.xml (687, 2017-10-23)
.idea\modules.xml (278, 2017-10-23)
.idea\workspace.xml (56695, 2017-10-23)
Final_report.pdf (1047069, 2017-10-23)
MID_SEM REPORT.pdf (248990, 2017-10-23)
RDRPOSTagger (0, 2017-10-23)
RDRPOSTagger\FullUsage.html (198318, 2017-10-23)
RDRPOSTagger\InitialTagger (0, 2017-10-23)
RDRPOSTagger\InitialTagger\InitialTagger.py (2035, 2017-10-23)
RDRPOSTagger\InitialTagger\InitialTagger4En.py (1990, 2017-10-23)
RDRPOSTagger\InitialTagger\InitialTagger4Vn.py (3251, 2017-10-23)
RDRPOSTagger\InitialTagger\__init__.py (0, 2017-10-23)
RDRPOSTagger\InitialTagger\__pycache__ (0, 2017-10-23)
RDRPOSTagger\InitialTagger\__pycache__\InitialTagger.cpython-35.pyc (1598, 2017-10-23)
RDRPOSTagger\InitialTagger\__pycache__\InitialTagger4En.cpython-35.pyc (1739, 2017-10-23)
RDRPOSTagger\InitialTagger\__pycache__\__init__.cpython-35.pyc (179, 2017-10-23)
RDRPOSTagger\License.txt (769, 2017-10-23)
RDRPOSTagger\Models (0, 2017-10-23)
RDRPOSTagger\Models\MORPH (0, 2017-10-23)
RDRPOSTagger\Models\MORPH\Bulgarian.DICT (1023364, 2017-10-23)
RDRPOSTagger\Models\MORPH\Bulgarian.RDR (120040, 2017-10-23)
RDRPOSTagger\Models\MORPH\Czech.DICT (4515316, 2017-10-23)
RDRPOSTagger\Models\MORPH\Czech.RDR (1448880, 2017-10-23)
RDRPOSTagger\Models\MORPH\Dutch.DICT (3112520, 2017-10-23)
RDRPOSTagger\Models\MORPH\Dutch.RDR (704059, 2017-10-23)
RDRPOSTagger\Models\MORPH\French.DICT (801096, 2017-10-23)
RDRPOSTagger\Models\MORPH\French.RDR (205984, 2017-10-23)
RDRPOSTagger\Models\MORPH\German.DICT (2451206, 2017-10-23)
RDRPOSTagger\Models\MORPH\German.RDR (1181638, 2017-10-23)
... ...

# Document Clustering and Title generation A news Document classifier with supervised learning algorithm and title generation with machine learning. Needed python libraries: 1. Pandas 2. Scipy 3. Numpy 4. sciket-learn 4. Pygubu 5. tkinter To Run: run mainApp.py file. how to Use: 1. A simple GUI will pop-up after running mainApp.py file. 2. Select the classifier from drop-down. 3. Select the parition size for training and testing. 4. Select stop word removal condition. 5. Click on classify button. 6. Wait for operation to complete. Once it complete it will show result. List of features used: 1. Language model features 2. title Length feature 3. Part of Speech Language Model Feature 4. N-Gram Match feature 5. Content selection feature Used file features: 1. is word in first sentence 2. in what range a word occurred in this file Features from news title: 1. Pos tri-gram and probability 2. Content score and its probability of occurrence 3. bleu score and its probability of occurrence Features used from news contents: 1. Current Story Word 2. Word Bi-gram Context - both sides -1 and +1 3. POS of Current Story Word 4. POS Bi-gram of Current Word - both sides -1 and +1 5. POS Tri-gram of Current Word - both sides -1, -2 and +1, +2 6. Word Position in Lead sentence 7. Word Position 8. First Word Occurrence Position 9. Word TF-IDF Range

近期下载者

相关文件


收藏者