小白菜爱吃肉

积分:40
上传文件:1
下载次数:0
注册日期:2019-06-03 18:44:36

上传列表
review_LDA.rar - 用LDA对英文语料库提取n个主题,并输出每条文章属于哪个主题: 1)对英文评论数据进行预处理:分词、词性标注、去掉停用词和垃圾字符串 2)仅保留名词、形容词和动词 3) 将每条评论处理成TF-IDF向量表示,去掉频率为后2%的词语言 4)拟合LDA模型 5)提取n个主题,输出每个主题下包含哪些关键词(按重要程度排序) 6)对每条评论,给出其属于哪个主题(以及属于每个主题的概率) 7)统计每个主题下有多少条评论 依赖: python3, NLTK, enchant, sklearn, numpy, pickle等,详细见代码 数据集:80,000+英文评论 输出结果: topic #1: view night river light building nice walk day beautiful skyline visit evening amazing spectacular stroll time floor architecture people amaze modern top enjoy cruise look photo fantastic skyscraper awesome picture topic #2: garden bike nice beautiful visit peaceful ride chinese walk ancient temple town time rent cycle gate china history bicycle building middle hour oasis quiet busy look enjoy hire lot architecture topic #3: ...,2019-06-03 19:45:16,下载0次

近期下载

收藏