hotelReviewsMining
所属分类:特征抽取
开发工具:Python
文件大小:107088KB
下载次数:0
上传日期:2018-12-28 04:17:59
上 传 者:
sh-1993
说明: 挖掘酒店评论(来自tripAdvisor和Yelp),使用LDA、word2vec等算法来识别潜在的酒店方面,...
(Mining hotel reviews( from tripAdvisor and Yelp) to identify latent hotel aspects using algorithms like LDA, word2vec, n-gram, etc.)
文件列表:
data (0, 2018-12-28)
data\LDA_OpinRank_TA_review_topics-20_words-30_pass-10_noun.txt (10287, 2018-12-28)
data\LDA_OpinRank_topics-20_NOUN-20_pass-20 (copy).txt (6969, 2018-12-28)
data\LDA_OpinRank_topics-20_NOUN-20_pass-20.txt (6969, 2018-12-28)
data\LDA_OpinRank_topics-20_NOUN-20_pass-20_CLEANED.txt (2941, 2018-12-28)
data\OpinRank_TA_review.txt.gz (37796887, 2018-12-28)
data\OpinRank_TA_review.txt.gz_w2v_kMeans_Topics-20_Nouns-20.txt (13508, 2018-12-28)
data\OpinRank_TA_review.txt.gz_w2v_kMeans_Topics-20_Nouns-20.txt.xlsx (19465, 2018-12-28)
data\OpinRank_TA_review.txt.gz_w2v_kMeans_Topics-20_Nouns-20_CLEANED.txt (8822, 2018-12-28)
data\OpinRank_TA_review_FULL.txt.gz (37796887, 2018-12-28)
data\TripAdvisor100.xls (28161024, 2018-12-28)
data\hotelInfo_TA.xlsx (13006, 2018-12-28)
data\hotelInfo_YELP100.xlsx (21123, 2018-12-28)
data\opinion-lexicon-English (0, 2018-12-28)
data\opinion-lexicon-English\negative-words.txt (44805, 2018-12-28)
data\opinion-lexicon-English\opinion-lexicon-English.rar (23669, 2018-12-28)
data\opinion-lexicon-English\positive-words.txt (21227, 2018-12-28)
data\opinion-lexicon-English\source.txt (56, 2018-12-28)
data\yelp_Review100_OLD.xlsx (18765037, 2018-12-28)
estimateHotelRank (0, 2018-12-28)
estimateHotelRank\aspect-LDAw2v_COSSimScore_TEST.py (25815, 2018-12-28)
estimateHotelRank\aspect-LDAw2v_sentimentScore_Test.py (24309, 2018-12-28)
estimateHotelRank\base-LDAw2v_CountBased.py (23938, 2018-12-28)
lda (0, 2018-12-28)
lda\lda.py (4256, 2018-12-28)
ndcg (0, 2018-12-28)
ndcg\0_ndgc_summary.csv (194, 2018-12-28)
ndcg\_2X_3gram_result_reviews0.xlsx (10078, 2018-12-28)
ndcg\ndcg.py (3079, 2018-12-28)
regression (0, 2018-12-28)
regression\_2X_3gram_ALL_HOTEL_TOPIC_VALUES_0_r.csv (23187, 2018-12-28)
regression\_2X_3gram_reviews0-200.png (291507, 2018-12-28)
regression\regression.py (4042, 2018-12-28)
regression\regressionResult.txt (254, 2018-12-28)
word2vec (0, 2018-12-28)
word2vec\code (0, 2018-12-28)
... ...
Mining hotel reviews( from tripAdvisor and Yelp) to identify latent hotel aspects using algorithms like LDA, word2vec, n-gram, etc.
Latent hotel aspects identification using differnet algorithms. The hotel reviews from TripAdvisors and Yelp are taken as input.
0. data ( the OpinRank_TA_review.txt.gz is minimized as github cannot maintain file larger than 50 MB)
1. LDA: for topic modeling using opinRank hotel review dataset
2. word2vec: clustering opinRank hotel reviews to identify clusters using word2vec and kmeans algorithm
3. estimateHotelRank:
3.1 Baseline method: count based method using basic implementation of LDA and word3vec ( base-LDAw2v_CountBased.py )
3.2 Sentiment based score: scoring mehtod based on textblob sentiment score ( aspect-LDAw2v_sentimentScore_Test.py )
3.2 Cosine similarity based score: scoring method based on cosine similarity of the hotel review with positive wordlist and negative word list ( aspect-LDAw2v_COSSimScore_Test.py )
4. ndcg: ndcg applied to the resut of estimateHotelRank
5. regression: Regression applied to the result given by estimateHotelRank ( e.g. _2X_3gram_ALL_HOTEL_TOPIC_VALUES_0_r.csv) to estimate the aspect weights ( i.e. regression coefficients).
Once the weights are estiated, then the final calculation is done using those coefficents in (3. estimateHotelRank)
近期下载者:
相关文件:
收藏者: