hotelReviewsMining

所属分类:特征抽取
开发工具:Python
文件大小:107088KB
下载次数:0
上传日期:2018-12-28 04:17:59
上 传 者sh-1993
说明:  挖掘酒店评论(来自tripAdvisor和Yelp),使用LDA、word2vec等算法来识别潜在的酒店方面,...
(Mining hotel reviews( from tripAdvisor and Yelp) to identify latent hotel aspects using algorithms like LDA, word2vec, n-gram, etc.)

文件列表:
data (0, 2018-12-28)
data\LDA_OpinRank_TA_review_topics-20_words-30_pass-10_noun.txt (10287, 2018-12-28)
data\LDA_OpinRank_topics-20_NOUN-20_pass-20 (copy).txt (6969, 2018-12-28)
data\LDA_OpinRank_topics-20_NOUN-20_pass-20.txt (6969, 2018-12-28)
data\LDA_OpinRank_topics-20_NOUN-20_pass-20_CLEANED.txt (2941, 2018-12-28)
data\OpinRank_TA_review.txt.gz (37796887, 2018-12-28)
data\OpinRank_TA_review.txt.gz_w2v_kMeans_Topics-20_Nouns-20.txt (13508, 2018-12-28)
data\OpinRank_TA_review.txt.gz_w2v_kMeans_Topics-20_Nouns-20.txt.xlsx (19465, 2018-12-28)
data\OpinRank_TA_review.txt.gz_w2v_kMeans_Topics-20_Nouns-20_CLEANED.txt (8822, 2018-12-28)
data\OpinRank_TA_review_FULL.txt.gz (37796887, 2018-12-28)
data\TripAdvisor100.xls (28161024, 2018-12-28)
data\hotelInfo_TA.xlsx (13006, 2018-12-28)
data\hotelInfo_YELP100.xlsx (21123, 2018-12-28)
data\opinion-lexicon-English (0, 2018-12-28)
data\opinion-lexicon-English\negative-words.txt (44805, 2018-12-28)
data\opinion-lexicon-English\opinion-lexicon-English.rar (23669, 2018-12-28)
data\opinion-lexicon-English\positive-words.txt (21227, 2018-12-28)
data\opinion-lexicon-English\source.txt (56, 2018-12-28)
data\yelp_Review100_OLD.xlsx (18765037, 2018-12-28)
estimateHotelRank (0, 2018-12-28)
estimateHotelRank\aspect-LDAw2v_COSSimScore_TEST.py (25815, 2018-12-28)
estimateHotelRank\aspect-LDAw2v_sentimentScore_Test.py (24309, 2018-12-28)
estimateHotelRank\base-LDAw2v_CountBased.py (23938, 2018-12-28)
lda (0, 2018-12-28)
lda\lda.py (4256, 2018-12-28)
ndcg (0, 2018-12-28)
ndcg\0_ndgc_summary.csv (194, 2018-12-28)
ndcg\_2X_3gram_result_reviews0.xlsx (10078, 2018-12-28)
ndcg\ndcg.py (3079, 2018-12-28)
regression (0, 2018-12-28)
regression\_2X_3gram_ALL_HOTEL_TOPIC_VALUES_0_r.csv (23187, 2018-12-28)
regression\_2X_3gram_reviews0-200.png (291507, 2018-12-28)
regression\regression.py (4042, 2018-12-28)
regression\regressionResult.txt (254, 2018-12-28)
word2vec (0, 2018-12-28)
word2vec\code (0, 2018-12-28)
... ...

Mining hotel reviews( from tripAdvisor and Yelp) to identify latent hotel aspects using algorithms like LDA, word2vec, n-gram, etc. Latent hotel aspects identification using differnet algorithms. The hotel reviews from TripAdvisors and Yelp are taken as input. 0. data ( the OpinRank_TA_review.txt.gz is minimized as github cannot maintain file larger than 50 MB) 1. LDA: for topic modeling using opinRank hotel review dataset 2. word2vec: clustering opinRank hotel reviews to identify clusters using word2vec and kmeans algorithm 3. estimateHotelRank: 3.1 Baseline method: count based method using basic implementation of LDA and word3vec ( base-LDAw2v_CountBased.py ) 3.2 Sentiment based score: scoring mehtod based on textblob sentiment score ( aspect-LDAw2v_sentimentScore_Test.py ) 3.2 Cosine similarity based score: scoring method based on cosine similarity of the hotel review with positive wordlist and negative word list ( aspect-LDAw2v_COSSimScore_Test.py ) 4. ndcg: ndcg applied to the resut of estimateHotelRank 5. regression: Regression applied to the result given by estimateHotelRank ( e.g. _2X_3gram_ALL_HOTEL_TOPIC_VALUES_0_r.csv) to estimate the aspect weights ( i.e. regression coefficients). Once the weights are estiated, then the final calculation is done using those coefficents in (3. estimateHotelRank)

近期下载者

相关文件


收藏者