hotel-review-analysis:使用配备MonkeyLearn的机器学习模型进行酒店点评的情感分析和方面分类

  • L8_393329
    了解作者
  • 21.8KB
    文件大小
  • zip
    文件格式
  • 0
    收藏次数
  • VIP专享
    资源类型
  • 0
    下载次数
  • 2022-06-08 09:37
    上传日期
酒店评论的情感分析和方面分类 这是MonkeyLearn系列文章的源代码,这些文章与使用机器学习模型分析酒店评论中的情绪和方面有关。 此代码在python2.7中运行。 (2018年5月更新-自从编写了这些蜘蛛以来,TripAdvisor和Booking.com对其网站进行了很大的更改,因此它们不再起作用。博客文章和代码对于作为如何构建Scrapy蜘蛛的示例仍然非常有用,但可悲的是,示例本身已不再起作用。我们将来可能会修复蜘蛛程序,因为它可能足以更新所有选择器以使所有功能再次正常工作。) 代码组织 该项目本身是一个Scrapy项目,用于从TripAdvisor和Booking等不同站点收集培训和测试数据。 此外,还有一系列Python脚本和Jupyter笔记本实现了一些必要的脚本。 TripAdvisor(hotel_sentiment / spider / tripadvisor_sp
hotel-review-analysis-master.zip
  • hotel-review-analysis-master
  • scrapy.cfg
    274B
  • csv_monkey_converter.py
    928B
  • opinionTokenizer.py
    2.5KB
  • README.md
    4.3KB
  • classify_elastic
  • classify_pipe.py
    1.4KB
  • Extract keywords.ipynb
    3.7KB
  • index_reviews.py
    1.8KB
  • queries
  • Topic sentiment per city.json
    1.1KB
  • Overall sentiment per city.json
    1KB
  • Overall sentiment per hotel class.json
    1.1KB
  • index_opinion_units.py
    1.3KB
  • opinionTokenizer.py
    2.6KB
  • generate_files_for_indexing.py
    1010B
  • index_definition.json
    1.2KB
  • classify_and_plot_reviews.ipynb
    6.2KB
  • .gitignore
    32B
  • hotel_sentiment
  • settings.py
    3KB
  • pipelines.py
    294B
  • __init__.py
    0B
  • items.py
    1KB
  • spiders
  • booking_spider.py
    3KB
  • __init__.py
    161B
  • tripadvisor_spider_moreinfo.py
    4KB
  • booking_single_hotel_spider.py
    1.9KB
  • tripadvisor_spider.py
    2.1KB
内容介绍
# Sentiment Analysis and Aspect classification for Hotel Reviews This is the source code of MonkeyLearn's series of posts related to analyzing sentiment and aspects from hotel reviews using machine learning models. This code runs in python2.7. (May 2018 update -- TripAdvisor and Booking.com have changed their sites greatly since these spiders were written, and as such, they no longer work. The blog posts and code are still very useful as an example on how to build a Scrapy spider, but sadly, the examples themselves are no longer functional. We will probably fix the spiders in the future, since it's probably enough to update all the selectors to get everything working again.) ### Code organization The project itself is a Scrapy project that is used to gather training and testing data from different sites like TripAdvisor and Booking. Besides, there are a series of Python scripts and Jupyter notebooks that implement some necessary scripts. ### [Creating a sentiment analysis model with Scrapy and MonkeyLearn](https://blog.monkeylearn.com/creating-sentiment-analysis-model-with-scrapy-and-monkeylearn/) The TripAdvisor (hotel_sentiment/spider/tripadvisor_spider.py) spider is used to gather data to train a sentiment analysis classifier in MonkeyLearn. Reviews texts are used as the sample content and reviews stars are used as the category (1 and 2 stars = Negative, 4 and 5 stars = Positive). To crawl ~15000 items from tripadvisor use: ```sh scrapy crawl tripadvisor -o itemsTripadvisor.csv -s CLOSESPIDER_ITEMCOUNT=15000 ``` You can check out the generated machine learning sentiment analysis model [here](https://app.monkeylearn.com/categorizer/projects/cl_rZ2P7hbs/tab/main-tab). ### [Aspect Analysis from reviews using Machine Learning](https://blog.monkeylearn.com/aspect-analysis-from-reviews-using-machine-learning/) The Booking spider (hotel_sentiment/spider/booking_spider.py) is used to gather data to train an aspect classifier in MonkeyLearn. The data obtained with this spider can be manually tagged with each aspect (eg: cleanliness, comfort & facilities, food, internet, location, staff, value for money) using MonkeyLearn's Sample tab or an external crowd sourcing service like Mechanical Turk. To crawl from booking use: ```sh scrapy crawl booking -o itemsBooking.csv ``` You first have to add the url of a starting city. To crawl from a single hotel in booking use: ```sh scrapy crawl booking_singlehotel -o <hotel name>.csv ``` - ```opinionTokenizer.py``` is a simple script to obtain the "opinion units" from each review. - ```classify_and_plot_reviews.ipynb``` is a simple script that uses the generated model to classify new reviews and then plot the results in a graph using Plotly. You can check out the generated machine learning aspect classifier [here](https://app.monkeylearn.com/categorizer/projects/cl_TKb7XmdG/tab/main-tab). ### [Machine Learning over 1M hotel reviews finds interesting insights](https://blog.monkeylearn.com/machine-learning-1m-hotel-reviews-finds-interesting-insights/) To crawl from Tripadvisor use: ```sh scrapy crawl tripadvisor_more -a start_url="http://some_url" -o <hotel_name>.csv -s CLOSESPIDER_ITEMCOUNT=20000 ``` With the url of a starting city to crawl from, such as https://www.tripadvisor.com/Hotels-g186338-London_England-Hotels.html. The scripts and notebooks necessary to replicate the post are in the ```classify_elastic``` folder: - ```classify_elastic/generate_files_for_indexing.py``` will take the csv file produced by scrapy and generate two files that other scripts will use. - ```classify_elastic/classify_pipe.py``` will open the ```opinion_units``` file and classify it with MonkeyLearn according to topic and sentiment, and save the results to a new csv file. - ```classify_elastic/index_definition.json``` contains the mapping definitions used in ElasticSearch. - ```classify_elastic/index_reviews.py``` will index into your ElasticSearch instance the reviews generated by ```generate_files_for_indexing.py```. - ```classify_elastic/index_opinion_units.py``` will index into your ElasticSearch instance the classified opinion units. - ```classify_elastic/Extract keywords.ipynb``` shows how to extract keywords from the indexed data. Finally, the ```queries``` folder contains some queries that were used to power the Kibana visualization.
评论
    相关推荐
    • The_iPad__PSD_Edition_by_GeminiDesign
      The_iPad__PSD_Edition_by_GeminiDesign
    • VOMetroLayoutDemo
      Metro风格的UICollectionView, 目前只支持横向布局,仅在iPad上应用. 目前只支持设置4个参数: styleArray, NSNumber二维数组,每个数字的值对应VOMetroCellStyle枚举类型. areaSpacing, 每个区域直接的间距. ...
    • HSImageSidebarView, 在苹果iPad应用的Keynote中,一个基于侧栏的视图.zip
      HSImageSidebarView, 在苹果iPad应用的Keynote中,一个基于侧栏的视图 HSImageSidebarViewHSImageSidebarView 是 UIView的子类,用于显示图像集合。 图像按水平或者垂直排列,具体取决于视图的尺寸。 它支持选择。...
    • FlyoutNavigation
      该视图还可以在iPad上用作UISplitViewController 。 using FlyoutNavigation ; using MonoTouch . Dialog ; ... public override void ViewDidLoad () { base . ViewDidLoad (); var navigation = new ...
    • Python + Uipad IDE
      Python IDE Uipad 附加txt教程
    • iPad QQ 侧边栏实现
      作者xorshine,源码XHSliderController,一个简单的思路,只有15行左右的核心代码,实现QQ ipad版本的侧边栏效果。模拟器中Xcode内存占用情况稳定在28MB内。
    • The iPad Project Book
      Bridging the gap between the palm-sized iPod touch and a full-sized computer, Apple’s iPad offers enough screen area and horsepower to perform the day-to-day tasks most people want to do. Packed with...
    • ulipad python IDE
      友好的python集成开发界面,找找麻烦的,放上去先
    • OS.X.Mountain.Lion.Pocket.Guide
      Learn what’s new, including improved iCloud integration and other iPad-inspired features Discover how working with multiple devices is easier and more streamlined with Mountain Lion Get a guide to ...