hotel-scrapy 联合开发网

Pudn.com > 下载中心 > 数据采集/爬虫 > hotel-scrapy

hotel-scrapy

所属分类：数据采集/爬虫
开发工具：JavaScript
文件大小：21KB
下载次数：0
上传日期：2018-03-18 15:59:09
上传者：sh-1993

说明：酒店垃圾，，
(hotel-scrapy,,)

文件列表:

.babelrc (109, 2018-03-18)
.flowconfig (58, 2018-03-18)
configs (0, 2018-03-18)
configs\index.js (92, 2018-03-18)
configs\trip-advisor.js (170, 2018-03-18)
package.json (556, 2018-03-18)
scraper.js (511, 2018-03-18)
src (0, 2018-03-18)
src\index.js (2967, 2018-03-18)
trip-advisor_london.json (10349, 2018-03-18)
yarn.lock (51906, 2018-03-18)

# Hotel Scrapy Simple Node.js web scraper for scraping hotel data from websites. Recursively finds all properties passed to the configuration, searching for each property a maximum of 10 times. #### Prerequisites * Install [Node.js @ >= v8.9.1](https://nodejs.org/en/) * Install [Git](https://git-scm.com/downloads) #### Setup ``` $ npm i ``` #### Usage ``` $ npm start ``` ### Config Configs are stored in the `configs/` folder and are organised by website. Inside each website's configuration files, eg. `configs/trip-advisor.js`, you will see an object that looks something like the following: ``` module.exports = { filename: '...', base: '...', cities: { london: '...' } } ``` The object contains a property called `base` that should contain the domain of the website you wish to scrape, and the cities property should contain URL paths to the list pages you wish to scrape. The `filename` property will be the name of the output file. Before writing a new file, the scraper reads the previous file and merges the results, so each generated file will also contain the output of the previous runs. Currently, only a config for Trip Advisor is set up. ### Output The generated output is stored inside the root of the project and is stored in a JSON format (JavaScript Object Notation, key-value pairs). In future this will be output in a different format, for now though, if you wish to view in a table format just search Google for a JSON to table formatter and copy in the contents of the file. ## Step by Step 1. Install Node.js - see [prerequisites](#prerequisites) 2. Install git - see [prerequisites](#prerequisites) 3. From the Github page, copy the Git URL found by clicking the green "Clone or Download" button in the top right * Ensure you copy the HTTPS link, not the SSH link * Your link should look like this: `https://github.com/Jahans3/hotel-scrapy.git` 4. Open the command prompt (Windows) or terminal (macOS, Linux) 5. Within the command prompt/terminal, navigate to the folder where you wish to store the scraper 6. Type the following command (leave out the `$`): `$ git clone ` 7. You should now see a folder named `hotel-scrapy/` 8. Still within command prompt/terminal, navigate to `hotel-scrapy/` 9. Once inside `hotel-scrapy/`, run the following commands: * `$ npm install` * `$ npm start`

近期下载者：

相关文件：

评论：[我要评论] [举报此文件]

收藏者：