hotel-scrapy
所属分类:数据采集/爬虫
开发工具:JavaScript
文件大小:21KB
下载次数:0
上传日期:2018-03-18 15:59:09
上 传 者:
sh-1993
说明: 酒店垃圾,,
(hotel-scrapy,,)
文件列表:
.babelrc (109, 2018-03-18)
.flowconfig (58, 2018-03-18)
configs (0, 2018-03-18)
configs\index.js (92, 2018-03-18)
configs\trip-advisor.js (170, 2018-03-18)
package.json (556, 2018-03-18)
scraper.js (511, 2018-03-18)
src (0, 2018-03-18)
src\index.js (2967, 2018-03-18)
trip-advisor_london.json (10349, 2018-03-18)
yarn.lock (51906, 2018-03-18)
# Hotel Scrapy
Simple Node.js web scraper for scraping hotel data from websites. Recursively finds all properties passed to the configuration, searching for each property a maximum of 10 times.
#### Prerequisites
* Install [Node.js @ >= v8.9.1](https://nodejs.org/en/)
* Install [Git](https://git-scm.com/downloads)
#### Setup
```
$ npm i
```
#### Usage
```
$ npm start
```
### Config
Configs are stored in the `configs/` folder and are organised by website.
Inside each website's configuration files, eg. `configs/trip-advisor.js`, you will see an object that looks something like the following:
```
module.exports = {
filename: '...',
base: '...',
cities: {
london: '...'
}
}
```
The object contains a property called `base` that should contain the domain of the website you wish to scrape, and the cities property should contain URL paths to the list pages you wish to scrape.
The `filename` property will be the name of the output file. Before writing a new file, the scraper reads the previous file and merges the results, so each generated file will also contain the output of the previous runs.
Currently, only a config for Trip Advisor is set up.
### Output
The generated output is stored inside the root of the project and is stored in a JSON format (JavaScript Object Notation, key-value pairs). In future this will be output in a different format, for now though, if you wish to view in a table format just search Google for a JSON to table formatter and copy in the contents of the file.
## Step by Step
1. Install Node.js - see [prerequisites](#prerequisites)
2. Install git - see [prerequisites](#prerequisites)
3. From the Github page, copy the Git URL found by clicking the green "Clone or Download" button in the top right
* Ensure you copy the HTTPS link, not the SSH link
* Your link should look like this: `https://github.com/Jahans3/hotel-scrapy.git`
4. Open the command prompt (Windows) or terminal (macOS, Linux)
5. Within the command prompt/terminal, navigate to the folder where you wish to store the scraper
6. Type the following command (leave out the `$`): `$ git clone `
7. You should now see a folder named `hotel-scrapy/`
8. Still within command prompt/terminal, navigate to `hotel-scrapy/`
9. Once inside `hotel-scrapy/`, run the following commands:
* `$ npm install`
* `$ npm start`
近期下载者:
相关文件:
收藏者: