textbox_elastic_indexer
所属分类:自然语言处理
开发工具:GO
文件大小:2112KB
下载次数:0
上传日期:2017-07-28 13:59:04
上 传 者:
sh-1993
说明: 示例如何在弹性搜索上使用文本框和索引预处理新闻文章
(Example how to pre-process news articles with textbox and index on Elastic Search)
文件列表:
LICENSE (11357, 2017-07-28)
Makefile (226, 2017-07-28)
indexer.go (3672, 2017-07-28)
plots (0, 2017-07-28)
plots\dashboard.png (958751, 2017-07-28)
plots\keywords.png (752412, 2017-07-28)
plots\people.png (667719, 2017-07-28)
# Textbox helps you to increase relevance in your search
This is an example of use case of textbox to extract keywords, places and people from a news article, so I can increase relevance, on any search that I perform and power visualizations.
Read this blog post, to see more background info:
https://blog.machinebox.io/increase-the-power-of-your-search-engine-with-textbox-bd5f773a1410
![Dashboard](https://github.com/machinebox/textbox_elastic_indexer/blob/master/plots/dashboard.png)
## Requirements
* Textbox (https://machinebox.io/docs/textbox)
* Elastic Search and Kibana (https://www.elastic.co)
## Dataset
We are going to use a subset of BBCSport News Dataset that could be found here.
http://mlg.ucd.ie/datasets/bbc.html
http://mlg.ucd.ie/files/datasets/bbcsport-fulltext.zip
## Index the data
`indexer.go` will pre-process the articles using `textbox` and index the dataset into Elastic Search
You can download and run the inserting using the `Makefile`:
```
$ make run
```
Alternative you can do it manually:
```
# Get the dataset
$ wget http://mlg.ucd.ie/files/datasets/bbcsport-fulltext.zip
$ unzip bbcsport-fulltext.zip
# Run the indexer
$ go run indexer.go
```
# Structure of the document on Elastic Search
The articles in raw are just a txt files, where the first line is the title, we are going to extract the `title` and use `textbox` to extract `keywords`, `places` and `people`.
Pre-Processing with textbox allows to have more structured document as it follows.
```
{
id: "123",
title: "Radcliffe will compete in London",
content: "Paula Radcliffe will compete in the Flora London Marathon...",
keywords: ["race director david bedford", "25th anniversary", "..."],
places: ["London"],
people: ["Paula Radcliffe"]
}
```
# Power up the search, getting more relevant results
Now that we have more structure data we can perform queries by `place`:
```
GET news_textbox/_search
{
"query": {
"term": {
"places.keyword": "London"
}
}
}
```
Or by people:
```
GET news_textbox/_search
{
"query": {
"term": {
"people.keyword": "Paula Radcliffe"
}
}
}
```
# Visualize with Kibana
And you can plot visualizations to get trends and quick feedback.
## Tag clouds
### People tag cloud
![People](https://github.com/machinebox/textbox_elastic_indexer/blob/master/plots/people.png)
### Keywords tag cloud
![Keywords](https://github.com/machinebox/textbox_elastic_indexer/blob/master/plots/keywords.png)
### Dashboard
![Dashboard](https://github.com/machinebox/textbox_elastic_indexer/blob/master/plots/dashboard.png)
近期下载者:
相关文件:
收藏者: