# **************IMPORTANT NOTICE ************
## As of __30 September__ 2015, the OpenCalais API on which this gem is built, has been discontinued by Thomson-Reuters. A new and significantly changed API is now in use by OpenCalais. You can read about the changes [here](http://www.opencalais.com/upgrade/). Unfortunately this means that DoverToCalais is no longer functional. I don't know -at this stage- if and when I'll upgrade this gem to the new API. Thank you for your time and effort in using DoverToCalais.
# DoverToCalais
DoverToCalais allows the user to send a wide range of data sources (files & URLs)
to [OpenCalais](http://www.opencalais.com/about) and receive asynchronous responses when [OpenCalais](http://www.opencalais.com/about) has finished processing
the inputs. In addition, DoverToCalais enables response filtering in order to find relevant tags and/or tag values.
## What is OpenCalais?
In short -and quoting the [OpenCalais](http://www.opencalais.com/about) creators:
> "*The OpenCalais Web Service automatically creates rich semantic metadata for the content you submit – in well under a second. Using natural language processing (NLP), machine learning and other methods, Calais analyzes your document and finds the entities within it. But, Calais goes well beyond classic entity identification and returns the facts and events hidden within your text as well.*"
In general, OpenCalais Simple XML Format (the one used by DoverToCalais) returns three kinds of tags: [Entitites, Events](http://www.opencalais.com/documentation/calais-web-service-api/api-metadata/entity-index-and-definitions) and [Topics](http://www.opencalais.com/documentation/calais-web-service-api/api-metadata/document-categorization). ***Entities*** are static 'things', like Persons, Places, et al. that are involved in the textual context in some capacity. OpenCalais assigns a *relevance* score to each entity to indicate it's relevance within the context of the data source's general topic. ***Events*** are facts or actions that pertain to one or more Entities. ***Topics*** are a characterisation or generic description of the data source's context.
We can use these tags and the information within them to extract relevant information from the data or to draw useful conclusions about it. For example, if the data source tags include an *<Event>* with the value of *'CompanyExpansion'*, I can then look for the <City> or <Company> tags to find out which company is expanding and if it's near my location (hint: they may be looking for more staff :)) Or, I could pick out all <Company>s involved in a <JointVenture>, or all <Person>s implicated in an <Arrest> in my <City>, etc.
DoverToCalais, from version 0.2.1 onwards also supports the OpenCalais rich [JSON Output format](http://www.opencalais.com/documentation/calais-web-service-api/interpreting-api-response/opencalais-json-output-format). This format returns relationships between entities, as well as the previous tags returned by the Simple XML format, thus allowing a deeper level of data analysis and detection.
## Why use OpenCalais?
There are many reasons, mainly to:
* incorporate tags into other applications, such as search, news aggregation, blogs, catalogs, etc.
* enrich search by looking for deeper, contextual meaning instead of merely phrases or keywords.
* help to discern relationships between semantic entities.
* facilitate data processing and analysis by allowing easy identification of relevant or important data sources and the discarding of irrelevant ones.
## DoverToCalais Features
1. **Multiple data source support**: Thanks to the power of [Yomu](https://github.com/Erol/yomu), DoverToCalais can process a vast range of files (and, of course, web pages), extract text from them and send
them to OpenCalais for analysis and tag generation.
2. **Asynchronous responses (callbacks)**:
Users can set callbacks to receive the processed meta-data, once the OpenCalais Web Service response has been received.
Furthermore, a user can set multiple callbacks for the same request (data source), thus enabling cleaner,
more modular code.
3. **Result filtering**: DoverToCalais uses the OpenCalais [Simple XML Format](http://www.opencalais.com/documentation/calais-web-service-api/interpreting-api-response/simple-format) as the preferred response format. The user can work directly with the XML-formatted response, or -if feeling a bit lazy- can take advantage of the DoverToCalais filtering functionality and receive specific entities, optionally based on specified conditions.
For more details of the features and code samples, see [Usage](#usage).
##Pre-requisites and dependencies
To use the OpenCalais Web Service and -by extension- DoverToCalais, one needs to possess an OpenCalais API key, which is easily obtainable from the [OpenCalais web site](http://www.opencalais.com/APIkey).
DoverToCalais requires the presence of a working [JRE](http://en.wikipedia.org/wiki/JRE#Execution_environment).
Also, if you're going to use the rich JSON output format, you'll need to have [Redis](http://redis.io/topics/quickstart) running on an accessible node.
## Installation
Add this line to your application's Gemfile:
gem 'dover_to_calais'
And then execute:
$ bundle
Or install it yourself as:
$ gem install dover_to_calais
## Compatibility
DoverToCalais has been developed in Ruby 1.9.3 and should work fine on post-1.9.3 MRI versions too. If anyone is succesfully running it on other Ruby runtimes please let me know.
## Usage
Using DoverToCalais is extremely simple.
### The Basics
As DoverToCalais uses the awesome-ness of [EventMachine](http://rubyeventmachine.com/), code must be placed within an EM *run* block:
```ruby
EM.run do
# use Control + C to stop the EM
Signal.trap('INT') { EventMachine.stop }
Signal.trap('TERM') { EventMachine.stop }
# we need an API key to use OpenCalais
DoverToCalais::API_KEY = 'my-opencalais-api-key'
# create a new dover
dover = DoverToCalais::Dover.new('http://www.bbc.co.uk/news/world-africa-24412315')
# parse the text and send it to OpenCalais
dover.analyse_this
puts 'do some stuff....'
# set a callback for when we receive a response
dover.to_calais { |response| puts response.error ? response.error : response }
puts 'do some more stuff....'
end
```
This will produce the following result:
> do some stuff.... <br>
> do some more stuff.... <br>
> <?xml version="1.0"?> <br>
> <OpenCalaisSimple> <br>
> .......... <br>
> (the rest of the XML response from OpenCalais) <br>
As can be observed, the callback (#to_calais) is trigerred after the rest of the code has been executed and only when the OpenCalais request has been completed.
Of course, we can analyse more than one sources at a time:
```ruby
EM.run do
# use Control + C to stop the EM
Signal.trap('INT') { EventMachine.stop }
Signal.trap('TERM') { EventMachine.stop }
DoverToCalais::API_KEY = 'my-opencalais-api-key'
d1 = DoverToCalais::Dover.new('http://www.bbc.co.uk/news/world-africa-24412315')
d2 = DoverToCalais::Dover.new('/home/fred/Documents/RailsRecipes.pdf')
d3 = DoverToCalais::Dover.new('//network-drive/annual_forecast.doc')
d1.analyse_this; d2.analyse_this; d3.analyse_this;
puts 'do some stuff....'
d1.to_calais { |response| puts response.error ? response.error : response }
d2.to_calais { |response| puts response.error ? response.error : response }
d3.to_calais { |response| puts response.error ? response.error : response }
puts 'do some more stuff....'
end
```
This will output the two *puts* statements followed by the three callbacks (d1, d2, d3) in the order in which they are triggered, i.e. the first callback to receive a response from OpenCalais will fire first.
###Filtering the response
Why parse the response XML ourselves when DoverToCalais can do it