Intelligent Tagging - RESTful API

Intelligent Tagging - Tutorial

Download tutorial source code

Click here to download

Last update January 2020
Environment Windows, Linux
Compilers JDK 1.7 or greater
Prerequisites Components
  • Java compiler, optional IDE, requesting access token
  • Apache HTTP client (tested with http-components-client 4.4)
  • Java-json is optional

Description

Our example will open and read one or more input text files containing unstructured content, parse, analyze and create one or more structured output text files containing the results of the analysis:

  • Our example picks up from inputDir (included with the tutorial) a file with the text to be tagged.
  • Our example creates in outputDir (included with the tutorial) an xml file with the produced tags.

Please note that any language that supports HTTP can be used to implement the request.

Setup Steps

The steps include:

  • If you don't already have a free Open Calais access token, register for MyRefinitiv, and then login to PermID.org with your credentials. An Open Calais access token is automatically e-mailed to you. 
  • Review the example source code
  • Put your file(s) to be tagged into inputDir folder (the tutorial includes a sample input text file called: apireq.txt)
  • Make sure outputDir exists and is writable (the tutorial includes the inputDir and outputDir folders)
  • Build and run

Review the example code

The steps include:

  • Create HTTP client
    	
            

// create HTTP client

HttpClientCalaisPost httpClientPost = new HttpClientCalaisPost();

  • Create PostMethod
    	
            

// specify end-point URL

private static final String CALAIS_URL = "https://api-eit.refinitiv.com/permid/calais";

 

PostMethod method = new PostMethod(CALAIS_URL);

  • Specify mandatory parameters
    	
            

method.setRequestHeader("X-AG-Access-Token", uniqueAccessKey);

 

method.setRequestHeader("Content-Type", "text/raw");

 

method.setRequestHeader("x-calais-selectiveTags", "company,person,industry,socialtags,topic");

 

// Set response/output format

method.setRequestHeader("outputformat", "xml/rdf" /*"application/json"*/); 

  • Set request entity to be our input file
    	
            method.setRequestEntity(new FileRequestEntity(file, null));
        
        
    
  • Execute the post method on the client and release connection
    	
            

try {

   int returnCode = client.executeMethod(method);

   if (returnCode == HttpStatus.SC_NOT_IMPLEMENTED) {

       System.err.println("The Post method is not implemented by this URI");

       // still consume the response body

       method.getResponseBodyAsString();

   } else if (returnCode == HttpStatus.SC_OK) {

       System.out.println("File post succeeded: " + file);

       saveResponse(file, method);

   } else {

       System.err.println("File post failed: " + file);

       System.err.println("Got code: " + returnCode);

       System.err.println("response: "+method.getResponseBodyAsString());

   }

} catch (Exception e) {

   e.printStackTrace();

} finally {

   method.releaseConnection();

}

Build and run

The quickest way to build and run is with an IDE, Eclipse or NetBeans would work great.

To build and run from the command line:

Build

    	
            javac -cp ".;prereqs\httpclient-4.4.jar;prereqs\commons-codec-1.10.jar;prereqs\commons-httpclient-3.1.jar;prereqs\commons-logging-1.2.jar" tr\test\*.java
        
        
    

Run

It takes 3 arguments:

  1. Input folder name to process
  2. Output folder name to store response from Calais
  3. Token
    	
            java -cp ".;prereqs\httpclient-4.4.jar;prereqs\commons-codec-1.10.jar;prereqs\commons-httpclient-3.1.jar;prereqs\commons-logging-1.2.jar" tr.test.HttpClientCalaisPost inputDir outputDir YOURTOKENGOESHERE
        
        
    

Understanding the input

Input file is just plain text:

    	
            

This is a text about google, about Google is this text

 

Megan Smith

In September 2014, President Obama named Megan Smith the United States Chief Technology Officer (CTO) in the Office of Science and Technology Policy.  In this role, she serves as an Assistant to the President.  As U.S. CTO, Smith focuses on how technology policy and innovation can advance the future of our nation.

 

Megan Smith is an award-winning entrepreneur, engineer, and tech evangelist, most recently serving as a Vice President at Google[x], where she worked on a range of projects and co-created the company’s “SolveForX” innovation community project as well as its “WomenTechmakers” tech-diversity initiative.

Request URL:

    	
            https://api-eit.refinitiv.com/permid/calais
        
        
    

The request includes the path and the query parameters.  The mandatory headers are also included with the request (see the code).  From api-eit.refinitiv.com we request from the open calais service to tag the information contained within our file.

Understanding the expected output

This is the command line response we receive when we submit for matching the example file included with this tutorial:

    	
            

working on all files in C:\projects\Web\OpenCalais\OpenCalaisHTTP\inputDir

File post succeeded: inputDir\apireq.txt

This is an excerpt from the tagged file:

    	
            

<?xml version="1.0" encoding="UTF-8"?>

<!--Use of the Calais Web Service is governed by the Terms of Service located at http://www.opencalais.com. By using this service or the results of the service you agree to these terms of service.--><!--Relations: 

Company: General Magic, Google, Malala Fund, PlanetOut, apple japan, mit media lab, technology review, vital voices

Person: Megan Smith, Obama

--><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:c="http://s.opencalais.com/1/pred/"><rdf:Description c:calaisRequestID="b901d7cc-68bd-f17d-16f8-5e2508851e2f" c:id="http://id.opencalais.com/SR7fEPxwmQlEzUUmcTkKSw" c:ontology="http://trit-us-east-1-sharedamd.int.refinitiv.com/owlschema/13.0-rc2/onecalais.owl.allmetadata.xml" rdf:about="http://d.opencalais.com/dochash-1/1ebb8fc9-2538-3117-881b-661763e348e5"><rdf:type rdf:resource="http://s.opencalais.com/1/type/sys/DocInfo"/><c:document><![CDATA[This is a text about google, about Google is this text 

 

Megan Smith 

In September 2014, President Obama named Megan Smith the United States Chief Technology Officer (CTO) in the Office of Science and Technology Policy.  In this role, she serves as an Assistant to the President.  As U.S. CTO, Smith focuses on how technology policy and innovation can advance the future of our nation. 

Or, if we were to go with outputFormat=”application/json” (an excerpt from the tagged file):

    	
            

"http:\/\/d.opencalais.com\/comphash-1\/30c4d804-1991-3421-8014-56d6bf13d59e":{

"_typeGroup":"entities",

"_type":"Company",

"forenduserdisplay":"false",

"name":"mit media lab",

"confidencelevel":"0.944",

"recognizedas":"name",

"_typeReference":"http:\/\/s.opencalais.com\/1\/type\/em\/e\/Company",

"instances":[{

"detection":"[  \n \nShe has served on the boards of MIT, ]MIT Media Lab[, MIT Technology Review, and Vital Voices; as a]",

"prefix":"  \n \nShe has served on the boards of MIT, ",

"exact":"MIT Media Lab",

"suffix":", MIT Technology Review, and Vital Voices; as a",

"offset":1710,

"length":13},

{"detection":"[she completed her master's thesis work at the ]MIT Media Lab[.  \n]",

"prefix":"she completed her master's thesis work at the ",

"exact":"MIT Media Lab",

"suffix":".  \n",

"offset":2060,

"length":13}],

"relevance":0.2,

"resolutions":[{

"name":"MIT Media Lab",

"permid":"5035087856",

"ispublic":"false",

"commonname":"MIT Media Lab",

"score":1,

"id":"https:\/\/permid.org\/1-5035087856"}],

"confidence":{

"statisticalfeature":"0.746",

"dblookup":"0.0","

resolution":"1.0",

"aggregate":"0.944"}},

"http:\/\/d.opencalais.com\/pershash-1\/b90f7a7b-9e5a-3842-8972-2d854e6024e2":{

"_typeGroup":"entities",

"_type":"Person",

"forenduserdisplay":"true",

"name":"Megan Smith",

"persontype":"N\/A",

"nationality":"N\/A",

"confidencelevel":"0.806",

"commonname":"Megan Smith",

"confidence":{

"statisticalfeature":"0.516",

"dblookup":"0.95",

"resolution":"0.6655559",

"aggregate":"0.806"},

"resolutions":[{

"name":"Megan J Smith",

"personid":"1948800",

"paid":"34414846458",

"officerid":"2294220",

"commonname":"Megan Smith"

"score":0.6655559}],

"_typeReference":"http:\/\/s.opencalais.com\/1\/type\/em\/e\/Person",

"permid":"https:\/\/permid.org\/1-404011",

"instances":[{

"detection":"[2000 miles across the Australian outback.   \n \n]She[ has served on the boards of MIT, MIT Media Lab,]",

"prefix":"2000 miles across the Australian outback.   \n \n",

"exact":"She",

"suffix":" has served on the boards of MIT, MIT Media Lab,",

"offset":1673,

"length":3},

{"detection":"[Smith \nIn September 2014, President Obama named ]Megan Smith[ the United States Chief Technology Officer (CTO)]",

"prefix":"Smith \nIn September 2014, President Obama named ",

"exact":"Megan Smith",

"suffix":" the United States Chief Technology Officer (CTO)",

"offset":112,

"length":11},

In addition to identifying and tagging individual text strings, Open Calais further enriches your data with metadata tags designed to describe the text. Open Calais automatically analyzes your input text and performs the following processes:

  • Named Entity and Relationship Recognition
  • Aboutness Tagging
    • Social Tagging – Classifies the document based on Wikipedia folksonomy.
    • Category Tagging – Identifies the topics discussed in the document. The list of possible topics is defined by the Refinitiv Coding Services (RCS) and International Press Telecommunications Council (IPTC) taxonomies.
    • Industry Tagging –Identifies the industries related to the text. The list of industries that can be identified is defined by the TRBC Business Classification taxonomy

Learn more

For more information, developer guides, FAQ and Release Notes check out the documentation.