Intelligent Tagging - RESTful API
Want an overview of how Intelligent Tagging processes and tags documents, plus a few output examples? Read "How Does Intelligent Tagging Work".
- Open Calais (the free, limited version of Intelligent Tagging) - Register for MyRefinitiv and then login to PermID.org with your credentials to automatically get emailed a free access token.
- Hosted Premium Intelligent Tagging - Request an API Key.
- On Premise Intelligent Tagging - Install Intelligent Tagging. (The Installation and Administration Guide is bundled with the software.)
***Not sure which Intelligent Tagging package to use? Read this...
Forming the API Call
The API request to tag content is made via a simple HTTP REST interface.
- Free Open Calais:
- Hosted Premium Intelligent Tagging:
- On Premise Intelligent Tagging
POST http://<HOST SERVER IP or HOSTNAME>/tag/rs/enrich
- Intelligent Tagging for internal customers who connect through API Gateway:
- Internal Intelligent Tagging (for internal customers who do not connect through API Gateway)
Please contact us for a link to the tagging method.
Tagging requests must always be sent to port 80. (Although Intelligent Tagging administration services run on port 8080, tagging requests must always be sent to port 80.)
Mandatory Request Headers
- Content-Type: Indicates the input mime type. Valid values: text/html; text/xml; text/raw; application/pdf. PDF input is supported for premium users, and supports submitting PDF files as binary streams.
- x-ag-access-token: Relevant only to free Open Calais and Hosted Premium Intelligent Tagging. The value of this header is your license key.
A full set of optional headers is available as well, to customize the Intelligent Tagging workflow for your business needs. Read about all supported input headers.
Tip: By default, the tagging response is returned in RDF format. You can use the outputFormat header to get the tagging response in JSON format instead.
The input size limitation applies to the entire document, including the body and xml tags, but excluding the HTTP headers. A submission that exceeds the input size limit is not processed, and an error message is returned.
Note: The size limitation defines the maximum file size that the system can process. However, processing time depends on the complexity of the text within the file, and a timeout error may be generated if a file is too complex (contains too many entities and relations) to be processed within the time limit.
- Free Open Calais - The maximum input size for all supported input file types (raw text, xml, html) is limited to 100KB (not characters, KB) per request.
- Hosted Premium Intelligent Tagging - The maximum input size is limited to 500KB (not characters, KB) per request, for all supported input file types (raw text, xml, html, pdf). If you require support for larger input files, please contact us.
- Intelligent Tagging On Premise - The maximum file size per request is:
- HTML– 45 MB
- XML - 1.5 MB
- raw text - 1.5 MB
- PDF - 45 MB
Intelligent Tagging On Premise - Each instance of Intelligent Tagging On Premise supports up to 4 concurrent requests. If you are interested in processing a higher volume of data, please contact us.
(This is not relevant to Premium Hosted Intelligent Tagging or free Open Calais.)
- We recommend that before submission, you remove from the input document any redundant or irrelevant text (such as ads, disclaimers, repeated generic text such as “contact customer support for further advice…,” trademarks, etc.).
- Text content should be UTF-8 encoded; otherwise, specify charset, e.g. text/xml; charset=utf-8.
- Note that if your text includes accented characters, for example, "Ségolène Royal," and you do not set encoding to UTF-8, the Intelligent Tagging output strips these characters, trashing the original text.
- Intelligent Tagging expects the url-encoded arguments to be encoded using UTF-8. HttpClient defaults to another encoding, so you must instruct it to use UTF-8 for proper url-encoding of your arguments.
- To optimize tagging of text files, you can define the x-calais-DocumentTitle header for best results.
- For binary documents (e.g. PDF) the http body should include the binary stream.
- To optimize tagging of research reports (in PDF format only), make sure to also define the x-calais-contentClass header for best results.
- Please make sure that your PDF files contain text objects; Intelligent Tagging does not extract text from images in PDF files.
- If you are tagging non-English languages, it is highly recommended to use the x-calais-language header to override the automatic language detection functionality.