How does Intelligent Tagging work?

Intelligent Tagging automatically analyzes your input text and performs the following processes:

  • Named Entity and Relationship Recognition – Intelligent Tagging identifies and tags mentions (text strings) of things like companies, people, deals, geographical locations, industries, physical assets, organizations, products, events, etc., based on a list of predefined metadata types.
  • Aboutness Tagging – Intelligent Tagging assigns tags (topic tags, social tags, industry tags, slugline tags) that describe what the input document is about as a whole.

In this overview, we illustrate the Intelligent Tagging response in the JSON format. For a detailed RDF output format example, see  RDF Response - A Sample RDF Output file with Explanation.

 

Named Entity and Relationship Recognition

During processing, Intelligent Tagging automatically scans and analyzes the input text, searching for mentions of things like companies, people, cities, industries, products, deals, alliances, company earnings announcements, company layoffs, IPOs, stock splits, business relationships, etc.

Intelligent Tagging classifies mentions of straightforward things like companies, people, cities, telephone numbers, etc. as Entities; more complex mentions that indicate relationships between things are classified as Relations. Some examples of relations are: deals, IPOs, analyst recommendations, company reorganizations, product recalls.

For the complete list of Intelligent Tagging Entity and Relation types, see the API User Guide.

Intelligent Tagging outputs the following named entity and relationship tags:

  • Instance Tag– Each mention found by Intelligent Tagging is expressed as an Instance tag.
  • Entity Markup Tag– Each group of one or more instances deemed to refer to a unique thing is expressed as an Entity Markup tag. (For example, multiple mentions of the same person will generate a single Entity Markup tag of the type Person; multiple mentions of the same company will generate a single Entity Markup tag of the type Company; multiple mentions of the same deal will generate a single Entity Markup tag of the type Deal; etc.) This is what we call the “extracted entity” or the “extracted relation.”
  • Relevance Tag– A tag that indicates how centric the extracted entity or relation is to the containing document.
  • Confidence Tag– A tag that indicates the likelihood that the extracted e.g. company or person is indeed a company or person. (Please note that the entity markup tag itself also displays the confidence score. You can get the confidence score from either tag, according to your preference.)
  • Disambiguation Tag– Intelligent Tagging attempts to map an extracted entity or relation to the corresponding entity and unique ID in the relevant Refinitiv dataset. If the mapping is successful, a Disambiguation tag is generated. The mapping is what enables all the instances, extracted entities, and extracted relations that refer to the same thing to be unambiguously identified (and thus linked) across all documents processed by Intelligent Tagging.

Note: In the JSON output format, all of the tags related to an extracted entity or relation are nested within the entity markup tag. 

Instance Tag

Each mention of a predefined entity or relation type found by Intelligent Tagging is expressed as an Instance tag in the output file. The Instance tag describes the mention. It includes the “found” text string itself, the surrounding text, the location and offset of the text string.

Each instance is assigned a unique ID.

For example, Intelligent Tagging found the following mentions of Tim Cook, the CEO of Apple, Inc., in an article about the anticipated launch of the Apple Watch:

“All Eyes on Apple’s  Cook as Watch Launch Expected”

    	
            

 "instances": [

            {

                "detection": "[\n<Title> All eyes on Apple's ]Cook[ as Watch launch expected</Title> \n<Body> Edwin]",

                "prefix": "\n<Title> All eyes on Apple's ",

                "exact": "Cook",

                "suffix": " as Watch launch expected</Title> \n<Body> Edwin",

                "offset": 40,

                "length": 4

            },

“Apple Inc Chief Executive Officer Tim Cook on Monday is expected to announce details of the first product developed under his leadership, a watch that Apple hopes will transform the market of wearable technology.”

    	
            

{

                "detection": "[9 (Reuters) - Apple Inc Chief Executive Officer ]Tim Cook[ on Monday is expected to announce details of the]",

                "prefix": "9 (Reuters) - Apple Inc Chief Executive Officer ",

                "exact": "Tim Cook",

                "suffix": " on Monday is expected to announce details of the",

                "offset": 199,

                "length": 8

            },

“Apple Inc Chief Executive Officer Tim Cook on Monday is expected to announce details of the first product developed under his leadership, a watch that Apple hopes will transform the market of wearable technology.”

    	
            

 

{

                "detection": "[details of the first product developed under ]his[ leadership, a watch that Apple hopes will]",

                "prefix": "details of the first product developed under ",

                "exact": "his",

                "suffix": " leadership, a watch that Apple hopes will",

                "offset": 287,

                "length": 3

            },

“Apple will have to ‘tweak’ its stores to handle the watch, Cook told the Telegraph newspaper recently.”

    	
            

{

                "detection": "[have to “tweak” its stores to handle the watch, ]Cook[ told the Telegraph newspaper recently. \n \nCook]",

                "prefix": "have to “tweak” its stores to handle the watch, ",

                "exact": "Cook",

                "suffix": " told the Telegraph newspaper recently. \n \nCook",

                "offset": 2079,

                "length": 4

            },

 

Entity Markup Tag

Intelligent Tagging identifies the instances that refer to the same thing and links them to each other. Each group of instances deemed to refer to a unique thing (e.g. one or more mentions of the same company, or one or more mentions of the same person or the same deal) results in a single Entity Markup tag in the output file. This is what we call the “extracted entity” or the “extracted relation.”

Note that the original mentions do not have to be identical text strings in order to be recognized as referring to the same thing. For example, Tim Cook, may be referred to as “Tim Cook”, “Timothy Cook,” “Timothy Donald Cook,” “Mr. Cook,” “Cook,” and even “he,” or “his.” Each mention that refers to Tim Cook is expressed as an Instance in the output file; Intelligent Tagging identifies the instances that refer to the same thing (in this case, Tim Cook), and outputs a single extracted entity or relation, expressed in the output file as an entity markup tag.

The person entity (em/e/person) tag extracted for Tim Cook:

    	
            

 "http://d.opencalais.com/pershash-1/e4808181-2cd0-3670-b992-7467229ba691": {

        "_typeGroup": "entities",

        "_type": "Person",

        "forenduserdisplay": "true",

        "name": "Tim Cook",

        "persontype": "economic",

        "nationality": "N/A",

        "confidencelevel": "0.999",

        "commonname": "Tim Cook",

        "_typeReference": "http://s.opencalais.com/1/type/em/e/Person",

        "permid": "https://permid.org/1-404011",

Every entity markup tag has one or more related instances.

Intelligent Tagging assigns a unique ID (a hash tag) to the extracted entity. In this example, the hash tag for the extracted entity, Tim Cook, is pershash-1/e4808181-2cd0-3670-b992-7467229ba691. In the RDF output format, the same hash tag is displayed by the "subject" attribute of all the instance tags that identify mentions of Tim Cook, linking them to the extracted entity and to each other. Likewise, the "subject" attribute of any related Relevance, Confidence, and Resolution tags also displays the same hash tag. In the JSON output, this isn't necessary, because all related tags are already nested within the entity markup tag. 

 

Relevance Tag

All extracted entities have an associated Relevance tag that indicates how centric the entity is to the containing document. Relevance scores range from 0 to 1. The higher the score, the more relevant the entity is to the containing document.

    	
              "relevance": 0.8,
        
        
    

The high relevance score indicates that the person Tim Cook is indeed centric to this story.

If you are working with the RDF output format, please note that the subject attribute of the Relevance tag is what links it to the relevant entity tag.

 

 

Confidence Scoring

Intensive efforts are devoted to making tagging as accurate as possible; however, automated tagging will never be 100% accurate. Therefore Intelligent Tagging implements confidence scoring. Confidence scoring indicates the likelihood that the extracted e.g. person or company is indeed a person or company.

The extracted entities and relations that implement confidence scores display the confidencelevel attribute within the Entity Markup tag. Some entities and relations also generate a related Confidence tag. The higher the confidence score, the more confident we are that the e.g. extracted person or company is indeed a person or company.

Note: The same confidence score is displayed in both the Confidence tag and the related entity markup tag. You can retrieve the score from either tag.

The consuming application can use the confidence score to achieve higher accuracy results by ignoring entities and relations and their related tags with confidence scores below a specified level. However, note that when you raise the specified level, you are boosting precision at the expense of recall, increasing the risk of ignoring tags that are correct. If you choose to filter data based on this feature, you should adjust the confidence threshold according to the use case.

In the following example, the high confidencelevel value, 0.999 indicates a high likelihood that the extracted entity of the type person is indeed a person.

The em/e/person tag extracted for Tim Cook:

    	
            

"http://d.opencalais.com/pershash-1/e4808181-2cd0-3670-b992-7467229ba691": {

        "_typeGroup": "entities",

        "_type": "Person",

        "forenduserdisplay": "true",

        "name": "Tim Cook",

        "persontype": "economic",

        "nationality": "N/A",

        "confidencelevel": "0.999",

        "commonname": "Tim Cook",

        "_typeReference": "http://s.opencalais.com/1/type/em/e/Person",

        "permid": "https://permid.org/1-404011",

        "instances": [

 

The related Confidence tag:

    	
            

 "confidence": {

            "statisticalfeature": "0.999",

            "dblookup": "0.95",

            "resolution": "0.0",

            "aggregate": "0.999"

        }

If you are working with the RDF output format, please note that the subject attribute of the Confidence tag is what links it to the relevant entity tag. 

 

Disambiguation Tag

Intelligent Tagging attempts to map extracted entities and relations to the corresponding entities and unique IDs in the relevant Refinitiv dataset. If the mapping is successful, a Disambiguation tag is created in the output file. The linking to a Refinitiv unique ID is an exact and specific identity recognition. The mapping is what enables all the instances, extracted entities, and extracted relations that refer to the same thing to be unambiguously identified (and thus linked) across all documents processed by Intelligent Tagging.

Additionally, the mapping offers you the opportunity to further enrich your data with information from the Refinitiv datasets. For further information about how to leverage the Refinitiv IDs, browse to https://permid.org.

The Disambiguation tag (resolution tag) extracted for Tim Cook:

    	
            

  "resolutions": [

            {

                "name": "Timothy D. Cook",

                "personid": "88090",

                "paid": "34413199178",

                "officerid": "88090",                

                "commonname": "Tim Cook",

                "score": 0.9358713

                "id": "https://permid.org/1-34413199178"

 

The Refinitiv unique ID (the "paid"  attribute value in this example) displayed in the resolutions tag will be consistent across all documents processed by Intelligent Tagging. (Note that the paid is equivalent to a permid.)

For RDF output format users: The hash tag generated by Intelligent Tagging is a local ID that links extracted entities or relations with their related tags within the containing document (local disambiguation). 

If you are working with the RDF output format, please note that the subject attribute of the Resolution tag is what links it to the relevant entity tag.

 

Aboutness Tags

In addition to identifying and tagging individual text strings, Intelligent Tagging further enriches your data with metadata tags designed to describe the piece of content as a whole:

  • Social Tag – Classifies the document based on Wikipedia folksonomy.
  • Topic Tag – Identifies the topics discussed in the document. The reference list of topics is drawn from the RCS (Refinitiv Classification Services) and IPTC (International Press Telecommunications Council) taxonomies. RCS topic tagging is a premium feature.
  • Industry Tag –Identifies the industries related to the text. The list of industries that can be identified is defined by the Refinitiv Business Classification (TRBC) taxonomy. Industry tagging is available to premium users.
  • Slugline Tag – Classifies the document based on Reuters sluglines. Slugline tagging is a premium feature.

 

Social Tag

A Social Tag is an association of the submitted text to related Wikipedia categories, or articles. Social tags attempt to emulate how a person would tag a specific piece of content. For example, if you submit a story about Barak Obama and a piece of legislation, at least one reasonable tag would be “U.S. Legislation.” A story about the relative merits of BMWs, Ferraris, and Porsches would probably be tagged with “sports cars,” “luxury makes,” “auto racing,” and “motorsport.”

The story about the Apple Watch Launch generated the following social tags: Human-computer interaction, Wearable devices, Computing, Technology, Wearable computers, Ubiquitous computing, GPS navigation devices, Apple Watch, Smartwatch, Apple Inc., Wearable technology, Apple Store, 

The SocialTag function does not identify individual items within the text, but rather attempts to provide common sense tags for the piece of content as a whole.

Social tags are derived from the Wikipedia folksonomy. They are periodically updated to keep them current.

Note: You can use input headers to limit the number of social tags in the tagging output.

A few examples:

    	
            

 "http://d.opencalais.com/dochash-1/d4301421-738a-3ff2-ab5b-09d18d5b91fe/SocialTag/2": {

        "_typeGroup": "socialTag",

        "id": "http://d.opencalais.com/dochash-1/d4301421-738a-3ff2-ab5b-09d18d5b91fe/SocialTag/2",

        "socialTag": "http://d.opencalais.com/genericHasher-1/90e52124-7665-3b7f-960c-cf083bad15af",

        "forenduserdisplay": "true",

        "name": "Wearable devices",

        "importance": "1",

        "originalValue": "Wearable devices"

    	
            

  "http://d.opencalais.com/dochash-1/d4301421-738a-3ff2-ab5b-09d18d5b91fe/SocialTag/8": {

        "_typeGroup": "socialTag",

        "id": "http://d.opencalais.com/dochash-1/d4301421-738a-3ff2-ab5b-09d18d5b91fe/SocialTag/8",

        "socialTag": "http://d.opencalais.com/genericHasher-1/f77c4a2f-aa04-37b8-9d59-11f87c900d94",

        "forenduserdisplay": "true",

        "name": "Apple Watch",

        "importance": "2",

        "originalValue": "Apple Watch"

    	
            

 "http://d.opencalais.com/dochash-1/d4301421-738a-3ff2-ab5b-09d18d5b91fe/SocialTag/6": {

        "_typeGroup": "socialTag",

        "id": "http://d.opencalais.com/dochash-1/d4301421-738a-3ff2-ab5b-09d18d5b91fe/SocialTag/6",

        "socialTag": "http://d.opencalais.com/genericHasher-1/121898b1-37bb-3e07-b50a-18dfe642405f",

        "forenduserdisplay": "true",

        "name": "Ubiquitous computing",

        "importance": "2",

        "originalValue": "Ubiquitous computing"

 

Topic (DocCat) Tag

Intelligent Tagging identifies the topic or topics that are being discussed in the document. For example, “Macroeconomics,” “Equities,” “Sports,” “Entertainment,” “Politics,” “Oil & Gas Products,” “Mergers/Acquisitions/Takeovers,” “Computer Hardware,” “Consumer Financial Services,” “Software and IT Services,” etc.

A DocCat (topic) tag is designed to give a general notion of what an input document is about. There is no specific entity recognition in the text, but rather deduction about what the text is about.

The reference list of topics is drawn from the RCS (Refinitiv Classification Services) taxonomy, the IPTC (International Press Telecommunications Council) taxonomy, and the Self Service Classification project taxonomy. RCS topic tagging is a premium feature.

Each identified topic results in a Topic (DocCat) tag. It is possible that multiple topics will be identified, or that no topic will be identified if the document does not discuss anything currently defined by the relevant taxonomies.

Following are some of the Topic tags that were extracted by Intelligent Tagging from the story about the Apple Watch Launch.

An IPTC taxonomy topic:

    	
            

"http://d.opencalais.com/dochash-1/d4301421-738a-3ff2-ab5b-09d18d5b91fe/cat/1": {

        "_typeGroup": "topics",

        "forenduserdisplay": "false",

        "score": 0.981,

        "name": "Technology_Internet"

 

A TRCS taxonomy topic:

    	
            

"http://d.opencalais.com/dochash-1/d4301421-738a-3ff2-ab5b-09d18d5b91fe/cat/3": {

        "_typeGroup": "topics",

        "forenduserdisplay": "false",

        "rcscode": "B:279",

        "name": "Technology Equipment",

        "permid": "4294952722",

        "score": 0.077

 

Industry Tag

Industry tagging is available to premium users.

During processing, Intelligent Tagging identifies the industries that are related to the companies mentioned in the text. For example, “Management Consultant Services,” “Information Services,” “Biotechnology & Medical Services,” “Integrated Telecommunications Services – NEC,” “Handbags and Luggage Retailers,” “Petroleum Refining,” etc.

The list of industries that can be identified is defined by the Refinitiv Business Classification (TRBC) taxonomy. Industry Tags include a unique Refinitiv ID. This ID enables extracting information about the industry from the Refinitiv dataset, and also supports linkage across documents processed by Intelligent Tagging.

The following Industry tags were extracted by Intelligent Tagging from the story about the Apple Watch Launch:

    	
            

 "http://d.opencalais.com/dochash-1/d4301421-738a-3ff2-ab5b-09d18d5b91fe/Industry/5": {

        "_typeGroup": "industry",

        "forenduserdisplay": "false",

        "name": "Watches",

        "rcscode": "B:1340",

        "trbccode": "5320202027",

        "permid": "4294951661",

        "relevance": 0.2

    	
            

 "http://d.opencalais.com/dochash-1/d4301421-738a-3ff2-ab5b-09d18d5b91fe/Industry/6": {

        "_typeGroup": "industry",

        "forenduserdisplay": "false",

        "name": "Phones & Smart Phones",

        "rcscode": "B:1769",

        "trbccode": "5710602011",

        "permid": "4294951232",

        "relevance": 0.8

    	
            

    "http://d.opencalais.com/dochash-1/d4301421-738a-3ff2-ab5b-09d18d5b91fe/Industry/7": {

        "_typeGroup": "industry",

        "forenduserdisplay": "false",

        "name": "Phones & Handheld Devices - NEC",

        "rcscode": "B:1768",

        "trbccode": "5710602010",

        "permid": "4294951233",

        "relevance": 0.2

 

Slugline Tag

Slugline tagging is a premium feature.

Slugline tagging classifies documents using Reuters slug lines, providing another way to consistently classify news documents across multiple sources.

A slug line is keyword phrase that describes the main event of a news article. For example, “SYRIA-REFUGEES/DARAYA,” “USA-KENYA-TRUMP,” “CHINA-BANKS/CCB-RESULTS.”

Reuters editors and journalists create and assign slug lines to news articles as part of the publishing process. There are strict rules in place that standardize how slug lines are assigned to ensure that they are applied consistently.

The Intelligent Tagging reference list of slug lines is comprised of the slug lines assigned to Reuters news stories in the past three months. The reference list is updated on a daily basis.

Each Intelligent Tagging output document may contain up to 8 slugline tags.

The following slugline tags were extracted from a story about the Bank of England and Brexit.

Note: The isactive and creationdate attributes are not currently in use. The placeholder values, true, and 2017-01-01T00:00:00.000Z appear consistently and should be disregarded.

    	
            

"http://d.opencalais.com/dochash-1/2771afb0-3124-3a21-ac80-ba4a94fbc8d3/Slugline/1": {

"_typeGroup": "sluglines",

"id": "http://d.opencalais.com/dochash-1/2771afb0-3124-3a21-ac80-ba4a94fbc8d3/Slugline/1",

"forenduserdisplay": "true",

"slugline": "BRITAIN-EU/BANKS",

"isactive": "true",

"creationdate": "2017-01-01T00:00:00.000Z",

"slugid": "sluglines/BRITAIN-EU/BANKS",

"confidencelevel": "0.2617"

    	
            

"http://d.opencalais.com/dochash-1/2771afb0-3124-3a21-ac80-ba4a94fbc8d3/Slugline/2": {

"_typeGroup": "sluglines",

"id": "http://d.opencalais.com/dochash-1/2771afb0-3124-3a21-ac80-ba4a94fbc8d3/Slugline/2",

"forenduserdisplay": "true",

"slugline": "BRITAIN-EU/BANKS-BOE",

"isactive": "true",

"creationdate": "2017-01-01T00:00:00.000Z",

"slugid": "sluglines/BRITAIN-EU/BANKS-BOE",

"confidencelevel": "0.2446"

 

    	
            

"http://d.opencalais.com/dochash-1/2771afb0-3124-3a21-ac80-ba4a94fbc8d3/Slugline/3": {

"_typeGroup": "sluglines",

"id": "http://d.opencalais.com/dochash-1/2771afb0-3124-3a21-ac80-ba4a94fbc8d3/Slugline/3",

"forenduserdisplay": "true",

"slugline": "BRITAIN-EU/LAWS",

"isactive": "true",

"creationdate": "2017-01-01T00:00:00.000Z",

"slugid": "sluglines/BRITAIN-EU/LAWS",

"confidencelevel": "0.2145"

    	
            

"http://d.opencalais.com/dochash-1/2771afb0-3124-3a21-ac80-ba4a94fbc8d3/Slugline/4": {

"_typeGroup": "sluglines",

"id": "http://d.opencalais.com/dochash-1/2771afb0-3124-3a21-ac80-ba4a94fbc8d3/Slugline/4",

"forenduserdisplay": "true",

"slugline": "BRITAIN-EU/CARNEY",

"isactive": "true",

"creationdate": "2017-01-01T00:00:00.000Z",

"slugid": "sluglines/BRITAIN-EU/CARNEY",

"confidencelevel": "0.2134"

    	
            

"http://d.opencalais.com/dochash-1/2771afb0-3124-3a21-ac80-ba4a94fbc8d3/Slugline/5": {

"_typeGroup": "sluglines",

"id": "http://d.opencalais.com/dochash-1/2771afb0-3124-3a21-ac80-ba4a94fbc8d3/Slugline/5",

"forenduserdisplay": "true",

"slugline": "BRITAIN-EU/REGULATOR",

"isactive": "true",

"creationdate": "2017-01-01T00:00:00.000Z",

"slugid": "sluglines/BRITAIN-EU/REGULATOR",

"confidencelevel": "0.2027"

 

Overlapping Metadata Types

The same thing can be identified by more than one Intelligent Tagging metadata type. For example, the tagging output might include both entity tags and topic tags which identify the same country, company, or industry. Remember, an entity tag is assigned based on an explicit mention in the text. An aboutness tag may be assigned regardless of whether the subject is explicitly mentioned in the text.

So for example, if the input document mentions German companies like Volkswagen and SAP, the tagging output might include a DocCat (topic) tag for Germany. But unless there is an explicit mention of Germany in the text, the tagging output will not include an em/e/Country (country entity) tag for Germany.

Continuing with this example, if there is an explicit mention of Germany in the text, then the tagging output may include both an em/e/Country (country entity) tag for Germany and a DocCat (topic) tag for Germany. In this case, both tags would output the same RCS code for Germany, as the reference lists for both metadata types come from the same (RCS) taxonomy.

Likewise, the tagging output might include both an Industry tag and a topic tag for the same industry. The reference lists used by the Industry tag and the DocCat (topic) tags related to industries both come from the same (TRBC) taxonomy.

 

Best Practice Recommendations

So which tags should you use? That depends on your use case. For example, if your use case requires the highest recall, you would probably want to use all the metadata tag types for best results.

A few suggestions:

  • Use Entity tags to:

 - Filter content according to whether or not a specific entity is mentioned.

 For example, you could use the entity tags to retrieve all documents that mention a particular company or country.

 - Highlight the entity instances in the narrative text (markup highlighting).

 - Detect all mentions of an entity.

  • Use topic tags (DocCat or Social tags) to Filter documents according to whether or not they are about a specific subject.
  • Use topic tags to filter search results after applying a query.

 

Additional Resources

  • For additional examples, and detailed information on parsing and interpreting the Intelligent Tagging response, see the API User Guide.
  • Abstraction Layer Developer Guide - Since the Intelligent Tagging output does not group metadata types, we highly recommend using the CalaisModel Abstraction Layer to simplify parsing the Intelligent Tagging output. Please note that the Abstraction Layer is relevant to the RDF/XML response format only.