Introduction to machine readable news with Elektron Message API

Overview

This article examines the Thomson Reuters Machine Readable News offering, discussing its features, the most common use cases and provides specific details of implementation from the standpoint of an application developer.

Machine readable news

Thomson Reuters Machine Readable News (MRN) is an advanced service for automating the consumption and systematic analysis of news. It delivers deep historical news archives, ultra-low latency structured news and news analytics directly to your applications. This enables algorithms to exploit the power of news to seize opportunities, capitalize on market inefficiencies and manage event risk.

Possible Use Cases

So, how can one make use of this data?

  • Correlation: A large family of usecases stems from correlating MRN data stream with another data stream. For instance, with realtime price stream. Please refer to Elektron SDK examples on developers.thomsonreuters.com for the detailed examples of realtime price subscription;
  • Archiving: storing news on a subset of instruments for a fixed period of time for further use. To get started, one of the examples noted later in this article illustrates storing data into MySQL database using EclipseLink;
  • Alerting: getting notifications on outlier news that can be considered an event for trading. The outlier cases can have

either very positive sentiment in a neutral or negative context or vice versa, or an abnormal number of news items within a short period of time, that are relevant to a specific organization. All of these characteristics can be quantified, specified and built into the workflow of your application.

Tools

In this article we are going to use the Java edition of Elektron Message API (EMA), a data-neutral, multi-threaded, ease-of-use API providing access to open message model (OMM) and Reuters Wire Format (RWF) data. If you are not familiar with using EMA we strongly recommend that you check out the tutorials for Java or C/C++.

Data model

Let's start looking into the specifics, by first examining how MRN data is formed:

  • core MRN data item is a UTF-8 JSON string;
  • JSON string is compressed using gzip;
  • compressed JSON is split into a number of fragments, each fitting into a single RSSL update, as some data items are too large

to fit into a single message;

  • data fragments are added to an update message as the FRAGMENT field value in a FieldList envelope.

Therefore, in order to parse this data, we will need to reverse this process. Let's consider how all pertaining fragments are identified and a monolithic story is re-assembled.

Five fields, as well as the RIC itself, are necessary to determine whether the entire item has been received in its various fragments and how to concatenate the fragments to reconstruct the item:

  • MRN_SRC: identifier of the scoring/processing system that published the FRAGMENT;
  • GUID: globally unique identifier for the data item, all messages for this data item will have the same GUID value;
  • FRAGMENT: compressed data item fragment;
  • TOT_SIZE: total size in bytes of the fragmented data;
  • FRAG_NUM: sequence number of fragments within a data item, which is set to 1 for the first fragment of each item published

and is incremented for each subsequent fragment for the same item.

A single MRN data item publication is uniquely identified by the combination of RIC, MRN_SRC and GUID.

For a given RIC/MRN_SRC/GUID combination, when a data item requires only a single message, the TOT_SIZE will equal the number of bytes in the FRAGMENT and the FRAG_NUM will equal 1. When multiple messages are required, then the data item can be deemed as fully received once the sum of the number of bytes of each FRAGMENT equals TOT_SIZE. The consumer will also observe that all FRAG_NUM do range from 1 to the number of fragment with no intermediate integers skipped. In other words, a data item transmitted over three messages will contain FRAG_NUM values of 1, 2 and 3.

High level structure of the app

Our application is a consumer, so it will:

  1. Establish a connection with the source of the data (provider) via an access point, directly (Elektron Edge Device) or via

the distribution infrastructure, such as Thomson Reuters Enterprise Platform (or TREP);

  1. Issue one or more subscription requests for MRN RICs;
  2. Register for data item updates and parse them;
  3. Register for status events and handle them appropriately.

For more details on writing an EMA consumer app, please refer to the Developer's guide

Let us have a look at the data published using the News Text Analytics domain and MRN-specific consumption. Firstly, we are going to request the following RICs and subscribe to their updates:

Content set RIC
Real-time News MRN_STORY
News Analytics: Company and C&E assets MRN_TRNA
News Analytics: Macroeconomic News & events MRN_TRNA_DOC
News Sentiment Indices MRN_TRSI

Example:

OmmConsumerConfig config = EmaFactory.createOmmConsumerConfig();
OmmConsumer consumer = EmaFactory.createOmmConsumer(config.host(_ip + ":" + _port).username(_userName));
ReqMsg reqMsg = EmaFactory.createReqMsg();
for(int i = 0; i<_ricsMRN.length; i++) {
    consumer.registerClient(reqMsg.domainType(EmaRdm.MMT_NEWS_TEXT_ANALYTICS).serviceName(
            _serviceName).name(_ricsMRN[i]), _appClient, (new Integer(i)));

We register for the updates:

  • onRefreshMsg
  • onUpdateMsg
  • onStatusMsg

Looking for framents:

if (fieldEntry.loadType() == DataTypes.BUFFER) {
    if (fieldEntry.fieldId() == FRAGMENT) {

If there is only one or the length we have assembled is equal to TOT_SIZE, we are ready to convert to JSON and possibly pretty-print:

if (fieldEntry.buffer().buffer().array().length == totalSize) {
    // there is only one segment, we are ready
    // unzip using gzip
    String strFlatFrag = unzipPayload(fieldEntry.buffer().buffer().array());
    System.out.println("=>FRAGMENT JSON STRING: " + strFlatFrag);
    try {
        JSONObject jsonResponse = new JSONObject(strFlatFrag);
        // pretty-print json response
        int spacesToIndentEachLevel = 2;
        System.out.println("FRAGMENT JSON PRETTY:\n" + jsonResponse.toString(spacesToIndentEachLevel));
    } 
    catch (Exception e) {
        System.err.println("Exception parsing json: " + e);
        e.printStackTrace(System.err);
    }
}

But what if it's just one of the fragments? We should keep the fragment till we have them all by storing them in a hash table:

Hashtable<String,ArrayList<ByteBuffer>> fragBuilderHash;
...
alFrags = fragBuilderHash.get(guid);
alFrags.add(fieldEntry.buffer().buffer());
fragBuilderHash.put(guid, alFrags);

When the total combined length of the fragments in fragBuilderHash equals total size we are ready to proceed. We concatenate, unzip, convert to JSON and are ready to put the data to a good use.

  • Console Example, pretty-printing data to the screen is the simplest and most obvious of the use cases. Please refer to: MRN Console Example on github
  • GUI Viewer, displaying side-by-side Realtime News (story), News Analytics and Sentiment Index. Please refer to: MRN GUI Viewer on github

References