Encoding and Decoding non-ASCII text using EMA and RFA C++/.NET

Overview

This article explains how to encode and decode RMTES String containing non-ASCII text using EMA and RFA C++ and .NET edition. Currently, the TREP APIs does not provide RMTES encoder, therefore, it has been a question from developers when they want to publish data containing non-ASCII text such as Chinese and Korea language to Elektron Real-Time network. Moreover, a user usually found the issue that their application unable to display non-ASCII text properly. This is because the data that the application receives containing garbage characters at the beginning of a string. We will be talking about the background of these issues and its solution in this article.

A problem in the Consumer application

There are some fields in the data dictionary that using RMTES_String data type and it was designed for use with the local language. The problem that the user often see is that it containing garbage string like "%0" at the beginning of the text. And below is a sample output from API example which contains the garbage character. It's data from field DSPLY_NML which represents local language instrument name for RIC ".HSI" and "0005.HK"

Note that FID DSPLY_NMLL is RMTES_STRING as indicate in RDMFieldDictionary file.

DSPLY_NMLL "LCL LANG DSP NM"     1352  NULL        ALPHANUMERIC       32  RMTES_STRING    32

.HSI


        FieldEntry [  1080] PREF_DISP             2191
        FieldEntry [  1352] DSPLY_NMLL            "%0恒生指數"
        FieldEntry [  1709] RDN_EXCHD2                    HSI         (456)

0005.HK

        FieldEntry [  1080] PREF_DISP             5780
        FieldEntry [  1352] DSPLY_NMLL            "%0匯豐控股"
        FieldEntry [  1404] ODD_VOLUME            18364

Considering raw data from RFA and EMA tracing log below, The garbage string from above sample texts usually contains three bytes character at the beginning of the text and there are 0x1B, 0x25 and 0x30 which is garbage characters "", "%" and "0".

.HSI

        <fieldEntry fieldId="1080" data="088F"/>
        <fieldEntry fieldId="1352" data="1B25 30E6 8192 E794 9FE6 8C87 E695 B8"/>
        <fieldEntry fieldId="1709" data="01C8"/>

0005.HK

        <fieldEntry fieldId="1080" data="1694"/>
        <fieldEntry fieldId="1352" data="1B25 30E5 8CAF E8B1 90E6 8EA7 E882 A1"/>
        <fieldEntry fieldId="1404" data="0E47 BC"/>

Actually, these three bytes character are not garbage character. There is an escape sequence used by Thomson Reuters for the text encoded with UTF-8. You may also see difference format of garbage character at the beginning of the string and it could be other character sets used by Thomson Reuters internal publisher. If the application calls DataBuffer.getAsString() method in RFA or using getAscii in EMA, it would return the actual string including the escape sequence. This is because the method does not have a built-in RMTES decoder. Normally RFA application can use RMTES Converter interface in order to get an actual UTF-8 text from the DataBuffer or just calling fieldEntry.getRmtes().toString() in EMA. We will provide more details about the background of the escape sequence in the next section.

RMTES Encoding

Basically, RMTES uses ISO 2022 escape sequences to select the character sets used. RMTES provides support for Reuters Basic Character Set (RBCS), UTF-8, Japanese Latin and Katakana (JIS C 6220 - 1969), Japanese Kanji (JIS X 0208 - 1990), and Chinese National Standard (CNS 11643-1986). RMTES also supports sequences for character repetition and sequences for partial updates. These characters set or format internally used by Thomson Reuters system. Unfortunately, there is no open RMTES encoder library provide for external user or customer. There are many questions from application developers who want to create publishing or contributing application, and it has to publish RMTES string which contains non-ASCII text such as Japanese and Chinese characters. Implementing their own RMTES encoder is not easy and it's a very complex format according to RMTES document.

However, there is an alternative choice for this case as the developer can use the switching function provided for encoding RMTES string and switching from default ISO 2022 scheme to UTF-8 character set. It could say that the developer can use the UTF-8 character set to publish data for RMTES field type. They can use the function to encode the UTF-8 string into RMTES string and then publish the string to Thomson Reuters's system.

The switching function is permitted only at the very start of a field and it contains three bytes of characters that are 1B 25 30.

0x1B 0x25 0x30

The function is permitted only at the very start of a field. The application could prepend 0x1B, 0x25, 0x30 to the UTF8 string and encode that way as an RMTES type. The escape sequence characters indicate to the RMTES parser or decoder that it’s supposed to be a UTF-8 string. As a result, there is a reason that the samples data from field DSPLY_NMLL for RIC ".HSI" and "0005.HK" start with the three bytes escape sequence character 0x1B 0x25 and 0x30.

Please note that you need to be very careful with using that three-byte string, as it can cause the UTF-8 string to be longer than the cached dictionary values which are the size of RWF LEN column in a byte from RDMFieldDictionary. It can cause display issues if they’re going through the infra.

The non-ASCII character such as Chinese, Thai, Japanese and Korea language can be used UTF-8 character set, therefore, the application can use this way to encode the non-ASCII text instead. We will talk about the implementation in RFA and EMA C++/.NET in the next section.

Publishing non-ASCII RMTES string in RFA application

As described earlier, the publishing or contributing application can encode the non-ASCII text with UTF-8 instead of implementing RMTES encoder according to RMTES document. The application can just prepend the three bytes escape sequence 0x1B, 0x25, and 0x30 to the UTF8 string and encode in the DataBuffer class as RMTES string.

Below is sample codes for RFA C++ and you can add the codes to our provider example such as Provider_Interactive example provided with the RFA C++ package in order to test the codes. You can modify codes in file MarketByPriceStreamItem.cpp method MarketPriceStreamItem::encodeMarketPriceFieldList to publish additional FID which use RMTES_String data type. In this case, we will add FID 1352 DSPLY_NMLL which holds local language instrument name to the codes.

Below sample codes use u8 literal from C++ 11 to create UTF-8 string. You can use another library or a different approach to create a UTF-8 string. Note that u8 literal requires Visual Studio 2013 or later version and you may need to add -std=c++11 when compiling the example on Linux.

        // Setup DSPLY_NMLL field
    field.setFieldID(1352);

        // Local lanaguage instrument name for Hang Seng Index
    std::string utf8Str = u8"恒生指數"; 
    unsigned char displayNameBytes[50];
    memset(displayNameBytes, 0, 50);

    //set three bytes escape sequences
    displayNameBytes[0] = 0x1B;
    displayNameBytes[1] = 0x25;
    displayNameBytes[2] = 0x30;

    //Add UTF-8 String to the buffer
    memcpy(displayNameBytes + 3, utf8Str.c_str(),utf8Str.length());

    //Create RFA Buffer and set it to DataBuffer. The type should be StringRMTESEnum
    Buffer bf;
    bf.setFrom(displayNameBytes, (int)utf8Str.length()+3);
    dataBuffer.setBuffer(bf, DataBuffer::StringRMTESEnum);
    field.setData(dataBuffer);
    pfieldListWIt->bind(field);

And the following codes are for RFA.NET Provider application. You can try sample C# codes with Provider_Interactive from RFA.NET package. Just modify the codes in file MarketPriceStreamItem.cs method EncodeMarketPriceFieldList.


            // Setup DSPLY_NMLL field
            field.FieldID = 1352;

            // Local language instrument name for Hang Seng Index
            var utf8Str = Encoding.UTF8; //using Encoding class from System.Text
            List<byte> displayNameBytes = new List<byte>();

            //Set three bytes escape sequences
            displayNameBytes.Add(0x1B);
            displayNameBytes.Add(0x25);
            displayNameBytes.Add(0x30);

            //Add bytes data for the UTF-8 string to displayNameBytes bytes array
            displayNameBytes.AddRange(utf8Str.GetBytes("恒生指數"));

            //create buffer and set it to data buffer, the type should be StringRMTESEnum
            RFA.Common.Buffer bf=new RFA.Common.Buffer();
            bf.SetFrom(displayNameBytes.ToArray(),displayNameBytes.Count);
            dataBuffer.SetBuffer(bf, DataBuffer.DataBufferEnum.StringRMTES);
            field.Data = dataBuffer;
            fieldListWIt.Bind(field);

Below is sample outgoing RSSL tracing log generated by the RFA Provider application. Comparing the data with sample data we described earlier, it's the same and identical one.

<fieldEntry fieldId="1352" data="1B25 30E6 8192 E794 9FE6 8C87 E695 B8"/>

An output from RFA StarterConsumer which has RMTES decoder, it can display the Chinese text correctly without garbage character. We will describe decoding RMTES string in Decoding RMTES section.

        FieldEntry [     2] RDNDISPLAY            100
        FieldEntry [     4] RDN_EXCHID                    SES         (155)
        FieldEntry [    38] DIVPAYDATE            12/10/2005
        FieldEntry [  1352] DSPLY_NMLL            "恒生指數"
        FieldEntry [     6] TRDPRC_1              1.00
        FieldEntry [    22] BID                   0.99

Publishing non-ASCII RMTES string in EMA application

Using the same approach in EMA C++ is easier than RFA. You can just add the UTF-8 string like the RFA C++ to FieldList and then publish the data to the wire. In order to test the codes, you can use Provider interactive example from EMA C++ package. For testing the codes, we use example 200MarketPriceStreaming to publish the same Chinese text. You can modify and use below codes in file IProvider.cpp method AppClient::processMarketPriceRequest.

    // Local lanaguage instrument name for Hang Seng Index
    std::string utf8Str = u8"恒生指數";
    char displayNameBytes[50];
    memset(displayNameBytes, 0, 50);

    //set three bytes escape sequences
    displayNameBytes[0] = 0x1B;
    displayNameBytes[1] = 0x25;
    displayNameBytes[2] = 0x30;

    //prepend UTF-8 string
    memcpy(displayNameBytes + 3, utf8Str.c_str(), utf8Str.length());
    EmaBuffer emaBuffer;

    emaBuffer.setFrom(displayNameBytes, (int)utf8Str.length() + 3);
    event.getProvider().submit( RefreshMsg().name( reqMsg.getName() ).serviceName( reqMsg.getServiceName() ).
        state( OmmState::OpenEnum, OmmState::OkEnum, OmmState::NoneEnum, "Refresh Completed" ).solicited( true ).
        payload( FieldList().
            addRmtes(1352, emaBuffer ).  // Add fid 1352 DSPLY_NMLL to FieldList
            addEnum( 15, 840 ).
            ...
            complete() ).
        complete(), event.getHandle() );

The outgoing message from EMA tracing log shows the same value as RFA. You should get the same result as RFA C++ when using OMM Consumer which has RMTES decoder subscribe the data from the Provider example.

<fieldEntry fieldId="1352" data="1B25 30E6 8192 E794 9FE6 8C87 E695 B8"/>

Decoding RMTES

TREP APIs such as RFA and EMA generally provided RMTES converter or parser interface for converting the encoded RMTES string payload received as part of the OMM data to a Unicode string. It helps display news in international languages with UCS2 format or transfer data through the network in ISO 2022 and UTF-8. The following section will provide a guideline for applications that want to display non-ASCII string correctly.

Displaying non-ASCII RMTES string in RFA Consumer application

RFA provides RMTESConverter interface to convert RMTES string back to a readable string. The interface is useful for converting the encoded RMTES string payload received as part of the OMM data to a Unicode string. It helps display news in international languages with the UCS2 format or transfer data through the network in ISO 2022 and UTF-8 format. The RMTESConverter interface provides the following major functions:

  • setBuffer(): binds a buffer to a converter (with a heap allocation). If the offset passed into the function is -1, the buffer passed in contains full field data and RFA will internally cache the buffer in memory. If the offset is equal to or greater than 0, the buffer passed in contains partial field data and RFA will internally apply the partial field data on the earlier cached buffer based on offset. The cached buffer should contain full field data.
  • getAsCharString(): returns a UTF-8-formatted string as an RFA_String.
  • getAsShortString(): returns a UCS2-formated string as an RFA_Vector in RFA C++ or List in RFA.NET. Each item in the vector is a short which holds a char and each item in List is a char which holds a character. The size of the vector or List is the total number of characters.

Sample Codes for RFA C++

RFA C++ example such as StarterConsumer cannot decode data buffer which contains non-ASCII text. However, the user can add RMTESConverter to StarterConsumer when it decoding the data buffer. You can open StarterConsumer project and add below codes to StandardOut.cpp in method void StandardOut::out( const DataBuffer& dataBuffer ). Note that below codes are just sample codes for decoding generic RMTES string and it does not support partial RMTES update. If you want to decode partial RMTES update, an application has to cache the RMTESConverter object and apply all received changes to them. You can find more details about RMTESConverter from RFA C++ Development guide.

#include "Common/RMTESConverter.h"
...
case DataBuffer::StringRMTESEnum:
    {
        RMTESConverter converter;
                //set DataBuffer to RMTESConverter
        converter.setBuffer(dataBuffer.getBuffer());
                //get UTF-8 string from RMTESConverter
        RFA_String utf8Str = converter.getAsCharString();
        int buffSize = utf8Str.size();
        sData = new char[buffSize + 1];
        strncpy(sData, (char*)utf8Str.c_str(), buffSize);
        sData[buffSize] = '\0';
        write("\"%s\"", sData);
        delete[] sData;
    }
break;

Open StarterConsumer_.log on a text editor, you should see Chinese text displayed correctly.

    FieldEntry [  1051] GV2_DATE    5/5/2017
    FieldEntry [  1080] PREF_DISP    2191
    FieldEntry [  1352] DSPLY_NMLL    "恒生指數"
    FieldEntry [  1709] RDN_EXCHD2            HSI         (456)
    FieldEntry [  3263] PREV_DISP    0

Sample Codes for RFA.NET

Like the RFA C++, we use StarterConsumer to demonstrate the codes for decoding the RMTES string and shows the non-ASCII text returned by DSPLY_NMLL fid. You can open StarterConsumer project from RFA.NET package and then open file StandOut.cs and then add below codes to method public void Out(DataBuffer dataBuffer)

 case DataBuffer.DataBufferEnum.StringRMTES:
{               
        RMTESConverter conv = new RMTESConverter();
        //set DataBuffer to RMTESConverter
        conv.SetBuffer(dataBuffer.GetBuffer());
        //get UTF-8 string from RMTESConverter
        var utf8Str = conv.GetAsCharString();
        RFA_String sData = new RFA_String(utf8Str.ToArray());
        Console.Write("\""+sData.ToString()+"\"");
        if (fileWriter != null)
        {
                fileWriter.Write("\"" + sData.ToString() + "\"");
        }
}
break;

If the application has to works with partial RMTES update message, it has to catch RMTESConverter object as explained in RFA C++. Please refer to RFA.NET Development guide for more details about RMTES Usage

Displaying non-ASCII RMTES string in EMA C++ Consumer application

Using RMTES on EMA C++ is easier than RFA C++/.NET. This is because EMA provides a built-in RMTES decoder. If needed, the application can cache RmtesBuffer objects and apply all received changes to them. Please refer to EMA C++ Reference Manual for more information about RmtestBuffer class.

Sample Codes for EMA C++

Below is sample codes that application can cache RmtesBuffer object and apply all received changes to them.

// Create RMtesBuffer object and caches it in application
thomsonreuters::ema::access::RmtesBuffer rmtesBuffer;

...
//Decoding FieldEntry
const FieldEntry& fe = fl.getEntry();
cout << "Name: " << fe.getName() << " Value: ";
rmtesBuffer.apply( fe.getRmtes() );
cout << rmtesBuffer.toString() << endl;

In case that application wants to decode RMTES field and do not need to works with the partial update, it can just call thomsonreuters::ema::access::FieldEntry::getRmtes() to get RMTESBuffer object when it decoding the FieldEntry which has DataType as RmtesEnum.

Conclusions

In this article, we explain the details of how to decode and encode non-ASCII text in field type RMTES using EMA and RFA C++ or .NET edition. We also provide a snippet of codes to demonstrate the API usages. The developer can try the codes with the examples application provided in the RFA or EMA package.

References

For further details, please check out the following resources: