Retrieving timeseries data for all options on a given underlying using TRTH REST API in Python

Introduction

Thomson Reuters Tick History product provides historical market data including intra-day Time and Sales, Quotes, and Market Depth content going back to January 1996. It also provides timeseries of end of day market data and auxiliary capabilities such as criteria search for instruments covered within the product including equities, indices, foreign exchange, money, fixed income, and derivatives instruments.

This article illustrates the use case of downloading timeseries data using Thomson Reuters Tick History REST API for all option instruments on a given underlying. In the example I used Python 3.6 with the following libraries: requests, json, shutil, datetime, sys, pandas, time.

Obtaining authentication token

The first thing we need to do to be able to download data from TRTH using REST API is to obtain authentication token. Authentication tokens are valid for 24 hours. I found it convenient to create a neat little script that allows me to copy & paste authentication token if I already have one and request a new token if I don’t or if my existing token expired.

Here’s the function that requests new token given TRTH user credentials (username and password)

def NewToken(un, pw):
    if not (un and pw):
        print ('Username or password is empty')
        sys.exit()
    
    requestUrl = 'https://hosted.datascopeapi.reuters.com/RestApi/v1/Authentication/RequestToken'    
    requestHeaders = {
        'Prefer':'respond-async',
        'Content-Type':'application/json'
        }
    requestBody = {
        'Credentials': {
        'Username': un,
        'Password': pw
      }
    }
    r1 = requests.post(requestUrl, json = requestBody, headers = requestHeaders, proxies = proxyServers)
    
    if r1.status_code == 200 :
        jsonResponse = json.loads(r1.text.encode('ascii', 'ignore'))
        return jsonResponse["value"]
    else:
        print ('Authentication failed')
        sys.exit()

And here’s the script utilizing this function to obtain new authentication token unless we already have valid token. To check the validity of an existing token I’m sending a request for user details using TRTH REST API. If the response status is 200 OK, then the token is definitely valid, otherwise I request a new token.

userName = 'your TRTH username'
passWord = 'your TRTH account password'
proxyServers = {'https':'your proxy address and port number'}
token = 'you can copy and paste your existing authentication token here'
if token:
    #check the validity of the authentication token
    requestUrl = 'https://hosted.datascopeapi.reuters.com/RestApi/v1/Users/Users(' + userName + ')'
    requestHeaders = {
        'Prefer':'respond-async',
        'Content-Type':'application/json',
        'Authorization': 'token ' + token
    }
    r1 = requests.get(requestUrl, headers = requestHeaders, proxies = proxyServers)
    #if the response status is 200 OK then the token is valid, otherwise request new token
    if r1.status_code != 200:
        token = NewToken(userName, passWord)
else:
    #token is empty, request new token.
    token = NewToken(userName, passWord)

print(token)

Creating list of option instruments on a given underlying

Next I search for all option instruments on a given underlying.

Method: POST

Endpoint: https://hosted.datascopeapi.reuters.com/RestApi/v1/Search/FuturesAndOptionsSearch

requestUrl = 'https://hosted.datascopeapi.reuters.com/RestApi/v1/Search/FuturesAndOptionsSearch'

requestHeaders = {
    'Prefer':'respond-async;odata.maxpagesize=5000',
    'Content-Type':'application/json',
    'Authorization': 'token ' + token
}

requestBody = {
    'SearchRequest': {
      '@odata.context': 'http://hosted.datascopeapi.reuters.com/RestApi/v1/$metadata#ThomsonReuters.Dss.Api.Search.FuturesAndOptionsSearchRequest',
      'FuturesAndOptionsType': 'Options',
      'UnderlyingRic': '.SPX',
      'ExpirationDate': {
        '@odata.type': '#ThomsonReuters.Dss.Api.Search.DateValueComparison',
        'ComparisonOperator': 'GreaterThanEquals',
        'Value': str(dt.date.today())
      }
    }
}

r2 = requests.post(requestUrl, json = requestBody, headers = requestHeaders, proxies = proxyServers)
r2Json = json.loads(r2.text.encode('ascii', 'ignore'))

Search uses server driven paging, which limits the result set to the max of 250 rows, unless odata.maxpagesize preference is set in the request header. Subsequent pages can be retrieved using the nextlink and continuing to call the nextlink in each received payload until there is no nextlink in the payload indicating that no more data is available. In many use cases it is not necessary to retrieve more than the first set of 250 rows. In this use case however at the time of writing the expected result from the search is over 10,000 rows. Sticking with the default value of 250 for odata.maxpagesize preference would result in the search process taking unnecessarily long, as the request processing time is in the same order of magnitude for 10,000 rows in the result set as it is for 250 rows. While you can set the value of odata.maxpagesize preference to an arbitrary large number, there’s no guarantee that the Web service will honor the requested page size. In your code you should always account for the possibility of receiving partial result set and the nextlink pointing to the next page. For the purpose of this example I chose the value of 5,000 for odata.maxpagesize parameter, which allows me to illustrate how to work with server driven paging without unnecessarily slowing down the search process.

For more details on server driven paging in Datascope Select/TRTH REST API see this article in Datascope Select Help (you’ll be required to sign in with Datascope Select or TRTH credentials).

instrumentList = r2Json['value']
nextLink = r2Json['@odata.nextlink'] if '@odata.nextlink' in r2Json else False

while nextLink:
    print('requesting the next batch of option RICs from ' + nextLink)
    r2 = requests.post(nextLink, json = requestBody, headers=requestHeaders, proxies = proxyServers)
    r2Json = json.loads(r2.text.encode('ascii', 'ignore'))
    instrumentList = instrumentList + r2Json['value']
    nextLink = r2Json['@odata.nextlink'] if '@odata.nextlink' in r2Json else False

print(str(len(instrumentList)) + ' option RICs returned from search')

Transform instrument list into the format required for extraction

The instrument list returned from search looks like

    [
        {
            "Identifier": "SPXl151710250.U",
            "IdentifierType": "Ric",
            "Source": "OPQ",
            "Key": "VjF8MHgwMDEwMGIwMDBiYmYzZTI1fDB4MDAxMDBiMDAwYmJmM2UyNHxPUFF8RFZRVXxPUFR8fER8fFNQWGwxNTE3MTAyNTAuVXw3MTU0",
            "Description": "SPX Dec7 1025.0C",
            "InstrumentType": "DerivativeQuote",
            "Status": "Valid"
        },
        {
            "Identifier": "SPXl151711250.U",
            "IdentifierType": "Ric",
            "Source": "OPQ",
            "Key": "VjF8MHgwMDEwMGIwMDBiYmYzZTQ5fDB4MDAxMDBiMDAwYmJmM2U0OHxPUFF8RFZRVXxPUFR8fER8fFNQWGwxNTE3MTEyNTAuVXw3MTU0",
            "Description": "SPX Dec7 1125.0C",
            "InstrumentType": "DerivativeQuote",
            "Status": "Valid"
        },
        {
            "Identifier": "SPXL151710000.U",
            "IdentifierType": "Ric",
            "Source": "OPQ",
            "Key": "VjF8MHgwMDEwMGIwMDBiYmYzZTRifDB4MDAxMDBiMDAwYmJmM2U0YXxPUFF8RFZRVXxPUFR8fER8fFNQWEwxNTE3MTAwMDAuVXw3MTU0",
            "Description": "SPX Dec7 100.0 C",
            "InstrumentType": "DerivativeQuote",
            "Status": "Valid"
        }, ...
    ]

It contains more information than we need to request timeseries and cannot be used directly. We need to transform the result returned from search by extracting only the Identifier and IdentifierType keys.

tmpDF = pd.DataFrame.from_dict(instrumentList, orient='columns')
tmpDF = tmpDF[['Identifier','IdentifierType']]
instrumentList = tmpDF.to_dict('records')

Send an on demand extraction request for the list of instruments retrieved from search

Method: POST

Endpoint: https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractRaw

requestUrl='https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractRaw'

requestHeaders={
    'Prefer':'respond-async',
    'Content-Type':'application/json',
    'Authorization': 'token ' + token
}

In this example we’re requesting end of day data, which in TRTH terminology is known as ElektronTimeseries.

requestBody={
  'ExtractionRequest': {
    '@odata.type': '#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.ElektronTimeseriesExtractionRequest',
    'ContentFieldNames': [
      'RIC',
      'Expiration Date',
      'Put Call Flag',
      'Trade Date',
      'Bid',
      'Ask',
      'Last',
      'Security Description'
    ],
    'IdentifierList': {
      '@odata.type': '#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.InstrumentIdentifierList',  
      'InstrumentIdentifiers': instrumentList,
    },    
    'Condition': {
      'StartDate': str(dt.date.today() - dt.timedelta(days=5)),
      'EndDate': str(dt.date.today())
    }
  }
}

r3 = requests.post(requestUrl, json=requestBody, headers=requestHeaders, proxies = proxyServers)

In most cases the response status we will receive after about 30 seconds wait is 202 Accepted, which means the request has not yet completed. However for small dataset we may receive response status 200 OK. For more information on the workflow of on demand extractions see this tutorial.

We always recommend to include error handling for other possible response statuses.

print ('response status from the extraction request = ' + str(r3.status_code))
if r3.status_code >= 400 :
    print(r3.text.encode('ascii', 'ignore'))
    sys.exit()

Poll the status of the request using received location URL, and get the jobId and extraction notes

If the response status we received is 202 Accepted we need to wait until the server completes the request we sent. The response from the server is asynchronous. We need to periodically poll the server for the status of the extraction we requested.

requestUrl = r3.headers['location']
requestHeaders={
    'Prefer': 'respond-async',
    'Content-Type': 'application/json',
    'Authorization': 'token ' + token
}
r4 = requests.get(requestUrl, headers=requestHeaders, proxies = proxyServers)
while r4.status_code == 202 :
    r4 = requests.get(requestUrl, headers=requestHeaders, proxies = proxyServers)
    print (str(dt.datetime.now()) + ' Server is still processing the extraction. Checking the status again in 30 seconds')
    time.sleep(30)

Get the extraction results using received jobId and save compressed data to disk

To retrieve the data the request status must be 200, which indicates that the extraction is complete. First we display the jobId and the extraction notes. The extraction notes contain information about the request, various internal IDs and timestamps associated with the extraction, error messages if any and extraction quota status. If the request completed successfully, it will contain the message: Processing completed successfully. Then we retrieve the timeseries data and save it to the disk.

if r4.status_code == 200 :
    r4Json = json.loads(r4.text.encode('ascii', 'ignore'))
    jobId = r4Json["JobId"]
    print ('jobId: ' + jobId + '\n')
    notes = r4Json["Notes"]
    print ('Extraction notes:\n' + notes[0])
    requestUrl = 'https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/RawExtractionResults' + "('" + jobId + "')" + '/$value'
    requestHeaders={
        'Prefer': 'respond-async',
        'Content-Type': 'text/plain',
        'Accept-Encoding': 'gzip',
        'Authorization': 'token ' + token
    }
    r5 = requests.get(requestUrl,headers=requestHeaders,stream=True)
    r5.raw.decode_content = False
    print ('Response headers for content: type: ' + r5.headers['Content-Type'] + ' - encoding: ' + r5.headers['Content-Encoding'] + '\n')
    fo = open(fileName, 'wb')
    shutil.copyfileobj(r5.raw, fo)
    fo.close()

The content is compressed plain text in CSV format. Depending on the nature of the data, the time range and number of instruments, the response can be quite long and contain tens of thousands of lines.

Request tuning and best practices

Requests for raw data, tick data and market depth data can generate very large result sets. To optimize the retrieval times, see BEST PRACTICES FOR THE TICK HISTORY REST API.