Using Tick History REST API in R language - Part 3

This is the third article of the educational series that is going to show you how to use the httr package to retrieve data from Thomson Reuters Tick History (TRTH) REST APIs in R. In this article, we will look at the On-Demand custom report. You can find the first and second parts here and here.

Before we start, you are encouraged to visit the REST API Reference tree at the REST API Help Home Page as a primary source of reference for all restful functions and supported conditions.

You can also view the full list of functions in the httr package here.

The R script on this article is available on GitHub.

Tick History Reporting

The Tick History REST API has two kinds of reports:

  1. Standard reporting, also called Venue by Day (VBD), which automatically generates a day’s complete trading data for a given exchange. We covered this report in the previous article.
  2. Custom reporting, in which you specify all report attributes.

With custom reports, there are two ways in which you can identify the report’s fields and the instruments to report on.

  1. On-Demand, in which you specify the fields and instruments directly in a single HTTP request. The report begins running as soon as you submit the request.
  2. Stored & scheduled, in which the instrument list and report template are stored in your account and you can specify the timing when the report begins running.

On-Demand Workflow

Before we start coding, we have to understand the workflow of the On-Demand custom report. The workflow follows the basic pattern of submitting a report request and then retrieving the report. Here are the basic steps:

  1. Submit a report request that specifies:

    • The instruments on which you’re reporting.
    • The report logic (the type of report, the report fields, the time period, and additional properties).
    • An authentication token

      For all report types, you can use the ExtractRaw endpoint. If the report completes within the time specified by the wait preference, the report runs synchronously and you can omit the next report status polling step. (The wait preference defaults to 30 seconds; you can set it to a different time in the Prefer: respond-async header field.).

  2. Check the report status if the report had run for longer than the waiting period. This is done by polling the returned report’s job location. Although this step is optional as the report could complete within the wait time, in reality, the report is rarely complete within the default 30 seconds wait time.

  3. Retrieve the report once the report request has completed.

Specifying the Instruments

To specify the instruments, you need to know the identifier for each instrument. TRTH supports wide varieties of identifiers such as RIC, ISIN, CUSIP, SEDOL and more. However, we recommend using Reuters Instrument Code (RIC) over other identifiers. Some instruments are traded across multiple exchanges, and using ISIN or CUSIP will return reports from all exchanges. By using RIC, you will be able to request a specific exchange.

One way to discover a RIC is to issue the /Search/HistoricalSearch endpoint against one of the instrument’s secondary identifiers.

This is the sample HTTP request:

POST https://hosted.datascopeapi.reuters.com/RestApi/v1/Search/HistoricalSearch
Authorization: Token <your_auth_token_goes_here>
Prefer: respond-async
Content-Type: application/json; odata=minimalmetadata
{
  "Request": {
    "Identifier": "US0378331005",
    "IdentifierType": "Isin",
    "Range": {
      "Start": "2008-01-01T00:00:00.000Z",
      "End": "2008-02-01T00:00:00.000Z"
    }
  }
}

Supported identifier types are Ric, Isin, Cusip, Sedol. The search will look for the identifier in all supported types if not specified.

Start and End date are the Date range in which the instrument was active. If the instrument exists but was not active in the specified date range, it will not be returned.

With the above HTTP, we could write a Historical Search function as followed:

TRTHHistoricalSearch <- function(token,identifier,startDateTime,endDateTime,identifierType=NULL) {
  url <- "https://hosted.datascopeapi.reuters.com/RestApi/v1/Search/HistoricalSearch"
  b <- list(
    Request=list(
      Identifier=identifier,
      Range=list(
        Start=startDateTime,
        End=endDateTime
      )
    )
  )
  identifierTypeArg <- match.arg(identifierType,c(NULL,"Ric","Isin","Cusip","Sedol"))
  if (!is.null(identifierType)) {
    b[["Request"]][["IdentifierType"]] <- jsonlite::unbox(identifierTypeArg)
  }
  r <- httr::POST(url,add_headers(prefer = "respond-async",Authorization = token),content_type_json(),body = b,encode = "json")
  warn_for_status(r)
  a<-content(r, "parsed", "application/json", encoding="UTF-8")
  return(a)
}

The response returns the RICs that meet the conditions specified in the request:

List of 2
 $ @odata.context: chr "https://hosted.datascopeapi.reuters.com/RestApi/v1/$metadata#Collection(ThomsonReuters.Dss.Api.Search.HistoricalSearchResult)"
 $ value         :List of 25
  ..$ :List of 11
  .. ..$ Identifier    : chr "AAPL.BE"
  .. ..$ IdentifierType: chr "Ric"
  .. ..$ Source        : chr ""
  .. ..$ Key           : chr "VjF8MHgzMDAwMDAwMDAwMDAwMDAwfDB4MzAwMDAwMDAwMDAwMDAwMHx8fHx8fHxBQVBMLkJFfA"
  .. ..$ Description   : chr "Historical Instrument"
  .. ..$ InstrumentType: chr "Unknown"
  .. ..$ Status        : chr "Valid"
  .. ..$ DomainCode    : chr "6"
  .. ..$ FirstDate     : chr "1996-01-01T00:00:00.000Z"
  .. ..$ LastDate      : chr "2017-10-20T00:00:00.000Z"
  .. ..$ History       : list()
  ..$ :List of 11
  .. ..$ Identifier    : chr "AAPL.C"
  .. ..$ IdentifierType: chr "Ric"
  .. ..$ Source        : chr ""
  .. ..$ Key           : chr "VjF8MHgzMDAwMDAwMDAwMDAwMDAwfDB4MzAwMDAwMDAwMDAwMDAwMHx8fHx8fHxBQVBMLkN8"
  .. ..$ Description   : chr "Historical Instrument"
  .. ..$ InstrumentType: chr "Unknown"
  .. ..$ Status        : chr "Valid"
  .. ..$ DomainCode    : chr "6"
  .. ..$ FirstDate     : chr "2002-07-20T00:00:00.000Z"
  .. ..$ LastDate      : chr "2017-10-20T00:00:00.000Z"
  .. ..$ History       : list()
  ..$ :List of 11
  .. ..$ Identifier    : chr "AAPLs.INS"
  .. ..$ IdentifierType: chr "Ric"
  .. ..$ Source        : chr ""
  .. ..$ Key           : chr "VjF8MHgzMDAwMDAwMDAwMDAwMDAwfDB4MzAwMDAwMDAwMDAwMDAwMHx8fHx8fHxBQVBMcy5JTlN8"
  .. ..$ Description   : chr "Historical Instrument"
  .. ..$ InstrumentType: chr "Unknown"
  .. ..$ Status        : chr "Valid"
  .. ..$ DomainCode    : chr "6"
  .. ..$ FirstDate     : chr "2007-04-11T00:00:00.000Z"
  .. ..$ LastDate      : chr "2008-11-03T00:00:00.000Z"
  .. ..$ History       : list()

From the list, you can pick the RIC of specific exchange.

Specifying What to Include in the Report

You specify what to include in an On-demand custom report by first choosing the type of report (also referred to as the report template) and then choosing which of its fields to use.

Below are the types of reports that are available to Tick History. They belong to several categories:

  • Tick History
    • Time and Sales
    • Market Depth
    • Intraday Summaries
    • Raw
  • End of Day Pricing
    • Elektron Timeseries
  • Corporate Actions
    • Standard Events
  • Reference Data
    • Terms and Conditions
    • Historical Reference

Each type of report provides a different set of fields. For every report types except Raw reports, you can specify which of its fields you want. The Raw report always returns all the fields provided by the venue.

To see which fields each report type offers, you can issue the endpoint /Extractions/GetValidContentFieldTypes to return all fields for a given report type.

GET https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/GetValidContentFieldTypes(ReportTemplateType=ThomsonReuters.Dss.Api.Extractions.ReportTemplates.ReportTemplateTypes'TickHistoryTimeAndSales')
Authorization: Token <your_auth_token_goes_here>
Prefer: respond-async

Replace TickHistoryTimeAndSales with the enum of the report template. The list can be found on REST API Reference Tree

With the above HTTP, we could write a function as followed:

TRTHGetValidContentFieldTypes <- function(token,reportTemplateTypes=c("TickHistoryTimeAndSales","TickHistoryMarketDepth","TickHistoryIntradaySummaries")) {
  reportTemplateTypesArg <- match.arg(reportTemplateTypes)
  url <- paste0("https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/GetValidContentFieldTypes(ReportTemplateType=ThomsonReuters.Dss.Api.Extractions.ReportTemplates.ReportTemplateTypes'",reportTemplateTypesArg,"')")
  r <- GET(url,add_headers(prefer = "respond-async",Authorization = token))
  stop_for_status(r)
  a<-content(r, "parsed", "application/json", encoding="UTF-8")
  return(a)
}

Once you know your report type and fields, you can create the JSON body of the On-demand request.

The body contains several parts:

  • The type of extraction in @odata.type format.

    The @odata.type of each Tick History report are:
    • “#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.TickHistoryTimeAndSalesExtractionRequest”
    • “#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.TickHistoryMarketDepthExtractionRequest”
    • “#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.TickHistoryIntradaySummariesExtractionRequest”
    • “#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.TickHistoryRawExtractionRequest”
  • The list of instrument identifiers, each one with its type.
  • The list of field names.
  • The conditions include the date range and additional properties for the request. You can check the REST API Reference Tree for the list of additional properties.

Below is a sample request body of Time and Sales report request.

{
  "ExtractionRequest": {
    "@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.TickHistoryTimeAndSalesExtractionRequest",
    "ContentFieldNames": [
          "Trade - Price",
          "Trade - Volume",
          "Trade - Exchange Time"
    ],
    "IdentifierList": {
      "@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.InstrumentIdentifierList",  
      "InstrumentIdentifiers": [
      {
        "Identifier": "AAPL.O",
        "IdentifierType": "Ric"
      },
      {
        "Identifier": "TRI.N",
        "IdentifierType": "Ric"
      }]
    },
    "Condition": {
      "MessageTimeStampIn": "GmtUtc",
      "ReportDateRangeType": "Range",
      "QueryStartDate": "2017-09-29T00:00:00.000Z",
      "QueryEndDate": "2017-09-29T12:00:00.000Z"
    }
  }
}

Submit a request

We can perform an on-demand extraction using the API’s /Extractions/ExtractRaw endpoint.

POST https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractRaw
Authorization: Token <your_auth_token_goes_here>
Prefer: respond-async
Content-Type: application/json; odata=minimalmetadata
<your_request_body_goes_here>

If the results are ready before the request times out, the report runs synchronously and you can omit the next report status polling step. If not, it will return a 202 response and a monitor URL in response. In a later step, you will use this monitor URL to poll the report’s status to determine when the report has completed.

So let’s start the request function with the httr::POST() to /Extractions/ExtractRaw endpoint.

  url <- "https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractRaw"
  r <- httr::POST(url,add_headers(prefer = "respond-async",Authorization = token),content_type_json(),body = b,encode = "json")

Poll the Report Status

If the results are not ready within the 30-second timeout, the response will return an HTTP status 202 Accepted.

HTTP 202 Accepted
Status: InProgress
Progress: 1
Preference-Applied: respond-async
Location: https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractRawResult(ExtractionId='0x05806ed296dc2064')

In the request function, we use httr::status_code(), which extract status code from a response. If the code is 202, the function returns the monitor URL.

  url <- "https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractRaw"
  r <- httr::POST(url,add_headers(prefer = "respond-async",Authorization = token),content_type_json(),body = b,encode = "json")
  if (httr::status_code(r) == 202) {
    message("The request has been accepted but has not yet completed executing asynchronously.\r\nReturn monitor URL\r\n",r$headers$location)
    return(invisible(r$headers$location))
  }

We will issue a GET against this monitor URL to poll the report to determine when it has completed. Note that it may return an HTTP status 202 Accepted response again. This means the request has not yet completed and we should wait a few minutes before polling the extraction status again.

  r <- httr::GET(location,add_headers(prefer = "respond-async",Authorization = token))
  if (httr::status_code(r) == 202) {
    message("The request has not yet completed executing asynchronously.\r\nPlease wait a bit and check the request status again.\r\n")
    return(invisible(r$headers$location))
  }

Retrieve the Report

If you receive an HTTP status 200 OK response, the body will contain a jobId and Notes.

HTTP 200 OK
{
    "@odata.context": "https://hosted.datascopeapi.reuters.com/RestApi/v1/$metadata#RawExtractionResults/$entity",
    "JobId": "0x05806ed296dc2064",
    "Notes": [
        "Extraction Services Version 10.8.35959 (90ec8e35cd86), Built Nov 22 2016 01:01:43\nUser ID: 9001552\nExtraction ID: 110000124473\nSchedule: 0x05806ed296dc2064 (ID = 0x0000000000000000)\nInput List (1 items):  (ID = 0x05806ed296dc2064) Created: 23/11/2016 02:23:06 PM Last Modified: 23/11/2016 02:23:06 PM\nReport Template (1 fields): _OnD_0x05806ed296dc2064 (ID = 0x05806ed2e2dc2064) Created: 23/11/2016 02:22:09 PM Last Modified: 23/11/2016 02:22:09 PM\nSchedule dispatched via message queue (0x05806ed296dc2064), Data source identifier (14F11734E1A148ECA885A209462DBF8B)\nSchedule Time: 23/11/2016 02:22:09 PM\nProcessing started at 23/11/2016 02:22:09 PM\nProcessing completed successfully at 23/11/2016 02:23:07 PM\nExtraction finished at 23/11/2016 08:23:07 PM UTC, with servers: tm01n01, TRTH (46.763 secs)\nHistorical Instrument <RIC,IBM.N> expanded to 1 RIC: IBM.N.\n(RIC,IBM.N,NYS) having end time greater than the cutoff time of NYS is changed to the cutoff time of 22/11/2016 03:02:00 PM\nQuota Message: INFO: Tick History Compliance Quota Count Before Extraction: 55; Instruments Extracted: 1; Tick History Compliance Quota Count After Extraction: 55, 0.423076923076923% of Limit; Tick History Compliance Quota Limit: 13000\nManifest: #RIC,Domain,Start,End,Status,Count\nManifest: IBM.N,Market Price,2016-11-21T14:30:09.882619765Z,2016-11-22T20:59:59.994472189Z,Active,86927\n"
    ]
}

We can then retrieve the file by sending GET request to the /Extractions/RawExtractionResults('{{jobId}}')/$value endpoint.

GET https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/RawExtractionResults('0x05806ed296dc2064')/$value
Authorization: Token <your_auth_token_goes_here>
Prefer: respond-async

We retrieve the files and save them to local disk. Do not forget that the httr does not support concatenated gzip files and we have to disable the httr’s content decoding.

TRTHRawExtractionResults <- function(token,jobid,Path,Overwrite = TRUE) {
  url <- paste0("https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/RawExtractionResults('",jobid,"')/$value")
  r <- httr::GET(url,add_headers(prefer = "respond-async",Authorization = token),config(http_content_decoding=0),write_disk(Path,Overwrite),progress())
  stop_for_status(r)
  return(r)
}

Then, in the request and poll function, we add another check for status 200.

TRTHExtractRaw <- function(token,b,path,overwrite = FALSE) {
  url <- "https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractRaw"
  r <- httr::POST(url,add_headers(prefer = "respond-async",Authorization = token),content_type_json(),body = b,encode = "json")
  if (httr::status_code(r) == 202) {
    message("The request has been accepted but has not yet completed executing asynchronously.\r\nReturn monitor URL\r\n",r$headers$location)
    return(invisible(r$headers$location))
  } else if(status_code(r) == 200) {
    a<-content(r, "parsed", "application/json", encoding="UTF-8")
    message(a$Notes)
    return(TRTHRawExtractionResults(token,a$JobID,path,overwrite))
  } else {
    warn_for_status(r)
    a<-content(r, "parsed", "application/json", encoding="UTF-8")
    return(a)
  }
}
TRTHCheckRequestStatus <- function(token,location,path,overwrite = FALSE) {
  r <- GET(location,add_headers(prefer = "respond-async",Authorization = token))
  if (status_code(r) == 202) {
    message("The request has not yet completed executing asynchronously.\r\nPlease wait a bit and check the request status again.\r\n")
    return(invisible(r$headers$location))
  } else if(status_code(r) == 200) {
    a<-content(r, "parsed", "application/json", encoding="UTF-8")
    message(a$Notes)
    return(TRTHRawExtractionResults(token,a$JobID,path,overwrite))
  } else {
    warn_for_status(r)
    a<-content(r, "parsed", "application/json", encoding="UTF-8")
    return(a)
  }
}

 

Best Practices

Below are some of the best practices that Tick History users are encouraged to follow:

Specify Only the Dates, Fields, and Instruments You Need

Report performance can be very sensitive to the length of the date range on which you are reporting, to the number of fields you are reporting on (even if some are empty), and to the number of instruments. You should always specify only the date range, fields, and instruments that you need.

Use a Realistic Polling Interval

A large report that will take a long time to execute does not need a short polling interval. So set the interval to an appropriate period relative to the report.

Querying Multiple Instruments Together Is Faster Than One at a Time

Submitting a single job that reports on multiple instruments is generally faster than submitting several jobs that each report on one instrument.

Retrieve Reports When Completed, Not in the Sequence Submitted

Reports of the same type (for example, Tick History Market Depth reports) run sequentially, and complete in the order in which you submitted them. Reports of different types run in parallel (independently of each other) and complete when they are done. You can poll report jobs to determine when each one is done, which enables you to retrieve each one as soon as possible.

Do Not Resubmit After a Timeout

If your procedure that polls a report job times out on your system, do not resubmit the report job. It is only your polling procedure that has timed out: the original report job is still queued to execute or is still executing, on the DataScope Select platform. Resubmitting the report job will not get you the report faster. (In fact, resubmitting it might have the opposite effect, because you will now have an additional instance of the job running.)

If your polling procedure frequently times out, consider increasing your timeout period, or contact Thomson Reuters Support to investigate.

The limit

A single request should not have more than 30,000 instruments.

Each user can have up to 50 Tick History requests running at any one time.

Each user can extract up to 5,000,000 RIC-days in requests that are running at any one time.