1. Home
  2. Article Catalog
  3. Calculating the Median of an Index's Constituents with Datastream (DSWS)

Datastream API (DSWS) Median Index Data

Author:
Jonathan Legrand
Developer Advocate Developer Advocate

In this article, we will create a Python function that will take the median measure of all (non 'NaN') values of a specific field for any index (or list of indices) of choice using Refinitiv's DataStream Web Services (DSWS).

Contents:

Get Coding 

Import Libraries 

For full replication, note that the version of libraries used

    	
            

import sys # ' sys ' is only needed to display our Pyhon version

print("This code is running on Python version " + sys.version[0:5])

This code is running on Python version 3.7.7

We need to gather our data. Since Refinitiv's DataStream Web Services (DSWS) allows for access to ESG data covering nearly 70% of global market cap and over 400 metrics, naturally it is more than appropriate. We can access DSWS via the Python library "DatastreamDSWS" that can be installed simply by using pip install.

    	
            

import DatastreamDSWS as DSWS

 

# We can use our Refinitiv's Datastream Web Socket (DSWS) API keys that allows us to be identified by

# Refinitiv's back-end services and enables us to request (and fetch) data:

# Credentials are placed in a text file so that it may be used in this code without showing it itself.

(DSWS_username, DSWS_password) = (open("Datastream_username.txt","r"),

                                  open("Datastream_password.txt","r"))

 

ds = DSWS.Datastream(username = str(DSWS_username.read()),

                     password = str(DSWS_password.read()))

 

# It is best to close the files we opened in order to make sure that we don't stop any other

# services/programs from accessing them if they need to.

DSWS_username.close()

DSWS_password.close()

 

 

# # Alternatively one can use the following:

# import getpass

# dsusername = input()

# dspassword = getpass.getpass()

# ds = DSWS.Datastream(username = dsusername, password = dspassword)

The libraries in the cell bellow are  native to Python.

    	
            

import calendar # This library will be used to get the last day of any one month.

from datetime import date # ' datetime ' and ' date ' is crucial to manipulate dates.

import datetime # ' datetime ' and ' date ' is crucial to manipulate dates.

The libraries in the cell bellow are not native to Python - id est (i.e.): they are not installed upon Python's installation. This means that they have their own version independently of Python's version used.

    	
            

import dateutil # ' datetime ' and ' date ' is crucial to manipulate dates.

import numpy as np

import pandas as pd

for i,j in zip(["dateutil", "np","pd"],

               [ dateutil,   np,  pd]):

    print(f"The {i} library imported in this code is version: {j.__version__}")

The dateutil library imported in this code is version: 2.8.1
The np library imported in this code is version: 1.18.5
The pd library imported in this code is version: 1.1.4

Collecting Data: A Simple Example 

1st: We have to specify our list of fields. In this example, we have:

  • EVT1FD12: Enterprise Value's ForwarD contract with 12 month maturity (12FRW).
  • EBT1FD12: Earnings Before Taxes' 12FRW.
  • EBD1FD12: Earnings Before Depreciation and amortisation's 12FRW.
  • X(CPS1FD12)*X(IBNOSH): product of the Cashflow Per Share amount's 12FRW and the Institutional Brokers Estimate Service( i.e.: I/B/E/S pulled from datastream's estimate data)'s Number Of Shares.
  • X(MV)~IBCUR: Market Value IBES currency.
  • PEFD12: market share Price to Equity ratio's 12FRW.
  • SAL1FD12: SALes' 12FRW.
  • X(BPS1FD12)*X(IBNOSH): product of the Book value Per Share's 12FRW and the Institutional Brokers Estimate Service's Number Of Shares.

The '  ~USD ' is there to revert the Currency to United States Dollars.

N.B.: X is the variable, so X(...)*X(...) means that we multiply them

    	
            

list_of_fields = ['X(EVT1FD12)~USD', 'X(EBT1FD12)~USD', 'X(EBD1FD12)~USD', 'X(CPS1FD12)~USD*X(IBNOSH)',

                  'X(MV)~USD', 'PEFD12', 'X(SAL1FD12)~USD', 'X(BPS1FD12)~USD*X(IBNOSH)']

Now that we know the fields that we want to capture, we will ask for DSWS to collect such data for all constituents of the MSCI World Index. We need to do a few changed to the mnemonics/tickers used in the DSWS function ' ds.get_data ':

  • We need to add an 'L' at the start to indicate that we are requesting for the list of the constituents of the index.
  • We need to remove '$' as it is not recognised by the backend. We need to remove all currency symbols for this to work.
  • We need to add the month and year (in mmyy format) at the end to specify the point in time that we are trying to replicate. This means that we are capturing data for the list of constituents of that index on that month.
  • We need to add '|L' at the end to indicate to the DSWS service that we are requesting data for a list, not a single object.

N.B.: start = '2013-08-31' collects data from August 2013. If someone was to use ds.get_data("LMSWRLD0313|L", start = '2014-08-31'), they'd be collecting data starting in August 2014 for the constituents listed under the MSWRLD index back in March 2013.

    	
            

df = ds.get_data("LMSWRLD0313|L",

                 ['NAME',

                  list_of_fields[0]],

                 start = '2013-08-31',

                 kind = 0)

Now that we have stored the data we are looking for in the Python object 'df', we can sort through it:

    	
            

# ' df["Datatype"] == "X(EVT1FD12)~USD" ' here sieves through ' df ' and only

#    returns data where what is in the outer most square brackets is

#    true, i.e.: where values in the 'Datatype' column is 'X(EVT1FD12)~USD'.

df[df["Datatype"] == "X(EVT1FD12)~USD"] # Note that ' "X(EVT1FD12)~USD" ' is the same as ' list_of_fields[0] '

  Instrument Datatype Value Dates
1608 MS:65955 X(EVT1FD12)~USD NA 2013-08-31
1609 MS:1016 X(EVT1FD12)~USD 49373.9 2013-08-31
1610 MS:388 X(EVT1FD12)~USD 52651.6 2013-08-31
1611 MS:10 X(EVT1FD12)~USD 54133.7 2013-08-31
1612 MS:96942 X(EVT1FD12)~USD 77434.2 2013-08-31
... ... ... ... ...
3211 MS:65552 X(EVT1FD12)~USD 11510.2 2013-08-31
3212 MS:3695 X(EVT1FD12)~USD 8305.29 2013-08-31
3213 MS:3698 X(EVT1FD12)~USD NA 2013-08-31
3214 MS:4502 X(EVT1FD12)~USD NA 2013-08-31
3215 MS:2138 X(EVT1FD12)~USD 82122.3 2013-08-31

1608 rows × 4 columns

 

There are a lot of repeated values, let's use ' .unique() ' to only retrieve unique values within the 'Instrument' column:

    	
            df[df["Instrument"] == df["Instrument"].unique()[0]] # ' [0] ' here is needed to indicate that we want the 0th unique value in question.
        
        
    
  Instrument Datatype Value Dates
0 MS:65955 NAME A P MOLLER MAERSK A 2013-08-31
1608 MS:65955 X(EVT1FD12)~USD NA 2013-08-31
    	
            df[df["Datatype"] == list_of_fields[0]]
        
        
    
  Instrument Datatype Value Dates
1608 MS:65955 X(EVT1FD12)~USD NA 2013-08-31
1609 MS:1016 X(EVT1FD12)~USD 49373.9 2013-08-31
1610 MS:388 X(EVT1FD12)~USD 52651.6 2013-08-31
1611 MS:10 X(EVT1FD12)~USD 54133.7 2013-08-31
1612 MS:96942 X(EVT1FD12)~USD 77434.2 2013-08-31
... ... ... ... ...
3211 MS:65552 X(EVT1FD12)~USD 11510.2 2013-08-31
3212 MS:3695 X(EVT1FD12)~USD 8305.29 2013-08-31
3213 MS:3698 X(EVT1FD12)~USD NA 2013-08-31
3214 MS:4502 X(EVT1FD12)~USD NA 2013-08-31
3215 MS:2138 X(EVT1FD12)~USD 82122.3 2013-08-31

1608 rows × 4 columns

    	
            

print(list_of_fields[0]) # Printing out the field of interest's name

df[df["Datatype"] == list_of_fields[0]][df["Value"] != 'NA']["Value"].median() # ' .median() ' gets us - you guessed it - the median value.

X(EVT1FD12)~USD

10479.682

Creating Python Function to programmatically collect data

    	
            

def DSWS_Median_Index_Data(indices = ["MSWRLD"],

                           list_of_fields = ['X(EVT1FD12)~USD', 'X(EBT1FD12)~USD', 'X(EBD1FD12)~USD', 'X(CPS1FD12)~USD*X(IBNOSH)',

                                             'X(MV)~USD', 'PEFD12', 'X(SAL1FD12)~USD', 'X(BPS1FD12)~USD*X(IBNOSH)'], # Don't include non numerical fields like 'GDSCN'.

                          start = '2013-01-01',

                          end = '2021-01-31',

                          track_progress = False,

                          add_to_global_DF = False):

    """

    ' DSWS_Median_Index_Data ' requests data defined in ' list_of_fields ' from Datastream's API - DataStream Web

    Services (DSWS) - for a series of indices of choice - defined in 'indices' - and returns three lists full of Pandas data-frames:

    1st - the raw output from the requests (with columns 'Instrument', 'Datatype', 'Value' and 'Dates');

    2nd - the monthly median from the raw output from the requests (with columns of the requested metrics and an index of dates);

    3rd - the raw data pulled from the DSWS request (and where issues are most likely to lie).

    

    One could re-write this basing themselves off datetime objects, it would be equivalent; its code may look neater too.

    

    indices (list): List of strings of the DSWS mnemonic for the index for which data is requested.

    Default: indices = ["MSWRLD"]

    

    list_of_fields (list): List of strings of the DSWS fields we are after. This has to be comprised of 49 elements or less.

    Don't include non numerical fields like 'GDSCN'. That will result in a printed error

    Default: list_of_fields = ['EVT1FD12', 'EBT1FD12', 'EBD1FD12', 'X(CPS1FD12)*X(IBNOSH)', 'X(MV)~IBCUR', 'PEFD12', 'SAL1FD12', 'X(BPS1FD12)*X(IBNOSH)', '(X(SAL1FD12)*X(IBUNIT))', '354E(X)', 'X(INC1FD12)*X(IBUNIT)', 'X(FCF1FD12)*X(IBNOSH)', 'X(FCF1FD12)*X(IBNOSH)*X(IBUNIT)', 'GDSCN']

    

    start (str): String of the start date from which we are asking for data in 'yyy-mm-dd' format.

    Default: start = '2013-01-01'

    

    end (str): String of the end date until which we are asking for data in 'yyy-mm-dd' format.

    Default: end = '2021-01-31'

    

    track_progress (Boolean): If set to True, the ' DSWS_Median_Index_Data ' function will print out each mnemonic requested of DSWS

    Default: track_progress = False

    

    

    track_progress (Boolean): If set to True, each string of the transformed index will be printed off - remember that there is one per month.

    Default: track_progress = False

    

    add_to_global_DF (Boolean): If set to True, on top of returning the 3 lists of Pandas data-frames described above, this function will append lists named DF, _DF, and debug_df.

    This does mean that you need to include a line such as 'DF, _DF, debug_df = [], [], []' beforehand as a result.

    Default: add_to_global_DF = False

    """

    

    # We would like to keep the name of the companies in question.

    if 'NAME' not in list_of_fields:

        list_of_fields = ['NAME'] + list_of_fields

    

    # We are often going to need to split our dates into more useful objects, that is what 'Proccess_Date' does: 

    def Proccess_Date(date):

        d = date.split("-")

        year, year_2, month = d[0], d[0][2:4], d[1]

        day = calendar.monthrange(int(year), int(month))[1] # gets last day for that month

        return(year, year_2, month, day)

    

    # Finding the number of months between ' start ' and ' end '.

    from_year, from_year_2, from_month, from_day = Proccess_Date(date = start)

    to_year, to_year_2, to_month, to_day = Proccess_Date(date = end)

    start_date = datetime.datetime(int(from_year), int(from_month), int(from_day))

    end_date = datetime.datetime(int(to_year), int(to_month), int(to_day))

    num_months = (end_date.year - start_date.year) * 12 + (end_date.month - start_date.month) + 1 # ' + 1 ' to include the last month.

    

    if add_to_global_DF == True:

        # If the user is afraid that the function could fail halfway through,

        #    (s)he can append an existing data-frame via this if statement.

        global DF

        global _DF

        global debug_df

    else:

        DF, _DF, debug_df = [], [], [] # These ate the lists of the data-frames of interest that will be populated in the following loop.

    

    for a in indices: # Iterating through the list of indices:

        df_list = [] # List to be appended/populated with the for loop bellow.

        # for d in range(num_months+2): # ' +1 ' since the ' range ' function is exclusive with respect to its upper limit, and an additional 1 for go over the month inclusive n the range of the argument.

        for d in range(num_months):

            _start_date = start_date + dateutil.relativedelta.relativedelta(months=+d)

            from_year, from_year_2, from_month, from_day = Proccess_Date(date = _start_date.strftime("%Y-%m-%d"))

 

            if track_progress == True:

                print(f"L{a}{from_month}{from_year_2}|L")

 

            df_list.append(

                ds.get_data(

                    tickers = f"L{a}{from_month}{from_year_2}|L",

                    fields = list_of_fields,

                    start = _start_date.strftime("%Y-%m-%d"),

                    kind = 0))

 

        df = pd.concat(df_list).reset_index(drop = True)

        

        date_range = df[df["Dates"].notna()]["Dates"].unique()

        

        debug_df.append(df)

        

        # This try loop is in case a field value requests non-numerical data that would break the code (since a median value could then not be computed).

        try:

            _df = pd.DataFrame(

                data = [[df[df["Value"] != "NA"][df["Datatype"] == i][df["Dates"] == j]["Value"].median()

                         for i in list_of_fields[1:]] for j in date_range], # ' list_of_fields[1:] ' as opposed to ' list_of_fields ' because we don't want 'NAME'.

                columns = pd.MultiIndex.from_product([[a], list_of_fields[1:]],

                                                     names = ['Index', 'Metrics']),

                index = date_range)

        except:

            for i in list_of_fields[1:]:

                for j in date_range:

                    try:

                        df[df["Value"] != "NA"][df["Datatype"] == i][df["Dates"] == j]["Value"].median()

                    except:

                        print(f"field '{i}' does not return numerical data; a median could thus not be computed for it.")

                        print("The program has stopped. Please run it again without this field.")

        

        DF.append(df)

        _DF.append(_df)

    return DF, _DF, debug_df

Example setting 'track_progress' and 'add_to_global_DF' to True

Note that - in this example - we made sure to normalize all currency-denominated values to the same currency (USD), otherwise taking the median value of different currencies wouldn't make much sense.

    	
            

DF, _DF, debug_df = [], [], []

 

test1, test1_df, test1_deb_df = DSWS_Median_Index_Data(

    indices = ["MSWRLD", "S&PCOMP"],

    list_of_fields = ['X(EVT1FD12)~USD', 'X(EBT1FD12)~USD', 'X(EBD1FD12)~USD',

                      'X(CPS1FD12)~USD*X(IBNOSH)', 'X(MV)~USD', 'X(PEFD12)',

                      'X(SAL1FD12)~USD', 'X(BPS1FD12)~USD*X(IBNOSH)'],

    start = '2013-01-01',

    end = '2013-03-01',

    track_progress = True,

    add_to_global_DF = True)

LMSWRLD0113|L
LMSWRLD0213|L
LMSWRLD0313|L

LS&PCOMP0113|L
LS&PCOMP0213|L
LS&PCOMP0313|L

    	
            test1[0]
        
        
    
  Instrument Datatype Value Dates
0 MS:65955 NAME A P MOLLER MAERSK A 2013-01-31
1 MS:1016 NAME A P MOLLER MAERSK B 2013-01-31
2 MS:388 NAME ABB LTD N 2013-01-31
3 MS:10 NAME ABBOTT LABORATORIES 2013-01-31
4 MS:96942 NAME ABBVIE 2013-01-31
... ... ... ... ...
43447 MS:65552 X(BPS1FD12)~USD*X(IBNOSH) 6266.1 2013-03-31
43448 MS:3695 X(BPS1FD12)~USD*X(IBNOSH) 3074.03 2013-03-31
43449 MS:3698 X(BPS1FD12)~USD*X(IBNOSH) 34543.6 2013-03-31
43450 MS:4502 X(BPS1FD12)~USD*X(IBNOSH) 4837.31 2013-03-31
43451 MS:2138 X(BPS1FD12)~USD*X(IBNOSH) 20133.9 2013-03-31

43452 rows × 4 columns

    	
            test1_df[0]
        
        
    
Index MSWRLD
Metrics X(EVT1FD12)~USD X(EBT1FD12)~USD X(EBD1FD12)~USD X(CPS1FD12)~USD*X(IBNOSH) X(MV)~USD X(PEFD12) X(SAL1FD12)~USD X(BPS1FD12)~USD*X(IBNOSH)
31/01/2013 9768.0295 1008.731 1159.4745 1033.763 9083.045 13.934 5136.3315 5321.7885
28/02/2013 9748.715 993.057 1166.615 1045.071 9107.255 14.3305 5113.3065 5254.477
31/03/2013 9876.253 1000.378 1151.172 1037.7785 9345.155 14.8205 5117.3 5243.3455
    	
            test1_df[1]
        
        
    
Index S&PCOMP
Metrics X(EVT1FD12)~USD X(EBT1FD12)~USD X(EBD1FD12)~USD X(CPS1FD12)~USD*X(IBNOSH) X(MV)~USD X(PEFD12) X(SAL1FD12)~USD X(BPS1FD12)~USD*X(IBNOSH)
2013-01-31 14746.49 1403.990 1956.5220 1353.3355 13275.135 14.0380 9011.652 5988.997
2013-02-28 15450.20 1404.525 1910.5655 1371.0870 13112.900 14.4570 9073.121 5870.180
2013-03-31 15799.75 1411.040 1921.3090 1368.2955 13821.085 14.8685 9089.930 5936.011
    	
            test1_deb_df[0]
        
        
    
  Instrument Datatype Value Dates
0 MS:65955 NAME A P MOLLER MAERSK A 2013-01-31
1 MS:1016 NAME A P MOLLER MAERSK B 2013-01-31
2 MS:388 NAME ABB LTD N 2013-01-31
3 MS:10 NAME ABBOTT LABORATORIES 2013-01-31
4 MS:96942 NAME ABBVIE 2013-01-31
... ... ... ... ...
43447 MS:65552 X(BPS1FD12)~USD*X(IBNOSH) 6266.1 2013-03-31
43448 MS:3695 X(BPS1FD12)~USD*X(IBNOSH) 3074.03 2013-03-31
43449 MS:3698 X(BPS1FD12)~USD*X(IBNOSH) 34543.6 2013-03-31
43450 MS:4502 X(BPS1FD12)~USD*X(IBNOSH) 4837.31 2013-03-31
43451 MS:2138 X(BPS1FD12)~USD*X(IBNOSH) 20133.9 2013-03-31
    	
            DF[0]
        
        
    
  Instrument Datatype Value Dates
0 MS:65955 NAME A P MOLLER MAERSK A 2013-01-31
1 MS:1016 NAME A P MOLLER MAERSK B 2013-01-31
2 MS:388 NAME ABB LTD N 2013-01-31
3 MS:10 NAME ABBOTT LABORATORIES 2013-01-31
4 MS:96942 NAME ABBVIE 2013-01-31
... ... ... ... ...
43447 MS:65552 X(BPS1FD12)~USD*X(IBNOSH) 6266.1 2013-03-31
43448 MS:3695 X(BPS1FD12)~USD*X(IBNOSH) 3074.03 2013-03-31
43449 MS:3698 X(BPS1FD12)~USD*X(IBNOSH) 34543.6 2013-03-31
43450 MS:4502 X(BPS1FD12)~USD*X(IBNOSH) 4837.31 2013-03-31
43451 MS:2138 X(BPS1FD12)~USD*X(IBNOSH) 20133.9 2013-03-31
         
    	
            _DF[0]
        
        
    
Index MSWRLD
Metrics X(EVT1FD12)~USD X(EBT1FD12)~USD X(EBD1FD12)~USD X(CPS1FD12)~USD*X(IBNOSH) X(MV)~USD X(PEFD12) X(SAL1FD12)~USD X(BPS1FD12)~USD*X(IBNOSH)
31/01/2013 9768.0295 1008.731 1159.4745 1033.763 9083.045 13.934 5136.3315 5321.7885
28/02/2013 9748.715 993.057 1166.615 1045.071 9107.255 14.3305 5113.3065 5254.477
31/03/2013 9876.253 1000.378 1151.172 1037.7785 9345.155 14.8205 5117.3 5243.3455
    	
            debug_df[0]
        
        
    
  Instrument Datatype Value Dates
0 MS:65955 NAME A P MOLLER MAERSK A 31/01/2013
1 MS:1016 NAME A P MOLLER MAERSK B 31/01/2013
2 MS:388 NAME ABB LTD N 31/01/2013
3 MS:10 NAME ABBOTT LABORATORIES 31/01/2013
4 MS:96942 NAME ABBVIE 31/01/2013
... ... ... ... ...
43447 MS:65552 X(BPS1FD12)~USD*X(IBNOSH) 6266.1 31/03/2013
43448 MS:3695 X(BPS1FD12)~USD*X(IBNOSH) 3074.03 31/03/2013
43449 MS:3698 X(BPS1FD12)~USD*X(IBNOSH) 34543.6 31/03/2013
43450 MS:4502 X(BPS1FD12)~USD*X(IBNOSH) 4837.31 31/03/2013
43451 MS:2138 X(BPS1FD12)~USD*X(IBNOSH) 20133.9 31/03/2013

You can find information inscribed inbetween the triple double quotes (*i.e.*:""" """) above with the Python functino ' help() ':

    	
            help(DSWS_Median_Index_Data)
        
        
    

Help on function DSWS_Median_Index_Data in module __main__:

DSWS_Median_Index_Data(indices=['MSWRLD'], list_of_fields=['X(EVT1FD12)~USD', 'X(EBT1FD12)~USD', 'X(EBD1FD12)~USD', 'X(CPS1FD12)~USD*X(IBNOSH)', 'X(MV)~USD', 'PEFD12', 'X(SAL1FD12)~USD', 'X(BPS1FD12)~USD*X(IBNOSH)'], start='2013-01-01', end='2021-01-31', track_progress=False, add_to_global_DF=False)
' DSWS_Median_Index_Data ' requests data defined in ' list_of_fields ' from Datastream's API - DataStream Web
Services (DSWS) - for a series of indices of choice - defined in 'indices' - and returns three lists full of Pandas data-frames:
1st - the raw output from the requests (with columns 'Instrument', 'Datatype', 'Value' and 'Dates');
2nd - the monthly median from the raw output from the requests (with columns of the requested metrics and an index of dates);
3rd - the raw data pulled from the DSWS request (and where issues are most likely to lie).

One could re-write this basing themselves off datetime objects, it would be equivalent; its code may look neater too.

indices (list): List of strings of the DSWS mnemonic for the index for which data is requested.
Default: indices = ["MSWRLD"]

list_of_fields (list): List of strings of the DSWS fields we are after. This has to be comprised of 49 elements or less.
Don't include non numerical fields like 'GDSCN'. That will result in a printed error
Default: list_of_fields = ['EVT1FD12', 'EBT1FD12', 'EBD1FD12', 'X(CPS1FD12)*X(IBNOSH)', 'X(MV)~IBCUR', 'PEFD12', 'SAL1FD12', 'X(BPS1FD12)*X(IBNOSH)', '(X(SAL1FD12)*X(IBUNIT))', '354E(X)', 'X(INC1FD12)*X(IBUNIT)', 'X(FCF1FD12)*X(IBNOSH)', 'X(FCF1FD12)*X(IBNOSH)*X(IBUNIT)', 'GDSCN']

start (str): String of the start date from which we are asking for data in 'yyy-mm-dd' format.
Default: start = '2013-01-01'

end (str): String of the end date until which we are asking for data in 'yyy-mm-dd' format.
Default: end = '2021-01-31'

track_progress (Boolean): If set to True, the ' DSWS_Median_Index_Data ' function will print out each mnemonic requested of DSWS
Default: track_progress = False


track_progress (Boolean): If set to True, each string of the transformed index will be printed off - remember that there is one per month.
Default: track_progress = False

add_to_global_DF (Boolean): If set to True, on top of returning the 3 lists of Pandas data-frames described above, this function will append lists named DF, _DF, and debug_df.
This does mean that you need to include a line such as 'DF, _DF, debug_df = [], [], []' beforehand as a result.
Default: add_to_global_DF = False

 

References

You can find more detail regarding the DSWS API and related technologies for this article from the following resources:

For any question related to this example or Eikon Data API, please use the Developers Community Q&A Forum.