Datastream API (DSWS) Median Index Data

March 01,2021

Author:

Jonathan Legrand

In this article, we will create a Python function that will take the median measure of all (non 'NaN') values of a specific field for any index (or list of indices) of choice using Refinitiv's DataStream Web Services (DSWS).

Get Coding

Import Libraries

For full replication, note that the version of libraries used

    	
            import sys # ' sys ' is only needed to display our Pyhon version
print("This code is running on Python version " + sys.version[0:5])

This code is running on Python version 3.7.7

We need to gather our data. Since Refinitiv's DataStream Web Services (DSWS) allows for access to ESG data covering nearly 70% of global market cap and over 400 metrics, naturally it is more than appropriate. We can access DSWS via the Python library "DatastreamDSWS" that can be installed simply by using pip install.

    	
            import DatastreamDSWS as DSWS
 
# We can use our Refinitiv's Datastream Web Socket (DSWS) API keys that allows us to be identified by
# Refinitiv's back-end services and enables us to request (and fetch) data:
# Credentials are placed in a text file so that it may be used in this code without showing it itself.
(DSWS_username, DSWS_password) = (open("Datastream_username.txt","r"),
                                  open("Datastream_password.txt","r"))
 
ds = DSWS.Datastream(username = str(DSWS_username.read()),
                     password = str(DSWS_password.read()))
 
# It is best to close the files we opened in order to make sure that we don't stop any other
# services/programs from accessing them if they need to.
DSWS_username.close()
DSWS_password.close()
 
 
# # Alternatively one can use the following:
# import getpass
# dsusername = input()
# dspassword = getpass.getpass()
# ds = DSWS.Datastream(username = dsusername, password = dspassword)

The libraries in the cell below are native to Python.

    	
            import calendar # This library will be used to get the last day of any one month.
from datetime import date # ' datetime ' and ' date ' is crucial to manipulate dates.
import datetime # ' datetime ' and ' date ' is crucial to manipulate dates.

The libraries in the cell below are not native to Python - id est (i.e.): they are not installed upon Python's installation. This means that they have their own version independently of Python's version used.

    	
            import dateutil # ' datetime ' and ' date ' is crucial to manipulate dates.
import numpy as np
import pandas as pd
for i,j in zip(["dateutil", "np","pd"],
               [ dateutil,   np,  pd]):
    print(f"The {i} library imported in this code is version: {j.__version__}")

The dateutil library imported in this code is version: 2.8.1
The np library imported in this code is version: 1.18.5
The pd library imported in this code is version: 1.1.4

Collecting Data: A Simple Example

1st: We have to specify our list of fields. In this example, we have:

EVT1FD12: Enterprise Value's ForwarD contract with 12 month maturity (12FRW).
EBT1FD12: Earnings Before Taxes' 12FRW.
EBD1FD12: Earnings Before Depreciation and amortisation's 12FRW.
X(CPS1FD12)*X(IBNOSH): product of the Cashflow Per Share amount's 12FRW and the Institutional Brokers Estimate Service( i.e.: I/B/E/S pulled from datastream's estimate data)'s Number Of Shares.
X(MV)~IBCUR: Market Value IBES currency.
PEFD12: market share Price to Equity ratio's 12FRW.
SAL1FD12: SALes' 12FRW.
X(BPS1FD12)*X(IBNOSH): product of the Book value Per Share's 12FRW and the Institutional Brokers Estimate Service's Number Of Shares.

The ' ~USD ' is there to revert the Currency to United States Dollars.

N.B.: X is the variable, so X(...)*X(...) means that we multiply them

    	
            list_of_fields = ['X(EVT1FD12)~USD', 'X(EBT1FD12)~USD', 'X(EBD1FD12)~USD', 'X(CPS1FD12)~USD*X(IBNOSH)',
                  'X(MV)~USD', 'PEFD12', 'X(SAL1FD12)~USD', 'X(BPS1FD12)~USD*X(IBNOSH)']

Now that we know the fields that we want to capture, we will ask for DSWS to collect such data for all constituents of the MSCI World Index. We need to do a few changed to the mnemonics/tickers used in the DSWS function ' ds.get_data ':

We need to add an 'L' at the start to indicate that we are requesting for the list of the constituents of the index.
We need to remove '$' as it is not recognised by the backend. We need to remove all currency symbols for this to work.
We need to add the month and year (in mmyy format) at the end to specify the point in time that we are trying to replicate. This means that we are capturing data for the list of constituents of that index on that month.
We need to add '|L' at the end to indicate to the DSWS service that we are requesting data for a list, not a single object.

N.B.: start = '2013-08-31' collects data from August 2013. If someone was to use ds.get_data("LMSWRLD0313|L", start = '2014-08-31'), they'd be collecting data starting in August 2014 for the constituents listed under the MSWRLD index back in March 2013.

    	
            df = ds.get_data("LMSWRLD0313|L",
                 ['NAME',
                  list_of_fields[0]],
                 start = '2013-08-31',
                 kind = 0)

Now that we have stored the data we are looking for in the Python object 'df', we can sort through it:

    	
            # ' df["Datatype"] == "X(EVT1FD12)~USD" ' here sieves through ' df ' and only
#    returns data where what is in the outer most square brackets is
#    true, i.e.: where values in the 'Datatype' column is 'X(EVT1FD12)~USD'.
df[df["Datatype"] == "X(EVT1FD12)~USD"] # Note that ' "X(EVT1FD12)~USD" ' is the same as ' list_of_fields[0] '

	Instrument	Datatype	Value	Dates
1608	MS:65955	X(EVT1FD12)~USD	NA	2013-08-31
1609	MS:1016	X(EVT1FD12)~USD	49373.9	2013-08-31
1610	MS:388	X(EVT1FD12)~USD	52651.6	2013-08-31
1611	MS:10	X(EVT1FD12)~USD	54133.7	2013-08-31
1612	MS:96942	X(EVT1FD12)~USD	77434.2	2013-08-31
...	...	...	...	...
3211	MS:65552	X(EVT1FD12)~USD	11510.2	2013-08-31
3212	MS:3695	X(EVT1FD12)~USD	8305.29	2013-08-31
3213	MS:3698	X(EVT1FD12)~USD	NA	2013-08-31
3214	MS:4502	X(EVT1FD12)~USD	NA	2013-08-31
3215	MS:2138	X(EVT1FD12)~USD	82122.3	2013-08-31

1608 rows × 4 columns

There are a lot of repeated values, let's use ' .unique() ' to only retrieve unique values within the 'Instrument' column:

    	
            df[df["Instrument"] == df["Instrument"].unique()[0]] # ' [0] ' here is needed to indicate that we want the 0th unique value in question.

	Instrument	Datatype	Value	Dates
0	MS:65955	NAME	A P MOLLER MAERSK A	2013-08-31
1608	MS:65955	X(EVT1FD12)~USD	NA	2013-08-31

    	
            df[df["Datatype"] == list_of_fields[0]]

	Instrument	Datatype	Value	Dates
1608	MS:65955	X(EVT1FD12)~USD	NA	2013-08-31
1609	MS:1016	X(EVT1FD12)~USD	49373.9	2013-08-31
1610	MS:388	X(EVT1FD12)~USD	52651.6	2013-08-31
1611	MS:10	X(EVT1FD12)~USD	54133.7	2013-08-31
1612	MS:96942	X(EVT1FD12)~USD	77434.2	2013-08-31
...	...	...	...	...
3211	MS:65552	X(EVT1FD12)~USD	11510.2	2013-08-31
3212	MS:3695	X(EVT1FD12)~USD	8305.29	2013-08-31
3213	MS:3698	X(EVT1FD12)~USD	NA	2013-08-31
3214	MS:4502	X(EVT1FD12)~USD	NA	2013-08-31
3215	MS:2138	X(EVT1FD12)~USD	82122.3	2013-08-31

1608 rows × 4 columns

    	
            print(list_of_fields[0]) # Printing out the field of interest's name
df[df["Datatype"] == list_of_fields[0]][df["Value"] != 'NA']["Value"].median() # ' .median() ' gets us - you guessed it - the median value.

X(EVT1FD12)~USD

10479.682

Creating Python Function to programmatically collect data

    	
            def DSWS_Median_Index_Data(indices = ["MSWRLD"],
                           list_of_fields = ['X(EVT1FD12)~USD', 'X(EBT1FD12)~USD', 'X(EBD1FD12)~USD', 'X(CPS1FD12)~USD*X(IBNOSH)',
                                             'X(MV)~USD', 'PEFD12', 'X(SAL1FD12)~USD', 'X(BPS1FD12)~USD*X(IBNOSH)'], # Don't include non numerical fields like 'GDSCN'.
                          start = '2013-01-01',
                          end = '2021-01-31',
                          track_progress = False,
                          add_to_global_DF = False):
    """
    ' DSWS_Median_Index_Data ' requests data defined in ' list_of_fields ' from Datastream's API - DataStream Web
    Services (DSWS) - for a series of indices of choice - defined in 'indices' - and returns three lists full of Pandas data-frames:
    1st - the raw output from the requests (with columns 'Instrument', 'Datatype', 'Value' and 'Dates');
    2nd - the monthly median from the raw output from the requests (with columns of the requested metrics and an index of dates);
    3rd - the raw data pulled from the DSWS request (and where issues are most likely to lie).
    
    One could re-write this basing themselves off datetime objects, it would be equivalent; its code may look neater too.
    
    indices (list): List of strings of the DSWS mnemonic for the index for which data is requested.
    Default: indices = ["MSWRLD"]
    
    list_of_fields (list): List of strings of the DSWS fields we are after. This has to be comprised of 49 elements or less.
    Don't include non numerical fields like 'GDSCN'. That will result in a printed error
    Default: list_of_fields = ['EVT1FD12', 'EBT1FD12', 'EBD1FD12', 'X(CPS1FD12)*X(IBNOSH)', 'X(MV)~IBCUR', 'PEFD12', 'SAL1FD12', 'X(BPS1FD12)*X(IBNOSH)', '(X(SAL1FD12)*X(IBUNIT))', '354E(X)', 'X(INC1FD12)*X(IBUNIT)', 'X(FCF1FD12)*X(IBNOSH)', 'X(FCF1FD12)*X(IBNOSH)*X(IBUNIT)', 'GDSCN']
    
    start (str): String of the start date from which we are asking for data in 'yyy-mm-dd' format.
    Default: start = '2013-01-01'
    
    end (str): String of the end date until which we are asking for data in 'yyy-mm-dd' format.
    Default: end = '2021-01-31'
    
    track_progress (Boolean): If set to True, the ' DSWS_Median_Index_Data ' function will print out each mnemonic requested of DSWS
    Default: track_progress = False
    
    
    track_progress (Boolean): If set to True, each string of the transformed index will be printed off - remember that there is one per month.
    Default: track_progress = False
    
    add_to_global_DF (Boolean): If set to True, on top of returning the 3 lists of Pandas data-frames described above, this function will append lists named DF, _DF, and debug_df.
    This does mean that you need to include a line such as 'DF, _DF, debug_df = [], [], []' beforehand as a result.
    Default: add_to_global_DF = False
    """
    
    # We would like to keep the name of the companies in question.
    if 'NAME' not in list_of_fields:
        list_of_fields = ['NAME'] + list_of_fields
    
    # We are often going to need to split our dates into more useful objects, that is what 'Proccess_Date' does: 
    def Proccess_Date(date):
        d = date.split("-")
        year, year_2, month = d[0], d[0][2:4], d[1]
        day = calendar.monthrange(int(year), int(month))[1] # gets last day for that month
        return(year, year_2, month, day)
    
    # Finding the number of months between ' start ' and ' end '.
    from_year, from_year_2, from_month, from_day = Proccess_Date(date = start)
    to_year, to_year_2, to_month, to_day = Proccess_Date(date = end)
    start_date = datetime.datetime(int(from_year), int(from_month), int(from_day))
    end_date = datetime.datetime(int(to_year), int(to_month), int(to_day))
    num_months = (end_date.year - start_date.year) * 12 + (end_date.month - start_date.month) + 1 # ' + 1 ' to include the last month.
    
    if add_to_global_DF == True:
        # If the user is afraid that the function could fail halfway through,
        #    (s)he can append an existing data-frame via this if statement.
        global DF
        global _DF
        global debug_df
    else:
        DF, _DF, debug_df = [], [], [] # These ate the lists of the data-frames of interest that will be populated in the following loop.
    
    for a in indices: # Iterating through the list of indices:
        df_list = [] # List to be appended/populated with the for loop below.
        # for d in range(num_months+2): # ' +1 ' since the ' range ' function is exclusive with respect to its upper limit, and an additional 1 for go over the month inclusive n the range of the argument.
        for d in range(num_months):
            _start_date = start_date + dateutil.relativedelta.relativedelta(months=+d)
            from_year, from_year_2, from_month, from_day = Proccess_Date(date = _start_date.strftime("%Y-%m-%d"))
 
            if track_progress == True:
                print(f"L{a}{from_month}{from_year_2}|L")
 
            df_list.append(
                ds.get_data(
                    tickers = f"L{a}{from_month}{from_year_2}|L",
                    fields = list_of_fields,
                    start = _start_date.strftime("%Y-%m-%d"),
                    kind = 0))
 
        df = pd.concat(df_list).reset_index(drop = True)
        
        date_range = df[df["Dates"].notna()]["Dates"].unique()
        
        debug_df.append(df)
        
        # This try loop is in case a field value requests non-numerical data that would break the code (since a median value could then not be computed).
        try:
            _df = pd.DataFrame(
                data = [[df[df["Value"] != "NA"][df["Datatype"] == i][df["Dates"] == j]["Value"].median()
                         for i in list_of_fields[1:]] for j in date_range], # ' list_of_fields[1:] ' as opposed to ' list_of_fields ' because we don't want 'NAME'.
                columns = pd.MultiIndex.from_product([[a], list_of_fields[1:]],
                                                     names = ['Index', 'Metrics']),
                index = date_range)
        except:
            for i in list_of_fields[1:]:
                for j in date_range:
                    try:
                        df[df["Value"] != "NA"][df["Datatype"] == i][df["Dates"] == j]["Value"].median()
                    except:
                        print(f"field '{i}' does not return numerical data; a median could thus not be computed for it.")
                        print("The program has stopped. Please run it again without this field.")
        
        DF.append(df)
        _DF.append(_df)
    return DF, _DF, debug_df

Example setting 'track_progress' and 'add_to_global_DF' to True

Note that - in this example - we made sure to normalize all currency-denominated values to the same currency (USD), otherwise taking the median value of different currencies wouldn't make much sense.

    	
            DF, _DF, debug_df = [], [], []
 
test1, test1_df, test1_deb_df = DSWS_Median_Index_Data(
    indices = ["MSWRLD", "S&PCOMP"],
    list_of_fields = ['X(EVT1FD12)~USD', 'X(EBT1FD12)~USD', 'X(EBD1FD12)~USD',
                      'X(CPS1FD12)~USD*X(IBNOSH)', 'X(MV)~USD', 'X(PEFD12)',
                      'X(SAL1FD12)~USD', 'X(BPS1FD12)~USD*X(IBNOSH)'],
    start = '2013-01-01',
    end = '2013-03-01',
    track_progress = True,
    add_to_global_DF = True)

LMSWRLD0113|L
LMSWRLD0213|L
LMSWRLD0313|L

LS&PCOMP0113|L
LS&PCOMP0213|L
LS&PCOMP0313|L

    	
            test1[0]

	Instrument	Datatype	Value	Dates
0	MS:65955	NAME	A P MOLLER MAERSK A	2013-01-31
1	MS:1016	NAME	A P MOLLER MAERSK B	2013-01-31
2	MS:388	NAME	ABB LTD N	2013-01-31
3	MS:10	NAME	ABBOTT LABORATORIES	2013-01-31
4	MS:96942	NAME	ABBVIE	2013-01-31
...	...	...	...	...
43447	MS:65552	X(BPS1FD12)~USD*X(IBNOSH)	6266.1	2013-03-31
43448	MS:3695	X(BPS1FD12)~USD*X(IBNOSH)	3074.03	2013-03-31
43449	MS:3698	X(BPS1FD12)~USD*X(IBNOSH)	34543.6	2013-03-31
43450	MS:4502	X(BPS1FD12)~USD*X(IBNOSH)	4837.31	2013-03-31
43451	MS:2138	X(BPS1FD12)~USD*X(IBNOSH)	20133.9	2013-03-31

43452 rows × 4 columns

    	
            test1_df[0]

Index	MSWRLD
Metrics	X(EVT1FD12)~USD	X(EBT1FD12)~USD	X(EBD1FD12)~USD	X(CPS1FD12)~USD*X(IBNOSH)	X(MV)~USD	X(PEFD12)	X(SAL1FD12)~USD	X(BPS1FD12)~USD*X(IBNOSH)
31/01/2013	9768.0295	1008.731	1159.4745	1033.763	9083.045	13.934	5136.3315	5321.7885
28/02/2013	9748.715	993.057	1166.615	1045.071	9107.255	14.3305	5113.3065	5254.477
31/03/2013	9876.253	1000.378	1151.172	1037.7785	9345.155	14.8205	5117.3	5243.3455

    	
            test1_df[1]

Metrics	X(EVT1FD12)~USD	X(EBT1FD12)~USD	X(EBD1FD12)~USD	X(CPS1FD12)~USD*X(IBNOSH)	X(MV)~USD	X(PEFD12)	X(SAL1FD12)~USD	X(BPS1FD12)~USD*X(IBNOSH)
Index	S&PCOMP
2013-01-31	14746.49	1403.990	1956.5220	1353.3355	13275.135	14.0380	9011.652	5988.997
2013-02-28	15450.20	1404.525	1910.5655	1371.0870	13112.900	14.4570	9073.121	5870.180
2013-03-31	15799.75	1411.040	1921.3090	1368.2955	13821.085	14.8685	9089.930	5936.011

    	
            test1_deb_df[0]

	Instrument	Datatype	Value	Dates
0	MS:65955	NAME	A P MOLLER MAERSK A	2013-01-31
1	MS:1016	NAME	A P MOLLER MAERSK B	2013-01-31
2	MS:388	NAME	ABB LTD N	2013-01-31
3	MS:10	NAME	ABBOTT LABORATORIES	2013-01-31
4	MS:96942	NAME	ABBVIE	2013-01-31
...	...	...	...	...
43447	MS:65552	X(BPS1FD12)~USD*X(IBNOSH)	6266.1	2013-03-31
43448	MS:3695	X(BPS1FD12)~USD*X(IBNOSH)	3074.03	2013-03-31
43449	MS:3698	X(BPS1FD12)~USD*X(IBNOSH)	34543.6	2013-03-31
43450	MS:4502	X(BPS1FD12)~USD*X(IBNOSH)	4837.31	2013-03-31
43451	MS:2138	X(BPS1FD12)~USD*X(IBNOSH)	20133.9	2013-03-31

    	
            DF[0]

	Instrument	Datatype	Value	Dates
0	MS:65955	NAME	A P MOLLER MAERSK A	2013-01-31
1	MS:1016	NAME	A P MOLLER MAERSK B	2013-01-31
2	MS:388	NAME	ABB LTD N	2013-01-31
3	MS:10	NAME	ABBOTT LABORATORIES	2013-01-31
4	MS:96942	NAME	ABBVIE	2013-01-31
...	...	...	...	...
43447	MS:65552	X(BPS1FD12)~USD*X(IBNOSH)	6266.1	2013-03-31
43448	MS:3695	X(BPS1FD12)~USD*X(IBNOSH)	3074.03	2013-03-31
43449	MS:3698	X(BPS1FD12)~USD*X(IBNOSH)	34543.6	2013-03-31
43450	MS:4502	X(BPS1FD12)~USD*X(IBNOSH)	4837.31	2013-03-31
43451	MS:2138	X(BPS1FD12)~USD*X(IBNOSH)	20133.9	2013-03-31

    	
            _DF[0]

Index	MSWRLD
Metrics	X(EVT1FD12)~USD	X(EBT1FD12)~USD	X(EBD1FD12)~USD	X(CPS1FD12)~USD*X(IBNOSH)	X(MV)~USD	X(PEFD12)	X(SAL1FD12)~USD	X(BPS1FD12)~USD*X(IBNOSH)
31/01/2013	9768.0295	1008.731	1159.4745	1033.763	9083.045	13.934	5136.3315	5321.7885
28/02/2013	9748.715	993.057	1166.615	1045.071	9107.255	14.3305	5113.3065	5254.477
31/03/2013	9876.253	1000.378	1151.172	1037.7785	9345.155	14.8205	5117.3	5243.3455

    	
            debug_df[0]

	Instrument	Datatype	Value	Dates
0	MS:65955	NAME	A P MOLLER MAERSK A	31/01/2013
1	MS:1016	NAME	A P MOLLER MAERSK B	31/01/2013
2	MS:388	NAME	ABB LTD N	31/01/2013
3	MS:10	NAME	ABBOTT LABORATORIES	31/01/2013
4	MS:96942	NAME	ABBVIE	31/01/2013
...	...	...	...	...
43447	MS:65552	X(BPS1FD12)~USD*X(IBNOSH)	6266.1	31/03/2013
43448	MS:3695	X(BPS1FD12)~USD*X(IBNOSH)	3074.03	31/03/2013
43449	MS:3698	X(BPS1FD12)~USD*X(IBNOSH)	34543.6	31/03/2013
43450	MS:4502	X(BPS1FD12)~USD*X(IBNOSH)	4837.31	31/03/2013
43451	MS:2138	X(BPS1FD12)~USD*X(IBNOSH)	20133.9	31/03/2013

You can find information inscribed inbetween the triple double quotes (*i.e.*:""" """) above with the Python functino ' help() ':

    	
            help(DSWS_Median_Index_Data)

Help on function DSWS_Median_Index_Data in module __main__:

DSWS_Median_Index_Data(indices=['MSWRLD'], list_of_fields=['X(EVT1FD12)~USD', 'X(EBT1FD12)~USD', 'X(EBD1FD12)~USD', 'X(CPS1FD12)~USD*X(IBNOSH)', 'X(MV)~USD', 'PEFD12', 'X(SAL1FD12)~USD', 'X(BPS1FD12)~USD*X(IBNOSH)'], start='2013-01-01', end='2021-01-31', track_progress=False, add_to_global_DF=False)
' DSWS_Median_Index_Data ' requests data defined in ' list_of_fields ' from Datastream's API - DataStream Web
Services (DSWS) - for a series of indices of choice - defined in 'indices' - and returns three lists full of Pandas data-frames:
1st - the raw output from the requests (with columns 'Instrument', 'Datatype', 'Value' and 'Dates');
2nd - the monthly median from the raw output from the requests (with columns of the requested metrics and an index of dates);
3rd - the raw data pulled from the DSWS request (and where issues are most likely to lie).

One could re-write this basing themselves off datetime objects, it would be equivalent; its code may look neater too.

indices (list): List of strings of the DSWS mnemonic for the index for which data is requested.
Default: indices = ["MSWRLD"]

list_of_fields (list): List of strings of the DSWS fields we are after. This has to be comprised of 49 elements or less.
Don't include non numerical fields like 'GDSCN'. That will result in a printed error
Default: list_of_fields = ['EVT1FD12', 'EBT1FD12', 'EBD1FD12', 'X(CPS1FD12)*X(IBNOSH)', 'X(MV)~IBCUR', 'PEFD12', 'SAL1FD12', 'X(BPS1FD12)*X(IBNOSH)', '(X(SAL1FD12)*X(IBUNIT))', '354E(X)', 'X(INC1FD12)*X(IBUNIT)', 'X(FCF1FD12)*X(IBNOSH)', 'X(FCF1FD12)*X(IBNOSH)*X(IBUNIT)', 'GDSCN']

start (str): String of the start date from which we are asking for data in 'yyy-mm-dd' format.
Default: start = '2013-01-01'

end (str): String of the end date until which we are asking for data in 'yyy-mm-dd' format.
Default: end = '2021-01-31'

track_progress (Boolean): If set to True, the ' DSWS_Median_Index_Data ' function will print out each mnemonic requested of DSWS
Default: track_progress = False

track_progress (Boolean): If set to True, each string of the transformed index will be printed off - remember that there is one per month.
Default: track_progress = False

add_to_global_DF (Boolean): If set to True, on top of returning the 3 lists of Pandas data-frames described above, this function will append lists named DF, _DF, and debug_df.
This does mean that you need to include a line such as 'DF, _DF, debug_df = [], [], []' beforehand as a result.
Default: add_to_global_DF = False