Creating a WDI Heatmap with DataStream Web Services and Google Maps

This example demonstrates how to retrieve World Development Indicators from DataStream Web Services and then use Google Maps (gmaps) to create a heatmap with the retrieved data. To run the example, the DataStream account and Google Maps' API key are required. To use other maps, such as Geopandas, please refer to this article.

To get the Google Maps' API key, please refer to Get an API Key.

This example creates a heatmap of the yearly deposit interest rate (%) across countries. However, it can be applied to any World Development Indicators.

Loading required packages

The required packages for this example are:

  • DataStreamDSWS: Python API package for Refinitiv Datastream Webservice
  • pandas: Powerful data structures for data analysis, time series, and statistics
  • numpy: NumPy is the fundamental package for array computing with Python
  • gmaps: Google maps plugin for Jupyter notebooks
  • IPython.display: Public API for display tools in IPython
  • ipywidgets: IPython HTML widgets for Jupyter

You also need to install lxml package (pip install lxml) used by pandas when parsing the HTML file.

import DatastreamDSWS as DSWS
import pandas as pd
import numpy as np
import gmaps
from IPython.display import display
import ipywidgets as widgets

Setting credentials

The DataStream account and Google Maps' API key are required to run the example.

ds = DSWS.Datastream(username = '<username>', password = '<password>')
gmaps.configure(api_key='<api_key>')

Loading an instrument list

In DataStream, the World Development Indicators are available under the Economics » International Historical Sources » World Bank Development Indicators (WDI) category. The data is categorized by countries.

You can use the indication name to search for instruments of all available countries, and then refine the search results by selecting the results from World Bank WDI. For example, the search query for Deposit Interest Rate (%) from World Bank WDI is "DESC:and(Deposit Interest Rate (%)) DSSRC:and(World Bank WDI)".

After that, the search results can be downloaded to an excel file. This excel file can be read into a data frame object by calling the pandas.read_html method. Then, only active instruments are selected. In the data frame , the Symbol column contains instrument names and the Market column contains country names.

df = pd.read_html('DepositeInterestRate.xls')  
df[0] = df[0][df[0].Activity=="Active"]
df[0]
  Name Symbol RIC Start Date End Date Hist. Category Market Source Frequency Adjustment Forecast Full Name Activity
0 DEPOSIT INTEREST RATE (%) NGWDKJQDR - 1/1/1970 12/31/2019 1970 Economics Nigeria World Bank WDI Annual Not seasonally adjusted Historical Nigeria, Interest Rates, Deposit Interest Rate... Active
1 DEPOSIT INTEREST RATE (%) BLWDKJQDR - 1/1/2000 12/31/2019 2000 Economics Bulgaria World Bank WDI Annual Not seasonally adjusted Historical Bulgaria, Interest Rates, Deposit Interest Rat... Active
2 DEPOSIT INTEREST RATE (%) MYWDKJQDR - 1/1/1966 12/31/2019 1966 Economics Malaysia World Bank WDI Annual Not seasonally adjusted Historical Malaysia, Interest Rates, Deposit Interest Rat... Active
3 DEPOSIT INTEREST RATE (%) NZWDKJQDR - 1/1/1990 12/31/2019 1990 Economics New Zealand World Bank WDI Annual Not seasonally adjusted Historical New Zealand, Interest Rates, Deposit Interest ... Active
4 DEPOSIT INTEREST RATE (%) SPWDKJQDR - 1/1/1977 12/31/2019 1977 Economics Singapore World Bank WDI Annual Not seasonally adjusted Historical Singapore, Interest Rates, Deposit Interest Ra... Active
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
175 DEPOSIT INTEREST RATE (%) IAWDKJQDR - 1/1/2003 12/31/2016 2003 Economics Iran World Bank WDI Annual Not seasonally adjusted Historical Iran, Islamic Republic of, Interest Rates, Dep... Active
176 DEPOSIT INTEREST RATE (%) UKWDKJQDR - 1/1/1960 12/31/1998 1960 Economics United Kingdom World Bank WDI Annual Not seasonally adjusted Historical United Kingdom, Interest Rates, Deposit Intere... Active
177 DEPOSIT INTEREST RATE (%) ICWDKJQDR - 1/1/1973 12/31/2004 1973 Economics Iceland World Bank WDI Annual Not seasonally adjusted Historical Iceland, Interest Rates, Deposit Interest Rate... Active
178 DEPOSIT INTEREST RATE (%) UAWDKJQDR - 1/1/1980 12/31/2001 1980 Economics United Arab Emirates World Bank WDI Annual Not seasonally adjusted Historical United Arab Emirates, Interest Rates, Deposit ... Active
179 DEPOSIT INTEREST RATE (%) LYWDKJQDR - 1/1/1968 12/31/2014 1968 Economics Libya World Bank WDI Annual Not seasonally adjusted Historical Libya, Interest Rates, Deposit Interest Rate (... Active

180 rows × 14 columns

Requesting data

The instrument names in the Symbol column are split into several batch requests. Each batch request containing 20 instruments is sent to the DataStream Web Services to request yearly data for the last ten years. All results are joined to create a data frame object. Then, missing data is removed.

item_list=df[0]["Symbol"].values
batch_list = np.array_split(item_list,len(item_list)/20)
data = None
for batch in batch_list:
    temp_dat = ds.get_data(tickers=','.join(batch), start='-10Y',freq='Y')

    if  data is None:
        data = temp_dat
    else:
        data = data.join(temp_dat)
data.dropna(how='all', inplace=True)
data.dropna(axis=1,how='all',inplace=True)
data
Instrument NGWDKJQDR BLWDKJQDR MYWDKJQDR NZWDKJQDR SPWDKJQDR AOWDKJQDR TOWDKJQDR RSWDKJQDR BIWDKJQDR MCWDKJQDR ... MDWDKJQDR USWDKJQDR BAWDKJQDR LVWDKJQDR YAWDKJQDR MIWDKJQDR EOWDKJQDR LAWDKJQDR IAWDKJQDR LYWDKJQDR
Field                     ...                    
Dates                                          
2010-06-30 6.52 4.08 2.50 4.58 0.21 12.76 8.39 6.01 0.47 3.69 ... 10.29 0.31 1.23 0.77 18.67 3.60 1.11 3.0 11.94 2.5
2011-06-30 5.69 3.37 2.91 4.27 0.17 6.31 8.50 4.44 0.40 3.76 ... 10.65 0.30 1.02 0.82 20.00 4.11 1.27 NaN 11.16 2.5
2012-06-29 8.41 3.08 2.98 4.11 0.14 3.60 7.99 5.53 0.23 3.83 ... 10.40 0.28 1.08 0.38 19.50 11.08 0.67 NaN 14.81 2.5
2013-06-28 7.95 2.41 2.97 3.83 0.14 3.15 7.65 5.59 0.28 3.91 ... 10.15 NaN 1.06 0.19 15.25 18.41 0.42 NaN 14.76 2.5
2014-06-30 9.34 1.66 3.05 4.01 0.14 3.53 7.26 6.04 0.30 3.89 ... 10.95 NaN 0.98 NaN NaN 13.17 0.51 NaN 16.94 2.5
2015-06-30 9.15 0.61 3.13 3.73 0.17 3.31 7.00 9.20 0.34 3.80 ... 14.67 NaN 0.99 NaN NaN 11.59 0.50 NaN 16.30 NaN
2016-06-30 7.50 0.17 3.03 3.23 0.19 5.54 6.93 6.97 0.33 3.46 ... 15.00 NaN NaN NaN NaN 11.57 0.39 NaN 12.80 NaN
2017-06-30 9.55 0.05 2.92 3.30 0.14 6.34 6.49 5.86 0.32 3.12 ... 15.00 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2018-06-29 9.70 0.03 3.14 3.26 0.16 6.88 NaN 5.36 0.31 3.09 ... 12.75 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2019-06-28 8.81 0.02 2.98 3.00 0.20 6.42 NaN 5.40 0.37 3.01 ... 13.00 NaN NaN NaN NaN NaN NaN NaN NaN NaN

10 rows × 153 columns

Formatting data

The DataFrame must be re-formatted to a long format with Dates, Symbol, and Value columns. The pandas.DataFrame.melt is called to reformat the DataFrame. Then, rows with missing data or non-positive data are removed. Finally, the Dates column has been converted to Year.

data = data.reset_index().melt(id_vars=["Dates"], 
        var_name="Symbol", 
        value_name="Value")
data.dropna(inplace=True)
data = data[data['Value']>0]
data = data.rename({'Dates': 'Year'}, axis=1)
data["Year"] = [int(sublist.split('-', 1)[0])  for sublist in data['Year']]
data
  Year Symbol Value
0 2010 NGWDKJQDR 6.52
1 2011 NGWDKJQDR 5.69
2 2012 NGWDKJQDR 8.41
3 2013 NGWDKJQDR 7.95
4 2014 NGWDKJQDR 9.34
... ... ... ...
1520 2010 LYWDKJQDR 2.50
1521 2011 LYWDKJQDR 2.50
1522 2012 LYWDKJQDR 2.50
1523 2013 LYWDKJQDR 2.50
1524 2014 LYWDKJQDR 2.50

1344 rows × 3 columns

Adding country names

The current data frame doesn't have country names. To add country names, the DataFrame.merge method is called to join the Market column from the list downloaded from the DataStream to the current DataFrame. Then, the required columns (Year, Symbol, Market, and Value) are selected.

data = df[0].merge(data, on=['Symbol'], how='left').dropna()[['Year','Symbol','Market','Value']]
data
  Year Symbol Market Value
0 2010.0 NGWDKJQDR Nigeria 6.52
1 2011.0 NGWDKJQDR Nigeria 5.69
2 2012.0 NGWDKJQDR Nigeria 8.41
3 2013.0 NGWDKJQDR Nigeria 7.95
4 2014.0 NGWDKJQDR Nigeria 9.34
... ... ... ... ...
1366 2010.0 LYWDKJQDR Libya 2.50
1367 2011.0 LYWDKJQDR Libya 2.50
1368 2012.0 LYWDKJQDR Libya 2.50
1369 2013.0 LYWDKJQDR Libya 2.50
1370 2014.0 LYWDKJQDR Libya 2.50

1344 rows × 4 columns

Adding latitudes and longitudes

The latitudes and longitudes of countries are available at countries.csv. Then, loading this file to a data frame object.

countries = pd.read_csv("countries.csv")
countries
  country latitude longitude name
0 AD 42.546245 1.601554 Andorra
1 AE 23.424076 53.847818 United Arab Emirates
2 AF 33.939110 67.709953 Afghanistan
3 AG 17.060816 -61.796428 Antigua and Barbuda
4 AI 18.220554 -63.068615 Anguilla
... ... ... ... ...
241 YE 15.552727 48.516388 Yemen
242 YT -12.827500 45.166244 Mayotte
243 ZA -30.559482 22.937506 South Africa
244 ZM -13.133897 27.849332 Zambia
245 ZW -19.015438 29.154857 Zimbabwe

246 rows × 4 columns

Next, the DataFrame.merge method is called to join the latitude and longitude columns from the countries data frame to the result data frame.

result = pd.merge( data, countries,  left_on='Market', right_on='name', how='left')
result
  Year Symbol Market Value country latitude longitude name
0 2010.0 NGWDKJQDR Nigeria 6.52 NG 9.081999 8.675277 Nigeria
1 2011.0 NGWDKJQDR Nigeria 5.69 NG 9.081999 8.675277 Nigeria
2 2012.0 NGWDKJQDR Nigeria 8.41 NG 9.081999 8.675277 Nigeria
3 2013.0 NGWDKJQDR Nigeria 7.95 NG 9.081999 8.675277 Nigeria
4 2014.0 NGWDKJQDR Nigeria 9.34 NG 9.081999 8.675277 Nigeria
... ... ... ... ... ... ... ... ...
1339 2010.0 LYWDKJQDR Libya 2.50 LY 26.335100 17.228331 Libya
1340 2011.0 LYWDKJQDR Libya 2.50 LY 26.335100 17.228331 Libya
1341 2012.0 LYWDKJQDR Libya 2.50 LY 26.335100 17.228331 Libya
1342 2013.0 LYWDKJQDR Libya 2.50 LY 26.335100 17.228331 Libya
1343 2014.0 LYWDKJQDR Libya 2.50 LY 26.335100 17.228331 Libya

1344 rows × 8 columns

Due to the mismatch of country names, some rows may not have latitudes and longitudes.

result[result['name'].isna()]
  Year Symbol Market Value country latitude longitude name

The mismatch of country names can be resolved by modifying the name column in the countries.csv or adding a new entry into the countries.csv. Then, re-run this step.

Otherwise, the below code can be called to remove those rows.

result = result[result['name'].notna()]
result

Creating a heatmap with Google Map

Next, a heatmap of deposit interest rate (%) across countries is created by using Google Map. The below code is modified from an example in the jupyter-gmaps document to create a widget which displays an interactive heatmap based on yearly data.

class GmapsWidget(object):

    def __init__(self, df, title):
        self._df = df
        self._heatmap = None
        self._marker_layer = None
        self._markers = None
        self._slider = None
        self._checkbox = None
        self._is_show_marker = False

        initial_year = min(self._df['Year'])

        title_widget = widgets.HTML('<h3>{}</h3><h4>Data from <a href="http://product.datastream.com/browse/">DataStream</a></h4'.format(title))

        map_figure = self._render_map(initial_year)
        controls = self._render_controls(initial_year)
        self._container = widgets.VBox([title_widget, controls, map_figure])

    def render(self):
        display(self._container)

    def _on_checkbox_change(self, b):        
        if b.new == True:
            year = self._slider.value
            temp = gmaps.marker_layer(self._locations_for_year(year), 
                                      info_box_content=self._content_for_year(year))                     

            self._marker_layer.markers = temp.markers            
            self._is_show_marker = True
        elif b.new == False:
            self._marker_layer.markers = []
            self._is_show_marker = False

    def _on_year_change(self, change):           
        year = self._slider.value       
        self._heatmap.locations = self._locations_for_year(year)
        self._heatmap.weights=self._weights_for_year(year)
        self._heatmap.max_intensity = self._max_weights_for_year(year)
        self._total_box.value = self._total_casualties_text_for_year(year)        

        if self._is_show_marker == True:
            temp = gmaps.marker_layer(self._locations_for_year(year), 
                                      info_box_content=self._content_for_year(year))
            self._marker_layer.markers = temp.markers
        return self._container

    def _render_map(self, initial_year):               
        fig = gmaps.figure()  
        self._marker_layer = gmaps.marker_layer([])

        self._heatmap = gmaps.heatmap_layer(self._locations_for_year(initial_year),
                                            weights=self._weights_for_year(initial_year),
                                            max_intensity=self._max_weights_for_year(initial_year),
                                            point_radius=10)
        fig.add_layer(self._heatmap)
        fig.add_layer(self._marker_layer)
        return fig

    def _render_controls(self, initial_year):

        self._slider = widgets.IntSlider(value=initial_year,min=min(self._df['Year']),
                                         max=max(self._df['Year']),
                                         description='Year',
                                         continuous_update=False)
        self._checkbox = widgets.Checkbox(False, description='Show Marker')
        self._total_box = widgets.Label(value=self._total_casualties_text_for_year(initial_year))
        self._slider.observe(self._on_year_change, names='value')
        self._checkbox.observe(self._on_checkbox_change)
        controls = widgets.HBox([self._slider, self._checkbox, self._total_box],layout={'justify_content': 'space-between'})
        return controls

    def _weights_for_year(self, year):
        return [x for x in self._df[self._df['Year'] == year]['Value']]

    def _content_for_year(self, year):       
        return [ '{}: {}%'.format(y,x) for (x,y) in zip(self._df[self._df['Year']==year]['Value'].values, 
                                                        self._df[self._df['Year']==year]['name'].values)]
    def _max_weights_for_year(self, year):
        return self._df[self._df['Year'] == year]['Value'].max()

    def _locations_for_year(self, year):
        return self._df[self._df['Year'] == year][['latitude', 'longitude']]

    def _total_casualties_for_year(self, year):
        return int(self._df[self._df['Year'] == year]['Year'].count())

    def _total_casualties_text_for_year(self, year):
        return '{} countries, min: {}%, mean: {:.2f}%, max: {}%'.format(self._total_casualties_for_year(year),
                                                                self._df[self._df['Year'] == year]['Value'].min(),
                                                                self._df[self._df['Year'] == year]['Value'].mean(),
                                                                self._df[self._df['Year'] == year]['Value'].max())


GmapsWidget(result, 'Deposit Interest Rate (%)').render()

 

References

  1. Bugnion, P., 2016. Jupyter-Gmaps — Gmaps 0.8.3-Dev Documentation. [online] Jupyter-gmaps.readthedocs.io. Available at: https://jupyter-gmaps.readthedocs.io/en/latest/index.html [Accessed 12 June 2020].
  2. Google Developers. 2012. Countries.Csv | Dataset Publishing Language | Google Developers. [online] Available at: https://developers.google.com/public-data/docs/canonical/countries_csv [Accessed 12 June 2020].
  3. Product.datastream.com. 2020. Datastream Login. [online] Available at: http://product.datastream.com/browse/ [Accessed 12 June 2020].
  4. Google Developers. 2020. Get An API Key | Maps Embed API | Google Developers. [online] Available at: https://developers.google.com/maps/documentation/embed/get-api-key [Accessed 12 June 2020].
  5. Rungruengrayubkul, V., 2020. Using Datastream To Generate Coronavirus Cumulative Cases In Geographical Map | DEVELOPER COMMUNITY. [online] Developers.refinitiv.com. Available at: https://developers.refinitiv.com/article/using-datastream-generate-coronavirus-cumulative-cases-geographical-map [Accessed 12 June 2020].
  6. Maps.worldbank.org. n.d. World Bank Maps. [online] Available at: https://maps.worldbank.org/ [Accessed 12 June 2020].