ARTICLE

Creating a WDI Heatmap with DataStream Web Services and Google Maps

Jirapongse Phuriphanvichai
Developer Advocate Developer Advocate

This example demonstrates how to retrieve World Development Indicators from DataStream Web Services and then use Google Maps (gmaps) to create a heatmap with the retrieved data. To run the example, the DataStream account and Google Maps' API key are required. To use other maps, such as Geopandas, please refer to this article.

To get the Google Maps' API key, please refer to Get an API Key.

This example creates a heatmap of the yearly deposit interest rate (%) across countries. However, it can be applied to any World Development Indicators.

Loading required packages

The required packages for this example are:

  • DataStreamDSWS: Python API package for Refinitiv Datastream Webservice
  • pandas: Powerful data structures for data analysis, time series, and statistics
  • numpy: NumPy is the fundamental package for array computing with Python
  • gmaps: Google maps plugin for Jupyter notebooks
  • IPython.display: Public API for display tools in IPython
  • ipywidgets: IPython HTML widgets for Jupyter

You also need to install lxml package (pip install lxml) used by pandas when parsing the HTML file.

    	
            import DatastreamDSWS as DSWS
import pandas as pd
import numpy as np
import gmaps
from IPython.display import display
import ipywidgets as widgets

Setting credentials

The DataStream account and Google Maps' API key are required to run the example.

    	
            ds = DSWS.Datastream(username = '<username>', password = '<password>')
gmaps.configure(api_key='<api_key>')

Loading an instrument list

In DataStream, the World Development Indicators are available under the Economics » International Historical Sources » World Bank Development Indicators (WDI) category. The data is categorized by countries.

You can use the indication name to search for instruments of all available countries, and then refine the search results by selecting the results from World Bank WDI. For example, the search query for Deposit Interest Rate (%) from World Bank WDI is "DESC:and(Deposit Interest Rate (%)) DSSRC:and(World Bank WDI)".

After that, the search results can be downloaded to an excel file. This excel file can be read into a data frame object by calling the pandas.read_html method. Then, only active instruments are selected. In the data frame , the Symbol column contains instrument names and the Market column contains country names.

    	
            df = pd.read_html('DepositeInterestRate.xls')  
df[0] = df[0][df[0].Activity=="Active"]
df[0]

Requesting data

The instrument names in the Symbol column are split into several batch requests. Each batch request containing 20 instruments is sent to the DataStream Web Services to request yearly data for the last ten years. All results are joined to create a data frame object. Then, missing data is removed.

    	
            item_list=df[0]["Symbol"].values
batch_list = np.array_split(item_list,len(item_list)/20)
data = None
for batch in batch_list:
    temp_dat = ds.get_data(tickers=','.join(batch), start='-10Y',freq='Y')
   
    if  data is None:
        data = temp_dat
    else:
        data = data.join(temp_dat)
data.dropna(how='all', inplace=True)
data.dropna(axis=1,how='all',inplace=True)
data 

Formatting data

The DataFrame must be re-formatted to a long format with Dates, Symbol, and Value columns. The pandas.DataFrame.melt is called to reformat the DataFrame. Then, rows with missing data or non-positive data are removed. Finally, the Dates column has been converted to Year.

    	
            data = data.reset_index().melt(id_vars=["Dates"], 
        var_name="Symbol",
        value_name="Value")
data.dropna(inplace=True)
data = data[data['Value']>0]
data = data.rename({'Dates': 'Year'}, axis=1)
data["Year"] = [int(sublist.split('-', 1)[0])  for sublist in data['Year']]
data

Adding country names

The current data frame doesn't have country names. To add country names, the DataFrame.merge method is called to join the Market column from the list downloaded from the DataStream to the current DataFrame. Then, the required columns (Year, Symbol, Market, and Value) are selected.

    	
            data = df[0].merge(data, on=['Symbol'], how='left').dropna()[['Year','Symbol','Market','Value']]
data

Adding latitudes and longitudes

The latitudes and longitudes of countries are available at countries.csv. Then, loading this file to a data frame object.

    	
            countries = pd.read_csv("countries.csv")
countries

Next, the DataFrame.merge method is called to join the latitude and longitude columns from the countries data frame to the result data frame.

    	
            result = pd.merge( data, countries,  left_on='Market', right_on='name', how='left')
result

Due to the mismatch of country names, some rows may not have latitudes and longitudes.

    	
            result[result['name'].isna()]
        
        
    

The mismatch of country names can be resolved by modifying the name column in the countries.csv or adding a new entry into the countries.csv. Then, re-run this step.

Otherwise, the below code can be called to remove those rows.

    	
            result = result[result['name'].notna()]
result

Creating a heatmap with Google Map

Next, a heatmap of deposit interest rate (%) across countries is created by using Google Map. The below code is modified from an example in the jupyter-gmaps document to create a widget which displays an interactive heatmap based on yearly data.

    	
            

class GmapsWidget(object):

    def __init__(self, df, title):
        self._df = df
        self._heatmap = None
        self._marker_layer = None
        self._markers = None
        self._slider = None
        self._checkbox = None
        self._is_show_marker = False
       
        initial_year = min(self._df['Year'])

        title_widget = widgets.HTML('<h3>{}</h3><h4>Data from <a href="http://product.datastream.com/browse/">DataStream</a></h4'.format(title))

        map_figure = self._render_map(initial_year)
        controls = self._render_controls(initial_year)
        self._container = widgets.VBox([title_widget, controls, map_figure])

    def render(self):
        display(self._container)

    def _on_checkbox_change(self, b):       
        if b.new == True:
            year = self._slider.value
            temp = gmaps.marker_layer(self._locations_for_year(year),
                                      info_box_content=self._content_for_year(year))                    
          
            self._marker_layer.markers = temp.markers           
            self._is_show_marker = True
        elif b.new == False:
            self._marker_layer.markers = []
            self._is_show_marker = False
       
    def _on_year_change(self, change):          
        year = self._slider.value      
        self._heatmap.locations = self._locations_for_year(year)
        self._heatmap.weights=self._weights_for_year(year)
        self._heatmap.max_intensity = self._max_weights_for_year(year)
        self._total_box.value = self._total_casualties_text_for_year(year)       
     
        if self._is_show_marker == True:
            temp = gmaps.marker_layer(self._locations_for_year(year),
                                      info_box_content=self._content_for_year(year))
            self._marker_layer.markers = temp.markers
        return self._container

    def _render_map(self, initial_year):              
        fig = gmaps.figure() 
        self._marker_layer = gmaps.marker_layer([])
       
        self._heatmap = gmaps.heatmap_layer(self._locations_for_year(initial_year),
                                            weights=self._weights_for_year(initial_year),
                                            max_intensity=self._max_weights_for_year(initial_year),
                                            point_radius=10)
        fig.add_layer(self._heatmap)
        fig.add_layer(self._marker_layer)
        return fig

    def _render_controls(self, initial_year):
       
        self._slider = widgets.IntSlider(value=initial_year,min=min(self._df['Year']),
                                         max=max(self._df['Year']),
                                         description='Year',
                                         continuous_update=False)
        self._checkbox = widgets.Checkbox(False, description='Show Marker')
        self._total_box = widgets.Label(value=self._total_casualties_text_for_year(initial_year))
        self._slider.observe(self._on_year_change, names='value')
        self._checkbox.observe(self._on_checkbox_change)
        controls = widgets.HBox([self._slider, self._checkbox, self._total_box],layout={'justify_content': 'space-between'})
        return controls
   
    def _weights_for_year(self, year):
        return [x for x in self._df[self._df['Year'] == year]['Value']]
      
    def _content_for_year(self, year):      
        return [ '{}: {}%'.format(y,x) for (x,y) in zip(self._df[self._df['Year']==year]['Value'].values,
                                                        self._df[self._df['Year']==year]['name'].values)]
    def _max_weights_for_year(self, year):
        return self._df[self._df['Year'] == year]['Value'].max()
   
    def _locations_for_year(self, year):
        return self._df[self._df['Year'] == year][['latitude', 'longitude']]

    def _total_casualties_for_year(self, year):
        return int(self._df[self._df['Year'] == year]['Year'].count())

    def _total_casualties_text_for_year(self, year):
        return '{} countries, min: {}%, mean: {:.2f}%, max: {}%'.format(self._total_casualties_for_year(year),
                                                                self._df[self._df['Year'] == year]['Value'].min(),
                                                                self._df[self._df['Year'] == year]['Value'].mean(),
                                                                self._df[self._df['Year'] == year]['Value'].max())


GmapsWidget(result, 'Deposit Interest Rate (%)').render()

References

  1. Bugnion, P., 2016. Jupyter-Gmaps — Gmaps 0.8.3-Dev Documentation. [online] Jupyter-gmaps.readthedocs.io. Available at: https://jupyter-gmaps.readthedocs.io/en/latest/index.html [Accessed 12 June 2020].
  2. Google Developers. 2012. Countries.Csv | Dataset Publishing Language | Google Developers. [online] Available at: https://developers.google.com/public-data/docs/canonical/countries_csv [Accessed 12 June 2020].
  3. Product.datastream.com. 2020. Datastream Login. [online] Available at: http://product.datastream.com/browse/ [Accessed 12 June 2020].
  4. Google Developers. 2020. Get An API Key | Maps Embed API | Google Developers. [online] Available at: https://developers.google.com/maps/documentation/embed/get-api-key [Accessed 12 June 2020].
  5. Rungruengrayubkul, V., 2020. Using Datastream To Generate Coronavirus Cumulative Cases In Geographical Map | DEVELOPER COMMUNITY. [online] Developers.refinitiv.com. Available at: https://developers.refinitiv.com/article/using-datastream-generate-coronavirus-cumulative-cases-geographical-map [Accessed 12 June 2020].
  6. Maps.worldbank.org. n.d. World Bank Maps. [online] Available at: https://maps.worldbank.org/ [Accessed 12 June 2020].