Article

How to set up and run R Data Science Development Environment with Jupyter on Docker

Wasin Waeosri
Developer Advocate Developer Advocate

Introduction

The Data Scientists and Financial coders need to interact with various Data Science/Financial development tools such as the Anaconda (or Miniconda) Python distribution platform, the Python programming language, the R programming languageMatplotlib libraryPandas Library, the Jupyter application, and much more.

One of the hardest parts of being Data Developers is the step to set up those tools. You need to install a lot of software and libraries in the correct order to set up your Data Science development environment. The example steps are the following:

  1. Install Python or Anaconda/Miniconda
  2. Create a new virtual environment (It is not recommended to install programs into your base environment)
  3. Install Jupyter
  4. Install Data Science libraries such as Matplotlib, Pandas, Plotly, Bokeh, etc.
  5. If you are using R, install R and then its libraries
  6. If you are using Julia, Install Julia and then its libraries
  7. ... So on.

If you need to share your code/project with your peers, the task to replicate the above steps in your collogues environment is very complex too.

The good news is you can reduce the effort to set up the workbench with the Docker containerization platform. You may think Docker is for the DevOps or the hardcore Developers only, but the Jupyter Docker Stacks simplifies how to create a ready-to-use Jupyter application with Data Science/Financial libraries in a few commands.

This article is the second part of the series that demonstrates how to set up Jupyter Notebook environment with Docker to consume and display financial data from Refinitiv Data Platform without the need to install the steps above. If you are not familiar with Jupyter Docker Stacks, please see more detail in the first part article.

This second article is focusing on Jupyter with the R programming language.

Introduction to Jupyter Docker Stacks

The Jupyter Docker Stacks are a set of ready-to-run Docker images containing Jupyter applications and interactive computing tools with build-in scientific, mathematical and data analysis libraries pre-installed. With Jupyter Docker Stacks, the setup environment part is reduced to just the following steps:

  1. Install Docker and sign up for DockerHub website (free).
  2. Run a command to pull an image that contains Jupyter and preinstalled packages based on the image type.
  3. Work with your notebook file
  4. If you need additional libraries that are not preinstalled with the image, you can create your image with a Dockerfile to install those libraries.

Docker also helps the team share the development environment by letting your peers replicate the same environment easily. You can share the notebooks, Dockerfile, dependencies-list files with your colleagues, then they just run one or two commands to run the same environment.

Jupyter Docker Stacks provide various images for developers based on their requirements such as:

Please see more detail about all image types on Selecting an Image page.

Running the Jupyter Docker R-Notebook Image

If you are using the R programming language in your Data Science or Finance/Statistic works, Jupyter Docker Stacks provide a jupyter/r-notebook Docker image for you. You can pull a jupyter/r-notebook image starts a container running a Jupyter Notebook server with the R kernel via a single command.

    	
            docker run -p 8888:8888 --name notebook -v <your working directory>:/home/jovyan/work -e JUPYTER_ENABLE_LAB=yes --env-file .env  -it jupyter/r-notebook:70178b8e48d7
        
        
    

The above command set the following container's options:

  • -p 8888:8888: Exposes the server on host port 8888
  • -v <your working directory>:/home/jovyan/work: Mounts the working directory on the host as /home/jovyan/work folder in the container to save the files between your host machine and a container.
  • -e JUPYTER_ENABLE_LAB=yes: Run JupyterLab instead of the default classic Jupyter Notebook.
  • --name notebook: Define a container name as notebook
  • -it: enable interactive mode with a pseudo-TTY when running a container
  • --env-file .env: Pass a .env file to a container.

Note:

  • This article is based on the jupyter/r-notebook tag 70178b8e48d7.
  • Docker destroys the container and its data when you remove the container, so you always need the -v option.
  • The default notebook username of a container is always jovyan (but you can change it to something else).

The running result with the notebook server URL information is the following.

You can access the JupyterLab application by opening the notebook server URL in your browser. It starts with the /home/jovyan/ location. Please note that only the notebooks and files in the work folder can be saved to the host machine (<your working directory> folder). This JupyterLab application comes with both Python and R kernels for the notebook application.

The jupyter/r-notebook Docker image comes with pre-installed popular R packages for HTTP REST API, JSON, plotting a basic graph, and other operations too.

The files in <your working directory> folder will be available in the JupyterLab application the next time you start a container, so you can work with your files as a normal JupyterLab/Anaconda environment.

To stop the container, just press Ctrl+c keys to exit the container.

Alternatively, you may just run docker stop <container name> to stop the container and docker rm <container name> command remove the container.

    	
            

docker stop notebook

...

docker rm notebook

The example notebook of this scenario is the rdp_apis_r_esg_notebook.ipynb example notebook file in /r/notebook/ folder (available on GitHub repository /r/notebook folder too).  Please see the full detail regarding how to run this example notebook on the How to run the Jupyter Docker R-Notebook section.

With the pre-installed R Data Science and development packages, developers are ready to build a notebook or dashboard with the RDP APIs (or other Refinitiv HTTP REST APIs) content. You can request data from Refinitiv with the HTTP library, perform data analysis and then plot a graph for data visualization.

How to use other R Libraries

If you are using the libraries that do not come with the jupyter/r-notebook Docker image such as the Plotly R library, you can install them directly via the notebook shell with the following command.

    	
            install.packages("plotly")
        
        
    

However, this solution installs the package into the currently-running Jupyter kernel which is always destroyed every time you stop a Docker container.

A better solution is to create a new Docker image from Jupyter Docker Stacks that contains the required libraries, and then all containers generated from the image can use the libraries without any manual installation. Like the other Jupyter Docker Stacks, developers can create their Dockerfile with an instruction to install R packages on top of the jupyter/r-notebook image.

Example with Refinitiv Data APIs and Plotly on R-Notebook

Let's demonstrate by building a Docker image that included the Plotly library, and then run the R notebook application that retrieves historical data from Refinitiv Data Platform (RDP) APIs, and draws charts with the Plotly R library.

What is Refinitiv Data Platform (RDP) APIs?

The Refinitiv Data Platform (RDP) APIs provide various Refinitiv data and content for developers via easy to use Web-based API.

RDP APIs give developers seamless and holistic access to all of the Refinitiv content such as Historical Pricing, Environmental Social and Governance (ESG), News, Research, etc and commingled with their content, enriching, integrating, and distributing the data through a single interface, delivered wherever they need it. The RDP APIs delivery mechanisms are the following:

  • Request - Response: RESTful web service (HTTP GET, POST, PUT or DELETE)
  • Alert: delivery is a mechanism to receive asynchronous updates (alerts) to a subscription.
  • Bulks: deliver substantial payloads, like the end-of-day pricing data for the whole venue.
  • Streaming: deliver real-time delivery of messages.

This example project is focusing on the Request-Response: RESTful web service delivery method only.

For more detail regarding Refinitiv Data Platform, please see the following APIs resources:

Firstly, create a Dockerfile file in a /r/ folder with the following content:

    	
            

# Start from a core stack version

FROM jupyter/r-notebook:70178b8e48d7

 

LABEL maintainer="Your name and email address"

 

# Install package

RUN R -e "install.packages('plotly', repos='http://cran.rstudio.com/')"

 

ENV JUPYTER_ENABLE_LAB=yes

And then build a Docker image name jupyter_rdp_plotly with the following command:

    	
            docker build . -t jupyter_rdp_r_plotly
        
        
    

Once the Docker image is built successfully, you can the following command to starts a container running a Jupyter R Notebook server with the Plotly R library and jupyter/r-notebook in your machine.

    	
            docker run -p 8888:8888 --name notebook -v <project /r/notebook/ directory>:/home/jovyan/work --env-file .env -it jupyter_rdp_r_plotly
        
        
    

Then you can start to create notebook applications with R language to consume Refinitiv content via the RDP APIs HTTP REST, and then plot data with the Plotly library. Please see more detail in the rdp_library_plotly_notebook.ipynb example notebook file on GitHub repository /r/notebook folder.

The rdp_library_plotly_notebook.ipynb example notebook uses the pre-installed httr library to authenticate and request the historical data from RDP APIs HTTP REST services.

Then the notebook uses the Plotly R library to draw the historical data chart.

This example notebook is based on the RDPHistoricalRExample.ipynb example of Setup Jupyter Notebook for R (GitHub) with some modifications to match the Jupyter Docker scenario.

You can find a full detail regarding how to run this example notebook on the How to build and run the Jupyter Docker R-Notebook customize the image with Plotly section.

Demo prerequisite

This example requires the following dependencies software and libraries.

  1. RDP Access credentials.
  2. Docker Desktop/Engine version 20.10.x
  3. DockerHub account (free subscription).
  4. Internet connection.

Please contact your Refinitiv's representative to help you to access the RDP account and services. You can find more detail regarding the RDP access credentials set up from the Getting Started for User ID section of the Getting Start with Refinitiv Data Platform article:

How to run the Examples

The first step is to unzip or download the example project folder into a directory of your choice, then set up Python or R Docker environments based on your preference.

Caution: You should not share a .env file to your peers or commit/push it to the version control. You should add the file to the .gitignore file to avoid adding it to version control or public repository accidentally.

How to run the Jupyter Docker R-Notebook

Firstly, open the project folder in the command prompt and go to the r subfolder. Then create a file name .env in that folder with the following content.

    	
            

# RDP Core Credentials

RDP_USER=<Your RDP User>

RDP_PASSWORD=<Your RDP Password>

RDP_APP_KEY=<Your RDP App Key>

 

# RDP Core Endpoints

RDP_BASE_URL=https://api.refinitiv.com

RDP_AUTH_URL=/auth/oauth2/v1/token

RDP_ESG_URL=/data/environmental-social-governance/v2/views/scores-full

Run the following Docker run command in a command prompt to pull a Jupyter Docker R-Notebook image and run its container.

    	
            docker run -p 8888:8888 --name notebook -v <project /r/notebook/ directory>:/home/jovyan/work -e JUPYTER_ENABLE_LAB=yes --env-file .env -it jupyter/r-notebook:70178b8e48d7
        
        
    

The Jupyter Docker R-Notebook will run the Jupyter server and print the server URL in a console, , click on that URL to open the JupyterLab application in the web browser.

Finally, open the work folder and open rdp_apis_r_esg_notebook.ipynb example notebook file, then run through each notebook cell.

How to build and run the Jupyter Docker R-Notebook customized image with Plotly

Firstly, open the project folder in the command prompt and go to the r subfolder. Then create a file name .env in that folder with the following content.

    	
            

# RDP Core Credentials

RDP_USER=<Your RDP User>

RDP_PASSWORD=<Your RDP Password>

RDP_APP_KEY=<Your RDP App Key>

 

# RDP Core Endpoints

RDP_BASE_URL=https://api.refinitiv.com

RDP_AUTH_URL=/auth/oauth2/v1/token

RDP_ESG_URL=/data/environmental-social-governance/v2/views/scores-full

RDP_HISTORICAL_PRICE_URL=/data/historical-pricing/v1

RDP_HISTORICAL_INTERDAY_SUMMARIES_URL=/views/interday-summaries/

RDP_HISTORICAL_EVENT_URL=/views/events/

Run the following Docker build command to build the Docker Image name jupyter_rdp_r_plotly:

    	
            docker build . -t jupyter_rdp_r_plotly
        
        
    

Once Docker build the image success, run the following command to start a container.

    	
            docker run -p 8888:8888 --name notebook -v <project /r/notebook/ directory>:/home/jovyan/work --env-file .env -it jupyter_rdp_r_plotly
        
        
    

The jupyter_rdp_r_plotly container will run the Jupyter server and print the server URL in a console, click on that URL to open the JupyterLab application in the web browser.

Lastly, open the work folder and open rdp_apis_r_plotly_notebook.ipynb example notebook file, then run through each notebook cell.

Conclusion

Docker is an open containerization platform for developing, testing, deploying, and running any software application. The Jupyter Docker Stacks provide a ready-to-use and consistent development environment for Data Scientists, Financial coders, and their teams. Developers do not need to set up their environment/workbench (Anaconda, Virtual Environment, Jupyter installation, etc.) manually which is the most complex task for them anymore. Developers can just run a single command to start the Jupyter notebook server from Jupyter Docker Stacks and continue their work.

The Jupyter Docker Stacks provide a handful of libraries for Data Science/Financial development for various requirements (Python, R, Machine Learning, and much more). If developers need additional libraries, Jupyter Docker Stacks let developers create their Dockerfile with an instruction to install those dependencies. All containers generated from the customized image can use the libraries without any manual installation.

References

You can find more details regarding the Refinitiv Data Platform Libraries, Plotly, Jupyter Docker Stacks, and related technologies for this notebook from the following resources:

For any questions related to Refinitiv Data Platform, please use the following forum on the Developers Community Q&A page.

GitHub

https://github.com/Refinitiv-API-Samples/Article.RDP.RDPLibrary.Python.R.JupyterDocker