Prerequisites: Some experience in using VS Code and installing extensions
Running BigQuery SQL inside Visual Studio Code
Many organizations are now hosting or accessing Big Data on Google BigQuery - the Enterprise Data warehouse - which is part of the Google Cloud Platform offering.
BigQuery supports the querying of that data using standard SQL (Structured Query Language)
If you have a standard Google Cloud / BigQuery account, you can run these queries in the Google Cloud Platform Console - as described in this previous article.
However, for many users, access to the console is not always available. Often the user is provided with a 'Service' account (see https://cloud.google.com/iam/docs/overview for more details) - this could be due to company policy or firewall/security type reasons.
Service Account users
I was recently granted access to our historical data archive which is warehoused on BigQuery - this access was in the form of a Service account. It was explained to me during the onboarding that I would not be able to use the GCP Console to access the archive.
However, my colleague who did the introduction - introduced me to an open-source VS Code extension that allows a user to query the data archive with relative ease. In this article, I will introduce the excellent BigQuery Runner extension and illustrate how you can use it with VS Code to test and execute your Google BigQuery SQL statements.
Before we start, I just wanted to thank the author of BigQuery Runner - Daisuke Mino - for their work on the extension and the speed with which they responded to questions and addressed any concerns or issues with the extension. Whilst I was testing the extension I identified an issue with downloading the BigQuery response - and this was fixed within 24hrs - excellent stuff!
BigQuery Runner for VS Code
If like me, you only have Service Account only access to BigQuery, I think the BigQuery Runner extension with VS Code offers a great way of running SQL queries against the BigQuery data and downloading the query results in a variety of formats.
Another great feature of the BigQuery Runner extension is that whilst you are working on your SQL query, an estimate of the number of bytes read by the query will be displayed in the VS Code status bar. I find this is a very important feature - to help ensure I don't accidentally execute a very costly query - exceeding spend limits on an account can lock the account.
Installing the BigQuery Runner extension
As VS Code users, most of you are probably familiar with installing a VS Code extension - however, just to recap - you can install the BigQuery Runner extension as follows:
- Click the extension button in the VS code Activity bar
- Type 'BigQuery' into the search box
- From the results, select the BigQuery Runner item,
- The BigQuery Runner extension page should be displayed
- Click the Install button on the extension page
Whilst you are in the Extensions section, you should also select and install the SQL (BiqQuery) extension - it provides Syntax Highlighting for Google BigQuery SQL - making it easier to spot typos, etc.
Configure the BigQuery Runner extension
Before we can start using the BigQuery Runner extension we can configure a few basics - the most important of which is our Service Account details.
When you are allocated/granted a BigQuery service account you should be issued with a JSON file containing your service account details that look something like the stuff below:
"project_id": "<your project_id>",
"private_key_id": "<your private_key_id>",
"private_key": "-----BEGIN PRIVATE KEY-----\nXII
C299=\n-----END PRIVATE KEY-----\n",
"client_email": "<your service_account email>",
"client_id": "<your client_id>",
"client_x509_cert_url": "<your google apis URL>"
The normal way of configuring your Google Service Account is to set an environmental variable which points to the location of the above JSON file e.g.
If you have already set the above environmental variable to point to your JSON file - the BigQuery Runner extension will pick up on that and use it for connecting and authenticating with the BigQuery service.
If, however, you do not have the above variable set, you can configure the location of the JSON via the BigQuery extensions settings - by editing the standard VS Code settings.json file:
"bigqueryRunner.keyFilename": "<path to your Service Account JSON file>",
There are a few other settings you can configure for the BigQuery Runner extension - such as Icon, Rows per Page for the output, and so on - but I just left all these at their default values.
Constructing your SQL query
If you already have a .sql file containing your BigQuery SQL, you can just open the file.
Otherwise, create a New Text File and select SQL (BigQuery) for the language choice
I have noticed that if I already have an existing BQ SQL file loaded if I then create a new file & start typing some SQL, the auto-suggest works quite well and starts offering up possible matches - not just in terms of the SQL language - but more importantly Field names from the BigQuery data repository.
For example, with my particular data set - if I start typing 'Price' for a field, I will see other potential matches:
I can then scroll down and press Tab to select the most appropriate suggested field.
BigQuery Dry Run
As you are typing your SQL or once you complete constructing your SQL statement, the BigQuery Runner will execute a Dry Run of your query and display an estimate of the number of bytes read by the query - in the VS Code status bar.
This can be a very important feature - to help ensure you don't accidentally execute a very costly query - exceeding any spend limits on your account.
Run your SQL Query
Once you are happy with your SQL Query, you can execute it for 'real' by clicking the Run button
Once you run your Query, the results will be displayed in a new output window.
As well as the Results, the window has additional Tabs which include Table, Schema and Job details.
One of the other key features of this output window - which may not be totally obvious (I certainly missed it initially) - is the Download Button in the bottom right-hand corner.
Downloading the Results
Clicking the Download button will present a popup menu with a choice of download formats - JSON, CSV, Text and Markdown - allowing you to export the results in the format which you find the easiest to work with.
Refinitiv Tick History on BigQuery
In case you are wondering about the data and SQL statement I am referencing in my screenshots above - it is the Refinitiv Tick History which is warehoused on Google BigQuery.
Refinitiv Tick History is an archive of historical tick data drawn from our real-time networks. It covers a universe of more than 90 million active, and more than 900 million retired instruments.
It is seriously big data - approximately 4 terabytes of data are added daily to our database - which at the time of writing exceeds 9 petabytes and has 38.7 trillion rows of data! Given the size of this data, I am constantly amazed at just how quickly my SQL queries are execute and the results provided to me. As someone who dabbled with Oracle databases and MS SQL-Server many moons ago - I still cannot get my head round just how performant Google BigQuery is!
If you are a developer interested in accessing the Refinitiv Tick History data, you can refer to the Refinitiv Tick History API page for further details - including Tutorials etc.
I hope you found this quick overview of the BigQuery Runner extension informative and that you agree that BigQuery Runner is a very useful extension indeed.