The world’s leading publication for data science, AI, and ML professionals.

Fantastic Beasts of BigQuery and When to Use Them

Unveiling the Traits of BigQuery Studio, DataFrames, Generative AI/AI Functions, and DuetAI

"BigQuery is an all-in-one Google service with DB-BI-ML-GenAI features." [Photo by Korng Sok on Unsplash]
"BigQuery is an all-in-one Google service with DB-BI-ML-GenAI features." [Photo by Korng Sok on Unsplash]

Discover more from the BigQuery world

One of my favourite books is "Fantastic Beasts and Where to Find Them" by J.K. Rowling. It is a story about the world of magical creatures who get loose in the non-wizard world. It’s also a story that shows how Maj (wizards) and No-Maj (non-wizard) people form a friendship to protect magical creatures. On this mission, the lead No-Maj character discovers a world of magic and falls in love with all the challenges it offers, wishing he was a wizard too.

As a No-Maj myself, my transition from mechanical engineering to the data world was initially filled with challenges. Each time I entered a new data area, I was thinking: "If only I were a wizard". 😉

When I first started learning about databases (DB) and business intelligence (BI), I had this thought in my head.

As I progressed to machine learning (ML) topics, this thought was again present.

Nowadays, I am trying to manoeuvre through generative AI (GenAI) development, and—you guessed it—this thought is again with me.

Even after gaining experience in DB, BI, and ML, the GenAI area would be more challenging for me if it weren’t for one Google service—BigQuery (BQ).

Do you know why?

Because Bigquery is offering "all-in-one" when it comes to the "DB-BI-ML-GenAI" combination. Or, as Google has announced in one of its webinars, it is covering features "From Data to AI" [1].

And how I thought it should be announced: "Fantastic Beasts of BigQuery".

On top of my all-time favourite BigQuery feature—BQML—Google has recently implemented additional transformative features, making BQ more similar to an analytical development environment and less to a database environment.

These new features enable data professionals to conduct end-to-end analytical tasks without the need to switch between multiple tools.

With end-to-end tasks, I have in mind performing exploratory data analysis (EDA), predictive modelling using either SQL or Python with Spark, and creating new insights by using generative AI features. And all this can now be done with the assistance of the code co-creating feature.

The recent evolution of BigQuery’s ecosystem motivated me to write this post and to show the new BQ advances that will transform the way (we) data professionals work and possibly make us feel a bit like Maj people. 😉

In other words, this post aims to present when the new BQ features can be used in the analytical workflow.

But before we start the explanation, I need to share the names of these "fantastic beasts":

  1. BigQuery Studio and BigQuery DataFrames
  2. BigQuery GenAI and AI Functions
  3. DuetAI in BigQuery

Let’s begin now by unveiling their unique traits and pointing out how they can be leveraged to enhance your performance.

Fantastic Beasts of BigQuery

To hint at the purpose of the new BQ features, I created an illustration of how they align with the knowledge data discovery process (analytical workflow).

The new BigQuery features aligned with analytical workflow [Image by author using draw.io]
The new BigQuery features aligned with analytical workflow [Image by author using draw.io]

As visible from the picture, at the base of the analytical workflow is DuetAI, which is an AI coding assistance feature. Except for the coding support, DuetAI is a chatbot, and you can use it for brainstorming.

This means that data professionals can ask the chatbot different questions related to input problem definitions (e.g., how can I subset my dataset or could you explain a specific function) and recommendations on how to structure the analytical output (e.g., how can I present my findings).

In between the analytical input → output flow, other features come in handy:

  • In Phase I, i.e., the data preparation and understanding phase, BigQuery SQL (for subsetting and wrangling datasets), GenAI/AI functions (for enriching datasets), BigQuery DataFrames, and other Python libraries (for exploring datasets) can be used via Bigquery Studio.
  • In Phase II, i.e., the data modelling and insights synthesis phase, BigQuery SQL or BQML functions can be used alongside BigQuery DataFrames in BQ Studio (for BI/ML model creation) to get the required analytical output (descriptive or predictive outcome).

Now we will showcase the magical traits of these features.

BigQuery Studio and DataFrames

Don’t get confused here with BigQuery Studio vs. Looker Studio (the former Data Studio). While the latter is a self-service BI tool, the first one is a new collaborative workspace that supports the complete analytical workflow.

With this said, BigQuery Studio has the following main traits [2]:

#1: Supports multiple languages and tools in a unified interface.

By this, I mean that it eases the work between different data professions, as:

  • data engineering (ingestion and data wrangling),
  • data analytics (descriptive statistics/EDA), and
  • data science tasks (predictive modelling) can be done in one environment, or better yet, in one notebook.

The BQ Studio provides the notebook Colab interface and enables data professionals to use either SQL, Python, Spark, or natural language in BigQuery (in combination with DuetAI) in one notebook file. In addition, developed notebooks can later be accessed via Vertex AI for ML workflows.

When it comes to data ingestion formats, it supports both structured, semi-structured, and unstructured formats from different cloud platforms.

Google's presentation of BigQuery Studio notebook [2]
Google’s presentation of BigQuery Studio notebook [2]

#2: Enhances collaboration and versioning by connecting to external code repositories.

I have to say this trait is something I have been wishing for a while. Although it doesn’t support (yet) all git commands, the BQ Studio supports software development practices like continuous integration/continuous deployment (CI/CD), version history, and source control of data code assets. Simply put, it is now possible to review the history of a notebook and revert to or branch from a specific notebook version.

Google's presentation of version control BQ Studio feature [2]
Google’s presentation of version control BQ Studio feature [2]

#3: Enforces security and data governance within BigQuery.

BQ Studio enforces security because it reduces the need to share the data outside of BigQuery. In other words, by adopting unified credential management between services, analysts can, e.g., access Vertex AI foundational models to perform complex analytical tasks (like sentiment analysis) without sharing data with third-party tools.

On top of this, there are data governance traits, including data lineage tracking, data profiling, and enforcing quality constraints. I can only add "Amen to that".

Google's presentation of data governance features [2]
Google’s presentation of data governance features [2]

To summarise the above-listed traits, it is evident that BigQuery Studio is a magical feature, as it enables tasks from data ingestion to data modelling by enforcing security and governance.

The story would not be complete if Google didn’t provide an additional feature that can be used within BigData Studio notebooks for Data Analysis and modelling purposes—BigQuery DataFrames.

By installing the bigframes package (similar to the installation of any other Python package with pip), data professionals can use the following Python APIs [3]:

  • bigframes.[pandas](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) which is a pandas API for analysis and manipulation, and
  • bigframes.ml, i.e., a scikit-learn API for machine learning.

And with the machine learning topic, I wanted to finalise this section.

The reason for this is that in the next section, I will elaborate on the new BigQuery ML functions.

BigQuery GenAI and AI functions

As mentioned in the introduction, I am a big fan of the BQML functions because they enable data professionals to create predictive models using SQL syntax.

In addition to its already nice portfolio of BQML functions for supervised and unsupervised learning, Google has now added my next favourite functions: generative AI and AI functions.

When it comes to the generative AI function ML.GENERATE_TEXT, I recently wrote a blog to show its traits.

The New Generative AI Function in BigQuery

In summary, the function can be used to create new insights from unstructured text stored in the BigQuery datasets. Or more precisely, you can use it to create new classes or attributes (classification analysis, sentiment analysis, or entity extraction), summarise or rewrite natural text records, and generate ads or ideation concepts.

I would say that SQL-based data engineering now has next-level powers.

Alongside this magic function, other powerful new SQL-based AI functions are:

  • ML.UNDERSTAND_TEXT – a function that helps in text analysis of the records stored in BigQuery and supports similar features as the ML.GENERATE_TEXT function. Meaning that it supports entity, sentiment, classification, and syntax analysis.
  • ML.TRANSLATE – a function used for text translation from one language to another.
  • ML.PROCESS_DOCUMENT – a function used for processing unstructured documents from an object table (e.g., invoices).
  • ML.TRANSCRIBE – a function used for transcribing audio files from an object table.
  • ML.ANNOTATE_IMAGE – a function used for image annotation from object tables.

Although these functions are written in SQL, understanding their query structure and parameters is a must for proper usage. To speed up the learning curve, a little coding assistance—Duetai—comes in handy.

DuetAI in BigQuery

Simple as that: the DuetAI magic feature can help data professionals write, improve, and understand their multi-language code within the BQ and BQ Studio environments.

More precisely, this feature possesses the following traits:

#1: Creates queries or code from scratch within the BQ environment or BQ Studio notebooks

Example of the code completion by using DuetAI in BigQuery [Image by author]
Example of the code completion by using DuetAI in BigQuery [Image by author]

#2: Explains queries and code snippets within the BQ environment or BQ Studio notebooks

Example of the code completion by using DuetAI in BigQuery [Image by author]
Example of the code completion by using DuetAI in BigQuery [Image by author]

#3: Enhances code quality within the BQ environment or BQ Studio notebooks

By this, I mean that DuetAI can improve code by assisting with:

  • syntax correction: it can identify and suggest corrections for syntax errors.
  • logic improvement: it can suggest alternative ways to structure the code, improving the overall efficiency and readability.
  • documentation generation: it can automatically generate documentation for the code, making it easier to understand.

Finally, with this magic feature, I will conclude the section and presentation of the new BigQuery "fantastic beasts".

A World of Endless Possibilities

In the book "Fantastic Beasts and Where to Find Them", J.K. Rowling showed how magical creatures become less scary for Maj and even No-Maj people when they learn about their positive traits. Similarly, in this blog post, I wanted to showcase the new fantastic features of BigQuery and point out their positive traits on different analytical levels.

Illustration of the three BigQuery beasts created by the author using DALL-E extension in ChatGPT (Correct number of beasts, but a wrong number of the names ;))
Illustration of the three BigQuery beasts created by the author using DALL-E extension in ChatGPT (Correct number of beasts, but a wrong number of the names ;))

My goal was to present how the new features can support you in the complete analytical workflow and ease your work, whether you are a data engineer, analyst, or scientist. In addition, I wanted to point out how they can enhance collaboration among team members with different data backgrounds.

Hopefully, you will engage in hands-on magic yourself and learn more about the "Fantastic Beasts of BigQuery".

Happy learning!

Thank you for reading my post. Stay connected for more stories on Medium and Linkedin.

Knowledge resources

[1] Google Cloud webinar: "Cloud OnBoard: From Data to AI with BigQuery and Vertex AI", accessed December 10th 2023, https://cloudonair.withgoogle.com/events/cloud-onboard-data-to-ai

[2] Google Cloud blog: "Announcing BigQuery Studio – a collaborative analytics workspace to accelerate data-to-AI workflows", accessed December 11th 2023, https://cloudonair.withgoogle.com/events/cloud-onboard-data-to-AI

[3] Google Cloud documentation: "BigQuery DataFrames", accessed December 11th 2023, https://cloud.google.com/python/docs/reference/bigframes/latest


Related Articles