In 2019, Google announced that a new version of Google Analytics was on the roadmap: GA4 (previously called App + Web). By the end of 2020, this new product has been officially launched after a successful beta testing period.
The main difference between GA3 (universal analytics) and GA4 is the shift of the data schema, moving from session-based to event-based. At Semetis, since the start of 2021, we are pushing our clients to adopt this new product as it has become crystal clear that this is the future as no more improvements will be brought to GA3 (maintenance will continue for a period of time). If you want more information, go through our other articles where we’ve been writing about: what the tool was about, how to set up your GA4 account, how to use your existing enhanced ecommerce dataLayer for GA4, and many others.
One major improvement of GA4 over GA3 is the capability to export your data to Google BigQuery (Google Cloud Platform) for all accounts and for free while GA3 only allowed this for premium accounts (GA360). This feature will be the focus of this article.
What is BigQuery?
BigQuery is a fully managed enterprise data warehouse enabling you to store and analyze your data. BigQuery’s serverless architecture lets you use SQL queries to answer your organization's biggest questions with zero infrastructure management. BigQuery works great with all sizes of data, from a 100 row Excel spreadsheet to several Petabytes of data. Most importantly, it can execute a complex query on that data within a few seconds (terabytes) or minutes (petabytes).
You can then store your raw data in BigQuery, transform that data using SQL and store this output into another BigQuery table.
- Data ingestion - you can import data (1) manually, (2) using the API available in most programming languages or (3) using a data integration solution such as Fivetran, Stitch, Supermetrics, etc.
- Data transformation - you can manipulate the data using the specific SQL syntax of BigQuery, which comes with a very clear documentation. Moreover, built-in features like machine learning, geospatial analysis and business intelligence are also available.
- Data activation - the insights generated through your data transformation can then be leveraged in various ways : export to csv/excel files, connection to BI tools for visualizations, export to various platforms such as audiences in Google Analytics.
Finally, it is important to note that BigQuery is not free of charge and its cost depends on the amount of data stored and processed. Many GCP products (including BigQuery) possess a “Free Tier”, which are monthly limits under which you do not incur any cost. The BigQuery “Free Tier” gives 10GB of storage and 1TB of processing per month. In our experience, those costs remain extremely low, see more information on pricing here.
How to set up the connection?
Because those 2 products are part of the Google ecosystem, setting up a connection to send your raw GA4 data to BigQuery is extremely easy and requires a few clicks and no code is needed. All you need is a GA4 account and a Google Cloud Platform (GCP) account. If you do not already have a GCP account, it is possible to start for free by either :
1. Using the “BigQuery sandbox” (available to anyone with a Google Account) enables to use BigQuery for free (you do not need to enter any credit card details) in order to make sure that it is the right solution for your company. Many limitations apply, such as “Free Tier” usage only, see the full list here;
2. When creating an account (directly or upgrading from the sandbox), GCP offers a 90-days trial with a credit of 300$, enabling to try all their services. Note that you will need to enter a credit card, which will be billed as soon as the trial period or you credit is over.
How can you leverage GA4 data in BigQuery?
Once the connection is up and running, a dataset in BigQuery with the nomenclature “analytics_<property_id>” will be created under the GCP project you selected during the connection set-up. Every morning a new table (or partitioned table) containing all the raw data of the previous day will be created (if frequency “daily” is chosen in the connection set-up).
GA4 data can be used for a wide variety of projects :
- Answering any business question related to the data. One of our clients was interested to understand how their users are interacting with the product filters of their website as part of a CRO/UX mission. With the right tracking plan in place, we were able to create a very detailed dashboard to view the filter usage between countries, group of filters most used, filter leading to more conversions, etc. Moreover, the whole architecture was implemented on GCP, enabling a fully automated update of the data on a daily basis;
- Audience creation
- LTV prediction;
- Churn prediction;
- And many more
Those projects can very easily integrate the built-in machine learning capabilities through BigQuery ML and the insights can be then pushed to different platforms for activation (ie. pushing audiences in GA)
Finally, it is important to note that the data you get in BigQuery slightly differs from the data you have in GA4 :
- PII (personally identifiable information) dimensions such as IP addresses are not available in BigQuery;
- Variables based on Google’s internal data and extrapolations such as age and gender are not available in BigQuery;
- Any data created in Google Analytics, such as segments or advertising costs (if linked with a platform) are not available in BigQuery
How to get started?
Thanks to the broad adoption of this technology, many resources, and use cases are available online. A good starting point is the BigQuery cookbook with example queries or the course “Insights from Data with BigQuery”, both provided by Google (only for GA3 for the moment). In order to generate proper insights, you will also need to make sure that your tracking plan is capturing the right data in the appropriate format. Do not hesitate to reach out to us if you would like to discuss which use case to start with, how to properly set up your tracking plan, and create fully automated data transformation pipelines on GCP.