Skip to main content

Command Palette

Search for a command to run...

Calculate sessions in GA4's BigQuery

Updated
5 min read
Calculate sessions in GA4's BigQuery
M

Spent over a decade helping organizations thrive through re-platforming, digital analytics, and marketing automation. Now, I’m pivoting to Data Privacy and Governance. I specialize in translating abstract frameworks into actionable practices ensuring growth and protection work in tandem.

The most reliable way to calculate sessions in Google Analytics 4 (GA4)'s BigQuery export is counting unique combinations of user_pseudo_id and ga_session_id,

Let's explore the nuances.

Defining a Session

In GA4, a session represents a period of user interaction with your website or app. The GA4's session definition is more flexible than in Universal Analytics (UA).

UA relied heavily on timeouts and specific events to determine session boundaries.

GA4, in contrast, prioritizes user engagement and allows for more nuanced session initiation.

Why user_pseudo_id and ga_session_id are Essential

  • user_pseudo_id: This is a randomly generated, client-side identifier* that allows GA4 to recognize a user across sessions. It's critical for distinguishing between different users.

  • ga_session_id: This is a unique identifier assigned to each session. It allows GA4 to group events that belong to the same session.

In Google BigQuery, counting the distinct combinations of these two fields, ensure that:

  • Each user's sessions are counted separately.

  • Each session is counted only once, even if it contains multiple events.

Addressing the Nuances: Session Start Without session_start or user_engagement

Why a session can start without these seemingly crucial events:

  1. Custom Events and Measurement Protocol:
  • GA4 is designed to be highly flexible and accommodate various data collection methods. When you use the Measurement Protocol or server-side tracking, you might send events that don't include session_start or user_engagement.

  • If GA4 receives a custom event (e.g., purchase, add_to_cart, click) and no active session exists for that user_pseudo_id, it infers that a new session has started. This is particularly relevant in scenarios where you want to capture server-side actions or offline conversions.

  • This is very important for CRM integrations. For example, if a user makes a purchase over the phone, and that purchase is sent to GA4 via the measurement protocol, that purchase event will start a new session if one does not already exist.
  1. Delayed session_start or user_engagement:
  • In some cases, the session_start event might be delayed due to network latency or client-side (web browser, apps on user’s device) processing.

  • If a user quickly views a page (page_view) and then leaves before the 10-second user engagement threshold is reached, the user_engagement event might not fire.

  • However, GA4 still recognizes the page_view as a session initiation, because a user did interact with the site.

  • This is most common with bounce situations. A user hits a page, and leaves very quickly.

  1. App Background/Foreground:
  • In mobile apps, sessions can be impacted by the app going into the background and then coming back to the foreground. GA4 is designed to handle this, and may not always send a session_start event upon foregrounding if the background time was short.

Other Methods can have shortcoming

  • Counting session_start events: This will undercount sessions, as it misses those that start with other event types.

  • Counting user_engagement events: This is also unreliable, as many sessions do not include user_engagement events.

  • Time-based calculations: Trying to infer sessions based on time gaps between events can be complex and inaccurate, especially with GA4's flexible session definition.

BigQuery Implementation

Here's a basic SQL query to calculate sessions in BigQuery:

SQL

SELECT COUNT(DISTINCT CONCAT(user_pseudo_id, ga_session_id)) AS session_count FROM your_project.your_dataset.events_* WHERE TABLESUFFIX BETWEEN 'YYYYMMDD' AND 'YYYYMMDD';

Takeaways

  • GA4's session definition is more flexible than UA's.

  • Counting unique combinations of user_pseudo_id and ga_session_id is the most accurate method for session calculation in BigQuery.

  • Be aware that sessions can start without session_start or user_engagement events, especially in custom event tracking and Measurement Protocol scenarios.

  • The GA4 session is user focused. Meaning, that any user interaction has the ability to start a new session.


*client-side identifier

In essence, client-side identifiers are tools that allow websites and applications to "remember" users and their actions, enhancing the user experience and providing valuable data for analysis.6

A client-side identifier is essentially a piece of data that's stored and managed on the user's device (the "client"), such as their web browser or mobile app. Its purpose is to help identify that specific user or device over time.

Here's a breakdown:

  • Where it lives:
  • Client-side identifiers reside on the user's device. This contrasts with server-side identifiers, which are stored on the servers of a website or application.
  • How it works:
  • These identifiers can be implemented in various ways, with common methods including:
  • Cookies: Small text files stored in a web browser.

  • Local storage: A more modern browser feature that allows websites to store larger amounts of data.

  • Device identifiers: Unique codes assigned to mobile devices by their operating systems.

  • Purpose:
  • Client-side identifiers serve several purposes:
  • Tracking user behavior: Websites and apps use them to track user activity, such as pages visited, items added to a shopping cart, or interactions with specific features.

  • Personalization: They enable websites to personalize the user experience by remembering preferences or displaying relevant content.

  • Authentication: They can be used to maintain user login sessions.

  • Analytics: They are critical for analytics platforms, like Google Analytics, to distinguish between unique users and track their interactions.

  • GA4 Context:
  • In Google Analytics 4 (GA4), the user_pseudo_id is a prime example of a client-side identifier. It allows GA4 to recognize a user across multiple sessions, even if they don't explicitly log in.

In essence, client-side identifiers are tools that allow websites and applications to "remember" users and their actions, enhancing the user experience and providing valuable data for analysis.

18 views