How Data Thresholding in GA4 is Hiding Data and How to Avoid It

SEO Associate
Bianca Matos
April 28, 2023
Data thresholding in GA4

Google Analytics 4 provides many challenges for marketers as we start to embrace the inevitable, the end of Universal Analytics. And while you’re finding your way around the new tool, you may run into issues with data missing from reports, especially if you are running UA simultaneously with GA4. If you’re noticing this, it could be because of data thresholding applied to your reports.

In this blog, we will explore data thresholding in detail, help you understand when it is applied, and how you can tell if it is affecting your reports. We will also discuss how to avoid data thresholding and other workarounds available.  

What is Data Thresholding?

Data thresholding is a feature in GA4 that is designed to prevent users from being identified based on their demographics, interests, or other signals available in your data. You’ll see data thresholding applied when creating custom reports in “Explore” or viewing built-in reports.  

For example, if a report contains age, gender, or interest categories as a segment, a threshold may be applied, and some data will be kept hidden to prevent users from being identified. These thresholds are applied by Google and cannot be adjusted. 

When is Data Threshold Applied?

Google says that data thresholding is applied to reports because of these reasons: Google Signals, demographic information, selected date range, and search query information. It is most often applied to reports that have a low user count. The data is still there but not showcased in the report.

Google Signals

If you have enabled Google Signals and have a low user count in the specified date range, your data in a report or exploration may be withheld. Google signals can be found under Admin > Data Collection.

Enabling Google Signals allows you to track users across devices and platforms. When enabled, it collects data from users who have signed into a Google account and enabled the feature in their account settings. It can provide insights into your audience’s demographics, interests, and other characteristics. Google Signals lets you populate demographics data in GA4 and lets you reuse Google Analytics audiences as a retargeting audience in Google Ads.

But enabling Google Signals will also lead to data thresholding. We are noticing thresholding occurring when counts drop below 40-50 hits. This could significantly impact your reporting for websites with small volumes of traffic or for tracking low-frequency events.

Demographic Information

If a report or exploration includes demographic information and the reporting identity relies on the device ID, you may see the row containing that data withheld if there are a small number of users.

GA4 only starts collecting demographic data from the moment you enable Google Signals. Historic data is not available. Because users have to opt into having their demographics shared and be signed into a Google account, it means that the demographic data you do have may not be a complete data set.

Search Query Information

If a report or exploration includes search query information, the row containing the data may be withheld if there are not enough total users. You’ll see this applied if you’ve connected Search Console to GA4 or if you are viewing your site’s search terms.

Date Range

Because data thresholding is applied due to low user or event counts, it may also be applied when viewing reports or explorations that have a narrow date range. Expanding your date range may increase your user or event count which can allow you to see thresholded data.

How to Tell if Data Thresholding is Applied to Your Reports

If you see an orange exclamation icon while viewing reports or explorations, this warning indicates that data thresholding has been applied to your report. This icon can appear even though it says that your report is unsampled, based on 100% of available data.

Google has not specified the exact number that triggers data thresholding to hide data in reports so it is hard to determine when exactly it will appear. Here’s what the icon looks like in GA4:

Screenshot showing the data thresholding icon in GA4

How to Avoid Data Thresholding in Reports

If you don’t plan on using demographic information in reports or using GA4 audiences for retargeting in Google Ads, keep Google Signals disabled on your GA4 property. By not enabling it, data thresholding is not applied to your reports.  

Unfortunately, if you need those features, there is no real way to completely prevent data thresholding from occurring. If you have Google Signals enabled, and decide to disable it, data thresholding will not be applied to future data. Any date ranges that did previously include it will be impacted. For example, viewing year-over-year or quarter-over-quarter data will be impacted because Google Signals was once enabled.  

What Can Be Done If I Want to Keep Google Signals On?

If you decide to keep Google Signals on and want to avoid data thresholding in your reports, you have one workaround option: changing the default reporting identity.  

GA4 offers default reporting identity as a feature which affects how Google Analytics calculates the users of your website. You can view the various options by going to Admin > Reporting Identity.  

There you will see the two main options, “blended” and “observed”. By clicking on Show All, you can see the other option, “device-based”.  

Reporting identity options available in GA4

Change Reporting Identity in GA4

When you change the reporting identity it changes how Google Analytics identifies and categorizes users in your reports. There are three different ways Analytics can identify a user: 

  • Device ID – also known as a client ID. It’s a random integer stored by first-party cookies, and it is set automatically. It gets stored on the user’s first visit and is set for two years.  
  • User ID – this ID is set by you with a unique identifier. After the user has logged into your site, your authentication system assigns them an ID.  
  • Google Signals – only available for users who have turned on Ads Personalization in their Google account.  

In GA4 you have three options to choose from when identifying your users, blended, observed, and device based.  =

  • Device-based – is the most basic one as it just uses device ID. If the same user uses multiple browsers or devices, GA4 will treat that as separate users.  
  • Observed – uses cookie data, Google Signals, and user ID which can help remove duplicates for certain users by understanding that a single person uses different devices. 
  • Blended – is the most advanced option. It includes all the methods used by Observed but also uses machine learning to fill in the gaps and model data. Google Consent Mode needs to be unlocked to use this feature.  

If “Observed” or “Blended” is selected as your reporting identity, thresholding is likely being applied to your reports. By changing to “Device-based”, Google Signals will not be used to calculate users and thresholding will not be applied.  

Blended Reporting Identity Enabled

Blended reporting identity in GA4 with data missing
Data missing in Events Admin panel with Blended reporting identity enabled
GA4 report with data missing because of Blended reporting identity
Data missing in Events report with Blended reporting identity enabled

Device-based Reporting Identity Enabled

Device-based reporting identity in GA4 with data viewable
Data viewable in Events Admin panel with Device-based reporting identity enabled
GA4 report with data viewable with Device-based reporting identity
Data viewable in Events report with Blended reporting identity enabled

What is great about reporting identity is that you can change which type you want as many times as you want and whenever you want to. The data stored in your property will not be affected by changing it. Reporting identity is also applied retroactively and doesn’t affect data collection.

If you decide to change to Device-based reporting identity, be aware that things like user ID won’t be taken into account which can mean that your user counts are less accurate.

Final Thoughts 

Although it can be frustrating for those who want to see all of their website data, Google considers it is necessary to maintain privacy and ensure compliance with data protection regulations. By understanding when and why data thresholding is applied, you can better interpret your reports and use workarounds such as changing your default reporting identity to minimize its impact.  

Remember, data thresholding is applied in GA4, and data will be withheld when it meets all of these conditions: 

  • Some or all data has been collected by Google Signals  
  • Reporting Identity if either Blended or Observed  
  • The report you’re viewing contains rows with small user/event/session numbers 

Navigating GA4 can be complex, especially when it comes to reporting and the new features. At Ontario SEO, we can help you set up GA4, migrate from UA, and make sure that your data is being properly collected and displayed. Contact us today to help you get solutions to your GA4 questions and concerns.