Profiles Sync Sample Queries
On this page, you'll find queries that you can run with Profiles Sync to address common use cases.
Info
The examples in this guide are based on a Snowflake installation. If you're using another warehouse, you may need to adjust the syntax.
The queries on this page use two example schemas:
ps_segment, a schema where Segment lands dataps_ materialize, a schema with your produced materializations
These schema names may not match your own.
These queries let you view and manage identity graphs, which give you insight into unified customer profiles generated by identity resolution.
This example queries the id_graph_udpates table to measure the rate at which Segment creates and merges profiles, as well as the type of event that triggered the profile change:
1SELECT2DATE_TRUNC('hour',timestamp) as hr,3CASE4WHEN canonical_segment_id=segment_id5THEN 'profile creation' ELSE 'profile merge'6END as profile_event,7triggering_event_type,8COUNT(DISTINCT triggering_event_id) as event_count9FROM ps_segment.id_graph_updates10GROUP BY 1,2,3
Segment's configurable identifier limits let you set maximum values for identifiers like email. These maximum configured values help prevent two separate users from being merged into a single Profile.
The following query lets you view Profiles that have reached a configured limit for the email identifier:
1WITH agg AS (2SELECT3canonical_segment_id,4COUNT(LOWER(TRIM(external_id_value))) as value_count,5LISTAGG(external_id_value,', ') as external_id_values6FROM ps_materialize.external_id_mapping7WHERE external_id_type='email'8GROUP BY 19)10SELECT11canonical_segment_id,12external_id_values,13value_count14FROM agg15WHERE value_count > 5 -- set to your configured limit
When a merge occurs, Segment selects and associates a single trait value with a profile. This logic depends on how you materialize the profile_traits table.
You can break out a profile, though, to see the trait versions that existed before the merge. As a result, you can identify a particular trait's origin.
The following example inspects a particular profile, use_XX, and trait, trait_1. The query reports the profile's last observed trait, its source ID, and any profiles Segment has since merged into the profile:
1SELECT * FROM (2SELECT3ids.canonical_segment_id,4ident.segment_id,5ident.event_source_id,6ident.trait_1,7row_number() OVER(PARTITION BY ident.segment_id ORDER BY ident.timestamp DESC) as rn8FROM ps_segment.identifies as ident9INNER JOIN ps_materialize.id_graph as ids10ON ids.segment_id = ident.segment_id11AND ids.canonical_segment_id = 'use_XXX'12AND ident.trait_1 IS NOT NULL13) WHERE rn=1
The following three snippets will provide a full list of your customers, along with:
- The profile IDs merged into that customer:
1SELECT2canonical_segment_id,3LISTAGG(segment_id, ', ') as associated_segment_ids4FROM ps_materialize.id_graph5GROUP BY 1
- The external IDs associated with that customer:
1SELECT2canonical_segment_id,3LISTAGG(external_id_value || '(' || external_id_type || ')', ', ') as associated_segment_ids4FROM ps_materialize.external_id_mapping5GROUP BY 1
- The customer's traits:
SELECT * FROM ps_materialize.profile_traits WHERE merged_to IS NULL
Provides the latest subscription status set for all identifiers in the space. This query will not include identifiers that have no subscription status ever set.
1SELECT evt1.user_id, evt1.channel, evt1._id id, evt1.status, evt1.received_at2FROM ps_segment.CHANNEL_SUBSCRIPTION_UPDATED evt13JOIN (4SELECT _id, MAX(received_at) AS max_received_at5FROM ps_segment.CHANNEL_SUBSCRIPTION_UPDATED6GROUP BY _id7) evt28ON evt1._id = evt2._id AND evt1.received_at = evt2.max_received_at9ORDER BY 1
To get complete user histories, join event tables to the identity graph and aggregate or filter with id_graph.canonical_segment_id:
1SELECT2id_graph.canonical_segment_id,3pages.*4FROM ps_segment.pages5LEFT JOIN ps_materialize.id_graph6ON id_graph.segment_id = pages.segment_id7WHERE canonical_segment_id = 'use_XX..'
Suppose you want to track a user's entrances and exits of the audience aud_1. Running the following query would return all qualifying entrance and exits:
1SELECT2id_graph.canonical_segment_id,3identifies.aud_1,4identifies.timestamp5FROM ps_segment.identifies6INNER JOIN ps_materialize.id_graph7ON id_graph.segment_id = identifies.segment_id8AND identifies.aud_1 IS NOT NULL
This query works with any Trait or Audience membership, whether computed in Engage or instrumented upstream.
Yes. Engage sends updates to Audience membership (as a boolean) and computed trait value updates as traits on an Identify call that Segment forwards to your data warehouse.
The column name corresponds to the Audience or Trait key shown on the settings page:
Surface these values the same way as any other trait value:
- The Trait's complete history will be in
identifies - The Trait's current state for each customer will be in
profile_traits
Identity merges change Segment's understanding of who performed historical events.
For example, if profile_b completed a "Product Purchased" event but Segment understands that profile_b should be merged into profile_a, Segment deduces that profile_a performed that initial "Product Purchased" event.
With that in mind, here's how to differentiate between segment_id and canonical_segment_id:
segment_idis a unique identifier representing Segment's understanding of who performed an action at the time the action happened.canonical_segment_idis a unique identifier representing Segment's current understanding of who performed that action.
The mapping between these two identifiers materializes in your id_graph table. If a profile has not been merged away, then segment_id is equivalent to canonical_segment_id. If a profile has been merged away, id_graph reflects that state.
As a result, you can retrieve a customer's complete event history by joining an event table, like product_purchased to id_graph.
For more information, view the Profiles Sync tables guide.
Profiles Sync mimics the materialization performed by Segment Unify. A user's merges, external IDs, and traits should be expected whether they're queried in the warehouse, Profile API, or viewed in the UI.
The following edge cases might drive slight (< 0.01%) variation:
- Data processed by Unify hasn't yet landed in Profiles Sync.
- If you rebuild or use non-incremental materialization for
profile_traits, Profiles Sync will fully calculate traits against a user. As a result, Profiles Sync would ensure that all traits reflect the most recently observed value for fully-merged users.
By contrast, Segment Unify and incrementally-built Profiles Sync materializations won't combine already-computed traits across two merged profiles at the moment of merge. Instead, one profile's traits will be chosen across the board.
The external_id_hash is a hash of the external_id_type and external_id_value using SHA-1. This field corresponds to the primary_key for the table: hash (external_id_type and external_id_value).
For example, in BigQuery the logic is: TO_HEX(SHA1(concat(external_id_type, external_id_value))) as seg_hash.