Profiles Sync Sample Queries

FREE x

TEAM x

BUSINESS ✓

ADDON ✓

On this page, you'll find queries that you can run with Profiles Sync to address common use cases.

(information)

Info

The examples in this guide are based on a Snowflake installation. If you're using another warehouse, you may need to adjust the syntax.

About example schemas

The queries on this page use two example schemas:

ps_segment, a schema where Segment lands data
ps_ materialize, a schema with your produced materializations

These schema names may not match your own.

Monitor and diagnose identity graphs

These queries let you view and manage identity graphs, which give you insight into unified customer profiles generated by identity resolution.

Show how many profiles Segment creates and merges per hour

This example queries the id_graph_udpates table to measure the rate at which Segment creates and merges profiles, as well as the type of event that triggered the profile change:

1SELECT
2    DATE_TRUNC('hour',timestamp) as hr,
3    CASE
4      WHEN canonical_segment_id=segment_id
5      THEN 'profile creation' ELSE 'profile merge'
6    END as profile_event,
7    triggering_event_type,
8    COUNT(DISTINCT triggering_event_id) as event_count
9FROM ps_segment.id_graph_updates
10GROUP BY 1,2,3

Isolate profiles that have reached an identifier's maximum configured value

Segment's configurable identifier limits let you set maximum values for identifiers like email. These maximum configured values help prevent two separate users from being merged into a single Profile.

The following query lets you view Profiles that have reached a configured limit for the email identifier:

1WITH agg AS (
2    SELECT
3        canonical_segment_id,
4        COUNT(LOWER(TRIM(external_id_value))) as value_count,
5        LISTAGG(external_id_value,', ') as external_id_values
6    FROM ps_materialize.external_id_mapping
7    WHERE external_id_type='email'
8    GROUP BY 1
9)
10SELECT
11    canonical_segment_id,
12    external_id_values,
13    value_count
14FROM agg
15WHERE value_count > 5 -- set to your configured limit

Reconstruct a profile's traits

Identify the source that generated the value for a particular trait for a canonical profile as well as its child profiles

When a merge occurs, Segment selects and associates a single trait value with a profile. This logic depends on how you materialize the profile_traits table.

You can break out a profile, though, to see the trait versions that existed before the merge. As a result, you can identify a particular trait's origin.

The following example inspects a particular profile, use_XX, and trait, trait_1. The query reports the profile's last observed trait, its source ID, and any profiles Segment has since merged into the profile:

1SELECT * FROM (
2  SELECT
3    ids.canonical_segment_id,
4    ident.segment_id,
5    ident.event_source_id,
6    ident.trait_1,
7    row_number() OVER(PARTITION BY ident.segment_id ORDER BY ident.timestamp DESC) as rn
8  FROM ps_segment.identifies as ident
9  INNER JOIN ps_materialize.id_graph as ids
10ON ids.segment_id = ident.segment_id
11AND ids.canonical_segment_id = 'use_XXX'
12AND ident.trait_1 IS NOT NULL
13) WHERE rn=1

Measure and model your customer base

Pull a complete list of your customers, along with their merges, external identifiers, or traits

The following three snippets will provide a full list of your customers, along with:

The profile IDs merged into that customer:

1SELECT
2  canonical_segment_id,
3  LISTAGG(segment_id, ', ') as associated_segment_ids
4FROM ps_materialize.id_graph
5GROUP BY 1

The external IDs associated with that customer:

1SELECT
2  canonical_segment_id,
3  LISTAGG(external_id_value || '(' || external_id_type || ')', ', ') as associated_segment_ids
4FROM ps_materialize.external_id_mapping
5GROUP BY 1

The customer's traits:

SELECT * FROM ps_materialize.profile_traits WHERE merged_to IS NULL

Pull the latest subscription status set for every profile identifier in the space

Provides the latest subscription status set for all identifiers in the space. This query will not include identifiers that have no subscription status ever set.

1SELECT evt1.user_id,  evt1.channel, evt1._id id, evt1.status, evt1.received_at
2FROM ps_segment.CHANNEL_SUBSCRIPTION_UPDATED evt1
3JOIN (
4  SELECT _id, MAX(received_at) AS max_received_at
5  FROM ps_segment.CHANNEL_SUBSCRIPTION_UPDATED
6  GROUP BY _id
7) evt2
8ON evt1._id = evt2._id AND evt1.received_at = evt2.max_received_at
9ORDER BY 1

Show all pages visited by a user

To get complete user histories, join event tables to the identity graph and aggregate or filter with id_graph.canonical_segment_id:

1SELECT
2    id_graph.canonical_segment_id,
3    pages.*
4FROM ps_segment.pages
5LEFT JOIN ps_materialize.id_graph
6    ON id_graph.segment_id = pages.segment_id
7WHERE canonical_segment_id = 'use_XX..'

Show the complete history of a trait or audience membership associated with a customer

Suppose you want to track a user's entrances and exits of the audience aud_1. Running the following query would return all qualifying entrance and exits:

1SELECT
2    id_graph.canonical_segment_id,
3    identifies.aud_1,
4    identifies.timestamp
5FROM ps_segment.identifies
6INNER JOIN ps_materialize.id_graph
7    ON id_graph.segment_id = identifies.segment_id
8    AND identifies.aud_1 IS NOT NULL

This query works with any Trait or Audience membership, whether computed in Engage or instrumented upstream.

FAQs

Can I view Engage Audience membership and Computed Trait values in my Warehouse?

Yes. Engage sends updates to Audience membership (as a boolean) and computed trait value updates as traits on an Identify call that Segment forwards to your data warehouse.

The column name corresponds to the Audience or Trait key shown on the settings page:

Surface these values the same way as any other trait value:

The Trait's complete history will be in identifies
The Trait's current state for each customer will be in profile_traits

What is the relationship between `segment_id` and `canonical_segment_id`? Are they unique?

Identity merges change Segment's understanding of who performed historical events.

For example, if profile_b completed a "Product Purchased" event but Segment understands that profile_b should be merged into profile_a, Segment deduces that profile_a performed that initial "Product Purchased" event.

With that in mind, here's how to differentiate between segment_id and canonical_segment_id:

segment_id is a unique identifier representing Segment's understanding of who performed an action at the time the action happened.
canonical_segment_id is a unique identifier representing Segment's current understanding of who performed that action.

The mapping between these two identifiers materializes in your id_graph table. If a profile has not been merged away, then segment_id is equivalent to canonical_segment_id. If a profile has been merged away, id_graph reflects that state.

As a result, you can retrieve a customer's complete event history by joining an event table, like product_purchased to id_graph.

For more information, view the Profiles Sync tables guide.

Does Profiles Sync data ever differ from Unify data?

Profiles Sync mimics the materialization performed by Segment Unify. A user's merges, external IDs, and traits should be expected whether they're queried in the warehouse, Profile API, or viewed in the UI.

The following edge cases might drive slight (< 0.01%) variation:

Data processed by Unify hasn't yet landed in Profiles Sync.
If you rebuild or use non-incremental materialization for profile_traits, Profiles Sync will fully calculate traits against a user. As a result, Profiles Sync would ensure that all traits reflect the most recently observed value for fully-merged users.

By contrast, Segment Unify and incrementally-built Profiles Sync materializations won't combine already-computed traits across two merged profiles at the moment of merge. Instead, one profile's traits will be chosen across the board.

What hash function is used for the external_id_hash field by Profiles Sync?

The external_id_hash is a hash of the external_id_type and external_id_value using SHA-1. This field corresponds to the primary_key for the table: hash (external_id_type and external_id_value). For example, in BigQuery the logic is: TO_HEX(SHA1(concat(external_id_type, external_id_value))) as seg_hash.