Skip to contentSkip to navigationSkip to topbar
On this page
Looking for more inspiration?Visit the
(information)
You're in the right place! Segment documentation is now part of Twilio Docs. The content you are used to is still here—just in a new home with a refreshed look.

Databricks for Profiles Sync


FREE x
TEAM x
BUSINESS
ADDON

With Databricks for Profiles Sync, you can use Profiles Sync to sync Segment profiles into your Databricks Lakehouse.


Getting started

getting-started page anchor

Before getting started with Databricks Profiles Sync, note the following prerequisites for setup.

Warehouse size and performance

warehouse-size-and-performance page anchor

A SQL warehouse is required for compute. Segment recommends a warehouse with the the following characteristics:

  • Size: small
  • Type: Serverless otherwise Pro
  • Clusters: Minimum of 2 - Maximum of 6
(success)

Success!

To improve the query performance of the Delta Lake, Segment recommends creating compact jobs per table using OPTIMIZE following Databricks recommendations(link takes you to an external page).

(information)

Info

Segment recommends manually starting your SQL warehouse before setting up your Databricks destination. If the SQL warehouse isn't running, Segment attempts to start the SQL warehouse to validate the connection and may experience a timeout when you hit the Test Connection button during setup.


Set up Databricks for Profiles Sync

set-up-databricks-for-profiles-sync page anchor
  1. From your Segment app, navigate to Unify > Profiles Sync.
  2. Click Add Warehouse.
  3. Select Databricks as your warehouse type.
  4. Use the following steps to connect your warehouse.

Connect your Databricks warehouse

connect-your-databricks-warehouse page anchor

Use the five steps below to connect to your Databricks warehouse.

(warning)

Warning

To configure your warehouse, you'll need read and write permissions.

Step 1: Name your schema

step-1-name-your-schema page anchor

Pick a name to help you identify this space in the warehouse, or use the default name provided. You can't change this name once the warehouse is connected.

Step 2: Enter the Databricks compute resources URL

step-2-enter-the-databricks-compute-resources-url page anchor

You'll use the Databricks workspace URL, along with Segment, to access your workspace API.

Check your browser's address bar when inside the workspace. The workspace URL should resemble: https://<workspace-deployment-name>.cloud.databricks.com. Remove any characters after this portion and note the URL for later use.

Step 3: Enter a Unity catalog name

step-3-enter-a-unity-catalog-name page anchor

This catalog is the target catalog where Segment lands your schemas and tables.

  1. Follow the Databricks guide for creating a catalog(link takes you to an external page). Be sure to select the storage location created earlier. You can use any valid catalog name (for example, "Segment"). Note this name for later use.
  2. Select the catalog you've just created.
    1. Select the Permissions tab, then click Grant.
    2. Select the Segment service principal from the dropdown, and check ALL PRIVILEGES.
    3. Click Grant.

Step 4: Add the SQL warehouse details from your Databricks warehouse

step-4-add-the-sql-warehouse-details-from-your-databricks-warehouse page anchor

Next, add SQL warehouse details about your compute resource.

  • HTTP Path: The connection details for your SQL warehouse.
  • Port: The port number of your SQL warehouse.

Step 5: Add the service principal client ID and client secret

step-5-add-the-service-principal-client-id-and-client-secret page anchor

Segment uses the service principal to access your Databricks workspace and associated APIs.

Service principal client ID: Follow the Databricks guide for adding a service principal to your account(link takes you to an external page). This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Profiles Sync"). Segment doesn't require Account admin or Marketplace admin roles.

The service principal needs the following setup:

Client secret: Follow the Databricks instructions to generate an OAuth secret(link takes you to an external page).

Once you've configured your warehouse, test the connection and click Next.


With selective sync, you can choose exactly which tables you want synced to the Databricks warehouse. Segment syncs materialized view tables as well by default.

Select tables to sync, then click Next. Segment creates the warehouse and connects databricks to your Profiles Sync space.

You can view sync status, and the tables you're syncing from the Profiles Sync overview page.

Learn more about using selective sync with Profiles Sync.