Databricks Data Graph Setup
Warning
Data Graph, Reverse ETL, and Profiles Sync require different warehouse permissions.
On this page, you'll learn how to connect your Databricks data warehouse to Segment for the Data Graph.
Segment assumes that you already have a workspace that includes the datasets you'd like to use for the Data Graph. Sign in to Databricks with admin permissions to create new resources and provide the Data Graph with the necessary permissions.
Segment recommends setting up a new Service Principal user and only giving this user permissions to access the required catalogs and schemas.
If you already have a Service Principal user you'd like to use, grant it "Can use" permissions for your data warehouse and proceed to Step 2.
- Log in to the Databricks UI as an Admin.
- Click User Management.
- Select the Service principals tab.
- Click Add Service Principal.
- Enter a Service Principal user name and click Add.
- Select the Service Principal user you just created and click Generate secret.
- Save the Secret and Client ID to a safe place. You'll need these values to connect your Databricks warehouse to Segment.
- Navigate to Workspaces and select your Workspace.
- Select the "Permissions" tab and click Add Permissions.
- Add the newly created Service Principal user and click Save.
- Log in to the Databricks UI as an Admin.
- Navigate to SQL Warehouses.
- Select your warehouse and click Permissions.
- Add the Service Principal user and grant them "Can use" access.
- Click Add.
Segment requires write access to this catalog for internal bookkeeping and to store checkpoint tables for the queries that are executed. Therefore, Segment recommends creating a new catalog for this purpose. This is also the catalog you'll be required to specify when connecting Databricks with the Segment app.
Info
Segment recommends creating a new database for the Data Graph. If you choose to use an existing database that has also been used for Segment Reverse ETL, you must follow the additional instructions to update user access for the Segment Reverse ETL catalog.
1CREATE CATALOG IF NOT EXISTS `SEGMENT_LINKED_PROFILES_DB`;2-- Copy the saved Client ID from previously generated secret3GRANT USAGE ON CATALOG `SEGMENT_LINKED_PROFILES_DB` TO `${client_id}`;4GRANT CREATE ON CATALOG `SEGMENT_LINKED_PROFILES_DB` TO `${client_id}`;5GRANT SELECT ON CATALOG `SEGMENT_LINKED_PROFILES_DB` TO `${client_id}`;
Run the following SQL to grant the Data Graph read-only access to the Profiles Sync catalog:
GRANT USAGE, SELECT, USE SCHEMA ON CATALOG `${profiles_sync_catalog}` TO `${client_id}`;
Run the following SQL to grant your Service Principal user read-only access to any additional catalogs you want to use for the Data Graph.
1-- ********** REPEAT THIS COMMAND FOR EACH CATALOG YOU WANT TO USE FOR THE DATA GRAPH **********2GRANT USAGE, SELECT, USE SCHEMA ON CATALOG `${catalog}` TO `${client_id}`;
Restrict access to specific schemas by running the following SQL:
1GRANT USAGE ON CATALOG `${catalog}` TO `${client_id}`;2USE CATALOG `${catalog}`;3GRANT USAGE, SELECT ON SCHEMA `${schema_1}` TO `${client_id}`;4GRANT USAGE, SELECT ON SCHEMA `${schema_2}` TO `${client_id}`;5...
Sign in to the Databricks CLI with your Client ID secret and run the following SQL to verify the Service Principal user has the correct permissions for a given table.
Success!
If this command succeeds, you can view the table.
1USE DATABASE ${linked_read_only_database} ;2SHOW SCHEMAS;3SELECT * FROM ${schema}.${table} LIMIT 10;
To connect your warehouse to the Data Graph:
- Navigate to Unify > Data Graph. This should be a Unify space with Profiles Sync already set up.
- Click Connect warehouse.
- Select Databricks as your warehouse type.
- Enter your warehouse credentials. You can find these details in your Databricks workspace by navigating to SQL Warehouse > Connection details. Segment requires the following settings to connect to your Databricks warehouse:
- Hostname: The address of your Databricks server
- Http Path: The address of your Databricks compute resources
- Port: The port used to connect to your Databricks warehouse. The default port is 443, but your port might be different
- Catalog: The catalog you designated in Step 2
- Service principal client ID: The client ID used to access to your Databricks warehouse
- OAuth secret: The OAuth secret used to connect to your Databricks warehouse
- Test your connection, then click Save.
If Segment Reverse ETL has ever run in the catalog you are configuring as the Segment connection catalog, a Segment-managed schema is already created and you need to provide the new Segment user access to the existing catalog. Run the following SQL if you run into an error on the Segment app indicating that the user doesn't have sufficient privileges on an existing _segment_reverse_etl catalog.
GRANT ALL PRIVILEGES ON SCHEMA ${segment_internal_catalog}.__segment_reverse_etl TO `${client_id}`;