Skip to contentSkip to navigationSkip to topbar
On this page
Looking for more inspiration?Visit the
(information)
You're in the right place! Segment documentation is now part of Twilio Docs. The content you are used to is still here—just in a new home with a refreshed look.

Databricks Reverse ETL Setup


Set up Databricks as your Reverse ETL source.

At a high level, when you set up Databricks for Reverse ETL, the configured service-principal needs read permissions for any resources (databases, schemas, tables) the query needs to access. Segment keeps track of changes to your query results with a managed schema (__SEGMENT_REVERSE_ETL), which requires the configured service-principal to allow write permissions for that schema. Segment supports only OAuth (M2M) authentication. To generate a client ID and Secret, follow the steps listed in Databricks' OAuth machine-to-machine (M2M) authentication(link takes you to an external page) documentation.

(information)

Databricks Reverse ETL sources support Segment's dbt extension

If you have an existing dbt account with a Git repository, you can use Segment's dbt extension to centralize model management and versioning, reduce redundancies, and run CI checks to prevent breaking changes.


Required permissions

required-permissions page anchor
  • Make sure the service principal you use to connect to Segment has permissions to use that warehouse. In the Databricks console go to SQL warehouses and select the warehouse you're using. Navigate to Overview > Permissions and make sure the service principal you use to connect to Segment has can use permissions.

    Note the Service Principal UUID from the User Management Page(link takes you to an external page) (under Service Principals) for the following SQL operations.

  • To grant access to read data from the tables used in the model query, run:

    1
    GRANT USAGE ON SCHEMA <schema_name> TO `<service principal ID you are using to connect to Segment>`;
    2
    GRANT SELECT, READ_METADATA ON SCHEMA <schema_name> TO `<service principal ID you are using to connect to Segment>`;
  • To grant Segment access to create a schema to keep track of the running syncs, run:

    GRANT CREATE on catalog <name of the catalog, usually hive_metastore or main if using unity-catalog> TO `<service principal ID you are using to connect to Segment>`;
  • If you want to create the schema yourself instead and then give Segment access to it, run:

    1
    CREATE SCHEMA IF NOT EXISTS __segment_reverse_etl;
    2
    GRANT ALL PRIVILEGES ON SCHEMA __segment_reverse_etl TO `<service principal ID you are using to connect to Segment>`;

To set up Databricks as your Reverse ETL source:

  1. Log in to your Databricks account.
  2. Navigate to Workspaces and select the workspace you want to use.
  3. Select SQL in the main navigation.
  4. Select SQL Warehouses and select the warehouse you want to use. Note that Segment doesn't support the Compute connection parameters.
  5. Go to the Connection details tab and keep this page open.
  6. Open your Segment workspace(link takes you to an external page).
  7. Navigate to Connections > Sources and select the Reverse ETL tab.
  8. Click + Add Reverse ETL source.
  9. Select Databricks and click Add Source.
  10. Enter the configuration setting for your Databricks source based on information from step 5:
    • Hostname: adb-xxxxxxx.azuredatabricks.net
    • Http Path: /sql/1.0/warehouses/xxxxxxxxx
    • Port: 443 (default)
    • Service principal client ID: <your client ID>
    • OAuth secret: <OAuth secret used during connection>
    • Catalog [optional]: If not specified, Segment will use the default catalog
  11. Click Test Connection to see if the connection works. If the connection fails, make sure you have the right permissions and credentials, then try again.
  12. Click Add source if the test connection is successful.
(warning)

Segment authenticates with your workspace using OAuth (M2M) authentication at Databricks' recommendation

Segment previously supported token-based authentication, but now uses OAuth (M2M) authentication at the recommendation of Databricks. If you previously set up your source using token-based authentication, Segment will continue to support it. If you want to create a new source or update the connection settings of an existing source, Segment only supports OAuth machine-to-machine (M2M) authentication(link takes you to an external page).

After you've successfully added your Databricks source, add a model and follow the rest of the steps in the Reverse ETL setup guide.