Connection fields
| Field | Required | Description |
|---|---|---|
host | yes | Workspace URL, e.g. dbc-12345678-abcd.cloud.databricks.com |
httpPath | yes | SQL warehouse HTTP path, e.g. /sql/1.0/warehouses/abc123def456 |
token | yes | Personal access token or service principal token |
catalog | yes | Unity Catalog catalog name |
schema | no | Default schema |
Recommended: service principal
For production use, create a service principal rather than using a personal access token tied to a human user:- In your Databricks account console, open User management → Service principals → Add service principal.
- Name it (
traffical-reader). - In the workspace, Workspace settings → Identity and access → Service principals → Add. Add the service principal you just created.
- Generate an OAuth token for the service principal (Databricks docs cover the flow per cloud).
- Grant Unity Catalog permissions (next section).
Unity Catalog grants
The service principal needs read access to your source data and read + write access to a dedicated Traffical schema. Both live in a single Unity Catalog catalog (typically the catalog where your analytics data lives).MODIFY covers INSERT, UPDATE, and DELETE. Combined with CREATE TABLE and USE SCHEMA, that’s everything the pipeline needs in the Traffical schema.
When configuring the connection, set catalog: analytics and the Traffical working schema name (traffical) in the schema field.
SQL warehouse sizing
The pipeline runs short queries. Recommendations:- A serverless SQL warehouse with
Smallcluster size handles most workloads. Auto-stop after 5–10 minutes idle is plenty. - For very large fact tables, consider
MediumorLarge— but watch the cost dashboard. - Use a dedicated warehouse for Traffical so the pipeline’s queries don’t compete with BI dashboards for compute.
Recommended table design
Delta tables withZORDER on user/entity columns make joins fast:
OPTIMIZE jobs should be scheduled separately (Databricks docs explain). Without them, query performance degrades over time as small files accumulate.
Example definitions
Assignment definition SQL:Network access
Databricks is HTTPS-only and Traffical connects via the SQL API. No firewall setup is needed by default. If your workspace has IP access lists configured, add Traffical’s egress IPs (listed in Settings → Warehouse) to the allowed list. For PrivateLink setups, talk to us — that’s a custom setup but feasible.Gotchas
- Personal access tokens expire. If you use a PAT instead of a service principal, the connection will fail when the token expires. Service principals are recommended for that reason.
- Hive Metastore vs Unity Catalog. Traffical supports Unity Catalog. If your data lives in the legacy Hive Metastore (
hive_metastorecatalog), it works, but Unity Catalog is the recommended path going forward. - Auto-stop too aggressive. If your SQL warehouse is configured to stop after 1 minute idle, the pipeline pays a cold-start delay on every run. A 5–10 minute idle is a better tradeoff.
- DBR runtime. SQL warehouses use Photon, which is what Traffical expects. All-purpose clusters work too but are slower and more expensive for ad-hoc queries.