- Allocations — variants the user can be assigned to, each with a bucket range and a set of parameter overrides
- Conditions (optional) — context predicates restricting who is eligible
- Eligible bucket range (optional) — restricts the policy to a sub-range of the layer
- Algorithm configuration — only for adaptive policies (Thompson Sampling, contextual bandits, etc.)
- Rollout configuration (optional) — turns the policy into a progressive rollout
Policy states
A policy moves through a small lifecycle:| State | Description |
|---|---|
draft | Not yet active. Users are not assigned. |
running | Active. Users are assigned based on allocations and conditions. |
paused | Temporarily inactive. Users fall through to the next policy or to defaults. |
completed | Finished. The winning allocation can be promoted to the parameter default and the policy archived. |
Static vs adaptive
- Static
- Adaptive
Static policies have allocations with fixed bucket ranges. They don’t change unless you explicitly update them or attach a rollout. Use static policies for:
- A/B tests with fixed traffic splits
- Targeted overrides for specific segments
- Holdout groups
Allocations
An allocation maps a bucket range to a set of parameter overrides.- Name — a label like
control,treatment_a,low_price - Bucket range —
[start, end]within the layer’s bucket space - Overrides — parameter keys and the values to use for users in this allocation
Targeting conditions
A policy can declare conditions that restrict eligibility based on context fields. All conditions must pass (AND).Condition operators
Condition operators
| Operator | Description | Example |
|---|---|---|
eq | Equals | locale eq "en-US" |
neq | Not equals | plan neq "free" |
gt / gte | Greater than (or equal) | age gte 18 |
lt / lte | Less than (or equal) | cart_total lt 50 |
in | In list | country in ["US", "CA", "GB"] |
nin | Not in list | role nin ["admin", "internal"] |
contains | String contains | email contains "@company.com" |
regex | Regex match | user_agent regex "Mobile.*" |
Adaptive algorithms
Adaptive policies specify an algorithm and a goal:| Algorithm | Use case |
|---|---|
thompson_bernoulli | Binary outcomes (clicked / didn’t, converted / didn’t). Beta-Bernoulli Thompson Sampling. |
epsilon_greedy | Exploits the best-known variant most of the time; explores randomly with probability epsilon. |
ucb1 | Upper Confidence Bound. Optimistic exploration that decreases naturally over time. |
linear_contextual | Personalized scoring per user. The SDK uses context features and trained coefficients (shipped in the bundle) to score each allocation locally. |
Per-entity adaptive policies
A standard A/B test learns one answer for the whole user base. Sometimes that’s the wrong shape. Each product page might have a different best image order. Each merchant might convert better with a different recommendation algorithm. Each user segment might respond to a different email tone. A per-entity adaptive policy runs one bandit per entity:entityKeys— the field(s) incontextthat identify the entity (e.g.productId,userId,merchantId).resolutionMode: "bundle"— entity weights are shipped in the config bundle; the SDK resolves locally (sub-millisecond, but weights are only as fresh as the bundle).resolutionMode: "edge"— the SDK calls Traffical for each decision and gets the latest weights. Higher latency but real-time freshness.
Dynamic allocations
When each entity has a different number of options (e.g. each product has a different image count), usedynamicAllocations:
context.imageCount = 5, the SDK creates allocations ["0", "1", "2", "3", "4"] and selects the one with the highest learned weight. The selected index is reported as the allocation name on the resulting decision.
See the per-entity adaptive pattern for an end-to-end walkthrough.
Contextual bandits (personalized policies)
A contextual bandit personalizes the assignment based on user context features. Different users see different variants — and the model learns which features predict which variant performs best. The training pipeline produces coefficients per allocation. Those coefficients ship in the config bundle. At resolution time the SDK computes a score per allocation from the user’s context and selects via softmax — all locally, no network call. Two things you need to know:- Context logging allowlist — only the context fields you explicitly opt in to are logged with exposure events. This protects PII while still giving the trainer signal to learn from.
- Exploration is preserved — the softmax has a temperature (
gamma) and a minimum action probability (actionProbabilityFloor) so the model keeps exploring as it learns.
Eligible bucket ranges
A policy can optionally narrow itself to a sub-range of the layer:Next steps
A/B testing
Run a static policy end-to-end.
Optimization
Adaptive policies, contextual bandits, per-entity bandits.
Rollouts
Turn a policy into a progressive rollout with health checks.
Canonical experiments
The common patterns and how to model them.