When to use adaptive policies
Use adaptive policies when:- You want to minimise regret — the cost of showing the inferior variant longer than necessary.
- The best variant may change over time (seasonality, mix shifts).
- You want continuous optimization rather than a one-time decision.
- You want personalized selection based on user context (contextual bandits).
- You need a clean point-in-time decision with an unambiguous winner for stakeholders.
- The cost of the variant is large enough that you want a deliberate ramp afterwards (use rollouts).
- Your goal metric is delayed or noisy (the engine will still work, but a static test gives you a more interpretable result).
Algorithms
| Algorithm | What it does | Best for |
|---|---|---|
thompson_bernoulli | Beta-Bernoulli Thompson Sampling. Bayesian exploration/exploitation. | Binary outcomes (clicked/didn’t, converted/didn’t) |
epsilon_greedy | Mostly exploits the best-known variant; explores randomly with probability ε. | Simple baseline with predictable exploration rate |
ucb1 | Upper Confidence Bound. Optimistic exploration that naturally decreases over time. | When you want a deterministic, parameter-free explorer |
linear_contextual | Personalized scoring per user via a trained linear model. Resolved locally. | Personalization — when the best variant depends on user context |
Setting up an adaptive policy
-
Define the goal event in
.traffical/config.yamlor in the dashboard: -
Create the policy in the dashboard:
- Kind:
adaptive - Algorithm:
thompson_bernoulli - Goal event:
purchase - Goal type:
conversion_rate - Min exposures before shift:
100 - Window size:
60minutes
- Kind:
-
Define allocations. Initial ranges can be uniform; the engine will adjust them.
Allocation Initial range Override low_price0–3332pricing.discount_pct: 5mid_price3333–6665pricing.discount_pct: 10high_price6666–9999pricing.discount_pct: 20 -
Implement in code — same as a static policy:
Tuning knobs
| Knob | What it does |
|---|---|
minExposuresBeforeShift | The engine won’t change allocations until each allocation has seen at least this many exposures. Prevents premature shifts on noisy early data. |
maxShiftPerUpdate | Caps how much traffic can move in a single update (e.g. 0.1 = at most 10% of traffic shifts per cycle). |
windowSizeMinutes | How often the engine re-evaluates. Shorter windows = faster reaction, but noisier. |
Contextual bandits
Thelinear_contextual algorithm personalizes selection: different users get different variants based on context features. The training pipeline learns coefficients per allocation per feature; those coefficients ship in the config bundle; the SDK uses them to score each allocation for the current user at resolution time.
What it looks like in code
The code is identical to any other adaptive policy — you just pass more context:Scoring
For each allocation, the SDK computes:gamma to convert scores into selection probabilities, and enforces a minimum probability per allocation (actionProbabilityFloor) so the model keeps exploring.
| Setting | Meaning |
|---|---|
gamma | Softmax temperature. Lower = more deterministic (exploits learned best); higher = more uniform (more exploration). Typical: 0.1–0.5. |
actionProbabilityFloor | Minimum probability any allocation can have. Guarantees ongoing exploration. Typical: 0.05. |
defaultAllocationScore | Score for cold-start (no learned coefficients yet). |
Context logging allowlist
To train the model, the platform needs to see the context fields with each exposure event. To protect PII, only fields you explicitly allow-list are logged:When to use contextual bandits
When the optimal variant differs per user, and you can describe “who they are” with a handful of features:- Homepage layout by engagement level and device type
- CTA copy by referral source and purchase history
- Recommendation style by user segment
Per-entity bandits
For optimization at the entity level — each product, each merchant, each category learns independently — use a per-entity adaptive policy:Resolution modes
| Mode | How it works | Latency | Freshness |
|---|---|---|---|
bundle | Entity weights ship in the config bundle. SDK resolves locally. | Sub-ms | Bundle refresh cadence (typically hourly for per-entity policies) |
edge | SDK calls Traffical for each decision. Returns the latest weights. | ~50ms | Real-time |
bundle is the right choice for high-traffic entities (product pages, search results). edge is for low-traffic, high-stakes entities where you need every decision to use the freshest possible weights.
Dynamic allocations
When each entity has a different number of options (e.g. each product has a different image count), usedynamicAllocations:
context.imageCount = 5, the SDK creates five allocations on the fly and selects from them based on learned weights. The selected index is reported as the allocation name in the decision metadata.
Next steps
Per-entity adaptive pattern
Full walkthrough with metric setup.
Contextual bandit pattern
Personalized layouts and CTAs.
Warehouse-native
Training contextual bandits from warehouse data.