Skip to main content
Adaptive policies go beyond static A/B tests. Instead of fixed allocations, the optimization engine shifts traffic toward better-performing variants based on observed rewards — automatically, continuously, and without code changes.

When to use adaptive policies

Use adaptive policies when:
  • You want to minimise regret — the cost of showing the inferior variant longer than necessary.
  • The best variant may change over time (seasonality, mix shifts).
  • You want continuous optimization rather than a one-time decision.
  • You want personalized selection based on user context (contextual bandits).
Stick with static A/B tests when:
  • You need a clean point-in-time decision with an unambiguous winner for stakeholders.
  • The cost of the variant is large enough that you want a deliberate ramp afterwards (use rollouts).
  • Your goal metric is delayed or noisy (the engine will still work, but a static test gives you a more interpretable result).

Algorithms

AlgorithmWhat it doesBest for
thompson_bernoulliBeta-Bernoulli Thompson Sampling. Bayesian exploration/exploitation.Binary outcomes (clicked/didn’t, converted/didn’t)
epsilon_greedyMostly exploits the best-known variant; explores randomly with probability ε.Simple baseline with predictable exploration rate
ucb1Upper Confidence Bound. Optimistic exploration that naturally decreases over time.When you want a deterministic, parameter-free explorer
linear_contextualPersonalized scoring per user via a trained linear model. Resolved locally.Personalization — when the best variant depends on user context

Setting up an adaptive policy

  1. Define the goal event in .traffical/config.yaml or in the dashboard:
    events:
      purchase:
        valueType: currency
        unit: USD
    
  2. Create the policy in the dashboard:
    • Kind: adaptive
    • Algorithm: thompson_bernoulli
    • Goal event: purchase
    • Goal type: conversion_rate
    • Min exposures before shift: 100
    • Window size: 60 minutes
  3. Define allocations. Initial ranges can be uniform; the engine will adjust them.
    AllocationInitial rangeOverride
    low_price0–3332pricing.discount_pct: 5
    mid_price3333–6665pricing.discount_pct: 10
    high_price6666–9999pricing.discount_pct: 20
  4. Implement in code — same as a static policy:
    const params = traffical.getParams({
      context: { userId: "user_789" },
      defaults: { "pricing.discount_pct": 0 },
    });
    
    applyDiscount(params["pricing.discount_pct"]);
    
    traffical.track("purchase", {
      unitKey: "user_789",
      properties: { order_total: orderTotal },
    });
    
The SDK resolves from the bundle. The engine periodically retrains and republishes the bundle with updated bucket ranges. Your code never changes.

Tuning knobs

KnobWhat it does
minExposuresBeforeShiftThe engine won’t change allocations until each allocation has seen at least this many exposures. Prevents premature shifts on noisy early data.
maxShiftPerUpdateCaps how much traffic can move in a single update (e.g. 0.1 = at most 10% of traffic shifts per cycle).
windowSizeMinutesHow often the engine re-evaluates. Shorter windows = faster reaction, but noisier.

Contextual bandits

The linear_contextual algorithm personalizes selection: different users get different variants based on context features. The training pipeline learns coefficients per allocation per feature; those coefficients ship in the config bundle; the SDK uses them to score each allocation for the current user at resolution time.

What it looks like in code

The code is identical to any other adaptive policy — you just pass more context:
const params = traffical.getParams({
  context: {
    userId: "user_789",
    "user.engagement_score": user.engagementScore,
    "user.device_type": device.type,
    "user.days_since_signup": user.daysSinceSignup,
    "session.referrer": document.referrer,
  },
  defaults: {
    "homepage.layout": "standard",
  },
});
The model uses the listed context fields. Fields not in the model’s training set are ignored.

Scoring

For each allocation, the SDK computes:
score(allocation) = intercept(allocation) + Σ (coefficient(feature, allocation) × value(feature))
It then applies softmax with temperature gamma to convert scores into selection probabilities, and enforces a minimum probability per allocation (actionProbabilityFloor) so the model keeps exploring.
SettingMeaning
gammaSoftmax temperature. Lower = more deterministic (exploits learned best); higher = more uniform (more exploration). Typical: 0.1–0.5.
actionProbabilityFloorMinimum probability any allocation can have. Guarantees ongoing exploration. Typical: 0.05.
defaultAllocationScoreScore for cold-start (no learned coefficients yet).

Context logging allowlist

To train the model, the platform needs to see the context fields with each exposure event. To protect PII, only fields you explicitly allow-list are logged:
{
  "contextLogging": {
    "allowedFields": [
      "user.engagement_score",
      "user.device_type",
      "user.days_since_signup",
      "session.referrer"
    ]
  }
}
Anything outside the allowlist is dropped before storage.

When to use contextual bandits

When the optimal variant differs per user, and you can describe “who they are” with a handful of features:
  • Homepage layout by engagement level and device type
  • CTA copy by referral source and purchase history
  • Recommendation style by user segment
The features should be available at decision time. Don’t include features derived from the outcome you’re trying to predict.

Per-entity bandits

For optimization at the entity level — each product, each merchant, each category learns independently — use a per-entity adaptive policy:
entityConfig:
  entityKeys: ["productId"]
  resolutionMode: "bundle"      # or "edge"
Each entity gets its own bandit. A product with 1000 views has its own learned weights; a product with 3 views falls back to the global prior.

Resolution modes

ModeHow it worksLatencyFreshness
bundleEntity weights ship in the config bundle. SDK resolves locally.Sub-msBundle refresh cadence (typically hourly for per-entity policies)
edgeSDK calls Traffical for each decision. Returns the latest weights.~50msReal-time
bundle is the right choice for high-traffic entities (product pages, search results). edge is for low-traffic, high-stakes entities where you need every decision to use the freshest possible weights.

Dynamic allocations

When each entity has a different number of options (e.g. each product has a different image count), use dynamicAllocations:
entityConfig:
  entityKeys: ["productId"]
  resolutionMode: "bundle"
  dynamicAllocations:
    countKey: "imageCount"
If context.imageCount = 5, the SDK creates five allocations on the fly and selects from them based on learned weights. The selected index is reported as the allocation name in the decision metadata.

Next steps

Per-entity adaptive pattern

Full walkthrough with metric setup.

Contextual bandit pattern

Personalized layouts and CTAs.

Warehouse-native

Training contextual bandits from warehouse data.