skuf.ai Forecasting Methodology

How skuf.ai decomposes demand into independent component estimators (baseline, seasonality, events, ML residuals) and learns how to blend them — instead of picking one ML algorithm.

Contents
  1. Why decomposition
  2. Levels: SKU × Location × Day
  3. Preprocessing
  4. Baseline (BSR) — ISR + Bayesian
  5. Seasonality profiles + escalation matching
  6. Events — ML uplift
  7. Centrifuge — component blending
  8. New product forecasting (library / analog / launch)
  9. Size profile forecasting
  10. Day-level disaggregation
  11. Backtesting & accuracy
  12. Variance analysis

1 · Why decomposition

Most demand-forecasting products in market are single-algorithm: they pick ETS, Prophet, ARIMA, or a tree-model and fit it to every series. The problem is that no single algorithm fits every product or every location:

skuf.ai's approach instead: decompose demand into independent components, estimate each with a method appropriate to it, and learn how to blend them.

The component slots:

ComponentWhat it capturesHow it's estimated
Baseline (BSR)Average lifecycle demand rateISR (analog match) + Bayesian update from actuals
SeasonalityWithin-year repeating shape52-week profile from attribute-grouped history
EventsLift from promotions, holidays, etc.RF / Gradient Boosting on (event × SKU) attributes
ML residualsWhatever the components above didn't explainETS / ARIMA / Trend / Avg / Random Forest / Gradient Boosting (chosen per series — see §7.4)

2 · Levels: SKU × Location × Day

The full dimensional space for retail demand is SKU × Location × Day: every unit sold has an item ID, a place, and a moment in time.

Forecasting at this raw grain is statistically noisy — most SKU-Loc-Day cells have zero sales. skuf.ai accepts data at SKU-Loc-Day but forecasts at Product-Loc-Week, then disaggregates back to the input grain:

Each bridge is a learnable profile built from history. Section 8 and 9 cover them in detail.

3 · Preprocessing

Before any estimator runs, sales data goes through three corrections:

All three corrections are configurable per-forecast. They run as a single pass before estimators — iterating preprocessing with estimator outputs is on the methodology roadmap.

4 · Baseline (BSR) — ISR + Bayesian

4.1 Initial Sales Rate (new products)

For products with no sales history, skuf.ai computes an Initial Sales Rate (ISR) using attribute-based analog matching: find similar product-locations that DO have history, take a similarity-weighted average of their baselines.

If no similar product-location exists at the most specific attribute level, the algorithm escalates to a broader level (e.g. sub-category × price-band) and so on. This is the same simple-to-complex escalation hierarchy the methodology describes.

4.2 Bayesian update for active SKUs

Once a product has actual sales history, the baseline becomes a Bayesian blend of the library prior (ISR) and the observed BSR:

posterior = (prior_strength × prior_BSR + n × actual_BSR) / (prior_strength + n)

where n is the number of observed sales periods and prior_strength is a pseudo-count (default 8). The prior's weight shrinks as observed history grows — short history trusts the library prior, long history trusts the SKU's own pattern.

Each result surfaces bsrPrior / bsrActual / bsrPosterior / priorWeight for full transparency.

5 · Seasonality — profiles + escalation matching

5.1 Profile generation

A seasonality profile is a 52-element array whose entries sum to 1 — each entry represents the share of annual demand falling in that ISO week.

skuf.ai generates profiles per combination of analyst-chosen attribute columns over the last 104 weeks of history (most recent 52 = Y-1, prior 52 = Y-2). Each profile gets two quality scores:

Pruning rules drop profiles that aren't trustworthy:

5.2 4-tier escalation matching

For a target SKU, the matcher walks four reliability tiers from strict to lax:

TierMin reliabilityBehaviour
Excellent0.85Tightly repeatable Y/Y
Good0.65Repeatable enough to trust
Acceptable0.50Has signal but more variable
Weak0.30Last-resort fallback

At each tier, the matcher picks the richest profile that clears that tier's reliability threshold. The first tier with a candidate wins. Every forecast result is tagged with which tier was hit, so users can see at a glance whether a SKU has solid or weak seasonal signal.

6 · Events — ML uplift

Promotions, holidays, and other events drive demand spikes that the baseline + seasonality components don't capture. skuf.ai supports three approaches — each appropriate in different contexts.

6.1 Lookup tier

For each event type, compute the average uplift ratio (event-window sales / baseline-window sales) across historical events. Apply at forecast time as a multiplicative factor.

6.2 ML tier (per-SKU uplift)

Train a Random Forest (or Gradient Boosting) regression on the full (event × SKU) attribute matrix:

At forecast time, predict the uplift per (event × SKU) pair — so different SKUs see different uplifts based on their attributes. Falls back to the lookup table when prediction fails on an individual pair.

Important caveat: the lookup and ML uplift tiers are applied as post-fit multipliers to the forecast horizon. They modify forecasted values for upcoming weeks where events are scheduled, but they do not affect backtest MAPE because MAPE is scored on the holdout before the multiplier is applied. They're useful for forward-looking decisions ("Memorial Day is coming — boost the forecast"), not for proving accuracy.

6.3 Covariate tier (recommended for MAPE wins)

When the goal is measurable accuracy improvement, treat events as exogenous regressors instead of post-fit multipliers. skuf.ai builds a per-SKU exogenous matrix:

The tree-based ML models (Random Forest, Gradient Boost, LightGBM) consume this matrix at fit time, concatenated to their autoregressive lag features. The model learns event effects from history and predicts them on the holdout automatically — so backtest MAPE reflects the real lift. Set useExogenousCovariates: true on the job config.

📐 Measured impact

On a synthetic specialty-retailer dataset (45 SKUs × 104 weeks × five event types), enabling event + price covariates on the Auto leaderboard dropped mean MAPE from 33.14% → 24.41% (a 26% relative reduction) while keeping the same model architectures.

7 · Centrifuge — component blending

The Centrifuge composes the four component slots (level baseline, seasonal, ML, event) on a single SKU's series and learns how to weight them.

7.1 Weighted-sum mode (default)

Each component produces a holdout backtest. The Centrifuge runs scipy.optimize.minimize (SLSQP, sum-to-1, non-negative constraints) to find the weights that minimise blended-holdout MAPE.

forecast = Σᵢ wᵢ · component_i_forecast (with Σwᵢ = 1, wᵢ ≥ 0)

7.2 Residual decomposition mode

Components forecast sequentially: level first, then seasonal on (raw − level) residuals, then ML on the remaining residuals. The final forecast is additive:

forecast = level + seasonal + ml_on_residuals + event

Useful when components capture orthogonal structure (trend vs. periodic vs. irregular) rather than competing predictions.

7.3 Plug-in component selection

Callers specify which components to include via a components array. Seasonal needs a profile library + BSR pair; Event needs an event multiplier vector. Level + ML always run on any series.

7.4 ML-component options

The ML slot inside Centrifuge is itself pluggable. The dropdown offers six families, each with different strengths — the SLSQP optimiser in 7.1 automatically down-weights whichever family loses the holdout backtest, so picking the "wrong" one degrades gracefully rather than catastrophically.

FamilyWhen it tends to winEngine
ETS (default)Smooth trends + periodic seasonality, low noise.statsmodels ExponentialSmoothing
ARIMAStationary series with autocorrelated residuals.statsmodels ARIMA
Linear TrendShort series where you only trust a slope + intercept.sklearn.LinearRegression
Simple AverageVery short or near-flat series; an honest baseline.last-12 mean + 1/10 trend term
Random ForestResiduals with nonlinear interaction structure (step changes, threshold effects, lifecycle elbows).sklearn.RandomForestRegressor on autoregressive lag features (depth ~ history/4, plus optional t-52 seasonal lag)
Gradient BoostingSmooth nonlinear structure where boosting's bias-variance trade-off beats RF.sklearn.GradientBoostingRegressor, same feature stack as RF

Tree-based families forecast recursively: predict step 1, append it to history, rebuild the lag-feature vector, predict step 2, repeat. They need ≥8 history points before they activate; below that, the helper falls back to the simple average baseline so the Centrifuge always produces a valid ML component.

This menu matches the deck's original spec — "regression tree like Random Forest and Gradient Boosting" — without dropping the classical models that win on the cleaner half of any catalog.

7.5 Covariate-aware Centrifuge

When the ML slot is Random Forest or Gradient Boost, Centrifuge accepts the same per-SKU exogenous matrix described in §6.3 (event indicators + log price). The tree learns event + price effects from history as additional regressors alongside its autoregressive lag features. Set useExogenousCovariates: true alongside the standard Centrifuge config to opt in.

On the same synthetic dataset that takes Auto from 33.14% → 24.41% MAPE, covariate-aware Centrifuge moves from 43.51% → 30.89%. Auto + covariates beats Centrifuge + covariates on this data because Auto's per-SKU model selection lets a leaderboard of ARIMA / ETS / Prophet / RF / GB / LightGBM pick the best fit per SKU, whereas Centrifuge applies one fixed architecture (level + seasonal + ML) uniformly. Centrifuge's differentiated value is component transparency — it returns named weighted contributions per forecast so planners can audit "30% from the seasonal library, 25% from the ML lift, 45% from the level baseline." Auto returns a single winning model and doesn't expose this.

8 · New Product Forecasting

SKUs with no historical data fall through every model that needs lag features. skuf.ai provides three independent approaches — pick by what's available and the question being asked.

🎬 Prefer an interactive walkthrough? Open the New Product Forecasting walkthrough — 8 slides, ~3 minutes, with mock UI for every step.

8.1 Seasonality library + Bayesian BSR

Already described in §4.1 (ISR) and §5.2 (4-tier matching). Match the new SKU's attributes to a 52-week profile in the library, look up bsr_median as the annual baseline prior, and forecast bsr × profile[start_week + i] per week. Best when a strong matching profile exists and you trust the analog-attribute calibration.

8.2 Analog blend

Two analog-finder modes feed the same blender:

Once analogs are chosen, /forecast blends their historical series — score-weighted average → univariate forecast (Auto's leaderboard picks per analog) → combine into a single forecast for the new SKU. An optional ramp-up curve scales down the first N periods to model launch behaviour.

Use the toggle in the New Product panel to pick which mode runs. AI-powered when you need reasoning; rule-based when you need reproducibility or to skip an LLM round-trip.

8.3 Attribute-Based Launch Model

Train a Ridge + Random Forest ensemble regression on historical SKUs' attributes → first-period sales volume, then predict launch volume for the new item directly from its attributes. No time-series matching needed.

Use when: (a) you have ≥20 historical SKUs to train on, (b) the new SKU's attributes are well-represented in the training data, (c) you want a single point estimate + CI rather than a weekly forecast curve. The implied total-life output is designed to flow into the Sell-Through curve panel — auto-fills the total-life input there so you can convert a first-period prediction into a weekly schedule.

9 · Size profile forecasting

For apparel and SKU-with-variants categories, the forecast at SKU level needs to split across sizes. skuf.ai mirrors the BSR approach:

  1. Initial estimation: aggregate historical units per (attribute combo × size), normalise to shares summing to 1 per attribute group.
  2. Bayesian update: when this SKU has its own observed size mix, blend it with the library prior using the same pseudo-count formula as BSR. Three modes: library prior (no actuals), Bayesian (both), actuals only (no library match).

10 · Day-level disaggregation

The forecast operates at weekly grain; replenishment systems need daily. A Mon-Sun profile (7 weights summing to 1) splits weekly forecasts into days.

Three profile scopes:

Apply step accepts per-week overrides so users can pin a different profile to specific ISO weeks (e.g. promotion weeks behave differently than normal).

11 · Backtesting & accuracy

Every forecast is backtested on a held-out tail of recent history. The holdout window size is user-configurable (deck-aligned default: 7 weeks). Each forecast reports seven accuracy metrics:

MetricWhat it tells you
MAPEMean Absolute Percentage Error — most common, sensitive to zero actuals
RMSERoot Mean Squared Error — penalises large errors more
MAEMean Absolute Error — robust to outliers
SMAPESymmetric MAPE — robust to zero actuals
MASEMean Absolute Scaled Error — <1 beats naive forecast
Coefficient of determination — variance explained
PI 80%% of holdout actuals inside the 80% confidence band — the displayed CI band is honest when this is near 80%

Confidence bands use the proper z=1.282 × residual_std formula (80% normal), so the band corresponds to the PI 80% coverage metric on every chart.

12 · Variance analysis

The Variance Analysis Diagnostic Panel surfaces forecast-quality structure at portfolio level — designed to answer "where should I focus my tuning efforts?":