skuf.ai Forecasting Methodology

How skuf.ai decomposes demand into independent component estimators (baseline, seasonality, events, ML residuals) and learns how to blend them — instead of picking one ML algorithm.

Contents

Why decomposition
Levels: SKU × Location × Day
Preprocessing
Baseline (BSR) — ISR + Bayesian
Seasonality profiles + escalation matching
Events — ML uplift
Centrifuge — component blending
New product forecasting (library / analog / launch)
Size profile forecasting
Day-level disaggregation
Backtesting & accuracy
Variance analysis

1 · Why decomposition

Most demand-forecasting products in market are single-algorithm: they pick ETS, Prophet, ARIMA, or a tree-model and fit it to every series. The problem is that no single algorithm fits every product or every location:

The world isn't linear → models that assume linearity break on lifecycle curves and event spikes.
Real demand is noisy unless you look at it from multiple perspectives.
Different products and locations behave differently — one model can't capture both.
External factors (events, promotions, weather) get absorbed as "residual error" or hallucinated.

skuf.ai's approach instead: decompose demand into independent components, estimate each with a method appropriate to it, and learn how to blend them.

The component slots:

Component	What it captures	How it's estimated
Baseline (BSR)	Average lifecycle demand rate	ISR (analog match) + Bayesian update from actuals
Seasonality	Within-year repeating shape	52-week profile from attribute-grouped history
Events	Lift from promotions, holidays, etc.	RF / Gradient Boosting on (event × SKU) attributes
ML residuals	Whatever the components above didn't explain	ETS / ARIMA / Trend / Avg / Random Forest / Gradient Boosting (chosen per series — see §7.4)

2 · Levels: SKU × Location × Day

The full dimensional space for retail demand is SKU × Location × Day: every unit sold has an item ID, a place, and a moment in time.

Forecasting at this raw grain is statistically noisy — most SKU-Loc-Day cells have zero sales. skuf.ai accepts data at SKU-Loc-Day but forecasts at Product-Loc-Week, then disaggregates back to the input grain:

Size profile bridges Product → SKU (Style/Color × Size).
Day-level disaggregation bridges Week → Day.

Each bridge is a learnable profile built from history. Section 8 and 9 cover them in detail.

3 · Preprocessing

Before any estimator runs, sales data goes through three corrections:

Stockout correction — periods where sales were artificially low because inventory was missing. Detected by configurable threshold, replaced with neighbour interpolation.
Outlier detection — z-score-based, with configurable cap/interpolate behaviour.
Excluded date ranges — user-defined date windows where actuals are masked (e.g. system outages, data-quality issues).

All three corrections are configurable per-forecast. They run as a single pass before estimators — iterating preprocessing with estimator outputs is on the methodology roadmap.

4 · Baseline (BSR) — ISR + Bayesian

4.1 Initial Sales Rate (new products)

For products with no sales history, skuf.ai computes an Initial Sales Rate (ISR) using attribute-based analog matching: find similar product-locations that DO have history, take a similarity-weighted average of their baselines.

If no similar product-location exists at the most specific attribute level, the algorithm escalates to a broader level (e.g. sub-category × price-band) and so on. This is the same simple-to-complex escalation hierarchy the methodology describes.

4.2 Bayesian update for active SKUs

Once a product has actual sales history, the baseline becomes a Bayesian blend of the library prior (ISR) and the observed BSR:

posterior = (prior_strength × prior_BSR + n × actual_BSR) / (prior_strength + n)

where n is the number of observed sales periods and prior_strength is a pseudo-count (default 8). The prior's weight shrinks as observed history grows — short history trusts the library prior, long history trusts the SKU's own pattern.

Each result surfaces bsrPrior / bsrActual / bsrPosterior / priorWeight for full transparency.

5 · Seasonality — profiles + escalation matching

5.1 Profile generation

A seasonality profile is a 52-element array whose entries sum to 1 — each entry represents the share of annual demand falling in that ISO week.

skuf.ai generates profiles per combination of analyst-chosen attribute columns over the last 104 weeks of history (most recent 52 = Y-1, prior 52 = Y-2). Each profile gets two quality scores:

Reliability: Pearson correlation between the Y-1 and Y-2 mean profiles — measures repeatability (range -1..1; ≥0.65 acceptable, ≥0.85 strong).
Richness: a 0-1 specificity score combining attribute depth and the inverse of group size — high richness = "this profile is specific to a small, similar group of SKUs."

Pruning rules drop profiles that aren't trustworthy:

Groups with fewer than min_skus (default 5) covered SKUs.
Average weekly volume below ~20 units (configurable).
Y-1 vs Y-2 group-total volume delta exceeding 50% (configurable).

5.2 4-tier escalation matching

For a target SKU, the matcher walks four reliability tiers from strict to lax:

Tier	Min reliability	Behaviour
Excellent	0.85	Tightly repeatable Y/Y
Good	0.65	Repeatable enough to trust
Acceptable	0.50	Has signal but more variable
Weak	0.30	Last-resort fallback

At each tier, the matcher picks the richest profile that clears that tier's reliability threshold. The first tier with a candidate wins. Every forecast result is tagged with which tier was hit, so users can see at a glance whether a SKU has solid or weak seasonal signal.

6 · Events — ML uplift

Promotions, holidays, and other events drive demand spikes that the baseline + seasonality components don't capture. skuf.ai supports three approaches — each appropriate in different contexts.

6.1 Lookup tier

For each event type, compute the average uplift ratio (event-window sales / baseline-window sales) across historical events. Apply at forecast time as a multiplicative factor.

6.2 ML tier (per-SKU uplift)

Train a Random Forest (or Gradient Boosting) regression on the full (event × SKU) attribute matrix:

Event features: type, sub-type, month, duration, discount %, cannibalization.
SKU features: every attribute column in the dataset (category, brand, etc.).
Target: observed uplift ratio.

At forecast time, predict the uplift per (event × SKU) pair — so different SKUs see different uplifts based on their attributes. Falls back to the lookup table when prediction fails on an individual pair.

Important caveat: the lookup and ML uplift tiers are applied as post-fit multipliers to the forecast horizon. They modify forecasted values for upcoming weeks where events are scheduled, but they do not affect backtest MAPE because MAPE is scored on the holdout before the multiplier is applied. They're useful for forward-looking decisions ("Memorial Day is coming — boost the forecast"), not for proving accuracy.

6.3 Covariate tier (recommended for MAPE wins)

When the goal is measurable accuracy improvement, treat events as exogenous regressors instead of post-fit multipliers. skuf.ai builds a per-SKU exogenous matrix:

One column per active event type (has_holiday, has_promotion, has_other) — set to 1 in weeks the event covers, 0 otherwise.
One column for log price (when the dataset carries a price column).
Rows aligned to both historical weeks and the forecast horizon, using week-vs-event-window overlap so events landing mid-week still register.

The tree-based ML models (Random Forest, Gradient Boost, LightGBM) consume this matrix at fit time, concatenated to their autoregressive lag features. The model learns event effects from history and predicts them on the holdout automatically — so backtest MAPE reflects the real lift. Set useExogenousCovariates: true on the job config.

📐 Measured impact

On a synthetic specialty-retailer dataset (45 SKUs × 104 weeks × five event types), enabling event + price covariates on the Auto leaderboard dropped mean MAPE from 33.14% → 24.41% (a 26% relative reduction) while keeping the same model architectures.

7 · Centrifuge — component blending

The Centrifuge composes the four component slots (level baseline, seasonal, ML, event) on a single SKU's series and learns how to weight them.

7.1 Weighted-sum mode (default)

Each component produces a holdout backtest. The Centrifuge runs scipy.optimize.minimize (SLSQP, sum-to-1, non-negative constraints) to find the weights that minimise blended-holdout MAPE.

forecast = Σᵢ wᵢ · component_i_forecast (with Σwᵢ = 1, wᵢ ≥ 0)

7.2 Residual decomposition mode

Components forecast sequentially: level first, then seasonal on (raw − level) residuals, then ML on the remaining residuals. The final forecast is additive:

forecast = level + seasonal + ml_on_residuals + event

Useful when components capture orthogonal structure (trend vs. periodic vs. irregular) rather than competing predictions.

7.3 Plug-in component selection

Callers specify which components to include via a components array. Seasonal needs a profile library + BSR pair; Event needs an event multiplier vector. Level + ML always run on any series.

7.4 ML-component options

The ML slot inside Centrifuge is itself pluggable. The dropdown offers six families, each with different strengths — the SLSQP optimiser in 7.1 automatically down-weights whichever family loses the holdout backtest, so picking the "wrong" one degrades gracefully rather than catastrophically.

Family	When it tends to win	Engine
ETS (default)	Smooth trends + periodic seasonality, low noise.	`statsmodels` ExponentialSmoothing
ARIMA	Stationary series with autocorrelated residuals.	`statsmodels` ARIMA
Linear Trend	Short series where you only trust a slope + intercept.	`sklearn.LinearRegression`
Simple Average	Very short or near-flat series; an honest baseline.	last-12 mean + 1/10 trend term
Random Forest	Residuals with nonlinear interaction structure (step changes, threshold effects, lifecycle elbows).	`sklearn.RandomForestRegressor` on autoregressive lag features (depth ~ history/4, plus optional t-52 seasonal lag)
Gradient Boosting	Smooth nonlinear structure where boosting's bias-variance trade-off beats RF.	`sklearn.GradientBoostingRegressor`, same feature stack as RF

Tree-based families forecast recursively: predict step 1, append it to history, rebuild the lag-feature vector, predict step 2, repeat. They need ≥8 history points before they activate; below that, the helper falls back to the simple average baseline so the Centrifuge always produces a valid ML component.

This menu matches the deck's original spec — "regression tree like Random Forest and Gradient Boosting" — without dropping the classical models that win on the cleaner half of any catalog.

7.5 Covariate-aware Centrifuge

When the ML slot is Random Forest or Gradient Boost, Centrifuge accepts the same per-SKU exogenous matrix described in §6.3 (event indicators + log price). The tree learns event + price effects from history as additional regressors alongside its autoregressive lag features. Set useExogenousCovariates: true alongside the standard Centrifuge config to opt in.

On the same synthetic dataset that takes Auto from 33.14% → 24.41% MAPE, covariate-aware Centrifuge moves from 43.51% → 30.89%. Auto + covariates beats Centrifuge + covariates on this data because Auto's per-SKU model selection lets a leaderboard of ARIMA / ETS / Prophet / RF / GB / LightGBM pick the best fit per SKU, whereas Centrifuge applies one fixed architecture (level + seasonal + ML) uniformly. Centrifuge's differentiated value is component transparency — it returns named weighted contributions per forecast so planners can audit "30% from the seasonal library, 25% from the ML lift, 45% from the level baseline." Auto returns a single winning model and doesn't expose this.

8 · New Product Forecasting

SKUs with no historical data fall through every model that needs lag features. skuf.ai provides three independent approaches — pick by what's available and the question being asked.

🎬 Prefer an interactive walkthrough? Open the New Product Forecasting walkthrough — 8 slides, ~3 minutes, with mock UI for every step.

8.1 Seasonality library + Bayesian BSR

Already described in §4.1 (ISR) and §5.2 (4-tier matching). Match the new SKU's attributes to a 52-week profile in the library, look up bsr_median as the annual baseline prior, and forecast bsr × profile[start_week + i] per week. Best when a strong matching profile exists and you trust the analog-attribute calibration.

8.2 Analog blend

Two analog-finder modes feed the same blender:

Rule-based (/api/sku/new-product/find-analogs) — deterministic attribute scoring: +2 for each exact attribute match, +1.5 for "similar" numeric values (within 20%), +0.5 for "partial" (within 50%). Returns a ranked list with the matches explicitly listed per analog so the choice is defensible.
AI-powered (/api/sku/new-product/suggest-analogs) — Claude reads the target attributes and the catalog and selects analogs based on broader semantic similarity, returning a one-line reason per pick. Catches cases like "T-shirt is a reasonable analog for a Polo even though subcategory differs" that the rule-based matcher misses.

Once analogs are chosen, /forecast blends their historical series — score-weighted average → univariate forecast (Auto's leaderboard picks per analog) → combine into a single forecast for the new SKU. An optional ramp-up curve scales down the first N periods to model launch behaviour.

Use the toggle in the New Product panel to pick which mode runs. AI-powered when you need reasoning; rule-based when you need reproducibility or to skip an LLM round-trip.

8.3 Attribute-Based Launch Model

Train a Ridge + Random Forest ensemble regression on historical SKUs' attributes → first-period sales volume, then predict launch volume for the new item directly from its attributes. No time-series matching needed.

OrdinalEncodes categoricals, StandardScales numerics in a sklearn Pipeline.
Ridge regression always trains; Random Forest joins when n_samples ≥ 20.
Ensemble = average of Ridge and RF predictions (50/50).
5-fold cross-validation on the training set produces residual σ for an 80% prediction interval.
Returns predicted_first_period, lower_80, upper_80, implied_total_life (predicted ÷ defaultDecayFirst), feature_importances, and r2_score.

Use when: (a) you have ≥20 historical SKUs to train on, (b) the new SKU's attributes are well-represented in the training data, (c) you want a single point estimate + CI rather than a weekly forecast curve. The implied total-life output is designed to flow into the Sell-Through curve panel — auto-fills the total-life input there so you can convert a first-period prediction into a weekly schedule.

9 · Size profile forecasting

For apparel and SKU-with-variants categories, the forecast at SKU level needs to split across sizes. skuf.ai mirrors the BSR approach:

Initial estimation: aggregate historical units per (attribute combo × size), normalise to shares summing to 1 per attribute group.
Bayesian update: when this SKU has its own observed size mix, blend it with the library prior using the same pseudo-count formula as BSR. Three modes: library prior (no actuals), Bayesian (both), actuals only (no library match).

10 · Day-level disaggregation

The forecast operates at weekly grain; replenishment systems need daily. A Mon-Sun profile (7 weights summing to 1) splits weekly forecasts into days.

Three profile scopes:

Global — one profile from all history.
Per-SKU — learns one SKU's specific day-of-week pattern.
Special-week — learns the profile of specific ISO weeks (e.g. Black Friday) from past instances.

Apply step accepts per-week overrides so users can pin a different profile to specific ISO weeks (e.g. promotion weeks behave differently than normal).

11 · Backtesting & accuracy

Every forecast is backtested on a held-out tail of recent history. The holdout window size is user-configurable (deck-aligned default: 7 weeks). Each forecast reports seven accuracy metrics:

Metric	What it tells you
MAPE	Mean Absolute Percentage Error — most common, sensitive to zero actuals
RMSE	Root Mean Squared Error — penalises large errors more
MAE	Mean Absolute Error — robust to outliers
SMAPE	Symmetric MAPE — robust to zero actuals
MASE	Mean Absolute Scaled Error — <1 beats naive forecast
R²	Coefficient of determination — variance explained
PI 80%	% of holdout actuals inside the 80% confidence band — the displayed CI band is honest when this is near 80%

Confidence bands use the proper z=1.282 × residual_std formula (80% normal), so the band corresponds to the PI 80% coverage metric on every chart.

12 · Variance analysis

The Variance Analysis Diagnostic Panel surfaces forecast-quality structure at portfolio level — designed to answer "where should I focus my tuning efforts?":

Accuracy tier donut (Excellent / Good / Acceptable / Poor / Unreliable).
MAPE histogram, by model, by demand pattern, by hierarchy level.
Forecast bias pie + top-deviation SKU table.
Portfolio tracking signal — CUSUM(bias/MAD) with ±4 control limits to flag systematic drift across the SKU portfolio.