How skuf.ai decomposes demand into independent component estimators (baseline, seasonality, events, ML residuals) and learns how to blend them — instead of picking one ML algorithm.
Most demand-forecasting products in market are single-algorithm: they pick ETS, Prophet, ARIMA, or a tree-model and fit it to every series. The problem is that no single algorithm fits every product or every location:
skuf.ai's approach instead: decompose demand into independent components, estimate each with a method appropriate to it, and learn how to blend them.
The component slots:
| Component | What it captures | How it's estimated |
|---|---|---|
| Baseline (BSR) | Average lifecycle demand rate | ISR (analog match) + Bayesian update from actuals |
| Seasonality | Within-year repeating shape | 52-week profile from attribute-grouped history |
| Events | Lift from promotions, holidays, etc. | RF / Gradient Boosting on (event × SKU) attributes |
| ML residuals | Whatever the components above didn't explain | ETS / ARIMA / Trend / Avg / Random Forest / Gradient Boosting (chosen per series — see §7.4) |
The full dimensional space for retail demand is SKU × Location × Day:
every unit sold has an item ID, a place, and a moment in time.
Forecasting at this raw grain is statistically noisy — most SKU-Loc-Day cells have zero sales. skuf.ai accepts data at SKU-Loc-Day but forecasts at Product-Loc-Week, then disaggregates back to the input grain:
Each bridge is a learnable profile built from history. Section 8 and 9 cover them in detail.
Before any estimator runs, sales data goes through three corrections:
All three corrections are configurable per-forecast. They run as a single pass before estimators — iterating preprocessing with estimator outputs is on the methodology roadmap.
For products with no sales history, skuf.ai computes an Initial Sales Rate (ISR) using attribute-based analog matching: find similar product-locations that DO have history, take a similarity-weighted average of their baselines.
If no similar product-location exists at the most specific attribute level, the algorithm escalates to a broader level (e.g. sub-category × price-band) and so on. This is the same simple-to-complex escalation hierarchy the methodology describes.
Once a product has actual sales history, the baseline becomes a Bayesian blend of the library prior (ISR) and the observed BSR:
where n is the number of observed sales periods and
prior_strength is a pseudo-count (default 8). The prior's weight
shrinks as observed history grows — short history trusts the library prior,
long history trusts the SKU's own pattern.
Each result surfaces bsrPrior / bsrActual /
bsrPosterior / priorWeight for full transparency.
A seasonality profile is a 52-element array whose entries sum to 1 — each entry represents the share of annual demand falling in that ISO week.
skuf.ai generates profiles per combination of analyst-chosen attribute columns over the last 104 weeks of history (most recent 52 = Y-1, prior 52 = Y-2). Each profile gets two quality scores:
Pruning rules drop profiles that aren't trustworthy:
For a target SKU, the matcher walks four reliability tiers from strict to lax:
| Tier | Min reliability | Behaviour |
|---|---|---|
| Excellent | 0.85 | Tightly repeatable Y/Y |
| Good | 0.65 | Repeatable enough to trust |
| Acceptable | 0.50 | Has signal but more variable |
| Weak | 0.30 | Last-resort fallback |
At each tier, the matcher picks the richest profile that clears that tier's reliability threshold. The first tier with a candidate wins. Every forecast result is tagged with which tier was hit, so users can see at a glance whether a SKU has solid or weak seasonal signal.
Promotions, holidays, and other events drive demand spikes that the baseline + seasonality components don't capture. skuf.ai supports three approaches — each appropriate in different contexts.
For each event type, compute the average uplift ratio
(event-window sales / baseline-window sales) across historical
events. Apply at forecast time as a multiplicative factor.
Train a Random Forest (or Gradient Boosting) regression on the full (event × SKU) attribute matrix:
At forecast time, predict the uplift per (event × SKU) pair — so different SKUs see different uplifts based on their attributes. Falls back to the lookup table when prediction fails on an individual pair.
Important caveat: the lookup and ML uplift tiers are applied as post-fit multipliers to the forecast horizon. They modify forecasted values for upcoming weeks where events are scheduled, but they do not affect backtest MAPE because MAPE is scored on the holdout before the multiplier is applied. They're useful for forward-looking decisions ("Memorial Day is coming — boost the forecast"), not for proving accuracy.
When the goal is measurable accuracy improvement, treat events as exogenous regressors instead of post-fit multipliers. skuf.ai builds a per-SKU exogenous matrix:
has_holiday, has_promotion, has_other) — set to 1 in weeks the event covers, 0 otherwise.price column).
The tree-based ML models (Random Forest, Gradient Boost, LightGBM) consume
this matrix at fit time, concatenated to their autoregressive lag
features. The model learns event effects from history and
predicts them on the holdout automatically — so backtest MAPE reflects the
real lift. Set useExogenousCovariates: true on the job config.
On a synthetic specialty-retailer dataset (45 SKUs × 104 weeks × five event types), enabling event + price covariates on the Auto leaderboard dropped mean MAPE from 33.14% → 24.41% (a 26% relative reduction) while keeping the same model architectures.
The Centrifuge composes the four component slots (level baseline, seasonal, ML, event) on a single SKU's series and learns how to weight them.
Each component produces a holdout backtest. The Centrifuge runs
scipy.optimize.minimize (SLSQP, sum-to-1, non-negative
constraints) to find the weights that minimise blended-holdout MAPE.
Components forecast sequentially: level first, then seasonal on (raw − level) residuals, then ML on the remaining residuals. The final forecast is additive:
Useful when components capture orthogonal structure (trend vs. periodic vs. irregular) rather than competing predictions.
Callers specify which components to include via a components
array. Seasonal needs a profile library + BSR pair; Event needs an event
multiplier vector. Level + ML always run on any series.
The ML slot inside Centrifuge is itself pluggable. The dropdown offers six families, each with different strengths — the SLSQP optimiser in 7.1 automatically down-weights whichever family loses the holdout backtest, so picking the "wrong" one degrades gracefully rather than catastrophically.
| Family | When it tends to win | Engine |
|---|---|---|
| ETS (default) | Smooth trends + periodic seasonality, low noise. | statsmodels ExponentialSmoothing |
| ARIMA | Stationary series with autocorrelated residuals. | statsmodels ARIMA |
| Linear Trend | Short series where you only trust a slope + intercept. | sklearn.LinearRegression |
| Simple Average | Very short or near-flat series; an honest baseline. | last-12 mean + 1/10 trend term |
| Random Forest | Residuals with nonlinear interaction structure (step changes, threshold effects, lifecycle elbows). | sklearn.RandomForestRegressor on autoregressive lag features (depth ~ history/4, plus optional t-52 seasonal lag) |
| Gradient Boosting | Smooth nonlinear structure where boosting's bias-variance trade-off beats RF. | sklearn.GradientBoostingRegressor, same feature stack as RF |
Tree-based families forecast recursively: predict step 1, append it to history, rebuild the lag-feature vector, predict step 2, repeat. They need ≥8 history points before they activate; below that, the helper falls back to the simple average baseline so the Centrifuge always produces a valid ML component.
This menu matches the deck's original spec — "regression tree like Random Forest and Gradient Boosting" — without dropping the classical models that win on the cleaner half of any catalog.
When the ML slot is Random Forest or Gradient Boost, Centrifuge accepts the
same per-SKU exogenous matrix described in §6.3 (event indicators + log
price). The tree learns event + price effects from history as additional
regressors alongside its autoregressive lag features. Set
useExogenousCovariates: true alongside the standard Centrifuge
config to opt in.
On the same synthetic dataset that takes Auto from 33.14% → 24.41% MAPE, covariate-aware Centrifuge moves from 43.51% → 30.89%. Auto + covariates beats Centrifuge + covariates on this data because Auto's per-SKU model selection lets a leaderboard of ARIMA / ETS / Prophet / RF / GB / LightGBM pick the best fit per SKU, whereas Centrifuge applies one fixed architecture (level + seasonal + ML) uniformly. Centrifuge's differentiated value is component transparency — it returns named weighted contributions per forecast so planners can audit "30% from the seasonal library, 25% from the ML lift, 45% from the level baseline." Auto returns a single winning model and doesn't expose this.
SKUs with no historical data fall through every model that needs lag features. skuf.ai provides three independent approaches — pick by what's available and the question being asked.
🎬 Prefer an interactive walkthrough? Open the New Product Forecasting walkthrough — 8 slides, ~3 minutes, with mock UI for every step.
Already described in §4.1 (ISR) and §5.2 (4-tier matching). Match the new
SKU's attributes to a 52-week profile in the library, look up
bsr_median as the annual baseline prior, and forecast
bsr × profile[start_week + i] per week. Best when a strong
matching profile exists and you trust the analog-attribute calibration.
Two analog-finder modes feed the same blender:
/api/sku/new-product/find-analogs)
— deterministic attribute scoring: +2 for each exact attribute match,
+1.5 for "similar" numeric values (within 20%), +0.5 for "partial"
(within 50%). Returns a ranked list with the matches explicitly listed
per analog so the choice is defensible.
/api/sku/new-product/suggest-analogs)
— Claude reads the target attributes and the catalog and selects
analogs based on broader semantic similarity, returning a one-line
reason per pick. Catches cases like "T-shirt is a reasonable analog
for a Polo even though subcategory differs" that the rule-based
matcher misses.
Once analogs are chosen, /forecast blends their historical
series — score-weighted average → univariate forecast (Auto's leaderboard
picks per analog) → combine into a single forecast for the new SKU. An
optional ramp-up curve scales down the first N periods to model
launch behaviour.
Use the toggle in the New Product panel to pick which mode runs. AI-powered when you need reasoning; rule-based when you need reproducibility or to skip an LLM round-trip.
Train a Ridge + Random Forest ensemble regression on historical SKUs' attributes → first-period sales volume, then predict launch volume for the new item directly from its attributes. No time-series matching needed.
predicted_first_period, lower_80, upper_80, implied_total_life (predicted ÷ defaultDecayFirst), feature_importances, and r2_score.Use when: (a) you have ≥20 historical SKUs to train on, (b) the new SKU's attributes are well-represented in the training data, (c) you want a single point estimate + CI rather than a weekly forecast curve. The implied total-life output is designed to flow into the Sell-Through curve panel — auto-fills the total-life input there so you can convert a first-period prediction into a weekly schedule.
For apparel and SKU-with-variants categories, the forecast at SKU level needs to split across sizes. skuf.ai mirrors the BSR approach:
The forecast operates at weekly grain; replenishment systems need daily. A Mon-Sun profile (7 weights summing to 1) splits weekly forecasts into days.
Three profile scopes:
Apply step accepts per-week overrides so users can pin a different profile to specific ISO weeks (e.g. promotion weeks behave differently than normal).
Every forecast is backtested on a held-out tail of recent history. The holdout window size is user-configurable (deck-aligned default: 7 weeks). Each forecast reports seven accuracy metrics:
| Metric | What it tells you |
|---|---|
| MAPE | Mean Absolute Percentage Error — most common, sensitive to zero actuals |
| RMSE | Root Mean Squared Error — penalises large errors more |
| MAE | Mean Absolute Error — robust to outliers |
| SMAPE | Symmetric MAPE — robust to zero actuals |
| MASE | Mean Absolute Scaled Error — <1 beats naive forecast |
| R² | Coefficient of determination — variance explained |
| PI 80% | % of holdout actuals inside the 80% confidence band — the displayed CI band is honest when this is near 80% |
Confidence bands use the proper z=1.282 × residual_std formula (80% normal), so the band corresponds to the PI 80% coverage metric on every chart.
The Variance Analysis Diagnostic Panel surfaces forecast-quality structure at portfolio level — designed to answer "where should I focus my tuning efforts?":