HorizonAI SKU Forecaster

🚀 Getting Started

SKU Forecaster helps you predict future demand for your products using advanced time series forecasting. Here's how to get started:

Load your data - Upload a CSV/Excel file or connect to a database using Pipelines
Map your columns - Tell the app which columns contain SKU IDs, dates, and values
Select SKUs - Choose which products to forecast (or select all)
Run forecast - Click "Run Forecast" and select your preferred model
Review & export - Analyze results, compare models, and export to CSV

💡 Quick Start

For the fastest results, drag and drop a CSV file onto the app. It will auto-detect your columns and you can start forecasting immediately.

🔌 Data Sources & Pipelines

Pipelines let you connect to external data sources and automatically refresh your forecasts. Create a pipeline once, run it anytime.

Supported Connectors

Connector	Use Case	Status
🗄️ SQLite	Local database files (.db, .sqlite)	Full Support
🐘 PostgreSQL	Production databases, data warehouses	Full Support
🔷 SQL Server	Azure SQL, Microsoft SQL Server	Full Support
❄️ Snowflake	Cloud data warehouse	Full Support
🗂️ Parquet	Columnar data files, data lakes	Full Support
📊 CSV/Excel	File uploads (drag & drop)	Full Support

Creating a Pipeline

Click + New Pipeline in the sidebar
Select your data source type
Enter connection details (host, credentials, etc.)
Click Test Connection to verify
Preview data and map columns
Save the pipeline for future use

Custom SQL Queries

All database connectors support custom SQL queries. Use this to:

Filter data by date range
Join multiple tables
Aggregate data at different levels
Apply business logic before forecasting

SELECT 
    product_id as sku,
    sale_date as date,
    SUM(quantity) as value
FROM sales
WHERE sale_date >= '2023-01-01'
GROUP BY product_id, sale_date

📈 Running Forecasts

Basic Forecasting

Load your data (file upload or pipeline)
Select SKUs from the list (or click "Select All")
Set Forecast Horizon (number of periods to predict)
Choose a forecasting model (or use Auto)
Click Run Forecast

Column Mapping

For accurate forecasts, map these columns correctly:

📦 SKU Column

Unique product identifier (SKU, item_id, product_code)

📅 Date Column

Time period (date, week, month, period)

📊 Value Column

What to forecast (sales, units, revenue, quantity)

Forecast Options

Horizon: Number of future periods (default: 12)
Model: Forecasting algorithm (Auto recommended)
Confidence: Prediction intervals shown on charts

🧮 Forecasting Models

The app includes multiple forecasting methods. Use Auto to let the system choose the best model for each SKU.

🔄 Auto (Recommended)

Runs a leaderboard of all models below and picks the lowest-MAPE winner per SKU. With covariates enabled, the tree models (RF/GB/LightGBM) see event indicators + price as features and typically dominate.

📊 ETS

Exponential smoothing with trend and seasonality. Great for stable demand. State-space, no exogenous regressors.

📉 ARIMA

Autoregressive model. Good for data with trends and autocorrelation.

🔮 Prophet

Facebook's model. Handles holidays and multiple seasonalities.

🌲 LightGBM

Gradient-boosted trees on lag features. Accepts exogenous covariates (events, price) when enabled.

🌳 Random Forest

Tree ensemble on lag features. Accepts exogenous covariates. Robust on noisy data with nonlinear interactions.

📈 Gradient Boost

sklearn's GBM on lag features. Accepts exogenous covariates. Often wins on smooth nonlinear structure.

🌀 Centrifuge

Bundled decomposition: level + seasonal + ML weighted blend. Returns named component weights for audit. Accepts covariates inside its ML slot.

⚡ Croston / SBA / TSB

For intermittent/sporadic demand (lots of zeros).

Model Selection Tips

Short history (<24 periods): Use ETS or Simple Average
Seasonal patterns: Use ETS, Prophet, or Auto
Intermittent demand: Use Croston, SBA, or TSB
Have event / price data: Use Auto with covariates enabled — the tree models pick up the signal and Auto picks the best fit per SKU
Need component-level explainability: Use Centrifuge (returns level / seasonal / ML weights)
Not sure: Use Auto - it analyzes your data and picks the best per SKU

💡 Install Optional Dependencies

For best results, install: pip install statsmodels prophet lightgbm. Without these, the app falls back to simpler methods.

📦 Large Datasets

The app handles datasets with millions of rows and thousands of SKUs efficiently.

Large Dataset Features

Automatic detection: Datasets over 5M rows trigger large dataset mode
Background processing: Forecasts run as background jobs
Parallel workers: Multiple CPUs used for faster processing
Progress tracking: Real-time updates on job progress
Batch processing: SKUs processed in efficient batches

SKU Selection Options

🎯 All SKUs

Process every SKU in the dataset

📊 Top N by Volume

Focus on highest-volume products

🎲 Random Sample

Quick test on random subset

✅ Manual Selection

Pick specific SKUs to forecast

Background Jobs

For large forecasts, jobs run in the background:

Configure your forecast settings
Click Start Background Job
Monitor progress in the Jobs panel
Results auto-save when complete
Download or view results anytime

🆕 New Product Forecasting

Forecast demand for SKUs with no sales history. skuf.ai offers three complementary approaches in the purple New Product Mode panel — pick by what data you have and what question you need answered.

🎬

Interactive walkthrough — 8 slides, ~3 minutes

See the full new-product flow with mock UI screenshots: the AI / Rule analog toggle, the Attribute-Based Launch Model output, and the recommended combined workflow.

▶️ Open walkthrough

Step 1 — Define attributes

Open New Product Mode → set the new SKU's attributes (category, subcategory, color, brand, price, etc.). The dropdowns pull from your active dataset, so only values actually present in your catalog appear. The badge counter shows how many attributes you've set.

Step 2 — Pick your method

🔍 Analog Blend (Find Similar SKUs)

Find historical SKUs that resemble your new one, then weighted-average their forecasts. A toggle above the button lets you pick how analogs get chosen:

🤖 AI-Powered

Claude reads your target attributes and the catalog, returns analogs with a one-line reason per pick. Catches semantic similarity (e.g. T-shirts as analogs for Polos when subcategory differs). Slower (LLM round-trip) but interpretable.

📐 Rule-Based

Deterministic exact-attribute scoring: +2 per exact match, +1.5 for "similar" numerics. Returns the matches per analog so the ranking is defensible. Fast, repeatable, no LLM dependency.

After analogs are returned, adjust the selected set (the top 5 by score are picked by default), then click Run Forecast. Result is a weekly forecast for the new SKU, blended from analog histories.

🧠 Attribute-Based Launch Model

A Ridge + Random Forest ensemble regression trained on every historical SKU's attributes → first-period sales volume. Predicts a single launch volume for the new item directly from its attributes — no analog matching needed.

Output: predicted first-period volume + 80% prediction interval, implied total life volume, feature importances, R² of the fit.
Needs: ≥3 historical SKUs to train (≥20 to engage Random Forest; Ridge always runs).
Best when: the new SKU's attributes are well-represented in training and you want a quick number with a defensible CI rather than a weekly curve.
Pairs with Sell-Through: the modal has an "Auto-Fill Sell-Through" button that pipes implied_total_life into the decay-curve panel to convert the point estimate into a weekly schedule.

📅 Seasonal Component Forecast

Match the new SKU's attributes to a 52-week profile in your seasonality library, then multiply by a Base Sales Rate (BSR) the library carries from analog historical SKUs. Best when you have a strong matching profile and trust the analog calibration. See the methodology page section 4 (BSR) and section 5 (Seasonality library) for the math.

Common attribute columns

Category / Subcategory
Price point / Price tier
Brand
Size / Pack size
Color / Style
Channel (online, retail, wholesale)
Region / Market

💡 Pick the right method

Strong library + matching attributes → Seasonal Component. Need weekly curve + history-based defensibility → Analog Blend. ≥20 historical SKUs + just need launch volume → Attribute-Based Launch Model. Many teams use Attribute-Based to set initial buy quantity and Analog Blend or Seasonal Component to schedule weekly receipts.

📊 Covariates & External Data

Improve forecast accuracy by feeding skuf.ai the upstream signals that drive demand. The platform supports two distinct paths:

Path A — Auto-built event + price covariates (recommended)

When you create planning events and your dataset has a price column, skuf.ai can automatically build a per-SKU exogenous matrix and pass it to the tree-based models in Auto's leaderboard (Random Forest, Gradient Boost, LightGBM) at fit time. Set useExogenousCovariates: true on the job config and the system handles the rest:

One indicator column per active event type (has_holiday, has_promotion, has_other) — set to 1 in weeks the event covers, 0 otherwise.
One column for log price (when available).
Both historical weeks AND forecast-horizon weeks get rows, so the tree models predict the effect on the holdout (real MAPE improvement) AND on future weeks.

📐 Measured impact

On a synthetic specialty-retailer benchmark (45 SKUs × 104 weeks, 5 event types, price elasticity), enabling auto-built covariates dropped Auto's mean MAPE from 33.14% to 24.41% on the same backtest — a 26% relative reduction with no model change, just better inputs.

Path B — Custom column covariates (advanced)

For exogenous signals beyond events and price (weather, marketing spend, macro indicators), include them as columns in your data file and select them in the Covariates tab.

Include covariate columns in your data file
Open the Covariates tab
Select which columns to use
Set lag values if needed (e.g., price effect delayed 1 period)
Run forecast — covariates auto-included for the models that support them

Which models use covariates

Random Forest, Gradient Boost, LightGBM — consume exogenous columns alongside autoregressive lag features.
Centrifuge with RF / GB in its ML slot — same matrix flows in.
SARIMAX, Prophet — natively support exog (additional wiring planned).
ETS, ARIMA, Croston, SBA, TSB, simple, trend — state-space or sparse-demand models, no exog by design.

⚠️ Future Values Required

Custom column covariates require values for the forecast horizon. If you include "price" as a covariate, you must provide planned prices for the upcoming weeks. Auto-built event covariates handle this automatically — the planning_events table covers forward dates as well as historical ones.

🤖 AI Features

AI-powered features help you understand your data and get actionable insights.

AI Insights

Get natural language explanations of your forecast results:

Why a forecast is trending up or down
Key patterns and seasonality detected
Anomalies and outliers flagged
Recommendations for improvement

Dataset Intelligence

Automatic analysis of your data quality and patterns:

Data completeness checks
Outlier detection
Seasonality analysis
SKU classification (smooth, intermittent, lumpy)
Relationship mapping between SKUs

AI Agent

Natural language interface to run complex analyses:

"Forecast the top 50 SKUs by volume for the next 6 months"
"Find products with declining sales trends"
"Compare ETS vs ARIMA accuracy for seasonal items"

💡 API Key Required

AI features require an Anthropic API key. Add it in Settings → API Keys, or set the ANTHROPIC_API_KEY environment variable.

📐 Size Profile Libraries

For apparel and any catalog with a size dimension, learn how demand splits across sizes within each style group — then use the curve to disaggregate a style-level forecast into per-size forecasts.

📐

Train and apply size profile libraries

List existing libraries, train a new one from any dataset (pick a size column + the attributes to group by), and apply a library to a forecast vector to split it across sizes.

Open size-profiles →

When you apply a library, the response includes shares (the share each size carries within the matched attribute group), forecastBySize (the per-size forecast vectors), and matched (exact, bayesian when you supplied actualSizeSales for a blend, or none if no group matched).

📋 Results & Export

Understanding Results

Each forecast result includes:

Point forecast: Expected value for each period
Confidence intervals: Upper and lower bounds (typically 80%)
Model used: Which algorithm generated the forecast
Accuracy metrics: MAPE, MAE, RMSE from backtesting

Forecast Accuracy (MAPE)

MAPE Range	Interpretation
< 10%	Excellent - highly accurate
10-20%	Good - reliable for planning
20-30%	Acceptable - use with caution
> 30%	Poor - consider more data or different model

Export Options

📊 CSV Export

Download forecasts as CSV for Excel/BI tools

💾 Save Results

Save to app for later viewing

📈 Charts

Interactive charts with zoom/pan

🔗 API Access

Access results via REST API

Saved Results

Access previous forecasts from the sidebar under "Forecast History". Each saved result includes:

Full forecast data
Model configuration used
Timestamp and metadata
Quick re-run capability

🔧 Troubleshooting

Common Issues

❌ "Python not found" error

Install Python 3.9+ and ensure it's in your PATH. Run: python --version to verify.

❌ Forecasts fail with "Exit 1"

Check Python dependencies: pip install pandas numpy statsmodels

❌ Database connection failed

Verify credentials, check firewall rules, ensure database allows remote connections.

❌ "No data" after column mapping

Check that your SKU/date columns are correct. Dates should be parseable (YYYY-MM-DD works best).

❌ Slow performance with large files

Use database connectors instead of CSV for 1M+ rows. Enable background jobs for large forecasts.

Getting Help

Check server logs: npm run dev shows detailed errors
Browser console: Press F12 → Console tab for frontend errors
Test endpoints: Use the test suite npm test

Performance Tips

Use Top N selection for initial testing
Filter data in SQL queries before loading
Install statsmodels for faster, more accurate models
Use background jobs for 1000+ SKUs
Enable parallel workers (automatic on multi-core systems)