Analysis

Demand Forecasting for Regional Distribution Centers

Built a time series forecasting system in R that compared five models on 2.5 years of weekly distribution center data, then deployed the winning Prophet model as a production-ready Dockerized REST API.

GitHub

Role: Solo Data Scientist

Type: Data Science / MLOps

Timeline: Spring 2026 (1 day)

Stack:

RtidymodelsmodeltimetimetkTime Series ForecastingDockerREST APIggplotData Visualization

Demand Forecasting for Regional Distribution Centers

Project overview

Regional distribution centers run blind without accurate demand forecasts — overstaffing burns budget, understaffing congests operations. This project used the tidymodels + modeltime + timetk ecosystem to compare five forecasting models (ARIMA Regression, ARIMA Boost, Prophet, Exponential Smoothing, and Seasonal Decomposition) on weekly shipment data enriched with external regressors like temperature, fuel cost index, and a holiday-peak flag. The winning model — Prophet was packaged with vetiver, containerized in Docker, and validated through a live REST API endpoint, producing an end-to-end forecasting system from raw data to production deployment.

Problem

Distribution center managers lack reliable tools for predicting weekly unit volumes, forcing them to guess on staffing levels. Guessing too high wastes labor costs; guessing too low creates operational congestion and customer-facing delays. The challenge was building a forecasting system that could reliably anticipate demand across both normal weeks and sharp seasonal spikes, using 2.5 years of historical data enriched with external signals like temperature, fuel costs, and holiday timing.

Approach / process

I used a defensive forecasting strategy that evaluated both univariate models (Exponential Smoothing, Seasonal Decomposition) and regressor-based models (Prophet, ARIMA Regression, ARIMA Boost) side by side. A key design decision was comparing these two families rather than assuming external features would always help — this allowed the data to reveal which signal source was actually more valuable. All models were built within the tidymodels + modeltime framework, evaluated on a 24-week time-based test split, and compared on MAE, MAPE, RMSE, and R².

Implementation details

Built a complete R pipeline in forecast_pipeline.R using modeltime, timetk, and tidymodels. Defined a shared recipe with date, is_peak_period, avg_temp_f, and transport_cost_idx as regressors. Wrapped each model spec in a tidymodels workflow() before fitting to ensure vetiver compatibility downstream. Assembled all five models into a modeltime_table, calibrated against the test split, and generated a model accuracy comparison and faceted forecast plot. The winning Prophet model was refit on the full dataset, scored against a 24-week forward forecast window with seasonally-naive regressor projections, then packaged with vetiver_model(), pinned to a local board, containerized with Docker, and validated via live httr endpoint calls. The trained model object was also uploaded to S3 for external scoring.

Gallery

Related projects

DevelopmentAnalysisUX Design

CineNiche Streaming Platform

A full-stack movie discovery platform built during BYU’s INTEX 2025 that enables users to browse, rate, and organize films while receiving personalized recommendations through content-based and collaborative filtering.

ReactTypeScriptBootstrap.NET Web API

Analysis

Etsy Product Success Analysis

Analyzed an Etsy product dataset to understand what drives product success, uncovering how pricing, titles, brand scale, and keyword strategies impact engagement.