What is time series demand forecasting?

How to build statistical forecasting models in the retail sector

Photo by Nathália Rosa on Unsplash

I would like to give detailed information about Walmart dataset in Kaggle. My motivation in the project is that a good forecast helps us make better decisions about the future. Hence, I will be talking about how to build forecasting models in the retail sector.

Demand planning is part of the supply chain management process. Predicting future demand is one of the most valuable activities the organizations can undertake. A demand plan’s impact is felt throughout the business, from sales and marketing to manufacturing and distribution. When forecasting models are built correctly, demand planning can position the company in a great position to deliver superior customer service while meeting their financial objectives.

The failure of a statistical forecasting and planning approach could lead to serious problems, which result in a broad range of issues such as

  • missed deadlines (unsatisfied customers)
  • lost credibility (damage its reputation)
  • hold much inventory (can lead to waste as well as financial losses)

To reduce the high costs of failure, the businesses need to utilize historical sales and adopt a sophisticated demand planning strategy that leverages data and marketing insights. This strategy starts with building accurate forecasting models. These forecasting methods are as follows:

  1. Moving Average
  2. Simple Exponential Smoothing
  3. Double Exponential Smoothing
  4. Triple Exponential Smoothing

Before jumping to forecasting models, I wanted to perform exploratory data analysis to understand how I could use the data for the forecasting models.

Data Analysis

Training dataset includes historical weekly sales starting from Feb 2010 until Nov 2012. The table below represents 10 observations from the training dataset.

Figure 1: Snapshot of the dataset

The dataset includes historical sales data for 45 Walmart stores located in different regions. Each store obtains many departments (81 in total) and my objective was to predict the department sales for each store in an accurate way. Firstly, I analyzed the total sales of the departments during year 2011. According to the plot in Figure 2, the top 5 departments are ‘dept. 92', ‘dept. 95’, ‘dept. 38', ‘dept. 72' and ‘dept. 90’. Although there is no detailed information related to departments, based on my few conversations with Walmart associates:

  • department 92 — dry grocery,
  • department 38 — apparel,
  • department 72 — electronics,
  • department 95 — beverage and snacks,
  • department 90 — dairy products
Figure 2: The Department Sales across U.S.A

Afterward, I checked the total sales of the stores within year 2011. It demonstrates that the top 5 stores are ‘store 4’, ’store 10‘, ’store 13', ‘store 14’ and ‘store 20’.

Figure 3: The Store Sales across U.S.A

Besides, I performed a granularity analysis of the store & department dataset. (the line graph in Figure 4 indicates weekly sales of department 72 for store 4). The prominent holidays (the three largest of which) are the Super Bowl, Thanksgiving, and Christmas. The three holidays fall within the following weeks in the dataset.

  • Thanksgiving — 25 November 2010
  • Christmas — 25 December 2010
  • Super Bowl — 6 February 2011
  • Thanksgiving — 25 November 2011
  • Christmas — 25 December 2011
  • Super Bowl — 5 February 2012

Based on the plot, ‘Thanksgiving’ and ‘Christmas’ played a significant role in the growth of sales. (the weeks which obtained ‘Thanksgiving’ and ‘Christmas’ were the highest points within the dataset). The third top point demonstrates the effect of ‘Super Bowl’ effect.

Figure 4: Weekly Time Series Data

Forecasting Approach

To build efficient forecasting models, I need to answer two questions.

  • Which stores and departments play a significant role in terms of business?
  • Is the data of each ‘store & dept’ sufficient to build a forecasting model?

According to the store and department plots above, I found the top departments and stores that contributed to 95% of the total sales. Then, I checked the length of time-series data for each store & department and realized that weekly sales data of each department at stores differentiate from each other in a given dataset. My forecasting model does not give any forecast for any store & department combination whose data size is below 13 weeks. Regarding the size of weekly sales data, I implemented different forecasting methods. For instance, to run Holt Winter’s method, I need to have sufficient data to divide actual demand into level, trend and seasonality components. Hence, I implemented Holt Winter’s model into store & department datasets which have more than and equal to 104 weeks (full two years data). Based on the modeling framework, my holdout period is the last 13 weeks for each SKU and I provided 39 forecasting periods for each SKU.

Figure 5: Forecasting Model

Afterward, I checked the forecasting lag of each store & dept since there have been data gaps in the forecasting model. For instance, department 47 in store 2 has a training dataset until 20 July 2012, yet the starting forecasting period is 14 December 2012 (147 days forecasting lag). Data scientists have to be careful and know when their training set ends and the true forecasting period starts. Thus, I eliminated the datasets which have more than 7 days forecasting lag at the initial step. (consider it in later phase)

Figure 6: Forecasting Lag

Pre-processing

When I checked the dataset, I observed negative and zero sales values which represent returns from customers. There are 1258 negative sales values and 73 observations did not have any sales in the train dataset. To build a sufficient forecasting model, I replaced the negative sales of the departments at each store with an average of the department’s sales at each store. Then, I modified the data type of ‘date’ column as ‘DateTime’to use dates efficiently in ‘pandas’ package. Besides, I created a unique SKU code by combining the store & departments. In addition to that, I removed the attributes which are not essential for statistical forecasting models such as holidays, promotions, CPI and unemployment. These would be leveraged in future machine learning model development.

Statistical Forecasting Models

Statistical forecasting methods are intuitive to build and implement on demand planning processes. The goal is to isolate patterns in past data. The patterns that arise most often are:

  • Trend — Stable increasing or decreasing pattern.
  • Seasonality — Pattern repeated at fixed intervals (weekly, monthly, etc.)
  • Cycles — Pattern repeated cyclically at variable intervals.
  • Randomness — Random pattern with no recognizable structure.

Figure 7 demonstrates the idea of forecasting models visually separating historical data into several components. It represents the historical sales of department 94 in store 4.

Figure 7: Components of Time Series Model

Moreover, each forecasting model has different parameters such as moving average (ma), alpha (level), beta (trend), phi (damped trend), and gamma (seasonality) which are hyper-parameters that need to be tuned in the training set. They are optimized by running the whole job, checking the accuracy, and adjusting to improve the system. Hence, I applied hyper-parameter tuning by minimizing the loss function (error rate).

1.Moving Average

The first forecasting model I developed is the simplest. The main idea is that future demand is similar to the recent demand observed. With this model, I will assume that the forecast is the average of the demand during the last n periods.

Figure 8: Moving Average Formula

Based on the formula above, I ran a moving average model for SKUs with low sales volume. Since these store and department combinations contributed less to the total sales, I decided not to experiment with the later techniques presented in this article. I optimized a ‘MA’ value for each SKU by maximizing forecasting accuracy. I would like to illustrate one example which belongs to department 59 at store 30. For this SKU, optimal MA value is 6 which means that the model considers the last 6 periods to predict the next period. Although the moving average model works well on average (accuracy is 75.2%), the model has a difficulty to understand the sharp fluctuations. In other ways, it ignores complex relationships in data.

Figure 9: Moving Avg. Example

2.Simple Exponential

It is one of the simplest ways to forecast time series data. The model will be able to learn from historical sales. It will then forecast the future demand as its last estimation of the level. The underlying idea of the algorithm is that, at each period, the model will learn a bit from the most recent demand observation and remember a bit of the last forecast it did.

Figure 10: Simple Expo. Components Formula

According to the given formula, I performed a simple exponential model for some SKUs whose data size is below 52 weeks. When implementing the model in Python, I optimized ‘alpha’ value for each SKU by maximizing its forecasting accuracy. I would like to demonstrate one example which belongs to department 96 in store 17. The optimal alpha value is 0.6 which points out learning more from the most recent demand observation than the last forecast. This resulted in 83.6% forecast accuracy. Even though the simple smoothing model identifies the pattern, there is a still lag between the predicted and actual value.

Figure 11: Simple Exponential Example

3.Double Exponential

A major issue known from simple exponential smoothing methods is a lack of identifying a trend in data. The trend can be described as the average variation of time series level between two consecutive periods. (Remember level was the average actual value around which the demand varies over time). This new model will predict the trend based on weight which is beta (β) giving more or less importance to the most recent observations.

The fundamental idea is that each component (level and trend) will be updated each period based on two information:

  • The most recent observation of this component
  • The previous estimation of this component
Figure 12: Double Expo. Components Formula

Using the given formula above, I performed a double exponential model for some SKUs whose data size is between 52 and 104 weeks. I carried out tuning ‘alpha’ and ‘beta’ values by maximizing forecasting accuracy. I would like to demonstrate one example which belongs to department 93 at store 16. The optimal parameters for alpha and beta are 0.4 and 0.2 respectively. Although the historical sales fluctuate substantially, the double exponential model works well on average. (the forecasting accuracy is 90.7%)

Figure 13: Double Exponential Example

4.Triple Exponential

Simple Exponential and Double Exponential models do not recognize seasonal patterns and thus can not extrapolate any seasonal behavior in the future. To learn a seasonal pattern, we will add a third component to exponential smoothing. Firstly, the level and trend are deseasonalized. Then, seasonality factors are added into the general formula.

Figure 14: Triple Expo. Components Formula

The fundamental idea is that the forecasting model will learn multiplicative seasonal factors that will be implemented for each period inside a full seasonal cycle. Then, the forecast is comprised of level (a) plus damped trend (b) multiplied by seasonal factor (s). The advantage of the damping factor is to drastically reducing the impact of the trend for the future forecast. With the damped trend, the model will forget the trend and the forecast will remain at a stable level.

Figure 15: Triple Expo. General Formula

In the light of all formulas explained above, I ran a multiplicative triple exponential model for some SKUs whose data size is above 104 weeks by using 52 weeks seasonality length. To maximize forecast accuracy, I implemented hyper-parameter tuning to decide optimal ‘alpha’, ‘beta’, ‘phi’ and ‘gamma’ values. I would like to indicate one sample which represents department 94 in store 8. For that, the optimal parameter combination is (alpha=0.4, beta=0.2, phi=0.2, gamma=0.2). Based on that, the model explores sales giving more importance to the prior recent observations. The forecasting accuracy of this SKU is 96.45%.

Figure 16: Triple Exponential Example

Results

After creating the forecast model, the next step is to quantify its accuracy. Mean Absolute Error (MAE) % is a very good indicator to measure forecast accuracy. This error gives more insights than other KPIs. (considering the average absolute error over the average demand as shown in Figure 17). The problem with other KPIs such as Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), they are not scaled to the average demand.

Figure 17: Forecast Error Formula

Based on this formula, I created a final result table that shows the forecasting error and accuracy along with the associated model for each store & department combination. The below table is just a sample output among 1776 store & department combinations.

Figure 18: Result Table

Finally, I plotted the distribution of forecast accuracy for all store & department combinations. It can be seen that the distribution is negatively skewed and the forecasting model performs well on average even if there are several outlier points. The average accuracy is 88.2% across Walmart stores.

Figure 19: Accuracy Distribution

Next Steps

One limitation of the forecasting models explained above is the inability to deal with any external input (GDP growth, unemployment rate, weather, etc.). Regarding that, I planned to build machine learning algorithms on this dataset by concentrating on both external and internal factors such as holidays, promotions, CPI and unemployment. Then, I am going to compare statistical forecasting methods and machine learning models by checking their forecast accuracy.

This was a detailed analysis of statistical forecasting models in retail management. I would like to share my baseline code (not final one) to help data science candidates. Please leave any comment you have and also if you would like me to explain anything further!

What are the time series forecasting techniques?

Times series methods refer to different ways to measure timed data. Common types include: Autoregression (AR), Moving Average (MA), Autoregressive Moving Average (ARMA), Autoregressive Integrated Moving Average (ARIMA), and Seasonal Autoregressive Integrated Moving-Average (SARIMA).

What is the main objective of time series forecasting?

There are two main goals of time series analysis: identifying the nature of the phenomenon represented by the sequence of observations, and forecasting (predicting future values of the time series variable).

What are the 4 components of a time series?

These four components are:.
Secular trend, which describe the movement along the term;.
Seasonal variations, which represent seasonal changes;.
Cyclical fluctuations, which correspond to periodical but not seasonal variations;.
Irregular variations, which are other nonrandom sources of variations of series..