Predictive Analytics with Business Intelligence Tools

Guest blog entry of Simon Grässli

Business intelligence (BI) tools are ideally suited for users without extensive knowledge of data analysis because of their graphical user interface. The main functionality of BI tools is the descriptive analysis of data, e.g. presenting historical sales figures for different departments. However, more and more companies want to use predictive capabilities such as forecasting sales figures. BI tools with predictive capabilities can be particularly useful for small and medium-sized businesses that cannot afford data analysts, but would still benefit from solving prediction problems. How strong is the performance of BI tools in terms of advanced analytics?

In this blog post, I will evaluate the capabilities of BI tools for predictive analytics. Specifically, I will solve a prediction problem from a real business case with different BI tools and compare their performance. The tools I will consider are Power BI Desktop, Tableau Desktop and Excel. I have not analyzed open source tools such as Pentaho and RapidMiner because they are more complex in terms of usability.

Sales forecasting

A common business case with a prediction problem is revenue forecasting. Exact sales forecasts make it possible, for example, to improve inventory management or staff planning. As a specific use case, I will test the BI tools in forecasting sales figures for a department in a Walmart store. The data is provided by Kaggle for Walmart’s recruiting competition.[1] The dataset contains historic weekly sales data for different departments in 45 Walmart stores. In addition, the data set contains weekly information on temperature, unemployment and consumer prices. Weekly observations are provided for 2 years and 9 months from February 2010 to October 2012.

I will use a subset of the data, namely the information of one department in one store. This will make the situation more representative of the situation of a small or medium-sized business. I will split the data into a training set and a test set. The training set contains 2 years of data from 2010 to 2012. The remaining 9 months in 2012 include the test set. This allows me to feed the training set data into the BI tool. The tool then makes a sales forecast for the following 9 months. Finally, I will compare the forecasted weekly sales figures with the real sales figures from the test set data. This allows me to assess the accuracy of the forecasts and compare it across the BI tools.

Performance of BI tools

Let’s start with Power BI. Power BI has a built-in forecasting model that is a modified version of the Holt-Winters algorithm.[2] This model uses historical sales data to predict future sales figures. It also attempts to automatically detect a seasonality. Seasonality describes a predictable pattern that repeats itself over a period of time. Power BI automatically detects seasons of 13 weeks, which is clearly wrong and gives poor forecasts. Fortunately, Power BI allows to manually define the length of a season. In our data, one season is one year, i.e. 52 weeks. After changing the seasonality, we get a reasonable forecast of the sales figures for the next 9 months, as the following graph shows. The black line represents the sales forecasts, while the grey area represents a 95% prediction interval.[3] The red dotted line shows the actual sales figures during the test period. This allows us to visually assess the performance of the prediction model.

 

A comparison of the forecasted sales figures (black line) with the real sales figures (red dotted line) yields an absolute prediction error of USD 1’100 or 12% on average.

For weekly sales forecasts with an annual season, Power BI delivers a reasonable performance. However, it is not possible to include other features such as temperature or economic information in the analyses. In addition, Power BI cannot handle multiple seasonalities. For example, if we had daily sales figures, there could be one cycle within a week and one over the year. In this case, however, Power BI could not take both patterns into account.

Tableau seems to be less flexible in time series forecasting. It has a built-in forecasting tool that uses an exponential smoothing model.[4] It also automatically selects a seasonality of 13 weeks per season. In contrast to Power BI, however, there is no possibility to manually overwrite the automatically selected seasonality. Therefore, Tableau’s forecasts with a mean absolute prediction error of USD 1’900 or 22% are poor. It is also not possible to include other features

Excel also comes with an integrated forecasting model. Again, the tool applies an exponential smoothing model. Although the tool is not able to automatically recognize the appropriate length of a season, it allows manual adjustment. The forecast of the weekly sales figures for the 9 months in the test period results in an absolute forecast error of USD 1’360 or 14% on average. Therefore, the performance of the Excel forecasting tool is slightly lower than that of Power BI. It is not possible to include characteristics other than historical sales.

If you want to include other variables such as temperature or unemployment in the prediction model, you need to write an R code. Both Power BI and Tableau can execute R codes and graphically describe the results. I trained a simple autoregressive integrated moving average (ARIMA) model with temperature as an independent variable. Like the models above, it uses the previous sales data as the basis for forecasting sales. However, it also includes information about the temperature in a particular week to predict sales in that week. Unfortunately, the mean absolute prediction error in the test set increases to USD 1’600 or 17%. If other variables such as consumer price index and unemployment rate are included, the error even increases to USD 2’100. The decline in performance with the inclusion of additional features is likely due to the fact that these variables are actually not predictive of sales. In the training set, however, the model identified correlations between sales figures and other features that have simply arisen by chance. This leads to overfitting of the model and worsening of the performance in the test sample. The conclusion that the features in this dataset are not important for predicting sales figures is supported by the winning model of the Kaggle competition, which does not take the features into account.[5]

Conclusion

BI tools work if there is just one recurring pattern over time and other features are not needed for the prediction. However, since BI tools have a limited capability of automatically detecting the length of a season, it is crucial whether the length of a season can be selected manually. Of the three tools analyzed, Tableau lacks this functionality and can therefore not be reliably used for forecasts. In contrast, Power BI and Excel allow the selection of the season length. Therefore, they can be used to make relatively reliable forecasts without programming skills. However, for time series with more complex relationships, it is still necessary to use a scripting language to build a model specifically for the available data.

Footnotes

[1] https://www.kaggle.com/c/walmart-recruiting-store-sales-forecasting

[2] https://powerbi.microsoft.com/en-us/blog/describing-the-forecasting-models-in-power-view/

[3] There is a 95% likelihood that the sales number will be within the grey area.

[4] Tableau does not disclose, which model it exactly uses. https://onlinehelp.tableau.com/current/pro/desktop/en-us/help.html#forecast_how_it_works.html%3FTocPath%3DBuild%2520Charts%2520and%2520Analyze%2520Data%7CBuild%2520Data%2520Views%2520from%2520Scratch%7CAnalyze%2520Data%7CForecast%2520Data%7C_____1

[5] https://www.kaggle.com/c/walmart-recruiting-store-sales-forecasting/discussion/8125