Prophet, released by Facebook Open Source, is an interesting library for Python and R. Straddling the line between statistical modeling and machine learning, it is an open source tool I have not seen talked about much, even though it is rather interesting. Prophet is a tool for forecasting time series data, such as weather or page views.
To get started, here is a little tutorial of Prophet.
As seen on the issues page on GitHub, Prophet has some problems working on Windows computers. While a majority of the functionality will work, some parts, such as graphing the predictions, will not work, partly because of Prophet and because of the main dependency, PyStan, which also seems to have problems on Windows. As usual to install PyStan to get started:
To begin, follow the previous installation along with doing the below to install prophet on your machine.
For this example, we will be using a dataset from PageViews, a Wikimedia service that allows one to see the daily page views of any page of Wikipedia, along with other related services such as redirect views and compare across languages, a useful service to track the popularity of subjects.
For this tutorial, we will be looking at the pageviews of newly famous politician Alexandria Ocasio-Cortez, mainly because it is interesting how much reporting around her has exploded, and this can be seen with her pageview results.
To start, here is the following
One does not need matplotlib for Prophet, but it will prove useful for doing an initial look at the dataset. We need Pandas to shape the CSV file into a data frame useable by Prophet. The data itself requires two forms of data. First the time series data in the form of month/day/year and a Y variable, such as pageviews in this case, which is the data we are forecasting.
Before working with Prophet, here is some simple tools from Pandas to have a look at the dataset
Describe result here
y count 21.000000 mean 8683.809524 std 39661.332898 min 2.000000 25% 12.000000 50% 24.000000 75% 32.000000 max 181780.000000
As seen, the page is quite new, with their being a very large difference between the min, max, mean, and standard deviation (std) That max is from the last day of the set, being 6/27/2018, the day after she won her primary in 14th congressional district of New York against incumbent Democratic Representative Joseph Crowley.
Now here is the basic code to work with Prophet.
First with Prophet() to initiate the Prophet object to analyze the dataframe, then we with the Prophet object fit our dataset into it, and in that last line of code, we make a forecast that is equal to 365 days, with periods=365 meaning we specify to look 365 days/1 year later.
And now for the last portion, where we can look at the results of the analysis.
Sometimes when running the code, the phrase Hessian Matrix will show up. The Hessian Matrix is a matrix of 2nd partial derivatives.
For example, let us say we have x2 + y2 + z2. The first derivative would be 2x + 2y + 2z and the 2nd derivative would be 2 + 2 + 2.
Partial derivatives are only when you take the derivative of one part of the equation however, so the partial derivative of the first unknown would be 2x + y2 + z2, with the 2nd partial derivative for x being 2 + 0 + 0. We can plot these results into a matrix then!
With these numbers for example:
[2,0,0 (2nd partial derivative of x) 0,2,0 (2nd partial derivative of y) 0,0,2](2nd partial derivative of z)
Now for the results of the analysis.
First let us look at the print statement:
ds yhat yhat_lower yhat_upper 381 2019-06-23 687963.404739 652022.556382 728502.770550 382 2019-06-24 687961.602067 649358.178347 726809.029122 383 2019-06-25 688009.537470 649664.221806 725880.952722 384 2019-06-26 748545.400747 708648.671636 789416.559359 385 2019-06-27 700957.348619 659259.490072 738761.610851
Y-hat stands for predicted y value on this case, in this case we are predicting average page views. Here we are printing the tail of our future forecasts, so for example on day 381 we get an average y-hat of
687963.404739. The lower and upper y-hate values represent the 25% / 75% range in forecasted values respectively.
Now a look with pandas at a description of the forecast value.
trend yhat_lower ... multiplicative_terms_upper yhat count 386.000000 386.000000 ... 386.0 386.000000 mean 347070.886224 307883.437338 ... 0.0 347062.884321 std 206902.507379 208053.515036 ... 0.0 207938.495639 min -9902.806374 -54435.244935 ... 0.0 -13006.850626 25% 168583.245236 129992.390964 ... 0.0 168733.079942 50% 347070.857915 310056.019017 ... 0.0 350458.653974 75% 525558.470594 487103.625723 ... 0.0 525716.758356 max 704046.083273 708648.671636 ... 0.0 748545.400747
The count in this case includes the initial data points before we initiated profit. The min value is a bit strange, but sans that the numbers look reasonable. However one cannot take these values at face value initially, not because they are wrong, but it is good to double check ones claims.
To finish off, let us compare our predictions to the actual Pageviews results for the past two days.
6/28/2018: 133,285 6/29/2018: 83,014
With the last value, the forecast is rather close! The discrepancy can be explained because her page is so new, and the change values is rather extreme. To test Prophet better, it would be easier to do this on a larger dataset that goes back further in time, for example the page for The United States of America.
Future ways to use Prophet include using it as a tool to look at datasets that have time series data such as stocks or adding an additional variable to look at special changes in time series data or check for seasonality in the model.