Facebook Prophet Evaluation

Can the prophet predict the most erratic of all timeseries?
Published

July 17, 2021

At this point I have done quite a lot of deep learning - both computer vision and natural language processing. One area that I’ve done less is timeseries analysis.

Facebook has released Prophet which is an easy to use forecasting tool for a single numeric timeseries (Taylor and Letham 2018). I’m going to try to predict the daily price of bitcoin with this. Given how volatile the cryptocurrency world is I do not have any confidence in this.

Taylor, Sean J., and Benjamin Letham. 2018. “Forecasting at Scale.” The American Statistician 72 (1): 37–45. https://doi.org/10.1080/00031305.2017.1380080.

Data Collection and Cleaning

Let’s start by getting the data. I’m using the binance exchange data from cryptodatadownload. This does require some special work, as the https certificate for the site is invalid and the source data needs some work.

Code
#hide_output
import pandas as pd
import requests
import io

# skip ssl certificate verification
response = requests.get(
    "https://www.cryptodatadownload.com/cdd/Binance_BTCUSDT_d.csv",
    verify=False
)

# first row is a link to cryptodatadownload, need to skip it
df = pd.read_csv(
    io.StringIO(response.content.decode("utf-8")),
    skiprows=1
)

df["date"] = pd.to_datetime(df.date)
df = df.drop(columns="unix")
df = df.set_index("date")
/home/matthew/.cache/pypoetry/virtualenvs/blog-HrtMnrOS-py3.8/lib/python3.8/site-packages/urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host 'www.cryptodatadownload.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  warnings.warn(
Code
df
symbol open high low close Volume BTC Volume USDT tradecount
date
2021-07-17 BTC/USDT 31383.86 31549.41 31305.00 31338.05 579.784658 1.820279e+07 11935.0
2021-07-16 BTC/USDT 31874.49 32249.18 31020.00 31383.87 48499.864154 1.538343e+09 1067591.0
2021-07-15 BTC/USDT 32820.03 33185.25 31133.00 31880.00 51639.576353 1.652078e+09 1099367.0
2021-07-14 BTC/USDT 32729.12 33114.03 31550.00 32820.02 46777.823484 1.515692e+09 1123129.0
2021-07-13 BTC/USDT 33086.94 33340.00 32202.25 32729.77 41126.361008 1.348583e+09 956053.0
... ... ... ... ... ... ... ... ...
2017-08-21 BTC/USDT 4086.29 4119.62 3911.79 4016.00 685.120000 2.770592e+06 NaN
2017-08-20 BTC/USDT 4139.98 4211.08 4032.62 4086.29 463.540000 1.915636e+06 NaN
2017-08-19 BTC/USDT 4108.37 4184.69 3850.00 4139.98 371.150000 1.508239e+06 NaN
2017-08-18 BTC/USDT 4285.08 4371.52 3938.77 4108.37 1178.070000 4.994494e+06 NaN
2017-08-17 BTC/USDT 4469.93 4485.39 4200.74 4285.08 647.860000 2.812379e+06 NaN

1432 rows × 8 columns

Code
df.to_parquet(
    "/data/blog/2021-07-17-facebook-prophet-evaluation/btc.gz.parquet",
    compression="gzip"
)

Data Review

So now we can try to apply prophet to this. Before we do so, lets have a quick look at the data. I want to take the year of 2021 as the held out data to predict - is this reasonable?

Code
(
    df.index.year
        .value_counts()
        .sort_index()
)
2017    138
2018    365
2019    365
2020    366
2021    198
Name: date, dtype: int64

The value to predict is the open price of BTC each day. What does that look like?

Code
df.open.describe()
count     1432.000000
mean     13551.845182
std      13510.322929
min       3189.020000
25%       6563.485000
50%       8809.975000
75%      11507.597500
max      63575.010000
Name: open, dtype: float64
Code
df.open.plot() ; None

This already looks impossible to predict. The value has gone nuts in 2021. I may have to revise the evaluation.


Data Split and Prediction

So lets start by doing 2017-2020 vs 2021.

Code
train_df = (
    df[df.index.year < 2021]
        .reset_index()
        [["date", "open"]]
        .rename(columns={"date": "ds", "open": "y"})
)
test_df = (
    df[df.index.year == 2021]
        .reset_index()
        [["date", "open"]]
        .rename(columns={"date": "ds", "open": "y"})
)

len(train_df), len(test_df)
(1234, 198)
Code
%%time

from prophet import Prophet

model = Prophet()
model.fit(train_df) ; None
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
CPU times: user 804 ms, sys: 35.8 ms, total: 840 ms
Wall time: 838 ms
Code
forecast = model.predict(test_df)

model.plot(forecast) ; None

Code
pd.merge(
    df[["open"]],
    forecast[["ds", "yhat"]],
    left_index=True,
    right_on="ds",
    how="outer"
).set_index("ds").plot() ; None

So somehow prophet was not able to see that BTC got bananas. I’m not really surprised. I wonder if a 2017-2018 vs 2019 predict would work better?

Code
train_df = (
    df[df.index.year < 2019]
        .reset_index()
        [["date", "open"]]
        .rename(columns={"date": "ds", "open": "y"})
)
test_df = (
    df[df.index.year == 2019]
        .reset_index()
        [["date", "open"]]
        .rename(columns={"date": "ds", "open": "y"})
)

len(train_df), len(test_df)
(503, 365)
Code
%%time

from prophet import Prophet

model = Prophet()
model.fit(train_df) ; None
INFO:prophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
CPU times: user 115 ms, sys: 35.7 ms, total: 150 ms
Wall time: 149 ms
Code
%%time

from prophet import Prophet

model = Prophet()
model.fit(train_df) ; None
INFO:prophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
CPU times: user 116 ms, sys: 32 ms, total: 148 ms
Wall time: 147 ms
Code
forecast = model.predict(test_df)

model.plot(forecast) ; None

Code
pd.merge(
    df[df.index.year < 2020][["open"]],
    forecast[["ds", "yhat"]],
    left_index=True,
    right_on="ds",
    how="outer"
).set_index("ds").plot() ; None

So this is one where there seems to be a repeated pattern but prophet has failed to match any previous behaviour. My best guess is that it matches the downward trend from the peak around Q1 2018. All in all, quite disappointing. Can’t use prophet to make my fortune with crypto :-P