Matthew’s Blog - Facebook Prophet Evaluation

At this point I have done quite a lot of deep learning - both computer vision and natural language processing. One area that I’ve done less is timeseries analysis.

Facebook has released Prophet which is an easy to use forecasting tool for a single numeric timeseries (Taylor and Letham 2018). I’m going to try to predict the daily price of bitcoin with this. Given how volatile the cryptocurrency world is I do not have any confidence in this.

Taylor, Sean J., and Benjamin Letham. 2018. “Forecasting at Scale.” The American Statistician 72 (1): 37–45. https://doi.org/10.1080/00031305.2017.1380080.

Data Collection and Cleaning

Let’s start by getting the data. I’m using the binance exchange data from cryptodatadownload. This does require some special work, as the https certificate for the site is invalid and the source data needs some work.

Code

#hide_output
import pandas as pd
import requests
import io

# skip ssl certificate verification
response = requests.get(
    "https://www.cryptodatadownload.com/cdd/Binance_BTCUSDT_d.csv",
    verify=False
)

# first row is a link to cryptodatadownload, need to skip it
df = pd.read_csv(
    io.StringIO(response.content.decode("utf-8")),
    skiprows=1
)

df["date"] = pd.to_datetime(df.date)
df = df.drop(columns="unix")
df = df.set_index("date")

/home/matthew/.cache/pypoetry/virtualenvs/blog-HrtMnrOS-py3.8/lib/python3.8/site-packages/urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host 'www.cryptodatadownload.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  warnings.warn(

Code

df

	symbol	open	high	low	close	Volume BTC	Volume USDT	tradecount
date
2021-07-17	BTC/USDT	31383.86	31549.41	31305.00	31338.05	579.784658	1.820279e+07	11935.0
2021-07-16	BTC/USDT	31874.49	32249.18	31020.00	31383.87	48499.864154	1.538343e+09	1067591.0
2021-07-15	BTC/USDT	32820.03	33185.25	31133.00	31880.00	51639.576353	1.652078e+09	1099367.0
2021-07-14	BTC/USDT	32729.12	33114.03	31550.00	32820.02	46777.823484	1.515692e+09	1123129.0
2021-07-13	BTC/USDT	33086.94	33340.00	32202.25	32729.77	41126.361008	1.348583e+09	956053.0
...	...	...	...	...	...	...	...	...
2017-08-21	BTC/USDT	4086.29	4119.62	3911.79	4016.00	685.120000	2.770592e+06	NaN
2017-08-20	BTC/USDT	4139.98	4211.08	4032.62	4086.29	463.540000	1.915636e+06	NaN
2017-08-19	BTC/USDT	4108.37	4184.69	3850.00	4139.98	371.150000	1.508239e+06	NaN
2017-08-18	BTC/USDT	4285.08	4371.52	3938.77	4108.37	1178.070000	4.994494e+06	NaN
2017-08-17	BTC/USDT	4469.93	4485.39	4200.74	4285.08	647.860000	2.812379e+06	NaN

1432 rows × 8 columns

Code

df.to_parquet(
    "/data/blog/2021-07-17-facebook-prophet-evaluation/btc.gz.parquet",
    compression="gzip"
)

Data Review

So now we can try to apply prophet to this. Before we do so, lets have a quick look at the data. I want to take the year of 2021 as the held out data to predict - is this reasonable?

Code

(
    df.index.year
        .value_counts()
        .sort_index()
)

2017    138
2018    365
2019    365
2020    366
2021    198
Name: date, dtype: int64

The value to predict is the open price of BTC each day. What does that look like?

Code

df.open.describe()

count     1432.000000
mean     13551.845182
std      13510.322929
min       3189.020000
25%       6563.485000
50%       8809.975000
75%      11507.597500
max      63575.010000
Name: open, dtype: float64

Code

df.open.plot() ; None

This already looks impossible to predict. The value has gone nuts in 2021. I may have to revise the evaluation.

Data Split and Prediction

So lets start by doing 2017-2020 vs 2021.

Code

train_df = (
    df[df.index.year < 2021]
        .reset_index()
        [["date", "open"]]
        .rename(columns={"date": "ds", "open": "y"})
)
test_df = (
    df[df.index.year == 2021]
        .reset_index()
        [["date", "open"]]
        .rename(columns={"date": "ds", "open": "y"})
)

len(train_df), len(test_df)

(1234, 198)

Code

%%time

from prophet import Prophet

model = Prophet()
model.fit(train_df) ; None

INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.

CPU times: user 804 ms, sys: 35.8 ms, total: 840 ms
Wall time: 838 ms

Code

forecast = model.predict(test_df)

model.plot(forecast) ; None

Code

pd.merge(
    df[["open"]],
    forecast[["ds", "yhat"]],
    left_index=True,
    right_on="ds",
    how="outer"
).set_index("ds").plot() ; None

So somehow prophet was not able to see that BTC got bananas. I’m not really surprised. I wonder if a 2017-2018 vs 2019 predict would work better?

Code

train_df = (
    df[df.index.year < 2019]
        .reset_index()
        [["date", "open"]]
        .rename(columns={"date": "ds", "open": "y"})
)
test_df = (
    df[df.index.year == 2019]
        .reset_index()
        [["date", "open"]]
        .rename(columns={"date": "ds", "open": "y"})
)

len(train_df), len(test_df)

(503, 365)

Code

%%time

from prophet import Prophet

model = Prophet()
model.fit(train_df) ; None

INFO:prophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.

CPU times: user 115 ms, sys: 35.7 ms, total: 150 ms
Wall time: 149 ms

Code

%%time

from prophet import Prophet

model = Prophet()
model.fit(train_df) ; None

INFO:prophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.

CPU times: user 116 ms, sys: 32 ms, total: 148 ms
Wall time: 147 ms

Code

forecast = model.predict(test_df)

model.plot(forecast) ; None

Code

pd.merge(
    df[df.index.year < 2020][["open"]],
    forecast[["ds", "yhat"]],
    left_index=True,
    right_on="ds",
    how="outer"
).set_index("ds").plot() ; None

So this is one where there seems to be a repeated pattern but prophet has failed to match any previous behaviour. My best guess is that it matches the downward trend from the peak around Q1 2018. All in all, quite disappointing. Can’t use prophet to make my fortune with crypto :-P