A guide to ETH price prediction

Andrea Armanni
Ocean Protocol
Published in
5 min readNov 10, 2022

--

Analysing the winning submission of the ETH prediction challenge

Accurately predicting crypto prices can be challenging due to their extremely volatile nature. On top of this, there can also be external macro and micro factors such as political and/or economic conditions influencing capital inflow and outflow.

The large amount of external factors combined with the transparent nature of blockchains allows us to access a large variety of data that can be used to find patterns and feature vectors for our price prediction models.

This blog post explains how to use supervised machine learning techniques and regression algorithms for the Ocean ETH Predict data challenge. Particularly it will reference the techniques used by the winning contestant to provide insights into their work.

The objective of the data challenge was to predict ETH price every hour over a period of 24 hours, with a provided dataset with historical ETH price data. The Ocean.py library was used to privately & securely consume and publish data assets, in this case the outcome of the models.

Let’s explore!

Step 1 — Retrieve and Read the Data

For this challenge we will be using historical data provided via the predict-eth README. The data feed returns the most recent 500 hours of OHLC (Open, High, Low, Close) history of ETH/USDT, from Binance.

# get the data

url = 'https://cexa.oceanprotocol.io/ohlc?exchange=binance&pair=ETH/USDT&period=1h'
df = pd.read_json(url)
df.columns = ['dt1','open','high','low','close','volume']
# convert to UTC+0
a = [dt.fromtimestamp(x/1000) - relativedelta(hours=6) for x in df.dt1]
df['dt1'] = a
df.to_csv('data/ETH_USDT_free.csv', index = False)
df.set_index('dt1', inplace = True)
print(df.shape)
display(df[-3:])
_ = df['close'].plot(figsize = (10,4))

As shown above, the contestant here converted the time to UTC+0 timezone as per competition guidelines.

Step 2 — Functions & Models Training

Once the training data was collected and polished and/or augmented [Related: Value from data pipelines] the next step was to identify the best prediction algorithm to use for predicting ETH price.

There are different types of machine learning algorithms, such as regression, decision trees, support vector machines (SVM) and more. Each algorithm has its own benefits and drawbacks. In this specific case, the contestant examined 3 different algorithms: exponential smoothing, last available value prediction and prophet model.

To evaluate the accuracy of each model, the following metrics have been used: NMSE (normalised mean squared error), MSE (mean squared error), MAE (mean absolute error), and MAPE (mean absolute percentage error).

def ts_metrics(y_true, y_pred):
return {
'mae': metrics.mean_absolute_error(y_true, y_pred),
'mse': metrics.mean_squared_error(y_true, y_pred),
'mape': np.mean(np.abs((y_true - y_pred) / y_true)),
'smape': np.mean(np.abs( 2*(y_true - y_pred) / (y_true+np.abs(y_pred)))),
'nmse': np.sum((y_true - y_pred)**2) / np.sum((y_true)**2),
}

To run the analysis, the contestant selected picked one day as an evaluation set and used the data before that day for training. Then, to the training model one more parameter was added called “hours_skip” — this is the number of last values that needed to be excluded from the training set.

As shown in the picture below, Prophet resulted to be the most accurate model and so it was chosen as the prediction algorithm for this challenge.

Step 3 — Final Predictions & Exploration

Now that the model has been tested and no relevant drop in quality was identified when using the provided dataset, the contestant proceeded with using the Prophet model and added the following parameter:

m = Prophet (weekly_seasonality=True, daily_seasonality=True)
from prophet import Prophet
# 1) prepare dataframe
x = df.close
x = x.reset_index()
x.columns = ['ds','y']
# 2) build model
m = Prophet(weekly_seasonality=True, daily_seasonality=True, )
m.fit(x)
# # 3) predict
t1 = 50
df1 = m.make_future_dataframe(periods=t1, freq='h')
df_out = m.predict(df1)
# 4) plotting
_ = x.set_index('ds').plot(figsize = (10,4))
_ = df_out.set_index('ds')['yhat'].plot()
_ = _.legend(['real','prophet'])

Finally, by running the command m.plot_components() we can see the overall trend that ETH is following, as well as check its price performance filtered daily and weekly.

Step 4 — Upload & Publish Predictions via Ocean.py

It’s now time to use the Ocean.py library to publish the output of the model in a tamper-proof manner. For this we will start by publishing the prediction in a permanent decentralized storage using a convenient Python approach: a wrapper of Bundlr Network.

# Put the csv online
from pybundlr import pybundlrfile_name = "output.csv"url = pybundlr.fund_and_upload(file_name, "matic", alice_wallet.private_key)print(f"Your csv url: {url}")

Once the prediction is uploaded onto Arweave via the Bundlr wrapper, we will then use the Ocean.py library to publish the asset as a data NFT on Ocean. This way, the prediction is recorded on the blockchain for the judges to make a tamper-proof audit trail.

Hypothesis & Conclusion

While it is possible to analyse the historical price of ETH to find patterns and fit a price function, the normalized mean squared error tells us that we may miss the price by hundreds of dollars. This is due to several reasons such as:

  1. The model is too basic and it’s not calculating intraseries patterns.
  2. Not enough data sources have been used in training the model. Metrics such as whale ratio or momentum factor could have been incorporated and provided further insights into the price movement.
  3. External factors that are unpredictable may lead to over/underestimation of the price, making predictions always imprecise.

Crypto price prediction can be an interesting and fun task, and it can produce more and more results when better features are implemented into our models.

This ETH predict challenge was the first of a long series of challenges. We aim to incentivise the community of data scientists to understand how to use Ocean.py to create value across data pipelines whilst, over time, increasing the accuracy of their predictions.

About Ocean Protocol

Ocean Protocol is a decentralized data exchange platform spearheading the movement to democratise AI, break down data silos, and have open access to quality data. Ocean’s intuitive marketplace technology allows data to be published, discovered and consumed in a secure, privacy-preserving manner by giving power back to data owners. Ocean resolves the tradeoff between using private data and the risks of exposing it.

Follow Ocean Protocol on Twitter, Telegram, or GitHub. And chat directly with the Ocean community on Discord.

--

--

Web 3 advocate. Passionate about decentralized AI and DeFi with strong competencies in Token Design, Financial Markets, Social Economics and Product Management