TimeGPT on top of Spark.
Outline:
1. Installation
Install Spark through Fugue. Fugue provides an easy-to-use interface for distributed computing that lets users execute Python code on top of several distributed computing frameworks, including Spark.Note You can installIf executing on a distributedfuguewithpip:
Spark cluster, ensure that the nixtla
library is installed across all the workers.
2. Load Data
You can load your data as apandas DataFrame. In this tutorial, we
will use a dataset that contains hourly electricity prices from
different markets.
| unique_id | ds | y | |
|---|---|---|---|
| 0 | BE | 2016-10-22 00:00:00 | 70.00 |
| 1 | BE | 2016-10-22 01:00:00 | 37.10 |
| 2 | BE | 2016-10-22 02:00:00 | 37.10 |
| 3 | BE | 2016-10-22 03:00:00 | 44.75 |
| 4 | BE | 2016-10-22 04:00:00 | 37.10 |
3. Initialize Spark
InitializeSpark and convert the pandas DataFrame to a Spark
DataFrame.
4. Use TimeGPT on Spark
UsingTimeGPT on top of Spark is almost identical to the
non-distributed case. The only difference is that you need to use a
Spark DataFrame.
First, instantiate the
NixtlaClient
class.
👍 Use an Azure AI endpoint To use an Azure AI endpoint, set theThen use any method from thebase_urlargument:nixtla_client = NixtlaClient(base_url="you azure ai endpoint", api_key="your api_key")
NixtlaClient
class such as
forecast
or
cross_validation.
📘 Available models in Azure AI If you are using an Azure AI endpoint, please be sure to setmodel="azureai":nixtla_client.forecast(..., model="azureai")For the public API, we support two models:timegpt-1andtimegpt-1-long-horizon. By default,timegpt-1is used. Please see this tutorial on how and when to usetimegpt-1-long-horizon.
TimeGPT on top of Spark.
To do this, please refer to the Exogenous
Variables
tutorial. Just keep in mind that instead of using a pandas DataFrame,
you need to use a Spark DataFrame instead.
5. Stop Spark
When you are done, stop theSpark session.

