module datasetsforecast.favorita
function numpy_balance
*arrs: NumPy arrays.
np.ndarray: NumPy array with balanced combinations.
function numpy_ffill
ffill that fills missing values.
Fills missing values in an array by propagating the last non-missing value forward.
For example, if the array has the following values: 0 1 2 3 1 2 NaN 4
The ffill method would fill the missing values as follows: 0 1 2 3 1 2 2 4
Args:
arr(np.ndarray): NumPy array.
np.ndarray: NumPy array with forward-filled values.
function numpy_bfill
bfill that fills missing values.
Fills missing values in an array by propagating the last non-missing value backwards.
For example, if the array has the following values: 0 1 2 3 1 2 NaN 4
The bfill method would fill the missing values as follows: 0 1 2 3 1 2 4 4
Args:
arr(np.ndarray): NumPy array.
np.ndarray: NumPy array with backward-filled values.
function one_hot_encoding
df(pd.DataFrame): DataFrame with categorical columns.index_col(str): The index column to avoid encoding.
pd.DataFrame: DataFrame with one hot encoded categorical columns.
function nested_one_hot_encoding
df(pd.DataFrame): DataFrame with hierarchically-nested categorical columns.index_col(str): The index column to avoid encoding.
pd.DataFrame: DataFrame with one hot encoded hierarchically-nested categorical columns.
function get_levels_from_S_df
S_df(pd.DataFrame): Summing matrix of size (base, bottom), see aggregate method.
list: Hierarchical aggregation indexes, where each entry is a level.
function distance_to_holiday
function make_holidays_distance_df
class CodeTimer
method __init__
class Favorita200
Favorita200(freq: str = ‘D’, horizon: int = 34, seasonality: int = 7, test_size: int = 34, tags_names: Tuple[str] = (‘Country’, ‘Country/State’, ‘Country/State/City’, ‘Country/State/City/Store’))
method __init__
class Favorita500
Favorita500(freq: str = ‘D’, horizon: int = 34, seasonality: int = 7, test_size: int = 34, tags_names: Tuple[str] = (‘Country’, ‘Country/State’, ‘Country/State/City’, ‘Country/State/City/Store’))
method __init__
class FavoritaComplete
class FavoritaRawData
Favorita Raw Data.
Raw subset datasets from the Favorita 2018 Kaggle competition. This class contains utilities to download, load and filter portions of the dataset.
If you prefer, you can also download original dataset available from Kaggle directly:
method download
directory(str): Directory where data will be downloaded.
method unzip
class FavoritaData
Favorita Data.
The processed Favorita dataset of grocery contains item sales daily history with additional information on promotions, items, stores, and holidays, containing 371,312 series from January 2013 to August 2017, with a geographic hierarchy of states, cities, and stores. This wrangling matches that of the DPMN paper.
References: Kin G. Olivares, O. Nganba Meetei, Ruijun Ma, Rohan Reddy, Mengfei Cao, Lee Dicker (2022). “Probabilistic Hierarchical Forecasting with Deep Poisson Mixtures”. International Journal Forecasting, special issue. https://doi.org/10.1016/j.ijforecast.2023.04.007
method load
directory(str): Directory where data will be downloaded and saved.group(str): Dataset group name in ‘Favorita200’, ‘Favorita500’, ‘FavoritaComplete’.cache(bool, optional): If True saves and loads. Defaults to True.verbose(bool, optional): Whether or not print partial outputs. Defaults to False.
tuple: A tuple containing:- Y_df (pd.DataFrame): Target base time series with columns [‘item_id’, ‘hier_id’, ‘ds’, ‘y’].
- S_df (pd.DataFrame): Hierarchical constraints dataframe of size (base, bottom).
- tags (dict): Dictionary with hierarchical level information.
method load_preprocessed
directory(str): Directory where data will be downloaded and saved.group(str): Dataset group name in ‘Favorita200’, ‘Favorita500’, ‘FavoritaComplete’.cache(bool, optional): If True saves and loads. Defaults to True.verbose(bool, optional): Whether or not print partial outputs. Defaults to False.
tuple: A tuple containing:- static_bottom (pd.DataFrame): Static variables of bottom level series.
- static_agg (pd.DataFrame): Static variables of aggregate level series.
- temporal_bottom (pd.DataFrame): Temporal variables of bottom level series.
- temporal_agg (pd.DataFrame): Temporal variables of aggregate level series.
This file was automatically generated via lazydocs.

