Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add temporal hierarchical forecasting #127

Closed
antoinecarme opened this issue Apr 30, 2020 · 11 comments
Closed

Add temporal hierarchical forecasting #127

antoinecarme opened this issue Apr 30, 2020 · 11 comments

Comments

@antoinecarme
Copy link
Owner

PyAF hierarchical forecasting is still missing a temporal aspect. Try to prototype some kind of signal aggregation based on temporal hierarchies.

A good starting point is :

Athanasopoulos, G., Hyndman, R.J., Kourentzes, N., and Petropoulos, F. (2016) Forecasting with temporal hierarchies.

Expected deliverable : Jupyter notebook.

@antoinecarme
Copy link
Owner Author

R package :

https://github.com/robjhyndman/thief

antoinecarme pushed a commit that referenced this issue May 1, 2020
antoinecarme pushed a commit that referenced this issue May 1, 2020
antoinecarme pushed a commit that referenced this issue May 1, 2020
antoinecarme pushed a commit that referenced this issue May 1, 2020
antoinecarme pushed a commit that referenced this issue May 1, 2020
antoinecarme pushed a commit that referenced this issue May 1, 2020
antoinecarme pushed a commit that referenced this issue May 1, 2020
Allow using temporal hierarchies (WIP). First doc
antoinecarme pushed a commit that referenced this issue May 1, 2020
Allow using temporal hierarchies (WIP). First script
antoinecarme pushed a commit that referenced this issue May 1, 2020
antoinecarme pushed a commit that referenced this issue May 2, 2020
antoinecarme pushed a commit that referenced this issue May 2, 2020
Some experiments with pandas datetime functions
antoinecarme pushed a commit that referenced this issue May 2, 2020
+    def computeTimeFrequency_in_seconds(self, iTime):
antoinecarme pushed a commit that referenced this issue May 2, 2020
Adapt model horizon to time resolution
User specoifies horizon for the most detailed level.
antoinecarme pushed a commit that referenced this issue May 2, 2020
Use pandas resampling functions to perform time-based aggregation.
Allow advanced periods in python ('2D' == 2 days , '30T' == 39 minutes etc)

Adapt model horizon to time resolution
User specoifies horizon for the most detailed level.
antoinecarme pushed a commit that referenced this issue May 2, 2020
antoinecarme pushed a commit that referenced this issue May 2, 2020
Added an articficial hourly signal (ozone) wiht sophisticated frequencies  ["H", "6H" , "12H" , "D"]
antoinecarme pushed a commit that referenced this issue May 2, 2020
antoinecarme pushed a commit that referenced this issue May 2, 2020
antoinecarme pushed a commit that referenced this issue May 3, 2020
Added soem tests with diffrent time hierarchies
antoinecarme pushed a commit that referenced this issue May 3, 2020
antoinecarme pushed a commit that referenced this issue May 3, 2020
antoinecarme pushed a commit that referenced this issue May 3, 2020
Updated these docs (a check for regressions)
antoinecarme pushed a commit that referenced this issue May 3, 2020
Temporal horizons should be computed only when training the model
antoinecarme pushed a commit that referenced this issue May 3, 2020
Temporal horizons should be computed only when training the model
antoinecarme pushed a commit that referenced this issue May 3, 2020
Temporal horizons should be computed only when training the model
antoinecarme pushed a commit that referenced this issue May 3, 2020
Temporal horizons should be computed only when training the model
antoinecarme pushed a commit that referenced this issue May 3, 2020
@antoinecarme
Copy link
Owner Author

thief allows defining some specific categories of temporal aggregates :

https://github.com/robjhyndman/thief/blob/3cf654c53c0448182bd3847fa692ddee0badcfb2/R/tsaggregates.R#L62

 if(m==4L)
  {
    names(y.out)[mout==4L] <- "Annual"
    names(y.out)[mout==2L] <- "Biannual"
    names(y.out)[mout==1L] <- "Quarterly"
  }
  else if(m == 12L)
  {
    names(y.out) <- paste(mout,"-Monthly",sep="")
    names(y.out)[mout==12L] <- "Annual"
    names(y.out)[mout==6L] <- "Biannual"
    names(y.out)[mout==3L] <- "Quarterly"
    names(y.out)[mout==1L] <- "Monthly"
  }
  else if(m == 7L)
  {
    names(y.out)[mout==7L] <- "Weekly"
    names(y.out)[mout==1L] <- "Daily"
  }
  else if(m == 24L | m == 168L | m == 8760L)
  {
    names(y.out) <- paste(mout,"-Hourly",sep="")
    j <- mout%%24L == 0L
    names(y.out)[j] <- paste(mout[j]/24L,"-Daily",sep="")
    j <- mout%%168L == 0L
    names(y.out)[j] <- paste(mout[j]/168L,"-Weekly",sep="")
    j <- mout%%8760L == 0L
    names(y.out)[j] <- paste(mout[j]/8760L,"-Yearly",sep="")
    names(y.out)[mout==8760L] <- "Annual"
    names(y.out)[mout==2190L] <- "Quarterly"
    names(y.out)[mout==168L] <- "Weekly"
    names(y.out)[mout==24L] <- "Daily"
    names(y.out)[mout==1L] <- "Hourly"
  }
  else if(m == 48L | m == 336L | m == 17520L)
  {
    j <- mout%%2L == 0L
    names(y.out)[j] <- paste(mout[j]/2L,"-Hourly",sep="")
    j <- mout%%48L == 0L
    names(y.out)[j] <- paste(mout[j]/48L,"-Daily",sep="")
    j <- mout%%336L == 0L
    names(y.out)[j] <- paste(mout[j]/336L,"-Weekly",sep="")
    j <- mout%%17520L == 0L
    names(y.out)[j] <- paste(mout[j]/17520L,"-Yearly",sep="")
    names(y.out)[mout==17520L] <- "Annual"
    names(y.out)[mout==4380L] <- "Quarterly"
    names(y.out)[mout==336L] <- "Weekly"
    names(y.out)[mout==48L] <- "Daily"
    names(y.out)[mout==2L] <- "Hourly"
    names(y.out)[mout==1L] <- "Half-hourly"
  }
  else if(m == 52L)
  {
    names(y.out) <- paste(mout,"-Weekly",sep="")
    names(y.out)[mout==52L] <- "Annual"
    names(y.out)[mout==26L] <- "Biannual"
    names(y.out)[mout==13L] <- "Quarterly"
    names(y.out)[mout==1L] <- "Weekly"
  }

@antoinecarme
Copy link
Owner Author

Pandas allows creating more sophisticated time periods (offsets) and aggregating signals from one time resolution to another (resampling) :

https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html

@antoinecarme
Copy link
Owner Author

Pandas offset aliases :

https://github.com/pandas-dev/pandas/blob/14eda586582513c68f32f0a1f00ecfe8d6c7f8f3/pandas/_libs/tslibs/frequencies.pyx#L72

    # Quarterly frequencies with various fiscal year ends.
    # eg, Q42005 for Q-OCT runs Aug 1, 2005 to Oct 31, 2005
    "Q-DEC": 2000,    # Quarterly - December year end
    "Q-JAN": 2001,    # Quarterly - January year end
    "Q-FEB": 2002,    # Quarterly - February year end
    "Q-MAR": 2003,    # Quarterly - March year end
    "Q-APR": 2004,    # Quarterly - April year end
    "Q-MAY": 2005,    # Quarterly - May year end
    "Q-JUN": 2006,    # Quarterly - June year end
    "Q-JUL": 2007,    # Quarterly - July year end
    "Q-AUG": 2008,    # Quarterly - August year end
    "Q-SEP": 2009,    # Quarterly - September year end
    "Q-OCT": 2010,    # Quarterly - October year end
    "Q-NOV": 2011,    # Quarterly - November year end

    "M": 3000,        # Monthly

    "W-SUN": 4000,    # Weekly - Sunday end of week
    "W-MON": 4001,    # Weekly - Monday end of week
    "W-TUE": 4002,    # Weekly - Tuesday end of week
    "W-WED": 4003,    # Weekly - Wednesday end of week
    "W-THU": 4004,    # Weekly - Thursday end of week
    "W-FRI": 4005,    # Weekly - Friday end of week
    "W-SAT": 4006,    # Weekly - Saturday end of week

    "B": 5000,        # Business days
    "D": 6000,        # Daily
    "H": 7000,        # Hourly
    "T": 8000,        # Minutely
    "S": 9000,        # Secondly
    "L": 10000,       # Millisecondly
    "U": 11000,       # Microsecondly
    "N": 12000}       # Nanosecondly

@antoinecarme
Copy link
Owner Author

antoinecarme commented May 4, 2020

Pandas allows also more complex period specification ("6H" stands for a 6 hours period).

@antoinecarme
Copy link
Owner Author

Pyaf hierarchical forecasting will be designed to allow pandas-friendly time hierarchies like :

Sample tests scripts : https://github.com/antoinecarme/pyaf/tree/Temporal_Hierarchy/tests/temporal_hierarchy

test_temporal_demo_1.py =>      PERIODS = ["D" , "W" , "Q"]
test_temporal_demo_daily_D_W_2W.py =>   PERIODS = ["D" , "W" , "2W"]
test_temporal_demo_daily_D_W_2W_Q.py =>         PERIODS = ["D" , "W" , "2W" , "Q" ]
test_temporal_demo_daily_D_W_M.py =>    PERIODS = ["D" , "W" , "M"]
test_temporal_demo_daily_D_W_M_Q.py =>  PERIODS = ["D" , "W" , "M" , "Q"]
test_temporal_demo_daily_D_W_Q.py =>    PERIODS = ["D" , "W" , "Q"]
test_temporal_demo_hourly_H_6H_12H_D.py =>      PERIODS = ["H" , "6H" , "12H", "D"]
test_temporal_demo_hourly_H_6H_12H_D_W.py =>    PERIODS = ["H" , "6H" , "12H" , "D" , "W"]
test_temporal_demo_hourly_H_D.py =>     PERIODS = ["H" , "D"]
test_temporal_demo_minutely_T_10T_30T_H.py =>   PERIODS = ["T" , "10T", "30T", "H"]
test_temporal_demo_minutely_T_H_12H_D.py =>     PERIODS = ["T" , "H", "12H" , "D"]
test_temporal_demo_minutely_T_H.py =>   PERIODS = ["T" , "H"]
test_temporal_demo_monthly_M_2M_6M_12M.py =>    PERIODS = ["M" , "2M" , "6M" , "12M"]
test_temporal_demo_monthly_M_2M_6M.py =>        PERIODS = ["M" , "2M" , "6M"]
test_temporal_demo_monthly_M_Q_A.py =>  PERIODS = ["M" , "Q" , "A"]
test_temporal_demo_weekly_W_2W_M_Q.py =>        PERIODS = ["W" , "2W", "M", "Q"]
test_temporal_demo_weekly_W_Q_A.py =>   PERIODS = ["W" , "Q" , "A"]

@antoinecarme
Copy link
Owner Author

First jupyter notebook describing the GOOG stock forecsting in a hierarchical manner :

lHierarchy['Periods']= ["D", "W" , "2W" , "M"]

Daily, Weekly, bi-weekly and monthly signals are analyzed.

https://github.com/antoinecarme/pyaf/blob/Temporal_Hierarchy/notebooks_sandbox/temporal_hierarchy/Temporal_Hierarchy_prototyping_GOOG.ipynb

@antoinecarme
Copy link
Owner Author

Another jupyter notebook for a hourly (fake) time series (based on ozone) :

lHierarchy['Periods']= ["H", "6H" , "12H" , "D"]

Every hour, 6 hours, 12 hours and daily signals.

https://github.com/antoinecarme/pyaf/blob/Temporal_Hierarchy/notebooks_sandbox/temporal_hierarchy/Temporal_Hierarchy_prototyping_ozone_hourly.ipynb

@antoinecarme antoinecarme self-assigned this May 5, 2020
@antoinecarme antoinecarme changed the title Add a prototyping document for temporal hierarchical forecasting Add temporal hierarchical forecasting May 7, 2020
@antoinecarme
Copy link
Owner Author

Three types of hierarchical forecasting are now available ("Grouped" , "Temporal" and "anything_else") :

def create_signal_hierarchy(self , iInputDS, iTime, iSignal, iHorizon, iHierarchy, iExogenousData = None):

    def create_signal_hierarchy(self , iInputDS, iTime, iSignal, iHorizon, iHierarchy, iExogenousData = None):
        lSignalHierarchy = None;
        if(iHierarchy['Type'] == "Grouped"):
            from .TS import Signal_Grouping as siggroup
            lSignalHierarchy = siggroup.cSignalGrouping();
        elif(iHierarchy['Type'] == "Temporal"):
            from .TS import Temporal_Hierarchy as temphier
            lSignalHierarchy = temphier.cTemporalHierarchy();
        else:
            from .TS import SignalHierarchy as sighier
            lSignalHierarchy = sighier.cSignalHierarchy();

@antoinecarme
Copy link
Owner Author

Closing

@antoinecarme
Copy link
Owner Author

Final fixes before 2.0

Added some trivial checks and their error messages :

  1. When time is not physical (integer and real series are not allowed as time columns)
  2. When time resolution is too low (cannot ask for hours in a daily dataset)
  3. When the hierarchy is not increasing ( ['6H' , 'H'] and ['D' , 'H' , 'W'] are not valid specifications)

Added one test for each case.

antoinecarme pushed a commit that referenced this issue Jun 8, 2020
Added three error messages
antoinecarme pushed a commit that referenced this issue Jun 8, 2020
Added some tests for three new error messages
antoinecarme pushed a commit that referenced this issue Jun 8, 2020
antoinecarme pushed a commit that referenced this issue Jun 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant