Migrate to ml-stars (#418)

* update readme * change mlprimitives to mlstars * change s3 bucket * fix mlstars import * add tqdm to requirements * fix imports * change tutorials to mlstars * update ml-stars * fix order * update mlstars
sintel-dev · May 19, 2023 · 601b1d4 · 601b1d4
1 parent a55f17b
commit 601b1d4
Show file tree

Hide file tree

Showing 141 changed files with 293 additions and 287 deletions.
diff --git a/README.md b/README.md
@@ -18,22 +18,20 @@
 
 A machine learning library for unsupervised time series anomaly detection.
 
-| Important Links                     |                                                                      |
-| ----------------------------------- | -------------------------------------------------------------------- |
-| :computer: **[Website]**            | Check out the Sintel Website for more information about the project. |
-| :book: **[Documentation]**          | Quickstarts, User and Development Guides, and API Reference.         |
-| :star: **[Tutorials]**              | Checkout our notebooks                                               |
-| :octocat: **[Repository]**          | The link to the Github Repository of this library.                   |
-| :scroll: **[License]**              | The repository is published under the MIT License.                   |
-| :keyboard: **[Development Status]** | This software is in its Pre-Alpha stage.                             |
+| Important Links                               |                                                                      |
+| --------------------------------------------- | -------------------------------------------------------------------- |
+| :computer: **[Website]**                      | Check out the Sintel Website for more information about the project. |
+| :book: **[Documentation]**                    | Quickstarts, User and Development Guides, and API Reference.         |
+| :star: **[Tutorials]**                        | Checkout our notebooks                                               |
+| :octocat: **[Repository]**                    | The link to the Github Repository of this library.                   |
+| :scroll: **[License]**                        | The repository is published under the MIT License.                   |
 | [![][Slack Logo] **Community**][Community]    | Join our Slack Workspace for announcements and discussions.          |
 
 [Website]: https://sintel.dev/
 [Documentation]: https://sintel-dev.github.io/Orion
 [Tutorials]: https://github.com/sintel-dev/Orion/tree/master/tutorials
 [Repository]: https://github.com/sintel-dev/Orion
 [License]: https://github.com/sintel-dev/Orion/blob/master/LICENSE
-[Development Status]: https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha
 [Community]: https://join.slack.com/t/sintel-space/shared_invite/zt-q147oimb-4HcphcxPfDAM0O9_4PaUtw
 [Slack Logo]: https://github.com/sintel-dev/Orion/blob/master/docs/images/slack.png
 
@@ -87,20 +85,20 @@ which should show a signal with `timestamp` and `value`.
 4  1222905600 -0.370746
 ```
 
-In this example we use `lstm_dynamic_threshold` pipeline and set some hyperparameters (in this case training epochs as 5).
+In this example we use `aer` pipeline and set some hyperparameters (in this case training epochs as 5).
 
 ```python3
 from orion import Orion
 
 hyperparameters = {
-    'keras.Sequential.LSTMTimeSeriesRegressor#1': {
+    'orion.primitives.aer.AER#1': {
         'epochs': 5,
         'verbose': True
     }
 }
 
 orion = Orion(
-    pipeline='lstm_dynamic_threshold',
+    pipeline='aer',
     hyperparameters=hyperparameters
 )
 
@@ -136,8 +134,8 @@ We run the benchmark on **11** datasets with their known grounth truth. We recor
 | LSTM Autoencoder          |          6         |
 | Dense Autoencoder         |          6         |
 | VAE                       |          7         |
-| GANF                      |          6         |
-| Azure                     |          0         |
+| [GANF](https://arxiv.org/pdf/2202.07857.pdf)                                                  |          6         |
+| [Azure](https://azure.microsoft.com/en-us/products/cognitive-services/anomaly-detector/)      |          0         |
 
 
 You can find the scores of each pipeline on every signal recorded in the [details Google Sheets document](https://docs.google.com/spreadsheets/d/1HaYDjY-BEXEObbi65fwG0om5d8kbRarhpK4mvOZVmqU/edit?usp=sharing). The summarized results can also be browsed in the following [summary Google Sheets document](https://docs.google.com/spreadsheets/d/1ZPUwYH8LhDovVeuJhKYGXYny7472HXVCzhX6D6PObmg/edit?usp=sharing).
@@ -151,24 +149,22 @@ Additional resources that might be of interest:
 
 # Citation
 
-If you use **Orion** which is part of the **Sintel** ecosystem for your research, please consider citing the following paper:
+If you use **AER** for your research, please consider citing the following paper:
+
+Lawrence Wong, Dongyu Liu, Laure Berti-Equille, Sarah Alnegheimish, Kalyan Veeramachaneni. [AER: Auto-Encoder with Regression for Time Series Anomaly Detection](https://arxiv.org/pdf/2212.13558.pdf).
 
-Sarah Alnegheimish, Dongyu Liu, Carles Sala, Laure Berti-Equille, Kalyan Veeramachaneni. [Sintel: A Machine Learning Framework to Extract Insights from Signals](https://dl.acm.org/doi/pdf/10.1145/3514221.3517910).
 ```
-@inproceedings{alnegheimish2022sintel,
-  title={Sintel: A Machine Learning Framework to Extract Insights from Signals},
-  author={Alnegheimish, Sarah and Liu, Dongyu and Sala, Carles and Berti-Equille, Laure and Veeramachaneni, Kalyan},  
-  booktitle={Proceedings of the 2022 International Conference on Management of Data},
-  pages = {1855–1865},
-  numpages = {11},
-  publisher={Association for Computing Machinery},
-  doi = {10.1145/3514221.3517910},
-  series = {SIGMOD '22},
+@inproceedings{wong2022aer,
+  title={AER: Auto-Encoder with Regression for Time Series Anomaly Detection},
+  author={Wong, Lawrence and Liu, Dongyu and Berti-Equille, Laure and Alnegheimish, Sarah and Veeramachaneni, Kalyan},
+  booktitle={2022 IEEE International Conference on Big Data (IEEE BigData)},
+  pages={1152-1161},
+  doi={10.1109/BigData55660.2022.10020857},
+  organization={IEEE},
   year={2022}
 }
 ```
 
-
 If you use **TadGAN** for your research, please consider citing the following paper:
 
 Alexander Geiger, Dongyu Liu, Sarah Alnegheimish, Alfredo Cuesta-Infante, Kalyan Veeramachaneni. [TadGAN - Time Series Anomaly Detection Using Generative Adversarial Networks](https://arxiv.org/pdf/2009.07769v3.pdf).
@@ -184,3 +180,20 @@ Alexander Geiger, Dongyu Liu, Sarah Alnegheimish, Alfredo Cuesta-Infante, Kalyan
   year={2020}
 }
 ```
+
+If you use **Orion** which is part of the **Sintel** ecosystem for your research, please consider citing the following paper:
+
+Sarah Alnegheimish, Dongyu Liu, Carles Sala, Laure Berti-Equille, Kalyan Veeramachaneni. [Sintel: A Machine Learning Framework to Extract Insights from Signals](https://dl.acm.org/doi/pdf/10.1145/3514221.3517910).
+```
+@inproceedings{alnegheimish2022sintel,
+  title={Sintel: A Machine Learning Framework to Extract Insights from Signals},
+  author={Alnegheimish, Sarah and Liu, Dongyu and Sala, Carles and Berti-Equille, Laure and Veeramachaneni, Kalyan},  
+  booktitle={Proceedings of the 2022 International Conference on Management of Data},
+  pages={1855–1865},
+  numpages={11},
+  publisher={Association for Computing Machinery},
+  doi={10.1145/3514221.3517910},
+  series={SIGMOD '22},
+  year={2022}
+}
+```
diff --git a/docs/user_guides/primitives_pipelines/pipelines.rst b/docs/user_guides/primitives_pipelines/pipelines.rst
@@ -8,7 +8,7 @@ The main component in the Orion project are the **Orion Pipelines**, which consi
 
 As ``MLPipeline`` instances, **Orion Pipelines**:
 
-* consist of a list of one or more `MLPrimitives <https://mlbazaar.github.io/MLPrimitives/>`__
+* consist of a list of one or more `mlstars <https://sintel-dev.github.io/ml-stars/>`__
 * can be *fitted* on some data and later on used to *predict* anomalies on more data
 * can be *scored* by comparing their predictions with some known anomalies
 * have *hyperparameters* that can be *tuned* to improve their anomaly detection performance
@@ -153,7 +153,7 @@ Since pipelines are composed of :ref:`primitives`, you can discover the interpre
                                 "value": np.random.randint(0, 10, 500)})
 
     hyperparameters = {
-        "mlprimitives.custom.timeseries_preprocessing.time_segments_aggregate#1": {
+        "mlstars.custom.timeseries_preprocessing.time_segments_aggregate#1": {
             "interval": 300
         },
         'keras.Sequential.LSTMTimeSeriesRegressor#1': {

diff --git a/docs/user_guides/primitives_pipelines/primitives.rst b/docs/user_guides/primitives_pipelines/primitives.rst
@@ -4,7 +4,7 @@
 Primitives
 ==========
 
-Primitives are data processing units. They are defined by the code that performs the actual processing and an annotated ``json`` file. To read more about primitives and their composition, visit `MLPrimitives <https://mlbazaar.github.io/MLPrimitives/>`__.
+Primitives are data processing units. They are defined by the code that performs the actual processing and an annotated ``json`` file. To read more about primitives and their composition, visit `mlstars <https://sintel-dev.github.io/ml-stars/>`__.
 
 Preprocessing
 -------------

diff --git a/docs/user_guides/primitives_pipelines/primitives/AER.rst b/docs/user_guides/primitives_pipelines/primitives/AER.rst
@@ -40,7 +40,7 @@ argument                    type                description
     :okwarning:
 
     import numpy as np
-    from mlprimitives import load_primitive
+    from mlstars import load_primitive
 
     X = np.ones((64, 100, 1))
     y = X[:,:, [0]] # signal to reconstruct from X (channel 0)

diff --git a/docs/user_guides/primitives_pipelines/primitives/DenseSeq2Seq.rst b/docs/user_guides/primitives_pipelines/primitives/DenseSeq2Seq.rst
@@ -47,7 +47,7 @@ argument                type                description
     :okwarning:
 
     import numpy as np
-    from mlprimitives import load_primitive
+    from mlstars import load_primitive
 
     X = np.array([1] * 100).reshape(1, -1, 1)
 

diff --git a/docs/user_guides/primitives_pipelines/primitives/LSTMSeq2Seq.rst b/docs/user_guides/primitives_pipelines/primitives/LSTMSeq2Seq.rst
@@ -49,7 +49,7 @@ argument                type                description
     :okwarning:
 
     import numpy as np
-    from mlprimitives import load_primitive
+    from mlstars import load_primitive
 
     X = np.array([1] * 100).reshape(1, -1, 1)
 

diff --git a/docs/user_guides/primitives_pipelines/primitives/LSTMTimeSeriesRegressor.rst b/docs/user_guides/primitives_pipelines/primitives/LSTMTimeSeriesRegressor.rst
@@ -7,7 +7,7 @@ LSTM
 
 **description**: this is a prediction model with double stacked LSTM layers used as a time series regressor. you can read more about it in the `related paper <https://arxiv.org/pdf/1802.04431.pdf>`__.
 
-see `json <https://github.com/MLBazaar/MLPrimitives/blob/master/mlprimitives/primitives/keras.Sequential.LSTMTimeSeriesRegressor.json>`__.
+see `json <https://github.com/MLBazaar/mlstars/blob/master/mlstars/primitives/keras.Sequential.LSTMTimeSeriesRegressor.json>`__.
 
 ====================== =================== ===========================================================================================================================================
 argument                type                description  
@@ -48,7 +48,7 @@ argument                type                description
     :okwarning:
 
     import numpy as np
-    from mlprimitives import load_primitive
+    from mlstars import load_primitive
 
     X = np.array([1] * 100).reshape(1, -1, 1)
     y = np.array([[1]])

diff --git a/docs/user_guides/primitives_pipelines/primitives/MinMaxScaler.rst b/docs/user_guides/primitives_pipelines/primitives/MinMaxScaler.rst
@@ -7,7 +7,7 @@ MinMaxScaler
 
 **description**: this primitive transforms features by scaling each feature to a given range.
 
-see `json <https://github.com/MLBazaar/MLPrimitives/blob/master/mlprimitives/primitives/sklearn.preprocessing.MinMaxScaler.json>`__.
+see `json <https://github.com/MLBazaar/mlstars/blob/master/mlstars/primitives/sklearn.preprocessing.MinMaxScaler.json>`__.
 
 ==================== =================== =============================================================================================================
 argument              type                description  
@@ -33,7 +33,7 @@ argument              type                description
     :okwarning:
 
     import numpy as np
-    from mlprimitives import load_primitive
+    from mlstars import load_primitive
 
     X = np.array(range(5)).reshape(-1, 1)
     primitive = load_primitive('sklearn.preprocessing.MinMaxScaler', 

diff --git a/docs/user_guides/primitives_pipelines/primitives/SimpleImputer.rst b/docs/user_guides/primitives_pipelines/primitives/SimpleImputer.rst
@@ -7,7 +7,7 @@ SimpleImputer
 
 **description**: this primitive is an imputation transformer for filling missing values.
 
-see `json <https://github.com/MLBazaar/MLPrimitives/blob/master/mlprimitives/primitives/sklearn.impute.SimpleImputer.json>`__.
+see `json <https://github.com/MLBazaar/mlstars/blob/master/mlstars/primitives/sklearn.impute.SimpleImputer.json>`__.
 
 ==================== ========================================================= ==========================================
 argument              type                                                      description  
@@ -35,7 +35,7 @@ argument              type
     :okwarning:
 
     import numpy as np
-    from mlprimitives import load_primitive
+    from mlstars import load_primitive
 
     X = np.array([1] * 4 + [np.nan]).reshape(-1, 1)
     primitive = load_primitive('sklearn.impute.SimpleImputer', 

diff --git a/docs/user_guides/primitives_pipelines/primitives/TadGAN.rst b/docs/user_guides/primitives_pipelines/primitives/TadGAN.rst
@@ -44,7 +44,7 @@ argument                    type                description
     :okwarning:
 
     import numpy as np
-    from mlprimitives import load_primitive
+    from mlstars import load_primitive
 
     X = np.array([1] * 100).reshape(1, -1, 1)
     y = X[:,:, [0]] # signal to reconstruct from X (channel 0)

diff --git a/docs/user_guides/primitives_pipelines/primitives/VAE.rst b/docs/user_guides/primitives_pipelines/primitives/VAE.rst
@@ -46,7 +46,7 @@ argument                    type                description
     :okwarning:
 
     import numpy as np
-    from mlprimitives import load_primitive
+    from mlstars import load_primitive
 
     X = np.array([1] * 100).reshape(1, -1, 1)
 

diff --git a/docs/user_guides/primitives_pipelines/primitives/arima.rst b/docs/user_guides/primitives_pipelines/primitives/arima.rst
@@ -7,7 +7,7 @@ ARIMA
 
 **description**: this is an Autoregressive Integrated Moving Average (ARIMA) prediction model.
 
-see `json <https://github.com/MLBazaar/MLPrimitives/blob/master/mlprimitives/primitives/statsmodels.tsa.arima_model.Arima.json>`__.
+see `json <https://github.com/MLBazaar/mlstars/blob/master/mlstars/primitives/statsmodels.tsa.arima_model.Arima.json>`__.
 
 ==================== =================== ==================================================================
 argument              type                description  
@@ -35,7 +35,7 @@ argument              type                description
     :okwarning:
 
     import numpy as np
-    from mlprimitives import load_primitive
+    from mlstars import load_primitive
 
     X = np.array(range(100)).reshape(-1, 1)
     primitive = load_primitive('statsmodels.tsa.arima_model.Arima', 

diff --git a/docs/user_guides/primitives_pipelines/primitives/fillna.rst b/docs/user_guides/primitives_pipelines/primitives/fillna.rst
@@ -35,7 +35,7 @@ argument            type
     :okwarning:
 
     import numpy as np
-    from mlprimitives import load_primitive
+    from mlstars import load_primitive
 
     X = np.array([1] * 4 + [np.nan]).reshape(-1, 1)
     primitive = load_primitive('orion.primitives.timeseries_preprocessing.fillna', 

diff --git a/docs/user_guides/primitives_pipelines/primitives/find_anomalies.rst b/docs/user_guides/primitives_pipelines/primitives/find_anomalies.rst
@@ -39,7 +39,7 @@ argument                    type                 description
     :okwarning:
 
     import numpy as np
-    from mlprimitives import load_primitive
+    from mlstars import load_primitive
 
     primitive = load_primitive('orion.primitives.timeseries_anomalies.find_anomalies',
         arguments={"anomaly_padding": 1})

diff --git a/docs/user_guides/primitives_pipelines/primitives/intervals_to_mask.rst b/docs/user_guides/primitives_pipelines/primitives/intervals_to_mask.rst
@@ -3,11 +3,11 @@
 intervals to mask
 ~~~~~~~~~~~~~~~~~
 
-**path**: ``mlprimitives.custom.timeseries_preprocessing.intervals_to_mask``
+**path**: ``mlstars.custom.timeseries_preprocessing.intervals_to_mask``
 
 **description**: this primitive creates boolean mask from given intervals.
 
-see `json <https://github.com/MLBazaar/MLPrimitives/blob/master/mlprimitives/primitives/mlprimitives.custom.timeseries_preprocessing.intervals_to_mask.json>`__.
+see `json <https://github.com/MLBazaar/mlstars/blob/master/mlstars/primitives/mlstars.custom.timeseries_preprocessing.intervals_to_mask.json>`__.
 
 ==================== =============================== =================================================================================================================================
 argument              type                            description  
@@ -28,9 +28,9 @@ argument              type                            description
     :okwarning:
 
     import numpy as np
-    from mlprimitives import load_primitive
+    from mlstars import load_primitive
 
-    primitive = load_primitive('mlprimitives.custom.timeseries_preprocessing.intervals_to_mask')
+    primitive = load_primitive('mlstars.custom.timeseries_preprocessing.intervals_to_mask')
 
     index = np.array(range(10))
     intervals = [(1, 3), (7, 7)]

diff --git a/docs/user_guides/primitives_pipelines/primitives/reconstruction_errors.rst b/docs/user_guides/primitives_pipelines/primitives/reconstruction_errors.rst
@@ -38,7 +38,7 @@ argument                    type                description
     :okwarning:
 
     import numpy as np
-    from mlprimitives import load_primitive
+    from mlstars import load_primitive
 
     primitive = load_primitive('orion.primitives.timeseries_errors.reconstruction_errors')
     y = np.array([[1]] * 100)

diff --git a/docs/user_guides/primitives_pipelines/primitives/regression_errors.rst b/docs/user_guides/primitives_pipelines/primitives/regression_errors.rst
@@ -37,7 +37,7 @@ argument                    type                description
     :okwarning:
 
     import numpy as np
-    from mlprimitives import load_primitive
+    from mlstars import load_primitive
 
     primitive = load_primitive('orion.primitives.timeseries_errors.regression_errors')
     y = np.array([[1]] * 100)

diff --git a/docs/user_guides/primitives_pipelines/primitives/rolling_window_sequences.rst b/docs/user_guides/primitives_pipelines/primitives/rolling_window_sequences.rst
@@ -3,11 +3,11 @@
 rolling window sequence
 ~~~~~~~~~~~~~~~~~~~~~~~
 
-**path**: ``mlprimitives.custom.timeseries_preprocessing.rolling_window_sequences``
+**path**: ``mlstars.custom.timeseries_preprocessing.rolling_window_sequences``
 
 **description**: this primitive generates many sub-sequences of the original sequence. it uses a rolling window approach to create the sub-sequences out of time series data.
 
-see `json <https://github.com/MLBazaar/MLPrimitives/blob/master/mlprimitives/primitives/mlprimitives.custom.timeseries_preprocessing.rolling_window_sequences.json>`__.
+see `json <https://github.com/MLBazaar/mlstars/blob/master/mlstars/primitives/mlstars.custom.timeseries_preprocessing.rolling_window_sequences.json>`__.
 
 ==================== ============================================================== ==================================================================
  argument             type                                                           description  
@@ -41,9 +41,9 @@ see `json <https://github.com/MLBazaar/MLPrimitives/blob/master/mlprimitives/pri
     :okwarning:
 
     import numpy as np
-    from mlprimitives import load_primitive
+    from mlstars import load_primitive
 
-    primitive = load_primitive('mlprimitives.custom.timeseries_preprocessing.rolling_window_sequences', 
+    primitive = load_primitive('mlstars.custom.timeseries_preprocessing.rolling_window_sequences', 
         arguments={"window_size": 10, "target_size": 1, "step_size": 1, "target_column": 0})
 
     X = np.array([1] * 50).reshape(-1, 1)

diff --git a/docs/user_guides/primitives_pipelines/primitives/score_anomalies.rst b/docs/user_guides/primitives_pipelines/primitives/score_anomalies.rst
@@ -44,7 +44,7 @@ argument                    type                description
     :okwarning:
 
     import numpy as np
-    from mlprimitives import load_primitive
+    from mlstars import load_primitive
 
     primitive = load_primitive('orion.primitives.tadgan.score_anomalies', 
         arguments={"error_smooth_window": 10, "critic_smooth_window": 10,

diff --git a/docs/user_guides/primitives_pipelines/primitives/time_segments_aggregate.rst b/docs/user_guides/primitives_pipelines/primitives/time_segments_aggregate.rst
@@ -3,11 +3,11 @@
 time segments aggregate
 ~~~~~~~~~~~~~~~~~~~~~~~
 
-**path**: ``mlprimitives.custom.timeseries_preprocessing.time_segments_aggregate``
+**path**: ``mlstars.custom.timeseries_preprocessing.time_segments_aggregate``
 
 **description**: this primitive creates an equi-spaced time series by aggregating values over fixed specified interval.
 
-see `json <https://github.com/MLBazaar/MLPrimitives/blob/master/mlprimitives/primitives/mlprimitives.custom.timeseries_preprocessing.time_segments_aggregate.json>`__.
+see `json <https://github.com/MLBazaar/mlstars/blob/master/mlstars/primitives/mlstars.custom.timeseries_preprocessing.time_segments_aggregate.json>`__.
 
 ==================== =========================================== =============================================================================================================================
 argument              type                                        description  
@@ -28,9 +28,9 @@ argument              type                                        description
 .. ipython:: python
     :okwarning:
 
-    from mlprimitives import load_primitive
+    from mlstars import load_primitive
 
-    primitive = load_primitive('mlprimitives.custom.timeseries_preprocessing.time_segments_aggregate', 
+    primitive = load_primitive('mlstars.custom.timeseries_preprocessing.time_segments_aggregate', 
         arguments={"time_column": "timestamp", "interval":10, "method":'mean'})
 
     df = pd.DataFrame({

diff --git a/orion/benchmark.py b/orion/benchmark.py
@@ -28,7 +28,7 @@
 
 LOGGER = logging.getLogger(__name__)
 
-BUCKET = 'd3-ai-orion'
+BUCKET = 'sintel-orion'
 S3_URL = 'https://{}.s3.amazonaws.com/{}'
 
 BENCHMARK_PATH = os.path.join(os.path.join(