Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to ml-stars #418

Merged
merged 10 commits into from
May 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 39 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,22 +18,20 @@

A machine learning library for unsupervised time series anomaly detection.

| Important Links | |
| ----------------------------------- | -------------------------------------------------------------------- |
| :computer: **[Website]** | Check out the Sintel Website for more information about the project. |
| :book: **[Documentation]** | Quickstarts, User and Development Guides, and API Reference. |
| :star: **[Tutorials]** | Checkout our notebooks |
| :octocat: **[Repository]** | The link to the Github Repository of this library. |
| :scroll: **[License]** | The repository is published under the MIT License. |
| :keyboard: **[Development Status]** | This software is in its Pre-Alpha stage. |
| Important Links | |
| --------------------------------------------- | -------------------------------------------------------------------- |
| :computer: **[Website]** | Check out the Sintel Website for more information about the project. |
| :book: **[Documentation]** | Quickstarts, User and Development Guides, and API Reference. |
| :star: **[Tutorials]** | Checkout our notebooks |
| :octocat: **[Repository]** | The link to the Github Repository of this library. |
| :scroll: **[License]** | The repository is published under the MIT License. |
| [![][Slack Logo] **Community**][Community] | Join our Slack Workspace for announcements and discussions. |

[Website]: https://sintel.dev/
[Documentation]: https://sintel-dev.github.io/Orion
[Tutorials]: https://github.com/sintel-dev/Orion/tree/master/tutorials
[Repository]: https://github.com/sintel-dev/Orion
[License]: https://github.com/sintel-dev/Orion/blob/master/LICENSE
[Development Status]: https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha
[Community]: https://join.slack.com/t/sintel-space/shared_invite/zt-q147oimb-4HcphcxPfDAM0O9_4PaUtw
[Slack Logo]: https://github.com/sintel-dev/Orion/blob/master/docs/images/slack.png

Expand Down Expand Up @@ -87,20 +85,20 @@ which should show a signal with `timestamp` and `value`.
4 1222905600 -0.370746
```

In this example we use `lstm_dynamic_threshold` pipeline and set some hyperparameters (in this case training epochs as 5).
In this example we use `aer` pipeline and set some hyperparameters (in this case training epochs as 5).

```python3
from orion import Orion

hyperparameters = {
'keras.Sequential.LSTMTimeSeriesRegressor#1': {
'orion.primitives.aer.AER#1': {
'epochs': 5,
'verbose': True
}
}

orion = Orion(
pipeline='lstm_dynamic_threshold',
pipeline='aer',
hyperparameters=hyperparameters
)

Expand Down Expand Up @@ -136,8 +134,8 @@ We run the benchmark on **11** datasets with their known grounth truth. We recor
| LSTM Autoencoder | 6 |
| Dense Autoencoder | 6 |
| VAE | 7 |
| GANF | 6 |
| Azure | 0 |
| [GANF](https://arxiv.org/pdf/2202.07857.pdf) | 6 |
| [Azure](https://azure.microsoft.com/en-us/products/cognitive-services/anomaly-detector/) | 0 |


You can find the scores of each pipeline on every signal recorded in the [details Google Sheets document](https://docs.google.com/spreadsheets/d/1HaYDjY-BEXEObbi65fwG0om5d8kbRarhpK4mvOZVmqU/edit?usp=sharing). The summarized results can also be browsed in the following [summary Google Sheets document](https://docs.google.com/spreadsheets/d/1ZPUwYH8LhDovVeuJhKYGXYny7472HXVCzhX6D6PObmg/edit?usp=sharing).
Expand All @@ -151,24 +149,22 @@ Additional resources that might be of interest:

# Citation

If you use **Orion** which is part of the **Sintel** ecosystem for your research, please consider citing the following paper:
If you use **AER** for your research, please consider citing the following paper:

Lawrence Wong, Dongyu Liu, Laure Berti-Equille, Sarah Alnegheimish, Kalyan Veeramachaneni. [AER: Auto-Encoder with Regression for Time Series Anomaly Detection](https://arxiv.org/pdf/2212.13558.pdf).

Sarah Alnegheimish, Dongyu Liu, Carles Sala, Laure Berti-Equille, Kalyan Veeramachaneni. [Sintel: A Machine Learning Framework to Extract Insights from Signals](https://dl.acm.org/doi/pdf/10.1145/3514221.3517910).
```
@inproceedings{alnegheimish2022sintel,
title={Sintel: A Machine Learning Framework to Extract Insights from Signals},
author={Alnegheimish, Sarah and Liu, Dongyu and Sala, Carles and Berti-Equille, Laure and Veeramachaneni, Kalyan},
booktitle={Proceedings of the 2022 International Conference on Management of Data},
pages = {1855–1865},
numpages = {11},
publisher={Association for Computing Machinery},
doi = {10.1145/3514221.3517910},
series = {SIGMOD '22},
@inproceedings{wong2022aer,
title={AER: Auto-Encoder with Regression for Time Series Anomaly Detection},
author={Wong, Lawrence and Liu, Dongyu and Berti-Equille, Laure and Alnegheimish, Sarah and Veeramachaneni, Kalyan},
booktitle={2022 IEEE International Conference on Big Data (IEEE BigData)},
pages={1152-1161},
doi={10.1109/BigData55660.2022.10020857},
organization={IEEE},
year={2022}
}
```


If you use **TadGAN** for your research, please consider citing the following paper:

Alexander Geiger, Dongyu Liu, Sarah Alnegheimish, Alfredo Cuesta-Infante, Kalyan Veeramachaneni. [TadGAN - Time Series Anomaly Detection Using Generative Adversarial Networks](https://arxiv.org/pdf/2009.07769v3.pdf).
Expand All @@ -184,3 +180,20 @@ Alexander Geiger, Dongyu Liu, Sarah Alnegheimish, Alfredo Cuesta-Infante, Kalyan
year={2020}
}
```

If you use **Orion** which is part of the **Sintel** ecosystem for your research, please consider citing the following paper:

Sarah Alnegheimish, Dongyu Liu, Carles Sala, Laure Berti-Equille, Kalyan Veeramachaneni. [Sintel: A Machine Learning Framework to Extract Insights from Signals](https://dl.acm.org/doi/pdf/10.1145/3514221.3517910).
```
@inproceedings{alnegheimish2022sintel,
title={Sintel: A Machine Learning Framework to Extract Insights from Signals},
author={Alnegheimish, Sarah and Liu, Dongyu and Sala, Carles and Berti-Equille, Laure and Veeramachaneni, Kalyan},
booktitle={Proceedings of the 2022 International Conference on Management of Data},
pages={1855–1865},
numpages={11},
publisher={Association for Computing Machinery},
doi={10.1145/3514221.3517910},
series={SIGMOD '22},
year={2022}
}
```
4 changes: 2 additions & 2 deletions docs/user_guides/primitives_pipelines/pipelines.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ The main component in the Orion project are the **Orion Pipelines**, which consi

As ``MLPipeline`` instances, **Orion Pipelines**:

* consist of a list of one or more `MLPrimitives <https://mlbazaar.github.io/MLPrimitives/>`__
* consist of a list of one or more `mlstars <https://sintel-dev.github.io/ml-stars/>`__
* can be *fitted* on some data and later on used to *predict* anomalies on more data
* can be *scored* by comparing their predictions with some known anomalies
* have *hyperparameters* that can be *tuned* to improve their anomaly detection performance
Expand Down Expand Up @@ -153,7 +153,7 @@ Since pipelines are composed of :ref:`primitives`, you can discover the interpre
"value": np.random.randint(0, 10, 500)})

hyperparameters = {
"mlprimitives.custom.timeseries_preprocessing.time_segments_aggregate#1": {
"mlstars.custom.timeseries_preprocessing.time_segments_aggregate#1": {
"interval": 300
},
'keras.Sequential.LSTMTimeSeriesRegressor#1': {
Expand Down
2 changes: 1 addition & 1 deletion docs/user_guides/primitives_pipelines/primitives.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
Primitives
==========

Primitives are data processing units. They are defined by the code that performs the actual processing and an annotated ``json`` file. To read more about primitives and their composition, visit `MLPrimitives <https://mlbazaar.github.io/MLPrimitives/>`__.
Primitives are data processing units. They are defined by the code that performs the actual processing and an annotated ``json`` file. To read more about primitives and their composition, visit `mlstars <https://sintel-dev.github.io/ml-stars/>`__.

Preprocessing
-------------
Expand Down
2 changes: 1 addition & 1 deletion docs/user_guides/primitives_pipelines/primitives/AER.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ argument type description
:okwarning:

import numpy as np
from mlprimitives import load_primitive
from mlstars import load_primitive

X = np.ones((64, 100, 1))
y = X[:,:, [0]] # signal to reconstruct from X (channel 0)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ argument type description
:okwarning:

import numpy as np
from mlprimitives import load_primitive
from mlstars import load_primitive

X = np.array([1] * 100).reshape(1, -1, 1)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ argument type description
:okwarning:

import numpy as np
from mlprimitives import load_primitive
from mlstars import load_primitive

X = np.array([1] * 100).reshape(1, -1, 1)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ LSTM

**description**: this is a prediction model with double stacked LSTM layers used as a time series regressor. you can read more about it in the `related paper <https://arxiv.org/pdf/1802.04431.pdf>`__.

see `json <https://github.com/MLBazaar/MLPrimitives/blob/master/mlprimitives/primitives/keras.Sequential.LSTMTimeSeriesRegressor.json>`__.
see `json <https://github.com/MLBazaar/mlstars/blob/master/mlstars/primitives/keras.Sequential.LSTMTimeSeriesRegressor.json>`__.

====================== =================== ===========================================================================================================================================
argument type description
Expand Down Expand Up @@ -48,7 +48,7 @@ argument type description
:okwarning:

import numpy as np
from mlprimitives import load_primitive
from mlstars import load_primitive

X = np.array([1] * 100).reshape(1, -1, 1)
y = np.array([[1]])
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ MinMaxScaler

**description**: this primitive transforms features by scaling each feature to a given range.

see `json <https://github.com/MLBazaar/MLPrimitives/blob/master/mlprimitives/primitives/sklearn.preprocessing.MinMaxScaler.json>`__.
see `json <https://github.com/MLBazaar/mlstars/blob/master/mlstars/primitives/sklearn.preprocessing.MinMaxScaler.json>`__.

==================== =================== =============================================================================================================
argument type description
Expand All @@ -33,7 +33,7 @@ argument type description
:okwarning:
import numpy as np
from mlprimitives import load_primitive
from mlstars import load_primitive
X = np.array(range(5)).reshape(-1, 1)
primitive = load_primitive('sklearn.preprocessing.MinMaxScaler',
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ SimpleImputer

**description**: this primitive is an imputation transformer for filling missing values.

see `json <https://github.com/MLBazaar/MLPrimitives/blob/master/mlprimitives/primitives/sklearn.impute.SimpleImputer.json>`__.
see `json <https://github.com/MLBazaar/mlstars/blob/master/mlstars/primitives/sklearn.impute.SimpleImputer.json>`__.

==================== ========================================================= ==========================================
argument type description
Expand Down Expand Up @@ -35,7 +35,7 @@ argument type
:okwarning:

import numpy as np
from mlprimitives import load_primitive
from mlstars import load_primitive

X = np.array([1] * 4 + [np.nan]).reshape(-1, 1)
primitive = load_primitive('sklearn.impute.SimpleImputer',
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ argument type description
:okwarning:

import numpy as np
from mlprimitives import load_primitive
from mlstars import load_primitive

X = np.array([1] * 100).reshape(1, -1, 1)
y = X[:,:, [0]] # signal to reconstruct from X (channel 0)
Expand Down
2 changes: 1 addition & 1 deletion docs/user_guides/primitives_pipelines/primitives/VAE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ argument type description
:okwarning:

import numpy as np
from mlprimitives import load_primitive
from mlstars import load_primitive

X = np.array([1] * 100).reshape(1, -1, 1)

Expand Down
4 changes: 2 additions & 2 deletions docs/user_guides/primitives_pipelines/primitives/arima.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ ARIMA

**description**: this is an Autoregressive Integrated Moving Average (ARIMA) prediction model.

see `json <https://github.com/MLBazaar/MLPrimitives/blob/master/mlprimitives/primitives/statsmodels.tsa.arima_model.Arima.json>`__.
see `json <https://github.com/MLBazaar/mlstars/blob/master/mlstars/primitives/statsmodels.tsa.arima_model.Arima.json>`__.

==================== =================== ==================================================================
argument type description
Expand Down Expand Up @@ -35,7 +35,7 @@ argument type description
:okwarning:

import numpy as np
from mlprimitives import load_primitive
from mlstars import load_primitive

X = np.array(range(100)).reshape(-1, 1)
primitive = load_primitive('statsmodels.tsa.arima_model.Arima',
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ argument type
:okwarning:

import numpy as np
from mlprimitives import load_primitive
from mlstars import load_primitive

X = np.array([1] * 4 + [np.nan]).reshape(-1, 1)
primitive = load_primitive('orion.primitives.timeseries_preprocessing.fillna',
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ argument type description
:okwarning:

import numpy as np
from mlprimitives import load_primitive
from mlstars import load_primitive

primitive = load_primitive('orion.primitives.timeseries_anomalies.find_anomalies',
arguments={"anomaly_padding": 1})
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@
intervals to mask
~~~~~~~~~~~~~~~~~

**path**: ``mlprimitives.custom.timeseries_preprocessing.intervals_to_mask``
**path**: ``mlstars.custom.timeseries_preprocessing.intervals_to_mask``

**description**: this primitive creates boolean mask from given intervals.

see `json <https://github.com/MLBazaar/MLPrimitives/blob/master/mlprimitives/primitives/mlprimitives.custom.timeseries_preprocessing.intervals_to_mask.json>`__.
see `json <https://github.com/MLBazaar/mlstars/blob/master/mlstars/primitives/mlstars.custom.timeseries_preprocessing.intervals_to_mask.json>`__.

==================== =============================== =================================================================================================================================
argument type description
Expand All @@ -28,9 +28,9 @@ argument type description
:okwarning:

import numpy as np
from mlprimitives import load_primitive
from mlstars import load_primitive

primitive = load_primitive('mlprimitives.custom.timeseries_preprocessing.intervals_to_mask')
primitive = load_primitive('mlstars.custom.timeseries_preprocessing.intervals_to_mask')

index = np.array(range(10))
intervals = [(1, 3), (7, 7)]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ argument type description
:okwarning:

import numpy as np
from mlprimitives import load_primitive
from mlstars import load_primitive

primitive = load_primitive('orion.primitives.timeseries_errors.reconstruction_errors')
y = np.array([[1]] * 100)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ argument type description
:okwarning:

import numpy as np
from mlprimitives import load_primitive
from mlstars import load_primitive

primitive = load_primitive('orion.primitives.timeseries_errors.regression_errors')
y = np.array([[1]] * 100)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@
rolling window sequence
~~~~~~~~~~~~~~~~~~~~~~~

**path**: ``mlprimitives.custom.timeseries_preprocessing.rolling_window_sequences``
**path**: ``mlstars.custom.timeseries_preprocessing.rolling_window_sequences``

**description**: this primitive generates many sub-sequences of the original sequence. it uses a rolling window approach to create the sub-sequences out of time series data.

see `json <https://github.com/MLBazaar/MLPrimitives/blob/master/mlprimitives/primitives/mlprimitives.custom.timeseries_preprocessing.rolling_window_sequences.json>`__.
see `json <https://github.com/MLBazaar/mlstars/blob/master/mlstars/primitives/mlstars.custom.timeseries_preprocessing.rolling_window_sequences.json>`__.

==================== ============================================================== ==================================================================
argument type description
Expand Down Expand Up @@ -41,9 +41,9 @@ see `json <https://github.com/MLBazaar/MLPrimitives/blob/master/mlprimitives/pri
:okwarning:

import numpy as np
from mlprimitives import load_primitive
from mlstars import load_primitive

primitive = load_primitive('mlprimitives.custom.timeseries_preprocessing.rolling_window_sequences',
primitive = load_primitive('mlstars.custom.timeseries_preprocessing.rolling_window_sequences',
arguments={"window_size": 10, "target_size": 1, "step_size": 1, "target_column": 0})

X = np.array([1] * 50).reshape(-1, 1)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ argument type description
:okwarning:

import numpy as np
from mlprimitives import load_primitive
from mlstars import load_primitive

primitive = load_primitive('orion.primitives.tadgan.score_anomalies',
arguments={"error_smooth_window": 10, "critic_smooth_window": 10,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@
time segments aggregate
~~~~~~~~~~~~~~~~~~~~~~~

**path**: ``mlprimitives.custom.timeseries_preprocessing.time_segments_aggregate``
**path**: ``mlstars.custom.timeseries_preprocessing.time_segments_aggregate``

**description**: this primitive creates an equi-spaced time series by aggregating values over fixed specified interval.

see `json <https://github.com/MLBazaar/MLPrimitives/blob/master/mlprimitives/primitives/mlprimitives.custom.timeseries_preprocessing.time_segments_aggregate.json>`__.
see `json <https://github.com/MLBazaar/mlstars/blob/master/mlstars/primitives/mlstars.custom.timeseries_preprocessing.time_segments_aggregate.json>`__.

==================== =========================================== =============================================================================================================================
argument type description
Expand All @@ -28,9 +28,9 @@ argument type description
.. ipython:: python
:okwarning:

from mlprimitives import load_primitive
from mlstars import load_primitive

primitive = load_primitive('mlprimitives.custom.timeseries_preprocessing.time_segments_aggregate',
primitive = load_primitive('mlstars.custom.timeseries_preprocessing.time_segments_aggregate',
arguments={"time_column": "timestamp", "interval":10, "method":'mean'})

df = pd.DataFrame({
Expand Down
2 changes: 1 addition & 1 deletion orion/benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@

LOGGER = logging.getLogger(__name__)

BUCKET = 'd3-ai-orion'
BUCKET = 'sintel-orion'
S3_URL = 'https://{}.s3.amazonaws.com/{}'

BENCHMARK_PATH = os.path.join(os.path.join(
Expand Down
Loading