Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code style enforcement #245

Merged
merged 5 commits into from
Oct 18, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
repos:
- repo: https://github.com/pre-commit/mirrors-isort
rev: v4.3.21
hooks:
- id: isort
language_version: python3.7
- repo: https://github.com/ambv/black
rev: stable
hooks:
- id: black
language_version: python3.7
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.3.0
hooks:
- id: flake8
66 changes: 43 additions & 23 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,44 +1,64 @@
# Contributing to automatminer

We love your input! We want to make contributing to automatminer as easy and transparent as possible, whether it's:
* Reporting a bug
* Discussing the current state of the code
* Submitting a fix
* Proposing or implementing new features
* Becoming a maintainer

- Reporting a bug
- Discussing the current state of the code
- Submitting a fix
- Proposing or implementing new features
- Becoming a maintainer

## Reporting bugs, getting help, and discussion

At any time, feel free to start a thread on the automatminer [Discourse forum](https://hackingmaterials.discourse.group/c/matminer/automatminer).

If you are making a bug report, incorporate as many elements of the following as possible to ensure a timely response and avoid the need for followups:
* A quick summary and/or background
* Steps to reproduce - be specific! **Provide sample code.**
* What you expected would happen, compared to what actually happens
* The full stack trace of any errors you encounter
* Notes (possibly including why you think this might be happening, or steps you tried that didn't work)

- A quick summary and/or background
- Steps to reproduce - be specific! **Provide sample code.**
- What you expected would happen, compared to what actually happens
- The full stack trace of any errors you encounter
- Notes (possibly including why you think this might be happening, or steps you tried that didn't work)

We love thorough bug reports as this means the development team can make quick and meaningful fixes. When we confirm your bug report, we'll move it to the GitHub issues where its progress can be further tracked.

## Contributing code modifications or additions through Github
We use github to host code, to track issues and feature requests, as well as accept pull requests.
## Contributing code modifications or additions through GitHub

We use GitHub to host code, to track issues and feature requests, as well as accept pull requests.

Pull requests are the best way to propose changes to the codebase. Follow the [Github flow](https://www.atlassian.com/git/tutorials/comparing-workflows/forking-workflow) for more information on this procedure.
Pull requests are the best way to propose changes to the codebase. Follow the [GitHub flow](https://www.atlassian.com/git/tutorials/comparing-workflows/forking-workflow) for more information on this procedure.

The basic procedure for making a PR is:
* Fork the repo and create your branch from master.
* Commit your improvements to your branch and push to your Github fork (repo).
* When you're finished, go to your fork and make a Pull Request. It will automatically update if you need to make further changes.

- Fork the repo on GitHub and clone it to your machine.

```sh
git clone https://github.com/<your_github_name>/automatminer
```

- Install both regular and development dependencies and setup the `git` pre-commit hook.

```sh
pip install -r requirements.txt requirement && pre-commit install
```

This step is important as your changes may otherwise contain style violations that will throw errors when running our CI on your pull request.
- Commit your improvements and push to your GitHub fork.
- When you're finished, go to your fork and make a pull request. It will automatically update if you need to make further changes.

### How to Make a **Great** Pull Request

We have a few tips for writing good PRs that are accepted into the main repo:

* Use the Google Code style for all of your code. Find an example [here.](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html)
* Your code should have (4) spaces instead of tabs.
* If needed, update the documentation.
* **Write tests** for new features! Good tests are 100%, absolutely necessary for good code. We use the python `unittest` framework -- see some of the other tests in this repo for examples, or review the [Hitchhiker's guide to python](https://docs.python-guide.org/writing/tests/) for some good resources on writing good tests.
* Understand your contributions will fall under the same license as this repo.
- Use the Google Code style for all of your code. Find an example [here.](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html)
- Your code should have (4) spaces instead of tabs.
- If needed, update the documentation.
- **Write tests** for new features! Good tests are 100%, absolutely necessary for good code. We use the python `unittest` framework -- see some of the other tests in this repo for examples, or review the [Hitchhiker's guide to python](https://docs.python-guide.org/writing/tests/) for some good resources on writing good tests.
- Understand your contributions will fall under the same license as this repo.

When you submit your PR, our CI service will automatically run your tests.
When you submit your PR, our CI service will automatically run your tests.
We welcome good discussion on the best ways to write your code, and the comments on your PR are an excellent area for discussion.

#### References
This document was adapted from the open-source contribution guidelines for Facebook's Draft, as well as briandk's [contribution template](https://gist.github.com/briandk/3d2e8b3ec8daf5a27a62).

This document was adapted from the open-source contribution guidelines for Facebook's Draft, as well as briandk's [contribution template](https://gist.github.com/briandk/3d2e8b3ec8daf5a27a62).
16 changes: 8 additions & 8 deletions automatminer/__init__.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
from automatminer.preprocessing import DataCleaner, FeatureReducer
from automatminer.automl import TPOTAdaptor, SinglePipelineAdaptor
from automatminer.featurization import AutoFeaturizer
from automatminer.pipeline import MatPipe
from automatminer.presets import get_preset_config
from automatminer.automl import SinglePipelineAdaptor, TPOTAdaptor # noqa
from automatminer.featurization import AutoFeaturizer # noqa
from automatminer.pipeline import MatPipe # noqa
from automatminer.preprocessing import DataCleaner, FeatureReducer # noqa
from automatminer.presets import get_preset_config # noqa

__author__ = 'Alex Dunn, Qi Wang, Alex Ganose, Alireza Faghaninia, Anubhav Jain'
__author_email__ = 'ardunn@lbl.gov'
__license__ = 'Modified BSD'
__author__ = "Alex Dunn, Qi Wang, Alex Ganose, Alireza Faghaninia, Anubhav Jain"
__author_email__ = "ardunn@lbl.gov"
__license__ = "Modified BSD"
__version__ = "2019.10.14"
10 changes: 7 additions & 3 deletions automatminer/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,13 @@
"""
import abc
import logging

from automatminer.utils.log import (
AMM_LOGGER_BASENAME,
initialize_logger,
initialize_null_logger,
)
from sklearn.base import BaseEstimator
from automatminer.utils.log import initialize_logger, \
initialize_null_logger, AMM_LOGGER_BASENAME

__authors__ = ["Alex Dunn <ardunn@lbl.gov>", "Alex Ganose <aganose@lbl.gov>"]

Expand All @@ -24,7 +28,7 @@ def logger(self):
@logger.setter
def logger(self, new_logger):
"""Set a new logger.

Args:
new_logger (Logger, bool): A boolean or custom logger object to use
for logging. Alternatively, if set to True, the default automatminer
Expand Down
81 changes: 48 additions & 33 deletions automatminer/pipeline.py
Original file line number Diff line number Diff line change
@@ -1,20 +1,25 @@
"""
The highest level classes for pipelines.
"""
import os
import copy
import os
import pickle
from typing import Dict

import pandas as pd

from automatminer.base import LoggableMixin, DFTransformer
from automatminer.base import DFTransformer, LoggableMixin
from automatminer.presets import get_preset_config
from automatminer.utils.ml import regression_or_classification
from automatminer.utils.pkg import check_fitted, set_fitted, \
return_attrs_recursively, AutomatminerError, VersionError, get_version, \
save_dict_to_file
from automatminer.utils.log import AMM_DEFAULT_LOGGER
from automatminer.utils.ml import regression_or_classification
from automatminer.utils.pkg import (
AutomatminerError,
VersionError,
check_fitted,
get_version,
return_attrs_recursively,
save_dict_to_file,
set_fitted,
)


class MatPipe(DFTransformer, LoggableMixin):
Expand Down Expand Up @@ -88,15 +93,23 @@ class MatPipe(DFTransformer, LoggableMixin):
target (str): The name of the column where target values are held.
"""

def __init__(self, autofeaturizer=None, cleaner=None, reducer=None,
learner=None, logger=AMM_DEFAULT_LOGGER):
def __init__(
self,
autofeaturizer=None,
cleaner=None,
reducer=None,
learner=None,
logger=AMM_DEFAULT_LOGGER,
):
transformers = [autofeaturizer, cleaner, reducer, learner]
if not all(transformers):
if any(transformers):
raise AutomatminerError("Please specify all dataframe"
"transformers (autofeaturizer, learner,"
"reducer, and cleaner), or none (to use"
"default).")
raise AutomatminerError(
"Please specify all dataframe"
"transformers (autofeaturizer, learner,"
"reducer, and cleaner), or none (to use"
"default)."
)
else:
config = get_preset_config("express")
autofeaturizer = config["autofeaturizer"]
Expand All @@ -117,7 +130,7 @@ def __init__(self, autofeaturizer=None, cleaner=None, reducer=None,
super(MatPipe, self).__init__()

@staticmethod
def from_preset(preset: str = 'express', **powerups):
def from_preset(preset: str = "express", **powerups):
"""
Get a preset MatPipe from a string using
automatminer.presets.get_preset_config
Expand Down Expand Up @@ -238,8 +251,7 @@ def predict(self, df, ignore=None):
return merged_df

@set_fitted
def benchmark(self, df, target, kfold, fold_subset=None, cache=False,
ignore=None):
def benchmark(self, df, target, kfold, fold_subset=None, cache=False, ignore=None):
"""
If the target property is known for all data, perform an ML benchmark
using MatPipe. Used for getting an idea of how well AutoML can predict
Expand Down Expand Up @@ -292,22 +304,26 @@ def benchmark(self, df, target, kfold, fold_subset=None, cache=False,
if os.path.exists(cache_src):
self.logger.warning(
"Cache src {} already found! Ensure this featurized data "
"matches the df being benchmarked.".format(cache_src))
"matches the df being benchmarked.".format(cache_src)
)
self.logger.warning("Running pre-featurization for caching.")
self.autofeaturizer.fit_transform(df, target)
elif cache_src and not cache:
raise AutomatminerError(
"Caching was enabled in AutoFeaturizer but not in benchmark. "
"Either disable caching in AutoFeaturizer or enable it by "
"passing cache=True to benchmark.")
"passing cache=True to benchmark."
)
elif cache and not cache_src:
raise AutomatminerError(
"MatPipe cache is enabled, but no cache_src was defined in "
"autofeaturizer. Pass the cache_src argument to AutoFeaturizer "
"or use the cache_src get_preset_config powerup.")
"or use the cache_src get_preset_config powerup."
)
else:
self.logger.debug("No caching being used in AutoFeaturizer or "
"benchmark.")
self.logger.debug(
"No caching being used in AutoFeaturizer or " "benchmark."
)

if not fold_subset:
fold_subset = list(range(kfold.n_splits))
Expand Down Expand Up @@ -372,25 +388,20 @@ def summarize(self, filename=None) -> Dict[str, str]:
"drop_na_targets",
]
cleaner_data = {
attr: str(getattr(self.cleaner, attr))
for attr in cleaner_attrs
attr: str(getattr(self.cleaner, attr)) for attr in cleaner_attrs
}

reducer_attrs = [
"reducers",
"reducer_params",
]
reducer_attrs = ["reducers", "reducer_params"]
reducer_data = {
attr: str(getattr(self.reducer, attr))
for attr in reducer_attrs
attr: str(getattr(self.reducer, attr)) for attr in reducer_attrs
}

attrs = {
"featurizers": self.autofeaturizer.featurizers,
"ml_model": str(self.learner.best_pipeline),
"feature_reduction": reducer_data,
"data_cleaning": cleaner_data,
"features": self.learner.features
"features": self.learner.features,
}
if filename:
save_dict_to_file(attrs, filename)
Expand All @@ -416,12 +427,16 @@ def save(self, filename="mat.pipe"):

temp_logger = copy.deepcopy(self._logger)
loggables = [
self, self.learner, self.reducer, self.cleaner, self.autofeaturizer
self,
self.learner,
self.reducer,
self.cleaner,
self.autofeaturizer,
]
for loggable in loggables:
loggable._logger = AMM_DEFAULT_LOGGER

with open(filename, 'wb') as f:
with open(filename, "wb") as f:
pickle.dump(self, f)

# Reassign live memory objects for further use in this object
Expand All @@ -446,7 +461,7 @@ def load(filename, logger=True, supress_version_mismatch=False):
Returns:
pipe (MatPipe): A MatPipe object.
"""
with open(filename, 'rb') as f:
with open(filename, "rb") as f:
pipe = pickle.load(f)

if pipe.version != get_version() and not supress_version_mismatch:
Expand Down
2 changes: 1 addition & 1 deletion automatminer/preprocessing/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
from .core import DataCleaner, FeatureReducer
from .core import DataCleaner, FeatureReducer # noqa
Loading