Hierarchical predictors #623

FBruzzesi · 2024-02-29T09:16:35Z

Description

Introduces HierarchicalPredictor, HierarchicalClassifier and HierarchicalRegressor - Partially fixes HierarchicalPredictor and HierarchicalTransformer #620
Introduces ShrinkageMixin to abstract such shared functionalities from HierarchicalPredictor and GroupedPredictor
Adds a new shrinkage built-in function equal_shrinkage

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

My code follows the style guidelines (flake8)
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation (also to the readme.md)
I have added tests that prove my fix is effective or that my feature works
I have added tests to check whether the new feature adheres to the sklearn convention
New and existing unit tests pass locally with my changes

FBruzzesi · 2024-02-29T09:17:14Z

docs/api/shrinkage-functions.md

@@ -0,0 +1,33 @@
+# Shrinkage


Deserve their own page, at least in API

FBruzzesi · 2024-02-29T09:17:55Z

docs/user-guide/meta-models.md

@@ -177,6 +177,50 @@ This transformer also has use-cases beyond fairness. You could use this transfor

 For example, for predicting house prices, using the surface of a house relatively to houses in the same neighborhood could be a more relevant feature than the surface relative to all houses.

+## Hierarchical Prediction
+
+!!! info "New in version 0.8.0"


Again, totally made up version number. Need to be fixed accordingly

I'm fine with 0.8.0. Just gotta get this reviewed.

FBruzzesi · 2024-02-29T09:19:08Z

sklego/meta/_shrinkage_utils.py

+def equal_shrinkage(group_sizes) -> np.ndarray:
+    """Each group is weighed equally.
+
+    Parameters
+    ----------
+    group_sizes : array-like
+        The number of observations in each group, must implement the `__len__` method.
+
+    Returns
+    -------
+    np.ndarray
+        The weights for each group.
+    """
+    return np.ones(len(group_sizes))


Not sure if we want this to exist, but it felt like an easy one to add

sklego/meta/_shrinkage_utils.py

tests/test_meta/test_hierarchical_predictor.py

docs/user-guide/meta-models.md

FBruzzesi · 2024-02-29T19:24:19Z

Forgot to mention a couple of ideas:

estimators_ attribute is something like the following dictionary:

{
    (1,): LogisticRegression(),
    (1, 'A'): LogisticRegression(),
    (1, 'B'): LogisticRegression(),
    (1, 'A', 'X'): LogisticRegression(),
    (1, 'B', 'Y'): LogisticRegression(),
}

which is ok-ish, but not super interpretable. One idea could be to have a namedtuple:

from collections import namedtuple

groups, grp_values = ["col1", "col2"], [1,2]
estimator = namedtuple("estimator", groups)

estimator(*grp_values)
# estimator(col1=1, col2=2)

This hits edge cases if a group has a "_" prefix (maybe we can trim that), but the end result would be:

 {
     estimator(sklego_global_estimator=1): LogisticRegression(),
     estimator(sklego_global_estimator=1, col1="A"): LogisticRegression(),
     estimator(sklego_global_estimator=1, col1="B"): LogisticRegression(),
     estimator(sklego_global_estimator=1, col1="A", col2="X"): LogisticRegression(),
     estimator(sklego_global_estimator=1, col1="B", col2="Y"): LogisticRegression(),
 }

dataclasses would also work similarly.

Possibility of accessing an estimator from its dictionary directly, namely being able to do hierarchical_predictor[(1, "A", "X")] in place of hierarchical_predictor.estimators_[(1, "A", "X")]
Related to both the above, as adding the 1 for the global model is not very ergonomic, we can create shortcuts by aliasing externally what is in the first point as:
```
 {
     "__sklego_global_estimator__": LogisticRegression(),
     ('A', ): LogisticRegression(),
     ('B', ): LogisticRegression(),
     ('A', 'X'): LogisticRegression(),
     ('B', 'Y'): LogisticRegression(),
 }
```
and all the associated "conversions" from the first and/or second point, if accepted

koaning · 2024-03-02T15:50:35Z

At the moment I'd be fine with this:

{
    (1,): LogisticRegression(),
    (1, 'A'): LogisticRegression(),
    (1, 'B'): LogisticRegression(),
    (1, 'A', 'X'): LogisticRegression(),
    (1, 'B', 'Y'): LogisticRegression(),
}

But it would be preferable to have this documented in the code with comments. I agree that it's merely "ok", but it feels clear enough if the comment is there, no?

FBruzzesi · 2024-03-04T11:30:25Z

But it would be preferable to have this documented in the code with comments. I agree that it's merely "ok", but it feels clear enough if the comment is there, no?

Just added a few comments on those. Here is how it looks like:

koaning · 2024-03-04T11:47:53Z

Ah yeah, that's even nicer. Just in the comments would've been sufficient but adding it in the docs for sure is a nice touch.

koaning

My main observation is that we might want to add an extra test that uses a dummy model to help predict the values that we'd expect. Other than that; this looks great! Good work :)

FBruzzesi · 2024-03-09T17:29:55Z

My main observation is that we might want to add an extra test that uses a dummy model to help predict the values that we'd expect. Other than that; this looks great! Good work :)

I am thinking out loud here but...maybe the easiest way to check this is to use a deterministic/fake predictions model. Does that sound reasonable?

koaning · 2024-03-09T20:08:17Z

Oh that was totally in line with what I had in mind. I've used Dummy models for this in the past but you're also free to pick another method. As long as we just have a test that our assumptions on how we shrink play out.

koaning

LGTM

FBruzzesi added 16 commits February 17, 2024 15:27

grouped predictor patch

4267560

add regr e clf to meta init

9a6a471

check in fit

123e94a

WIP

52e8d0a

minimal cleanup

339c9a3

docstrings and api

d9db769

shrinkage utils and mixin

17d429b

Merge branch 'patch/grouped-shrinkage' into feature/hierarchical

39556e8

docstrings

fe1cd42

mixin transfer completed

3908ecb

more docstrings

dab9ddc

docstrings

7e37600

merge main

6ba7e3e

user guide

bbb15bd

init tests

4aa1c56

unittests

26f8235

FBruzzesi commented Feb 29, 2024

View reviewed changes

sklego/meta/_shrinkage_utils.py Show resolved Hide resolved

FBruzzesi commented Feb 29, 2024

View reviewed changes

tests/test_meta/test_hierarchical_predictor.py Show resolved Hide resolved

koaning reviewed Feb 29, 2024

View reviewed changes

docs/user-guide/meta-models.md Outdated Show resolved Hide resolved

koaning reviewed Feb 29, 2024

View reviewed changes

docs/user-guide/meta-models.md Outdated Show resolved Hide resolved

exp_decay_shrinkage

9605a0a

estimators_ attribute documentation

3369f97

koaning requested changes Mar 6, 2024

View reviewed changes

koaning and others added 2 commits March 9, 2024 21:08

Merge branch 'main' into feature/hierarchical

995288e

add test cases

a1b7922

FBruzzesi requested a review from koaning March 13, 2024 16:43

koaning approved these changes Mar 15, 2024

View reviewed changes

koaning merged commit b7e9f77 into koaning:main Mar 15, 2024
14 checks passed

FBruzzesi deleted the feature/hierarchical branch April 10, 2024 08:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hierarchical predictors #623

Hierarchical predictors #623

FBruzzesi commented Feb 29, 2024 •

edited

Loading

FBruzzesi Feb 29, 2024

FBruzzesi Feb 29, 2024

koaning Feb 29, 2024

FBruzzesi Feb 29, 2024

FBruzzesi commented Feb 29, 2024 •

edited

Loading

koaning commented Mar 2, 2024

FBruzzesi commented Mar 4, 2024

koaning commented Mar 4, 2024

koaning left a comment •

edited

Loading

FBruzzesi commented Mar 9, 2024

koaning commented Mar 9, 2024

koaning left a comment

Hierarchical predictors #623

Hierarchical predictors #623

Conversation

FBruzzesi commented Feb 29, 2024 • edited Loading

Description

Type of change

Checklist:

FBruzzesi Feb 29, 2024

Choose a reason for hiding this comment

FBruzzesi Feb 29, 2024

Choose a reason for hiding this comment

koaning Feb 29, 2024

Choose a reason for hiding this comment

FBruzzesi Feb 29, 2024

Choose a reason for hiding this comment

FBruzzesi commented Feb 29, 2024 • edited Loading

koaning commented Mar 2, 2024

FBruzzesi commented Mar 4, 2024

koaning commented Mar 4, 2024

koaning left a comment • edited Loading

Choose a reason for hiding this comment

FBruzzesi commented Mar 9, 2024

koaning commented Mar 9, 2024

koaning left a comment

Choose a reason for hiding this comment

FBruzzesi commented Feb 29, 2024 •

edited

Loading

FBruzzesi commented Feb 29, 2024 •

edited

Loading

koaning left a comment •

edited

Loading