Add DCRBaselineProtection Metric #728

lajohn4747 · 2025-02-19T20:36:38Z

resolves #720
CU-86b3w869f

Average of 84 secs to run a synthetic sample of size 1000 to run against the demo dataset fake_hotels_guest (400 rows used for training and 100 rows used for validation) without subsampling.

…ete_dcr_baseline_protection

sdv-team · 2025-02-19T20:36:43Z

Task linked: CU-86b3w869f SDMetrics - Add DCRBaselineProtection metric #720

codecov · 2025-02-19T20:38:46Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.47%. Comparing base (b1d3612) to head (9ce76f0).
Report is 1 commits behind head on dcr_feature_branch.

Additional details and impacted files

@@                  Coverage Diff                   @@
##           dcr_feature_branch     #728      +/-   ##
======================================================
+ Coverage               95.39%   95.47%   +0.07%     
======================================================
  Files                     114      115       +1     
  Lines                    4491     4570      +79     
======================================================
+ Hits                     4284     4363      +79     
  Misses                    207      207

Flag	Coverage Δ
integration	`80.30% <100.00%> (+0.34%)`	⬆️
unit	`83.69% <100.00%> (+0.28%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…ete_dcr_baseline_protection

lajohn4747 · 2025-02-26T05:27:58Z

sdmetrics/single_table/privacy/util.py

@@ -148,3 +148,16 @@ def allow_nan_array(attributes):
            ret.append(entry)

    return ret
+
+
+def validate_num_samples_num_iteration(num_rows_subsample, num_iterations):


Putting this in a separate function as we reuse this for DCROverfittingProtection.

R-Palazzo

Looking good!

I just let 1 or 2 suggestions

sdmetrics/single_table/privacy/dcr_baseline_protection.py

tests/unit/single_table/privacy/test_dcr_baseline_protection.py

sdmetrics/single_table/privacy/dcr_baseline_protection.py

…e_protection

lajohn4747 added 10 commits February 18, 2025 18:14

Add DCR measurement funcitons

4cd3460

Update naming

a07a330

WIP

7363776

WIP

7dfcdee

Make sure col_range is always set

bf45829

Merge branch 'issue_720_dcr_baseline_protection' into issue_720_compl…

663116d

…ete_dcr_baseline_protection

WIP

77eb387

Add test for shuffled data

63846c5

Merge branch 'issue_720_dcr_baseline_protection' into issue_720_compl…

2977de5

…ete_dcr_baseline_protection

WIP

9338b8c

lajohn4747 added 8 commits February 19, 2025 16:16

Finish adding tests

c7458fb

Use apply instead of iterating through the dataframe

e21085f

Merge branch 'issue_720_dcr_baseline_protection' into issue_720_compl…

570e48d

…ete_dcr_baseline_protection

Rename variable

71e2ed2

Merge branch 'issue_720_dcr_baseline_protection' into issue_720_compl…

6d14f7a

…ete_dcr_baseline_protection

Merge branch 'main' into issue_720_dcr_baseline_protection

71f7811

Merge branch 'issue_720_dcr_baseline_protection' into issue_720_compl…

ebae16b

…ete_dcr_baseline_protection

Add SingleTableMetric variables to DCRBaselineProtection

24a0504

lajohn4747 self-assigned this Feb 19, 2025

lajohn4747 requested review from amontanez24, frances-h and R-Palazzo February 19, 2025 23:06

lajohn4747 marked this pull request as ready for review February 19, 2025 23:07

lajohn4747 requested a review from a team as a code owner February 19, 2025 23:07

lajohn4747 added 4 commits February 20, 2025 09:26

Fix lint

7a5d0af

Fix typo in description

4eb132a

Remove unused index

9f4c949

Use sum instead of a list

14338b3

lajohn4747 added 12 commits February 25, 2025 00:15

Merge branch 'issue_720_dcr_baseline_protection' into issue_720_compl…

556e2fc

…ete_dcr_baseline_protection

Add random generator

f1aed99

Fix naming

33784b0

Merge branch 'issue_720_dcr_baseline_protection' into issue_720_compl…

e78f9be

…ete_dcr_baseline_protection

wip

e8f815e

WIP

a5b9590

Wip

d0aeffc

Wip

dd08b11

Fix typo in docstrings

0d68b38

Merge branch 'issue_720_dcr_baseline_protection' into issue_720_compl…

b745deb

…ete_dcr_baseline_protection

Add validation unit test

8a41721

Update docstring

af3f1b9

lajohn4747 commented Feb 26, 2025

View reviewed changes

lajohn4747 marked this pull request as ready for review February 26, 2025 05:28

lajohn4747 requested review from amontanez24, R-Palazzo and frances-h February 26, 2025 05:28

R-Palazzo reviewed Feb 26, 2025

View reviewed changes

Base automatically changed from issue_720_dcr_baseline_protection to dcr_feature_branch February 26, 2025 15:12

frances-h reviewed Feb 26, 2025

View reviewed changes

sdmetrics/single_table/privacy/dcr_baseline_protection.py Outdated Show resolved Hide resolved

sdmetrics/single_table/privacy/dcr_baseline_protection.py Outdated Show resolved Hide resolved

sdmetrics/single_table/privacy/dcr_baseline_protection.py Outdated Show resolved Hide resolved

Update randomizer function and warn about large subsampling

c58b385

lajohn4747 requested review from frances-h and R-Palazzo February 26, 2025 18:56

lajohn4747 added 2 commits February 26, 2025 15:53

Merge branch 'dcr_feature_branch' into issue_720_complete_dcr_baselin…

b5df24a

…e_protection

Clean up import name

9ce76f0

R-Palazzo approved these changes Feb 27, 2025

View reviewed changes

frances-h approved these changes Feb 27, 2025

View reviewed changes

lajohn4747 merged commit b53dd20 into dcr_feature_branch Feb 27, 2025
55 checks passed

lajohn4747 deleted the issue_720_complete_dcr_baseline_protection branch February 27, 2025 16:07

lajohn4747 mentioned this pull request Feb 27, 2025

Add DCRBaselineProtection and DCROverfittingProtection metrics #735

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add DCRBaselineProtection Metric #728

Add DCRBaselineProtection Metric #728

lajohn4747 commented Feb 19, 2025 •

edited

Loading

sdv-team commented Feb 19, 2025

codecov bot commented Feb 19, 2025 •

edited

Loading

lajohn4747 Feb 26, 2025

R-Palazzo left a comment

Add DCRBaselineProtection Metric #728

Add DCRBaselineProtection Metric #728

Conversation

lajohn4747 commented Feb 19, 2025 • edited Loading

sdv-team commented Feb 19, 2025

codecov bot commented Feb 19, 2025 • edited Loading

Codecov Report

lajohn4747 Feb 26, 2025

Choose a reason for hiding this comment

R-Palazzo left a comment

Choose a reason for hiding this comment

lajohn4747 commented Feb 19, 2025 •

edited

Loading

codecov bot commented Feb 19, 2025 •

edited

Loading