Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement estimator for Hugging Face models #2245

Merged
Merged
Changes from 1 commit
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
b0b5be0
adding core hf estimator
GiulioZizzo Jul 3, 2023
2e32345
initial image processing
GiulioZizzo Jul 3, 2023
7a33398
adding huggingface to trainer tests
GiulioZizzo Jul 7, 2023
8faa487
updates to huggingface estimator
GiulioZizzo Jul 7, 2023
b7ed033
initial test_adversarial_trainer from unittest to pytest conversion
GiulioZizzo Jul 14, 2023
7728cb2
update test_deeplearning_common.py for huggingface
GiulioZizzo Jul 19, 2023
9e0246f
updating test for test_clean_label_backdoor_attack
GiulioZizzo Jul 21, 2023
101a309
update to test_adversarial_trainer_madry_pgd
GiulioZizzo Jul 24, 2023
c314c11
add huggingface workflow
GiulioZizzo Jul 24, 2023
b6d3fb0
Create ci-huggingface-2.yml
GiulioZizzo Jul 24, 2023
5005e08
checking workflow
GiulioZizzo Jul 24, 2023
f42e115
updating failing tests
GiulioZizzo Jul 24, 2023
1f3bfcc
updates to test_adversarial_trainer for huggingface
GiulioZizzo Jul 24, 2023
11c8d91
adding initial functionality for getting activations
GiulioZizzo Jul 24, 2023
8d96eea
updating huggignface get activations
GiulioZizzo Jul 25, 2023
df40e1c
updates to test_adversarial_trainer_FBF for huggingface
GiulioZizzo Jul 25, 2023
9399ccd
initial refactor for getting activations with pytorch. Potentially mo…
GiulioZizzo Jul 25, 2023
551da0b
using logits in pytorch estimator to follow the default loss function
GiulioZizzo Jul 26, 2023
c3d52ef
updating keyword args
GiulioZizzo Jul 26, 2023
4249610
initial activation defence test refactor
GiulioZizzo Jul 27, 2023
17f4f46
refactor test_activation_defence_pytest for pytest
GiulioZizzo Jul 28, 2023
f49978c
check for HF output type. Development to HF tests for TRADES
GiulioZizzo Jul 31, 2023
ef805e8
adding option for small classifier to run on CI actions
GiulioZizzo Jul 31, 2023
d209c0f
refactor for hidden trigger backdoor. Added tiebreak check for hidden…
GiulioZizzo Jul 31, 2023
ae3b343
update get_activations for huggingface with correct return if using f…
GiulioZizzo Jul 31, 2023
6498386
initial addition of test_projected_gradient_descent in pytorch format
GiulioZizzo Jul 31, 2023
126460a
addinf framework check to some tests in test_adversarial_trainer
GiulioZizzo Aug 1, 2023
d9326ff
adding tests for pgd checks
GiulioZizzo Aug 1, 2023
c1c5b94
adding hf tabular classifier
GiulioZizzo Aug 1, 2023
2bab5d6
correctly checking framework in test
GiulioZizzo Aug 2, 2023
de1d477
remove unneeded files. CI fixes.
GiulioZizzo Aug 16, 2023
86645ec
CI updates and style edits
GiulioZizzo Aug 17, 2023
fffed5a
re adding old test_adversarial_trainer_madry_pgd.py due to legacy tes…
GiulioZizzo Aug 17, 2023
e2f2ab7
adjusting pre-processors for HF. Adding progress bar to pgd trainer t…
GiulioZizzo Aug 21, 2023
88a2791
updates to progress bar display
GiulioZizzo Aug 21, 2023
ed65928
updates to progress bar display
GiulioZizzo Aug 21, 2023
ad3d581
adding initial notebook
GiulioZizzo Aug 21, 2023
d178ab6
Updating displayed progress bar info. Updated notebook. Type hinting …
GiulioZizzo Aug 23, 2023
5d7b08d
Black formatting. Executed notebook
GiulioZizzo Aug 24, 2023
daa7c62
Merge branch 'dev_1.16.0' into huggingface_integration
GiulioZizzo Aug 24, 2023
9208af9
Switch to small HF model for FBF test. Formatting edits
GiulioZizzo Aug 24, 2023
124f6c8
update allowed values for HF FBF. Formatting edits
GiulioZizzo Aug 24, 2023
2e7a29a
Initial review edits
GiulioZizzo Aug 25, 2023
e67388f
initial round of review edits
GiulioZizzo Aug 25, 2023
e520782
CI cleanup
GiulioZizzo Aug 26, 2023
41ac37d
removing option for large model
GiulioZizzo Aug 26, 2023
5cccfba
moving location of hf classifier. updating clean label attack to work…
GiulioZizzo Aug 26, 2023
f9fcc3f
moving progress bars to different PR. Re-running notebook
GiulioZizzo Aug 26, 2023
0178cbe
Simplifying poison changes. Removing missed progress bar edits
GiulioZizzo Aug 26, 2023
0914f99
Using clip values in notebook. CI error fixes
GiulioZizzo Aug 27, 2023
9ed5e32
remove unneeded components
GiulioZizzo Aug 31, 2023
d0f3fa1
remove unneeded change
GiulioZizzo Sep 1, 2023
b3cbf95
Merge branch 'dev_1.16.0' into huggingface_integration
beat-buesser Sep 8, 2023
5d6f11a
Update tests/attacks/poison/test_hidden_trigger_backdoor.py
GiulioZizzo Sep 13, 2023
89bd452
review cleanup
GiulioZizzo Sep 13, 2023
44bdefe
adding additional options for arguments to hf to match pytorch classi…
GiulioZizzo Sep 13, 2023
6f37944
mypy error fix
GiulioZizzo Sep 13, 2023
25d357a
clean up of dp instahide test
GiulioZizzo Sep 14, 2023
fe91036
clean up of dp instahide test
GiulioZizzo Sep 14, 2023
3d4fe0d
Merge branch 'dev_1.16.0' into huggingface_integration
beat-buesser Sep 18, 2023
6a6f094
Merge branch 'dev_1.16.0' into huggingface_integration
beat-buesser Sep 18, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Simplifying poison changes. Removing missed progress bar edits
Signed-off-by: GiulioZizzo <giulio.zizzo@yahoo.co.uk>
  • Loading branch information
GiulioZizzo committed Aug 26, 2023
commit 0178cbe2a33876553b002c23bc4d02a3ae94b188
2 changes: 1 addition & 1 deletion art/attacks/poisoning/backdoor_attack.py
Original file line number Diff line number Diff line change
@@ -79,7 +79,7 @@ def poison( # pylint: disable=W0221
poisoned = np.copy(x)

if callable(self.perturbation):
return self.perturbation(poisoned, channels_first=channels_first), y_attack
return self.perturbation(poisoned), y_attack

for perturb in self.perturbation:
poisoned = perturb(poisoned)
4 changes: 1 addition & 3 deletions art/attacks/poisoning/clean_label_backdoor_attack.py
Original file line number Diff line number Diff line change
@@ -142,9 +142,7 @@ def poison( # pylint: disable=W0221
logger.warning("%d indices without change: %s", len(idx_no_change), idx_no_change)

# Add backdoor and poison with the same label
poisoned_input, _ = self.backdoor.poison(
perturbed_input, self.target, broadcast=broadcast, channels_first=channels_first
)
poisoned_input, _ = self.backdoor.poison(perturbed_input, self.target, broadcast=broadcast)
data[selected_indices] = poisoned_input

return data, estimated_labels
4 changes: 0 additions & 4 deletions art/estimators/classification/pytorch.py
Original file line number Diff line number Diff line change
@@ -23,14 +23,12 @@

import copy
import logging
import math
import os
import time
from typing import Any, Dict, List, Optional, Tuple, Union, TYPE_CHECKING

import numpy as np
import six
from tqdm.auto import tqdm

from art import config
from art.estimators.classification.classifier import (
@@ -397,8 +395,6 @@ def fit( # pylint: disable=W0221
import torch
from torch.utils.data import TensorDataset, DataLoader

display_progress_bar = kwargs.get("display_progress_bar", False)

# Set model mode
self._model.train(mode=training_mode)

17 changes: 11 additions & 6 deletions tests/attacks/poison/test_clean_label_backdoor_attack.py
Original file line number Diff line number Diff line change
@@ -36,15 +36,20 @@ def test_poison(art_warning, get_default_mnist_subset, image_dl_estimator, frame
(x_train, y_train), (_, _) = get_default_mnist_subset
classifier, _ = image_dl_estimator()
target = to_categorical([9], 10)[0]
if framework in ["pytorch", "huggingface"]:
def mod(x):
original_dtype = x.dtype
x = add_pattern_bd(x, channels_first=True)
return x.astype(original_dtype)
else:
def mod(x):
original_dtype = x.dtype
x = add_pattern_bd(x)
return x.astype(original_dtype)

backdoor = PoisoningAttackBackdoor(add_pattern_bd)

attack = PoisoningAttackCleanLabelBackdoor(backdoor, classifier, target)
if framework in ["pytorch", "huggingface"]:
print('HERE!!')
poison_data, poison_labels = attack.poison(x_train, y_train, channels_first=True)
else:
poison_data, poison_labels = attack.poison(x_train, y_train)
poison_data, poison_labels = attack.poison(x_train, y_train)

np.testing.assert_equal(poison_data.shape, x_train.shape)
np.testing.assert_equal(poison_labels.shape, y_train.shape)