Skip to content

Commit 887200b

Browse files
Merge pull request #5 from leap-laboratories/update-readme
Update README to explain how LocalLLMAttributor and APILLMAttributor …
2 parents e7eb5bb + 8105721 commit 887200b

5 files changed

+1637
-68
lines changed

README.md

+170-68
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,85 @@
11
# LLM Attribution Library
22

3-
The LLM Attribution Library is a Python package designed to compute the attributions of each token in an input string to the generated tokens in a language model. This is particularly useful for understanding the influence of specific input tokens on the output of a language model.
4-
5-
![Attribution Table](docs/assets/table.png)
6-
7-
- [LLM Attribution Library](#llm-attribution-library)
8-
- [Technical Overview](#technical-overview)
9-
- [Requirements](#requirements)
10-
- [Packaging](#packaging)
11-
- [Linting](#linting)
12-
- [Installation](#installation)
13-
- [Usage](#usage)
14-
- [Limitations](#limitations)
15-
- [Batch dimensions](#batch-dimensions)
16-
- [Input Embeddings](#input-embeddings)
17-
- [GPU Acceleration](#gpu-acceleration)
18-
- [Logging](#logging)
19-
- [Cleaning Up](#cleaning-up)
20-
- [Development](#development)
21-
- [Testing](#testing)
22-
23-
## Technical Overview
24-
25-
The library uses gradient-based attribution to quantify the influence of input tokens on the output of a GPT-2 model. For each output token, it computes the gradients with respect to the input embeddings. The L1 norm of these gradients is then used as the attribution score, representing the total influence of each input token on the output. This approach provides a direct measure of the sensitivity of the output to changes in the input, aiding in model interpretation and diagnosis.
3+
The LLM Attribution Library is designed to compute the contribution of each token in a prompt to the generated response of a language model.
4+
5+
It can be used with both local LLMs:
6+
7+
![Local LLM Attribution Table](docs/assets/local-llm-attribution.png)
8+
9+
And OpenAI LLMs accessible through an API:
10+
![API-accessible LLM Attribution Table](docs/assets/api-llm-attribution.png)
11+
12+
## Index
13+
- [Quickstart](#quickstart)
14+
- [Requirements](#requirements)
15+
- [Packaging](#packaging)
16+
- [Linting](#linting)
17+
- [Installation](#installation)
18+
- [API Design](#api-design)
19+
- [BaseLLMAttributor](#basellmattributor)
20+
- [LocalLLMAttributor](#localllmattributor)
21+
- [APILLMAttributor](#apillmattributor)
22+
- [PerturbationStrategy and AttributionStrategy](#perturbationstrategy-and-attributionstrategy)
23+
- [ExperimentLogger](#experimentlogger)
24+
- [Limitations](#limitations)
25+
- [Batch dimensions](#batch-dimensions)
26+
- [Input Embeddings](#input-embeddings)
27+
- [GPU Acceleration](#gpu-acceleration)
28+
- [Development](#development)
29+
- [Testing](#testing)
30+
31+
## Quickstart
32+
33+
Example of gradient-based attrubution using gemma-2b locally:
34+
35+
```python
36+
from transformers import AutoModelForCausalLM, AutoTokenizer
37+
from attribution.attribution import Attributor
38+
39+
model_id = "google/gemma-2b-it"
40+
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto").cuda()
41+
tokenizer = AutoTokenizer.from_pretrained(model_id)
42+
embeddings = model.get_input_embeddings().weight.detach()
43+
44+
attributor = Attributor(model=model, embeddings=embeddings, tokenizer=tokenizer)
45+
attr_scores, token_ids = attributor.get_attributions(
46+
input_string="the five continents are asia, europe, afri",
47+
generation_length=7,
48+
)
49+
50+
attributor.print_attributions(
51+
word_list=tokenizer.convert_ids_to_tokens(token_ids),
52+
attr_scores=attr_scores,
53+
token_ids=token_ids,
54+
generation_length=7,
55+
)
56+
```
57+
58+
Attrubution using GPT-3 via OpenAI's API:
59+
60+
```python
61+
from attribution.api_attribution import APILLMAttributor
62+
from attribution.experiment_logger import ExperimentLogger
63+
from attribution.token_perturbation import NthNearestPerturbationStrategy
64+
65+
attributor = APILLMAttributor()
66+
logger = ExperimentLogger()
67+
68+
input_text = "The clock shows 9:47 PM. How many minutes 'til 10?"
69+
attributor.compute_attributions(
70+
input_text,
71+
perturbation_strategy=NthNearestPerturbationStrategy(n=-1),
72+
attribution_strategies=["cosine", "prob_diff", "token_displacement"],
73+
logger=logger,
74+
perturb_word_wise=True,
75+
)
76+
77+
logger.print_sentence_attribution()
78+
logger.print_attribution_matrix(exp_id=1)
79+
80+
```
81+
82+
Usage examples can be found in the `examples/` folder a more in-depth case study in `research/gemma-2b-case-study.ipynb`.
2683

2784
## Requirements
2885

@@ -74,43 +131,106 @@ uv pip compile requirements.in -o requirements.txt
74131
uv pip compile requirements-dev.in -o requirements-dev.txt
75132
```
76133

77-
## Usage
78134

79-
Usage examples can be found in the `examples/` folder.
135+
## API Design
136+
137+
The attributors are designed to compute the contribution made by each token in an input string to the tokens generated by a language model.
80138

81-
The following shows a simple example of attrubution using gemma-2b:
139+
### BaseLLMAttributor
140+
141+
[BaseLLMAttributor] is an abstract base class that defines the interface for all LLM attributors. It declares the `compute_attributions` method, which must be implemented by any concrete attributor class. This method takes an input text and computes the attribution scores for each token.
82142

83143
```python
84-
from transformers import AutoModelForCausalLM, AutoTokenizer
144+
class BaseLLMAttributor(ABC):
145+
@abstractmethod
146+
def compute_attributions(
147+
self, input_text: str, **kwargs
148+
) -> Optional[Tuple[torch.Tensor, torch.Tensor]]:
149+
pass
150+
```
85151

86-
from attribution.attribution import Attributor
152+
### LocalLLMAttributor
87153

88-
model_id = "google/gemma-2b-it"
89-
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto").cuda()
90-
tokenizer = AutoTokenizer.from_pretrained(model_id)
91-
embeddings = model.get_input_embeddings().weight.detach()
154+
`LocalLLMAttributor` uses a local model to compute attributions. The model, tokenizer, and embeddings are passed to the constructor.
92155

93-
attributor = Attributor(model=model, embeddings=embeddings, tokenizer=tokenizer)
94-
attr_scores, token_ids = attributor.get_attributions(
95-
input_string="the five continents are asia, europe, afri",
96-
generation_length=7,
97-
)
156+
```python
157+
class LocalLLMAttributor:
158+
def __init__(
159+
self,
160+
model: nn.Module,
161+
tokenizer: transformers.PreTrainedTokenizerBase,
162+
embeddings: torch.Tensor,
163+
device: Optional[str] = None,
164+
log_level: int = logging.WARNING,
165+
):
166+
...
167+
def compute_attributions(
168+
self, input_string: str, **kwargs
169+
) -> Tuple[torch.Tensor, torch.Tensor]:
170+
...
171+
```
98172

99-
attributor.print_attributions(
100-
word_list=tokenizer.convert_ids_to_tokens(token_ids),
101-
attr_scores=attr_scores,
102-
token_ids=token_ids,
103-
generation_length=7,
104-
)
173+
The `compute_attributions` method generates tokens from the input string and computes the gradients of the output with respect to the input embeddings. These gradients are used to compute the attribution scores.
174+
175+
`LocalLLMAttributor` uses gradient-based attribution to quantify the influence of input tokens on the output of a model. For each output token, it computes the gradients with respect to the input embeddings. The L1 norm of these gradients is then used as the attribution score, representing the total influence of each input token on the output
176+
177+
178+
#### Cleaning Up
179+
180+
A convenience method is provided to clean up memory used by Python and Torch. This can be useful when running the library in a cloud notebook environment:
181+
182+
```python
183+
local_attributor.cleanup()
105184
```
106185

107-
### Limitations
186+
### APILLMAttributor
187+
188+
`APILLMAttributor` uses the OpenAI API to compute attributions. Given that gradients are not accessible, the attributor perturbs the input with a given `PerturbationStrategy` and measures the magnitude of change of the generated output with an `attribution_strategy`.
189+
190+
The `compute_attributions` method:
191+
1. Sends a chat completion request to the OpenAI API.
192+
2. Uses a `PerturbationStrategy` to modify the input prompt, and sends the perturbed input to OpenAI's API to generate a perturbed output. Each token of the input prompt is perturbed separately, to obtain an attribution score for each input token.
193+
3. Uses an `attribution_strategy` to compute the magnitude of change between the original and perturbed output.
194+
4. Logs attribution scores to an `ExperimentLogger` if passed.
195+
196+
```python
197+
class APILLMAttributor(BaseLLMAttributor):
198+
def __init__(
199+
self,
200+
model: Optional[PreTrainedModel] = None,
201+
tokenizer: Optional[PreTrainedTokenizer] = None,
202+
token_embeddings: Optional[np.ndarray] = None,
203+
):
204+
...
205+
def compute_attributions(self, input_text: str, **kwargs):
206+
...
207+
```
208+
209+
### PerturbationStrategy and AttributionStrategy
210+
211+
`PerturbationStrategy` is an abstract base class that defines the interface for all perturbation strategies. It declares the `get_replacement_token` method, which must be implemented by any concrete perturbation strategy class. This method takes a token id and returns a replacement token id.
212+
213+
The `attribution_strategy` parameter is a string that specifies the method to use for computing attributions. The available strategies are "cosine", "prob_diff", and "token_displacement".
214+
215+
- **Cosine Similarity Attribution**: Measures the cosine similarity between the embeddings of the original and perturbed outputs. The embeddings are obtained from a pre-trained model (e.g. GPT2). The cosine similarity is calculated for each pair of tokens in the same position on the original and perturbed outputs. For example, it compares the token in position 0 in the original response to the token in position 0 in the perturbed response. Additionally, the cosine similarity of the entire sentence embeddings is computed. The difference in sentence similarity and token similarities are returned.
108216

109-
#### Batch dimensions
217+
- **Probability Difference Attribution**: Calculates the absolute difference in probabilities for each token in the original and perturbed outputs. The probabilities are obtained from the `top_logprobs` field of the tokens, which contains the most likely tokens for each token position. The mean of these differences is returned, as well as the probability difference for each token position.
218+
219+
- **Token Displacement Attribution**: Calculates the displacement of each token in the original output within the perturbed output's `top_logprobs` predicted tokens. The `top_logprobs` field contains the most likely tokens for each token position. If a token from the original output is not found in the `top_logprobs` of the perturbed output, a maximum displacement value is assigned. The mean of these displacements is returned, as well as the displacement of each original output token.
220+
221+
222+
### ExperimentLogger
223+
224+
The `ExperimentLogger` class is used to log the results of different experiment runs. It provides methods for starting and stopping an experiment, logging the input and output tokens, and logging the attribution scores. The `api_llm_attribution.ipynb` notebook shows an example of how to use `ExperimentLogger` to compare the results of different attribution strategies.
225+
226+
227+
## Limitations
228+
229+
### Batch dimensions
110230

111231
Currently this library only supports models that take inputs with a batch dimension. This is common across most modern models, but not always the case (e.g. GPT2).
112232

113-
#### Input Embeddings
233+
### Input Embeddings
114234

115235
This library only supports models that have a common interface to pass in embeddings, and generate outputs without sampling of the form:
116236

@@ -120,7 +240,7 @@ outputs = model(inputs_embeds=input_embeddings)
120240

121241
This format is common across HuggingFace models.
122242

123-
### GPU Acceleration
243+
## GPU Acceleration
124244

125245
To run the attribution process on a device of your choice, pass the device identifier into the `Attributor` class constructor:
126246

@@ -136,27 +256,6 @@ The device identifider must match the device used on the first embeddings layer
136256

137257
If no device is specified, the model device will be used by default.
138258

139-
### Logging
140-
141-
The library uses the `logging` module to log messages. You can configure the logging level via an optional argument in the `Attributor` class constructor:
142-
143-
```python
144-
import logging
145-
146-
attributor = Attributor(
147-
model=model,
148-
tokenizer=tokenizer,
149-
log_level=logging.INFO
150-
)
151-
```
152-
153-
### Cleaning Up
154-
155-
A convenience method is provided to clean up memory used by Python and Torch. This can be useful when running the library in a cloud notebook environment:
156-
157-
```python
158-
attributor.cleanup()
159-
```
160259

161260
## Development
162261

@@ -181,3 +280,6 @@ To run the integration tests:
181280
```bash
182281
python -m pytest tests/integration
183282
```
283+
284+
## Research
285+
Some preliminary exploration and research into using attribution and quantitatively measuring attribution success can be found in the research folder of this repository. We'd be excited to see expansion of this small library, including both algorithmic improvements, further attribution and perturbation methods, and more rigorous and exhaustive experimentation. We welcome pull requests and issues from external collaborators.

docs/assets/api-llm-attribution.png

289 KB
Loading
File renamed without changes.

0 commit comments

Comments
 (0)