Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Algorithm changes to handle output tokens shifting position in response to perturbation #8

Merged
merged 27 commits into from
Jul 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
28555e3
renamed llm-attribution references to PIZZA
jessicarumbelow Jun 25, 2024
f9ccceb
chnaging image names to PIZZA
jessicarumbelow Jun 25, 2024
d6a821f
Renamed APILLMAttributor to OpenAIAttributor to reflect openai focus.…
jessicarumbelow Jun 25, 2024
c703b8c
removed model passing to the OpenAIAttributor, using token_embeddings…
jessicarumbelow Jun 25, 2024
79f8b9b
updating readme
jessicarumbelow Jun 25, 2024
1f1c7b8
moved research notebooks into examples folder.
jessicarumbelow Jun 25, 2024
e84b756
Added whole perturbed input logging, added skip special tokens on dec…
jessicarumbelow Jun 25, 2024
4a1e3f2
saved examples notebook
jessicarumbelow Jun 25, 2024
55e0bd5
saved examples notebook WITHOUT API KEY
jessicarumbelow Jun 25, 2024
ed7899b
fixed formatting
jessicarumbelow Jun 25, 2024
9ea823f
fixed formatting
jessicarumbelow Jun 25, 2024
eedf68b
Word-wise perturbation relied on the tokenizer adding leading spaces,…
jessicarumbelow Jun 25, 2024
b384d03
Added "ignore_output_token_location" to make attribution work even if…
jessicarumbelow Jun 25, 2024
1cbe975
Fixed cosine-similarity attribution so it doesn't assume same output …
jessicarumbelow Jun 26, 2024
0ead688
removed token displacement metric, achanged tokenizer access in get_s…
jessicarumbelow Jun 26, 2024
b9ca2ad
Made colour map nicer, refactored cosine_similarity so it works on an…
jessicarumbelow Jun 26, 2024
1369a0c
fixed single embedding vector squeeze bug, changed default fixed pert…
jessicarumbelow Jun 28, 2024
532201f
fiddling with examples
jessicarumbelow Jun 28, 2024
fcd23c4
used ruff to fix formatting errors
jessicarumbelow Jul 3, 2024
db7f294
address Robbie's readme comments
jessicarumbelow Jul 3, 2024
dda365b
addressed PR comments - thanks guys
jessicarumbelow Jul 4, 2024
e3b118f
move default openai model to default arg
robbiemccorkell Jul 5, 2024
37e48fa
upgrade required python version to satisfy torch sub dependency networkx
robbiemccorkell Jul 5, 2024
5286147
replace source code in readme with arguments docs
robbiemccorkell Jul 5, 2024
0c86801
format readme
robbiemccorkell Jul 5, 2024
49e6b5c
add readme line explaining acronym
robbiemccorkell Jul 5, 2024
98aafd6
deleted top level pizza
jessicarumbelow Jul 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .python-version
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3.9
3.10
163 changes: 84 additions & 79 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,35 +1,70 @@
# LLM Attribution Library

The LLM Attribution Library is designed to compute the contribution of each token in a prompt to the generated response of a language model.

It can be used with both local LLMs:

![Local LLM Attribution Table](docs/assets/local-llm-attribution.png)

And OpenAI LLMs accessible through an API:
![API-accessible LLM Attribution Table](docs/assets/api-llm-attribution.png)

## Index
- [Quickstart](#quickstart)
- [Requirements](#requirements)
- [Packaging](#packaging)
- [Linting](#linting)
- [Installation](#installation)
- [API Design](#api-design)
- [BaseLLMAttributor](#basellmattributor)
- [LocalLLMAttributor](#localllmattributor)
- [APILLMAttributor](#apillmattributor)
- [PerturbationStrategy and AttributionStrategy](#perturbationstrategy-and-attributionstrategy)
- [ExperimentLogger](#experimentlogger)
- [Limitations](#limitations)
- [Batch dimensions](#batch-dimensions)
- [Input Embeddings](#input-embeddings)
- [GPU Acceleration](#gpu-acceleration)
- [Development](#development)
- [Testing](#testing)
# PIZZA

[`P`rompt `I`nput `Z`onal `Z`? `A`ttribution analyzer](https://www.bmj.com/content/349/bmj.g7092)

PIZZA is an LLM Attribution Library designed to compute the contribution of each token in a prompt to the generated response of a language model.

It can be used with OpenAI LLMs accessible through an API:

![API-accessible LLM Attribution Table](docs/assets/api-PIZZA.png)

and local LLMs:

![Local LLM Attribution Table](docs/assets/local-PIZZA.png)

## Index

- [PIZZA](#pizza)
- [Index](#index)
- [Quickstart](#quickstart)
- [Requirements](#requirements)
- [Packaging](#packaging)
- [Linting](#linting)
- [Installation](#installation)
- [API Design](#api-design)
- [BaseLLMAttributor](#basellmattributor)
- [LocalLLMAttributor](#localllmattributor)
- [Cleaning Up](#cleaning-up)
- [OpenAIAttributor](#openaiattributor)
- [PerturbationStrategy and AttributionStrategy](#perturbationstrategy-and-attributionstrategy)
- [ExperimentLogger](#experimentlogger)
- [Limitations](#limitations)
- [Batch dimensions](#batch-dimensions)
- [Input Embeddings](#input-embeddings)
- [GPU Acceleration](#gpu-acceleration)
- [Development](#development)
- [Testing](#testing)
- [Research](#research)

## Quickstart

Attrubution via OpenAI's API:

**!!! This will use API credits !!!**

```python
from attribution.api_attribution import OpenAIAttributor
from attribution.experiment_logger import ExperimentLogger
from attribution.token_perturbation import NthNearestPerturbationStrategy

# set your "OPENAI_API_KEY" environment variable to your openai API key, or pass it here:
attributor = OpenAIAttributor(openai_api_key=YOUR_OPENAI_API_KEY)
logger = ExperimentLogger()

input_text = "The clock shows 9:47 PM. How many minutes 'til 10?"
attributor.compute_attributions(
input_text,
perturbation_strategy=NthNearestPerturbationStrategy(n=-1),
attribution_strategies=["cosine", "prob_diff"],
logger=logger,
perturb_word_wise=True,
)

logger.print_sentence_attribution()
logger.print_attribution_matrix(exp_id=1)

```

Example of gradient-based attrubution using gemma-2b locally:

```python
Expand All @@ -55,31 +90,7 @@ attributor.print_attributions(
)
```

Attrubution using GPT-3 via OpenAI's API:

```python
from attribution.api_attribution import APILLMAttributor
from attribution.experiment_logger import ExperimentLogger
from attribution.token_perturbation import NthNearestPerturbationStrategy

attributor = APILLMAttributor()
logger = ExperimentLogger()

input_text = "The clock shows 9:47 PM. How many minutes 'til 10?"
attributor.compute_attributions(
input_text,
perturbation_strategy=NthNearestPerturbationStrategy(n=-1),
attribution_strategies=["cosine", "prob_diff", "token_displacement"],
logger=logger,
perturb_word_wise=True,
)

logger.print_sentence_attribution()
logger.print_attribution_matrix(exp_id=1)

```

Usage examples can be found in the `examples/` folder a more in-depth case study in `research/gemma-2b-case-study.ipynb`.
Usage examples can be found in the `examples/` folder.

## Requirements

Expand All @@ -96,13 +107,13 @@ This project uses [ruff](https://github.com/astral-sh/ruff) for linting and form
1. First, clone the repository:

```bash
git clone git@github.com:leap-laboratories/llm-attribution.git
git clone git@github.com:leap-laboratories/PIZZA.git
```

2. Navigate into the cloned directory:

```bash
cd llm-attribution
cd PIZZA
```

3. Create a virtual environment and activate it:
Expand All @@ -119,19 +130,20 @@ uv pip install -r requirements.txt
```

Or to install the dependencies needed to do development on the library:

```bash
uv pip install -r requirements-dev.txt
```

Now, you should be able to import and use the library in your Python scripts.

5. To update the lock files that specify the dependency versions:

```bash
uv pip compile requirements.in -o requirements.txt
uv pip compile requirements-dev.in -o requirements-dev.txt
```


## API Design

The attributors are designed to compute the contribution made by each token in an input string to the tokens generated by a language model.
Expand Down Expand Up @@ -174,7 +186,6 @@ The `compute_attributions` method generates tokens from the input string and com

`LocalLLMAttributor` uses gradient-based attribution to quantify the influence of input tokens on the output of a model. For each output token, it computes the gradients with respect to the input embeddings. The L1 norm of these gradients is then used as the attribution score, representing the total influence of each input token on the output


#### Cleaning Up

A convenience method is provided to clean up memory used by Python and Torch. This can be useful when running the library in a cloud notebook environment:
Expand All @@ -183,47 +194,41 @@ A convenience method is provided to clean up memory used by Python and Torch. Th
local_attributor.cleanup()
```

### APILLMAttributor
### OpenAIAttributor

`APILLMAttributor` uses the OpenAI API to compute attributions. Given that gradients are not accessible, the attributor perturbs the input with a given `PerturbationStrategy` and measures the magnitude of change of the generated output with an `attribution_strategy`.
`OpenAIAttributor` uses the OpenAI API to compute attributions. Given that gradients are not accessible, the attributor perturbs the input with a given `PerturbationStrategy` and measures the magnitude of change of the generated output with an `attribution_strategy`.

The `compute_attributions` method:

1. Sends a chat completion request to the OpenAI API.
2. Uses a `PerturbationStrategy` to modify the input prompt, and sends the perturbed input to OpenAI's API to generate a perturbed output. Each token of the input prompt is perturbed separately, to obtain an attribution score for each input token.
3. Uses an `attribution_strategy` to compute the magnitude of change between the original and perturbed output.
4. Logs attribution scores to an `ExperimentLogger` if passed.

```python
class APILLMAttributor(BaseLLMAttributor):
def __init__(
self,
model: Optional[PreTrainedModel] = None,
tokenizer: Optional[PreTrainedTokenizer] = None,
token_embeddings: Optional[np.ndarray] = None,
):
...
def compute_attributions(self, input_text: str, **kwargs):
...
```
**Initialization Parameters:**

- `openai_api_key` (`Optional[str]`): Your OpenAI API key. If not provided, the class attempts to retrieve the API key from the `OPENAI_API_KEY` environment variable.
- `openai_model` (`Optional[str]`): The identifier for the OpenAI model you wish to use. If not specified, a default model is used. This allows for flexibility in choosing between different models for different tasks or preferences.
- Default: gpt-3.5-turbo
- `tokenizer` (`Optional[PreTrainedTokenizer]`): An instance of a tokenizer compatible with the chosen model. If not provided, the class defaults to using the `GPT2Tokenizer` with the "gpt2" model, suitable for general-purpose text processing.
- `token_embeddings` (`Optional[np.ndarray]`): Pre-computed embeddings for tokens. If not provided, the class will default to using embeddings from the `GPT2LMHeadModel` for the "gpt2" model. This is useful for advanced use cases where custom or pre-computed embeddings are desired.

### PerturbationStrategy and AttributionStrategy

`PerturbationStrategy` is an abstract base class that defines the interface for all perturbation strategies. It declares the `get_replacement_token` method, which must be implemented by any concrete perturbation strategy class. This method takes a token id and returns a replacement token id.

The `attribution_strategy` parameter is a string that specifies the method to use for computing attributions. The available strategies are "cosine", "prob_diff", and "token_displacement".
The `attribution_strategy` parameter is a string that specifies the method to use for computing attributions. The available strategies are "cosine" and "prob_diff".

- **Cosine Similarity Attribution**: Measures the cosine similarity between the embeddings of the original and perturbed outputs. The embeddings are obtained from a pre-trained model (e.g. GPT2). The cosine similarity is calculated for each pair of tokens in the same position on the original and perturbed outputs. For example, it compares the token in position 0 in the original response to the token in position 0 in the perturbed response. Additionally, the cosine similarity of the entire sentence embeddings is computed. The difference in sentence similarity and token similarities are returned.

- **Probability Difference Attribution**: Calculates the absolute difference in probabilities for each token in the original and perturbed outputs. The probabilities are obtained from the `top_logprobs` field of the tokens, which contains the most likely tokens for each token position. The mean of these differences is returned, as well as the probability difference for each token position.

- **Token Displacement Attribution**: Calculates the displacement of each token in the original output within the perturbed output's `top_logprobs` predicted tokens. The `top_logprobs` field contains the most likely tokens for each token position. If a token from the original output is not found in the `top_logprobs` of the perturbed output, a maximum displacement value is assigned. The mean of these displacements is returned, as well as the displacement of each original output token.


### ExperimentLogger

The `ExperimentLogger` class is used to log the results of different experiment runs. It provides methods for starting and stopping an experiment, logging the input and output tokens, and logging the attribution scores. The `api_llm_attribution.ipynb` notebook shows an example of how to use `ExperimentLogger` to compare the results of different attribution strategies.


## Limitations

### Batch dimensions
Expand All @@ -242,7 +247,7 @@ This format is common across HuggingFace models.

## GPU Acceleration

To run the attribution process on a device of your choice, pass the device identifier into the `Attributor` class constructor:
To run the attribution process on a device of your choice (for local attribution), pass the device identifier into the `Attributor` class constructor:

```python
attributor = Attributor(
Expand All @@ -256,7 +261,6 @@ The device identifider must match the device used on the first embeddings layer

If no device is specified, the model device will be used by default.


## Development

To contribute to the library, you will need to install the development requirements:
Expand All @@ -281,5 +285,6 @@ To run the integration tests:
python -m pytest tests/integration
```

## Research
Some preliminary exploration and research into using attribution and quantitatively measuring attribution success can be found in the research folder of this repository. We'd be excited to see expansion of this small library, including both algorithmic improvements, further attribution and perturbation methods, and more rigorous and exhaustive experimentation. We welcome pull requests and issues from external collaborators.
## Research

Some preliminary exploration and research into using attribution and quantitatively measuring attribution success can be found in the examples folder of this repository. We'd be excited to see expansion of this small library, including both algorithmic improvements, further attribution and perturbation methods, and more rigorous and exhaustive experimentation. We welcome pull requests and issues from external collaborators.
Loading
Loading