Algorithm changes to handle output tokens shifting position in response to perturbation #8

jessicarumbelow · 2024-06-28T17:06:51Z

Algorithm changes to handle common instances where the output tokens change location in response to perturbation. For example:

"It's 9:47. How long until 10?" -> "13 minutes."

We perturb it (say, with word-wise fixed perturbation, in this case removing "It's"), and get:

"9:47. How long until 10?" -> "It's 13 minutes until 10."

Previously, this phrasing change would cause us to calculate attribution for "It's" in the input, based on the difference between the output tokens "13" and "It's". That doesn't make sense and was leading to instability and weird results.

I've changed it so that we look for matching tokens in the original vs perturbed output (and top 20 logprobs). This means we now compare the probabilities/cosine similarities of "13" and "13" instead of "13" and "It's".

This PR also includes a couple of bug fixes and some minor changes that seemed sensible to me, so I'm afraid it's a bit of a mixed bag.

Minor:

I renamed the repo to PIZZA, for obvious reasons.
I also did a fair bit of general tidying up for a general audience, e.g.:

Renamed APILLMAttributor to OpenAIAttributor, since that's the only API we support RN.
Removed local model passing to OpenAIAttributor – we actually only need the embeddings and tokenizer, so the previous use of the local model (and subsequent reliance on the embeddings being accessible with the default huggingface signature) was unnecessary. This way the user should be able to use any embedding matrix.
Made the OpenAI model selection an argument instead of hardcoded.
Made a new notebook for the blog post examples. This is paused until Andrew's improvements are ready.
Consolidated all example code into examples folder - previous was split between "research" and "examples".
Changed the debug logging from logging just the perturbed token to logging the entire perturbed input. This i feel is much more readable and makes interpreting the results more intuitive, as you can see the same input that the model did.
Changed some of the ordering in the readme to highlight the OpenAI attribution, since that's the most interesting part to my mind.
Changed tokenizer access in get_sentence_embeddings, for consistency.
Made colour map nicer (centered, visually divergent)
Changed default fixed perturbation substrate to "" empty string (since using the " " space token generally causes double spaces in perturbed inputs.)

Major:

Added "ignore_output_token_location" to make attribution work even if the output tokens change position.
Similarly, changed cosine-similarity attribution so it doesn't assume same output length and token position after perturbation.
Refactored cosine_similarity so it works on any input and output strings (rather than assuming openAI choices, and getting the strings from them).

Bug fixes:

Word-wise perturbation previously relied on the tokenizer adding leading spaces, which wasn't documented anywhere and is a weird gotcha for people in future who might want to use a different tokenizer that doesn't have an "add leading spaces" argument. I fixed it so that we don't assume the tokenizer does this and handle splitting words with spaces separately.
Fixed bug where it assumed cosine similarity (encoded by local tokenizer) would have same number of tokens as openai token output – this was causing stuff to get cut off in logging.

… Removed model passing to the OpenAIAttributor, using token_embeddings instead. Made openAI model an argument to Attributor instatiation.

… instead.

…oding.

… which was n't documented anywhere and is a weird gotcha. I fixed it so that we don't assume the tokenizer does this and handle splitting words with spaces separately.

… the output tokens change position. Works much better now with prob_diff metric.

…length and token position

…entence_embeddings for consistency

…y strings (rather than assuming openAI choices). Fixed bug where it assumed cosine similarity (encoded by gpt2 tokenizer) would have same number of tokens as openai token output – this was causing stuff to get cut off in logging.

…urbation substrate to ""

sebastian-sosa · 2024-07-02T11:43:08Z

There's a couple of files whose format doesn't adhere to what the ruff linter expects. Do you mind installing the ruff extension on VSCode and enabling it? So that it fixes styling errors on save.

It may also be good to install the vscode extension of Pylance.

robbiemccorkell

Some comments, mostly on style and bits in the README.

Also I think Seba said already, it looks like some linter/formatting errors have crept in, which probably means you don't have the Ruff extension installed.

This is partly my fault, because I didn't put instructions in the repo on how to set up the pre-commit which would have caught this. I also didn't add a formatting check to the pipeline. The linter is failing in CI though.

README.md

attribution/api_attribution.py

attribution/attribution_metrics.py

attribution/api_attribution.py

sebastian-sosa · 2024-07-02T13:28:13Z

attribution/api_attribution.py

+                        new_lp = all_top_logprobs[all_toks.index(otl.token)]
+
+                    else:
+                        new_lp = -100


We can move this number to a constant defined on top of the script.

sebastian-sosa · 2024-07-02T13:51:08Z

Example_PIZZA.ipynb

Should we move this to /examples?

Also, this may be a small nitpick, but given that python script names usually are lowercase and snake_case, maybe we can make it example_PIZZA.ipynb

sebastian-sosa · 2024-07-02T14:22:21Z

attribution/attribution_metrics.py

-    # Return difference in sentence similarity and token similarities
-    return self_similarity - sentence_similarity, 1 - token_similarities
+    cd = 1-cosine_similarity(original_output_emb, perturbed_output_emb)
+    token_distance = cd.min(axis=-1)    


What's the intuition behind how this would behave for longer inputs/outputs?

I'm thinking that with longer strings, the chances of having frequent tokens that don't add semantic meaning (the, a, etc) increase, therefore the chances of token_distance being low due to these tokens appearing at different places increases.

On a different note, do you mind explaining to me why we compute the min across axis=-1?

Is this dimension of length token_embeddings.shape[-1], so same as the vocabulary length? I may be wrong though.

Sure! So, cosine_similarity actually gives us a pairwise score between every unperturbed output embedding and every perturbed output embedding. So if unperturbed output embeddings is shape (10,768), and the perturbed is (6, 768), cosine_distance = 1-cosine_similarity(original_output_emb, perturbed_output_emb) gives us a matrix of shape (10, 6). And we want the minimum values for each unperturbed output embedding, so we take the min over the last dimension to give an attribution vector length 10 – i.e. one element for each token in the original unperturbed output.

Yeah, we can't really capture repetitions - it's not recognised as a change in the output, because we pick the lowest distance for each embedding. I'm not sure how to get around this though.

Thanks! That makes sense. Yeah we're computing the pairwise similarities instead of comparing elements in the same position as before. I think that's good although I agree it has some limitations -- repeated tokens and frequent tokens which appear twice for different reasons may affect the attribution score. I think we can leave it as is for now 👍🏼

robbiemccorkell

Approved from my side. Thanks for these changes

sebastian-sosa

Approved! I left a small comment on a duplicate notebook. Thanks for all these changes, it's looking really good.

sebastian-sosa · 2024-07-05T13:11:10Z

Example_PIZZA.ipynb

Thanks for moving this! Although I think a second copy was created, do you mind deleting Example_PIZZA.ipynb or merging it with examples/example_PIZZA.ipynb?

sebastian-sosa · 2024-07-05T14:05:22Z

attribution/attribution_metrics.py

-    # Return difference in sentence similarity and token similarities
-    return self_similarity - sentence_similarity, 1 - token_similarities
+    cd = 1-cosine_similarity(original_output_emb, perturbed_output_emb)
+    token_distance = cd.min(axis=-1)    


Thanks! That makes sense. Yeah we're computing the pairwise similarities instead of comparing elements in the same position as before. I think that's good although I agree it has some limitations -- repeated tokens and frequent tokens which appear twice for different reasons may affect the attribution score. I think we can leave it as is for now 👍🏼

jessicarumbelow added 18 commits June 25, 2024 12:18

renamed llm-attribution references to PIZZA

28555e3

chnaging image names to PIZZA

f9ccceb

Renamed APILLMAttributor to OpenAIAttributor to reflect openai focus.…

d6a821f

… Removed model passing to the OpenAIAttributor, using token_embeddings instead. Made openAI model an argument to Attributor instatiation.

removed model passing to the OpenAIAttributor, using token_embeddings…

c703b8c

… instead.

updating readme

79f8b9b

moved research notebooks into examples folder.

1f1c7b8

Added whole perturbed input logging, added skip special tokens on dec…

e84b756

…oding.

saved examples notebook

4a1e3f2

saved examples notebook WITHOUT API KEY

55e0bd5

fixed formatting

ed7899b

fixed formatting

9ea823f

Word-wise perturbation relied on the tokenizer adding leading spaces,…

eedf68b

… which was n't documented anywhere and is a weird gotcha. I fixed it so that we don't assume the tokenizer does this and handle splitting words with spaces separately.

Added "ignore_output_token_location" to make attribution work even if…

b384d03

… the output tokens change position. Works much better now with prob_diff metric.

Fixed cosine-similarity attribution so it doesn't assume same output …

1cbe975

…length and token position

removed token displacement metric, achanged tokenizer access in get_s…

0ead688

…entence_embeddings for consistency

fixed single embedding vector squeeze bug, changed default fixed pert…

1369a0c

…urbation substrate to ""

fiddling with examples

532201f

jessicarumbelow requested review from robbiemccorkell and sebastian-sosa June 28, 2024 17:06

jessicarumbelow changed the title ~~Jessica~~ Algorithm changes to handle output tokens shifting position in response to perturbation Jul 2, 2024

robbiemccorkell reviewed Jul 2, 2024

View reviewed changes

sebastian-sosa requested changes Jul 2, 2024

View reviewed changes

sebastian-sosa reviewed Jul 2, 2024

View reviewed changes

jessicarumbelow added 3 commits July 3, 2024 14:50

used ruff to fix formatting errors

fcd23c4

address Robbie's readme comments

db7f294

addressed PR comments - thanks guys

dda365b

jessicarumbelow requested a review from sebastian-sosa July 4, 2024 13:39

jessicarumbelow requested a review from robbiemccorkell July 4, 2024 13:39

robbiemccorkell added 4 commits July 5, 2024 10:49

move default openai model to default arg

e3b118f

upgrade required python version to satisfy torch sub dependency networkx

37e48fa

replace source code in readme with arguments docs

5286147

format readme

0c86801

robbiemccorkell approved these changes Jul 5, 2024

View reviewed changes

add readme line explaining acronym

49e6b5c

sebastian-sosa approved these changes Jul 5, 2024

View reviewed changes

deleted top level pizza

98aafd6

robbiemccorkell merged commit 79d4136 into main Jul 5, 2024
1 check passed

robbiemccorkell deleted the jessica branch July 5, 2024 14:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Algorithm changes to handle output tokens shifting position in response to perturbation #8

Algorithm changes to handle output tokens shifting position in response to perturbation #8

jessicarumbelow commented Jun 28, 2024 •

edited

Loading

sebastian-sosa commented Jul 2, 2024 •

edited

Loading

robbiemccorkell left a comment •

edited

Loading

sebastian-sosa Jul 2, 2024

sebastian-sosa Jul 2, 2024

sebastian-sosa Jul 2, 2024

sebastian-sosa Jul 2, 2024

jessicarumbelow Jul 4, 2024

jessicarumbelow Jul 4, 2024

sebastian-sosa Jul 5, 2024

robbiemccorkell left a comment

sebastian-sosa left a comment

sebastian-sosa Jul 5, 2024

sebastian-sosa Jul 5, 2024

Algorithm changes to handle output tokens shifting position in response to perturbation #8

Algorithm changes to handle output tokens shifting position in response to perturbation #8

Conversation

jessicarumbelow commented Jun 28, 2024 • edited Loading

sebastian-sosa commented Jul 2, 2024 • edited Loading

robbiemccorkell left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robbiemccorkell left a comment

Choose a reason for hiding this comment

sebastian-sosa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jessicarumbelow commented Jun 28, 2024 •

edited

Loading

sebastian-sosa commented Jul 2, 2024 •

edited

Loading

robbiemccorkell left a comment •

edited

Loading