Feature/implement the fuzz tests in robustness #1190

chakravarthik27 · 2025-03-25T13:50:53Z

Harness Setup:

from langtest import Harness 

harness = Harness(
    task="question-answering",
    model={
        "model": "llama3.1",
        "hub": "ollama",
        "type": "chat",
    },
    data={
        "data_source": "MedQA",
        "split": "test-tiny",
    },
    config={
        "tests": {
            "defaults": {
                "min_pass_rate": 0.5,
            },
            "clinical": {
                "medfuzz": {
                    "min_pass_rate": 0.1,
                    "attacker_llm": {
                        "model": "gpt-4o-mini",
                        "hub": "openai",
                        "type": "chat",
                    },                   
                }
            }
        }
    }
)

harness.generate().run().report()

…sks; implement AttackerLLM for adversarial learning in exam questions

…sage handling and reasoning prompt generation

…s; add MedFuzzSample for improved data handling

…d improved response handling

Copilot

Pull Request Overview

This PR implements fuzz tests for robustness by introducing the MedFuzz feature. Key changes include:

Refactoring CSV loading in utils.py to download remote files.
Creating new LLM interaction classes (TargetLLM, AttackerLLM) and a MedFuzz class in clinical.py for processing clinical samples.
Extending sample data types with HTML highlighting to display differences for MedFuzz samples.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
langtest/transform/utils.py	Refactored load_csv to download files; added TargetLLM and AttackerLLM classes.
langtest/transform/clinical.py	Introduced MedFuzz class and integrated LLM interactions for fuzz testing.
langtest/utils/custom_types/helpers.py	Added highlight_differences_both function for generating HTML diff highlights.
langtest/utils/custom_types/sample.py	Added MedFuzzSample subclass that overrides to_dict to incorporate HTML diff highlighting.

Comments suppressed due to low confidence (3)

langtest/transform/utils.py:574

The new CSV downloading mechanism uses requests.get on the 'filepath' variable assuming it is a URL. Consider validating the input or adding error handling to ensure that non-URL file paths are managed appropriately.

# save the csv file into `~/.langtest/` directory

langtest/transform/utils.py:1052

The error message here is unclear. Consider rephrasing it to clearly indicate the unsupported configuration in the LLM client.

raise TypeError("Unsupported hub and model and Only LLM")

langtest/transform/clinical.py:981

Slicing the joined expected_results with [:1] might unintentionally truncate the value. Verify that this behavior is intended, or adjust to preserve the full expected result as needed.

med_sample.expected_results = "".join(map(str, med_sample.expected_results))[:1]

…andling in MedFuzz transformation

…for original and perturbed data

chakravarthik27 added 5 commits March 20, 2025 10:42

fix: implement CSV file download and save functionality in DataRetriever

a7fb936

feat: add MedFuzz class for question-answering and text-generation ta…

8866c3b

…sks; implement AttackerLLM for adversarial learning in exam questions

feat: add TargetLLM class for clinical decision-making; implement mes…

c5c8a0b

…sage handling and reasoning prompt generation

feat: enhance MedFuzz class with transformation and processing method…

62696e2

…s; add MedFuzzSample for improved data handling

feat: enhance MedFuzz and TargetLLM classes with progress tracking an…

d505c48

…d improved response handling

chakravarthik27 self-assigned this Mar 25, 2025

chakravarthik27 requested a review from Copilot March 25, 2025 15:20

Copilot AI reviewed Mar 25, 2025

View reviewed changes

feat: enhance model handling with TypeVar support and improve error h…

901f637

…andling in MedFuzz transformation

chakravarthik27 linked an issue Apr 4, 2025 that may be closed by this pull request

Implement the Fuzz Tests in Robustness #1189

Closed

feat: enhance MedFuzzSample class with improved context highlighting …

993829d

…for original and perturbed data

chakravarthik27 requested a review from Prikshit7766 April 8, 2025 06:19

Prikshit7766 approved these changes Apr 8, 2025

View reviewed changes

chakravarthik27 merged commit 034d18d into release/2.7.0 Apr 8, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/implement the fuzz tests in robustness #1190

Feature/implement the fuzz tests in robustness #1190

chakravarthik27 commented Mar 25, 2025 •

edited

Loading

Copilot AI left a comment

Feature/implement the fuzz tests in robustness #1190

Feature/implement the fuzz tests in robustness #1190

Conversation

chakravarthik27 commented Mar 25, 2025 • edited Loading

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

chakravarthik27 commented Mar 25, 2025 •

edited

Loading