Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip for LoRA modules #3331

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

nikita-malininn
Copy link
Collaborator

@nikita-malininn nikita-malininn commented Mar 5, 2025

Changes

  • Added strip method for LoRA modules

Reason for changes

  • To able IR model conversion

Related tickets

  • 159708

Tests

  • Updated

On top of #3322

@github-actions github-actions bot added NNCF PT Pull requests that updates NNCF PyTorch NNCF Common Pull request that updates NNCF Common experimental NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PTQ Pull requests that updates NNCF PTQ API Public API-impacting changes labels Mar 5, 2025
),
ids=["asym", "sym"],
)
def test_fq_lora_tuning(mode, backup_mode, compression_kwargs, _seed):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest extending this test by calling strip and exporting to OV and checking similarity for OV model (you can pre-compute similarity for "stripped to float" model or do it in the test)

result_dtype=original_dtype,
)

elif isinstance(quantizer, SymmetricLoraQuantizer):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ordinary FQ should be also supported here, since some layers can be selected to INT8 (first/last or by mixed precision), and they will be represented by ordinary FQ w/o LoRA.
You can check number of u8/u4 constants after export to OV.

original_shape = original_weight.shape
original_eps = torch.finfo(original_dtype).eps

# Quantize-dequantize using universal quantization formula
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd reference to the markdown with this “universal”, otherwise it can be not clear what you mean here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to include a note explaining why the weights are not directly quantized. Please mention that this approach is necessary to prevent floating-point errors that can occur due to the different order of operations during quantization when using Torch for tuning and OpenVINO (OV) for inference.

@github-actions github-actions bot removed NNCF Common Pull request that updates NNCF Common experimental NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PTQ Pull requests that updates NNCF PTQ labels Mar 10, 2025
@nikita-malininn nikita-malininn marked this pull request as ready for review March 11, 2025 16:49
@nikita-malininn nikita-malininn requested a review from a team as a code owner March 11, 2025 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Public API-impacting changes Code Freeze NNCF PT Pull requests that updates NNCF PyTorch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants