Refactor to simplify scanresults model #214

abraemer · 2025-02-03T14:36:51Z

Summary of changes

slightly improve OpossumProvider to allow construction of Opossum where review_results=None
fix bug in ScanResults were the logic around the attribution keys assumed attribution_to_id to be a default_dict and would break otherwise.
change fields of ScanResults that were either list | None or dict | None to disallow None and use an empty container instead. This removes some null checks throughout the codebase and requires some adaption of the tests. Mainly to the round-trip of OpossumFileModel due to the normalization of the fields.

Context and reason for change

This PR aims to simplify the internal OpossumModel by making the types on the fields stricter. This makes the subsequent implementation of merging of opossums simpler.

* disallow `None` for fields that are containers. Use empty container instead. * Change converter from OpossumFileModel accordingly * Equality of OpossumFileModel in tests needs to respect that None might change to [] or {}

Hellgartner

Nice PR showcasing how good it is to decouple the working entities from the models directly associated to the file

The only one thing I would like to discuss is whether we should keep the change transparent for the end file, i.e. restore the Nones when converting back to the OpossumModel?
That would also mean we would avoid changing the tests

I did double check OpossumUI and from loading the data into OpossumUI there is no difference whether the entry is not part of the json at all or whether it is simply and empty dict/list.

However IMHO I would prefer keeping the .opossum files un-mutetated while processing.

src/opossum_lib/core/entities/scan_results.py

tests/core/entities/generators/opossum_provider.py

tests/core/entities/test_opossum.py

tests/data/expected_scancode.json

tests/input_formats/opossum/services/test_conversion_roundtrip.py

tests/shared/comparison_helpers.py

abraemer · 2025-02-05T12:49:03Z

The only one thing I would like to discuss is whether we should keep the change transparent for the end file, i.e. restore the Nones when converting back to the OpossumModel?

However IMHO I would prefer keeping the .opossum files un-mutetated while processing.

I think the main issue here is that we cannot distinguish whether the input was missing the field or had an empty container. Thus "keeping ... unmutated" is impossible in principle. We need to make an a priori arbitrary choice whether we want to write [] or skip the field.
Personally I like that the structure we write always has all the keys. That makes it easy for someone to look at and see everything that could be there without experimentation.

* rename ScanResults.get_attribution_key to clarify that it is private and can modify the internal state * move _assert_equal_or_both_falsy into the only file that uses it * rename tests for internal opossum model and make generation of attribution_to_id explicit

Hellgartner · 2025-02-05T13:49:03Z

Thus "keeping ... unmutated" is impossible in principle.

Agreed, however all files we have seen up to now adhered to the principle of just leaving empty fields out. So this is a break in procedure.

Hellgartner

Discussed non vs empty list and agreed to keep empty list

abraemer added 2 commits January 31, 2025 16:27

test: allow for empty review_result in OpossumProvider

f161f2c

fix: remove assumption of defaultdict in ScanResults

faa9e51

abraemer force-pushed the refactor-simplify-scanresults branch 2 times, most recently from a998764 to 6248152 Compare February 3, 2025 15:21

refactor: tighten types of ScanResults

0acbf7d

* disallow `None` for fields that are containers. Use empty container instead. * Change converter from OpossumFileModel accordingly * Equality of OpossumFileModel in tests needs to respect that None might change to [] or {}

abraemer force-pushed the refactor-simplify-scanresults branch from 6248152 to 0acbf7d Compare February 3, 2025 15:24

vasily-pozdnyakov assigned Hellgartner Feb 4, 2025

Hellgartner requested changes Feb 4, 2025

View reviewed changes

test: OpossumProvider sometimes generates review results per default

551e394

abraemer requested a review from Hellgartner February 5, 2025 14:31

Hellgartner approved these changes Feb 5, 2025

View reviewed changes

Hellgartner merged commit dca7a40 into main Feb 5, 2025
10 checks passed

Hellgartner deleted the refactor-simplify-scanresults branch February 5, 2025 16:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor to simplify scanresults model #214

Refactor to simplify scanresults model #214

abraemer commented Feb 3, 2025

Hellgartner left a comment

abraemer commented Feb 5, 2025

Hellgartner commented Feb 5, 2025 •

edited

Loading

Hellgartner left a comment

Refactor to simplify scanresults model #214

Refactor to simplify scanresults model #214

Conversation

abraemer commented Feb 3, 2025

Summary of changes

Context and reason for change

Hellgartner left a comment

Choose a reason for hiding this comment

abraemer commented Feb 5, 2025

Hellgartner commented Feb 5, 2025 • edited Loading

Hellgartner left a comment

Choose a reason for hiding this comment

Hellgartner commented Feb 5, 2025 •

edited

Loading