Use OO to clearly define interfaces #193

abraemer · 2025-01-20T17:54:15Z

We could use classes to structure the interfaces between the file format frontends, the internal opossum model and the .opossum file model.

Depends on #190 #199

Sketch of architecture

File Frontend

This stage should contain only the model/logic for reading/parsing the respective input file format and the logic for how to map it to an Opossum. Each frontend should be its own submodule.

The interface of this stage could be as simple as (using ScanCode as an example throughout):

Each file format has a similarly named class e.g "ScanCodeFile"
This class can be constructed via a classmethod from_path(path_to_file)
It has one (public) method to_opossum() that return an instance of Opossum (the internal format)

Notes:

from_path(path_to_file) should be a very simply method. Basically just opening the file, reading its contents and using e.g. ScanCodeFile.validate(file_contents) to create the instance.
AFAIK the constructor should not be misappropriated for this task, so using a classmethod is preferred
This means that testing the file format is also rather easy: One can create instances of the class in other means and then run to_opossum() and compare the result to a reference Opossum object

Internal `Opossum`

The internal Opossum class (and related model classes) are the level where all the Opossum-related logic happen. As such it:

should be capable of holding all possible data that a .opossum file might contain.
should provide an easy-to-use interface for the file front ends to use to construct the instances (Maybe there could be more convenience function for construction?)
We should implement an (additional?) equality check between Opossum instances that makes sensible choices with respect to fields that it ignores. I.e. 2 Opossums are equivalent if they differ only in their IDs
for the externalAttributions. Having this check makes tests a lot more convient. Alternatively, we can always check for exact equivalence and control these IDs via another way.
will contain the logic needed to merge two instances (or maybe this lives outside the class? opossum1.merge_with(opossum2) vs. merge_opossums(opossum1, opossum2))
has a function to_opossum_file_format() which constructs the corresponding OpossumFile instance used for writing the data to file

Notes:

Opossum also contains a field for the output.json part of an .opossum file. However these will be None for all frontends except the opossum one.
Constructing this Opossum is a difficult task that involves many different object from opossum_model.py. Can we simplify this process somehow?

`OpossumFile` backend

This again is quite simple stage basically only consisting of the pydantic models required for modeling the file.
The OpossumFile class has one method save_to(output_path) that writes the final .opossum to the specified path.

Thoughts on `.opossum` input

There are 2 options:

having OpossumFile doubling as the frontend for .opossum files and thus giving it also a classmethod from_path and instance method to_opossum_file_format().
having a separate OpossumFileReader or similar just for the input part.

Advantages of 1. are:

It makes the round-trip OpossumFile -> Opossum -> OpossumFile or Opossum -> OpossumFile -> Opossum very easy to test (as it's basically free)

Disadvantages of 1. are:

The class might become a bit large? However writing is a very small function.

Advantages of 2. are:

clearer separation between code pieces that fulfill different tasks
One could even go as far and separate out the "frontend" part to its own folder like all the other frontends and perhaps copy the model definition. Then the repository would be separated in frontend/back submodules very strictly (however I think this is overengineered.).

Disadvantages of 2. are:

A bit harder round-trip when testing for consistency. The Opossum -> OpossumFileReader would involve writing to a temporary file at worst.
I think naming is a bit more awkward

Proposed pipeline in code

input_file = "some/path/to/a/file.json"
opossum = ScanCodeFile.from_path(input_file).to_opossum()
opossum_file = opossum.to_opossum_file_format()
output_path = "another/path/to/the/output.opossum"
opossum_file.save_to(output_path)

The text was updated successfully, but these errors were encountered:

This was referenced Jan 21, 2025

Introduce base model #195

Closed

Final: Switch flow #199

Closed

Refactor uses opossum model for opossum files #202

Merged

This was referenced Jan 27, 2025

Introduce new generator for testing ScanCode and migrate tests #204

Merged

Finalize switch flow #207

Merged

abraemer closed this as completed Jan 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use OO to clearly define interfaces #193

Use OO to clearly define interfaces #193

abraemer commented Jan 20, 2025 •

edited

Loading

Use OO to clearly define interfaces #193

Use OO to clearly define interfaces #193

Comments

abraemer commented Jan 20, 2025 • edited Loading

Sketch of architecture

File Frontend

Internal Opossum

OpossumFile backend

Thoughts on .opossum input

Proposed pipeline in code

abraemer commented Jan 20, 2025 •

edited

Loading

Internal `Opossum`

`OpossumFile` backend

Thoughts on `.opossum` input