Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use OO to clearly define interfaces #193

Closed
abraemer opened this issue Jan 20, 2025 · 0 comments
Closed

Use OO to clearly define interfaces #193

abraemer opened this issue Jan 20, 2025 · 0 comments

Comments

@abraemer
Copy link
Contributor

abraemer commented Jan 20, 2025

We could use classes to structure the interfaces between the file format frontends, the internal opossum model and the .opossum file model.

Depends on #190 #199

Sketch of architecture

File Frontend

This stage should contain only the model/logic for reading/parsing the respective input file format and the logic for how to map it to an Opossum. Each frontend should be its own submodule.

The interface of this stage could be as simple as (using ScanCode as an example throughout):

  1. Each file format has a similarly named class e.g "ScanCodeFile"
  2. This class can be constructed via a classmethod from_path(path_to_file)
  3. It has one (public) method to_opossum() that return an instance of Opossum (the internal format)

Notes:

  • from_path(path_to_file) should be a very simply method. Basically just opening the file, reading its contents and using e.g. ScanCodeFile.validate(file_contents) to create the instance.
  • AFAIK the constructor should not be misappropriated for this task, so using a classmethod is preferred
  • This means that testing the file format is also rather easy: One can create instances of the class in other means and then run to_opossum() and compare the result to a reference Opossum object

Internal Opossum

The internal Opossum class (and related model classes) are the level where all the Opossum-related logic happen. As such it:

  • should be capable of holding all possible data that a .opossum file might contain.
  • should provide an easy-to-use interface for the file front ends to use to construct the instances (Maybe there could be more convenience function for construction?)
  • We should implement an (additional?) equality check between Opossum instances that makes sensible choices with respect to fields that it ignores. I.e. 2 Opossums are equivalent if they differ only in their IDs
    for the externalAttributions. Having this check makes tests a lot more convient. Alternatively, we can always check for exact equivalence and control these IDs via another way.
  • will contain the logic needed to merge two instances (or maybe this lives outside the class? opossum1.merge_with(opossum2) vs. merge_opossums(opossum1, opossum2))
  • has a function to_opossum_file_format() which constructs the corresponding OpossumFile instance used for writing the data to file

Notes:

  • Opossum also contains a field for the output.json part of an .opossum file. However these will be None for all frontends except the opossum one.
  • Constructing this Opossum is a difficult task that involves many different object from opossum_model.py. Can we simplify this process somehow?

OpossumFile backend

This again is quite simple stage basically only consisting of the pydantic models required for modeling the file.
The OpossumFile class has one method save_to(output_path) that writes the final .opossum to the specified path.

Thoughts on .opossum input

There are 2 options:

  1. having OpossumFile doubling as the frontend for .opossum files and thus giving it also a classmethod from_path and instance method to_opossum_file_format().
  2. having a separate OpossumFileReader or similar just for the input part.

Advantages of 1. are:

  • It makes the round-trip OpossumFile -> Opossum -> OpossumFile or Opossum -> OpossumFile -> Opossum very easy to test (as it's basically free)

Disadvantages of 1. are:

  • The class might become a bit large? However writing is a very small function.

Advantages of 2. are:

  • clearer separation between code pieces that fulfill different tasks
  • One could even go as far and separate out the "frontend" part to its own folder like all the other frontends and perhaps copy the model definition. Then the repository would be separated in frontend/back submodules very strictly (however I think this is overengineered.).

Disadvantages of 2. are:

  • A bit harder round-trip when testing for consistency. The Opossum -> OpossumFileReader would involve writing to a temporary file at worst.
  • I think naming is a bit more awkward

Proposed pipeline in code

input_file = "some/path/to/a/file.json"
opossum = ScanCodeFile.from_path(input_file).to_opossum()
opossum_file = opossum.to_opossum_file_format()
output_path = "another/path/to/the/output.opossum"
opossum_file.save_to(output_path)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant