You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This stage should contain only the model/logic for reading/parsing the respective input file format and the logic for how to map it to an Opossum. Each frontend should be its own submodule.
The interface of this stage could be as simple as (using ScanCode as an example throughout):
Each file format has a similarly named class e.g "ScanCodeFile"
This class can be constructed via a classmethod from_path(path_to_file)
It has one (public) method to_opossum() that return an instance of Opossum (the internal format)
Notes:
from_path(path_to_file) should be a very simply method. Basically just opening the file, reading its contents and using e.g. ScanCodeFile.validate(file_contents) to create the instance.
AFAIK the constructor should not be misappropriated for this task, so using a classmethod is preferred
This means that testing the file format is also rather easy: One can create instances of the class in other means and then run to_opossum() and compare the result to a reference Opossum object
Internal Opossum
The internal Opossum class (and related model classes) are the level where all the Opossum-related logic happen. As such it:
should be capable of holding all possible data that a .opossum file might contain.
should provide an easy-to-use interface for the file front ends to use to construct the instances (Maybe there could be more convenience function for construction?)
We should implement an (additional?) equality check between Opossum instances that makes sensible choices with respect to fields that it ignores. I.e. 2 Opossums are equivalent if they differ only in their IDs
for the externalAttributions. Having this check makes tests a lot more convient. Alternatively, we can always check for exact equivalence and control these IDs via another way.
will contain the logic needed to merge two instances (or maybe this lives outside the class? opossum1.merge_with(opossum2) vs. merge_opossums(opossum1, opossum2))
has a function to_opossum_file_format() which constructs the corresponding OpossumFile instance used for writing the data to file
Notes:
Opossum also contains a field for the output.json part of an .opossum file. However these will be None for all frontends except the opossum one.
Constructing this Opossum is a difficult task that involves many different object from opossum_model.py. Can we simplify this process somehow?
OpossumFile backend
This again is quite simple stage basically only consisting of the pydantic models required for modeling the file.
The OpossumFile class has one method save_to(output_path) that writes the final .opossum to the specified path.
Thoughts on .opossum input
There are 2 options:
having OpossumFile doubling as the frontend for .opossum files and thus giving it also a classmethod from_path and instance method to_opossum_file_format().
having a separate OpossumFileReader or similar just for the input part.
Advantages of 1. are:
It makes the round-trip OpossumFile -> Opossum -> OpossumFile or Opossum -> OpossumFile -> Opossum very easy to test (as it's basically free)
Disadvantages of 1. are:
The class might become a bit large? However writing is a very small function.
Advantages of 2. are:
clearer separation between code pieces that fulfill different tasks
One could even go as far and separate out the "frontend" part to its own folder like all the other frontends and perhaps copy the model definition. Then the repository would be separated in frontend/back submodules very strictly (however I think this is overengineered.).
Disadvantages of 2. are:
A bit harder round-trip when testing for consistency. The Opossum -> OpossumFileReader would involve writing to a temporary file at worst.
We could use classes to structure the interfaces between the file format frontends, the internal opossum model and the .opossum file model.
Depends on #190 #199
Sketch of architecture
File Frontend
This stage should contain only the model/logic for reading/parsing the respective input file format and the logic for how to map it to an
Opossum
. Each frontend should be its own submodule.The interface of this stage could be as simple as (using ScanCode as an example throughout):
from_path(path_to_file)
to_opossum()
that return an instance ofOpossum
(the internal format)Notes:
from_path(path_to_file)
should be a very simply method. Basically just opening the file, reading its contents and using e.g.ScanCodeFile.validate(file_contents)
to create the instance.to_opossum()
and compare the result to a referenceOpossum
objectInternal
Opossum
The internal
Opossum
class (and related model classes) are the level where all the Opossum-related logic happen. As such it:Opossum
instances that makes sensible choices with respect to fields that it ignores. I.e. 2Opossum
s are equivalent if they differ only in their IDsfor the
externalAttributions
. Having this check makes tests a lot more convient. Alternatively, we can always check for exact equivalence and control these IDs via another way.opossum1.merge_with(opossum2)
vs.merge_opossums(opossum1, opossum2)
)to_opossum_file_format()
which constructs the correspondingOpossumFile
instance used for writing the data to fileNotes:
Opossum
also contains a field for theoutput.json
part of an.opossum
file. However these will beNone
for all frontends except the opossum one.Opossum
is a difficult task that involves many different object fromopossum_model.py
. Can we simplify this process somehow?OpossumFile
backendThis again is quite simple stage basically only consisting of the pydantic models required for modeling the file.
The
OpossumFile
class has one methodsave_to(output_path)
that writes the final.opossum
to the specified path.Thoughts on
.opossum
inputThere are 2 options:
OpossumFile
doubling as the frontend for.opossum
files and thus giving it also a classmethodfrom_path
and instance methodto_opossum_file_format()
.OpossumFileReader
or similar just for the input part.Advantages of 1. are:
OpossumFile -> Opossum -> OpossumFile
orOpossum -> OpossumFile -> Opossum
very easy to test (as it's basically free)Disadvantages of 1. are:
Advantages of 2. are:
Disadvantages of 2. are:
Opossum -> OpossumFileReader
would involve writing to a temporary file at worst.Proposed pipeline in code
The text was updated successfully, but these errors were encountered: