The opossum-file
package is composed of three primary components:
- Input format readers: responsible for reading and converting different input file formats into the internal
Opossum
representation. - Internal representation: a data structure used for all operations on
opossum
files, such as merging, which provides an easier-to-work-with format than the on-disk representation. - On-disk representation: the format used to save
opossum
files to disk, defined usingpydantic
.
The following sections provide a detailed overview of each component.
opossum-file
supports multiple input file formats, which are converted into the internal Opossum
representation before further processing. This conversion is facilitated by the InputReader
interface, which consists of a single method read() -> Opossum
. The file path is set via the constructor, a complete invocation example is ScancodeFileReader(path).read()
.
To add support for a new input file format, follow these steps:
- Create a new subfolder in
src/opossum_lib/input_formats/<format_name>
. - Define the schema using
pydantic
(if applicable) in<format_name>/entities
. - Implement the conversion from the new schema to
Opossum
in<format_name>/services/convert_to_opossum.py
. - Write tests for the new format, mirroring the folder structure in
tests
. - Create a subclass of
InputReader
with apathlib.Path
constructor and an instance method.read()
returning anOpossum
instance. - Integrate the new reader with the CLI by adding a new argument in
src/opossum_lib/cli.py
, using existing arguments as a blueprint.
All operations on opossum
files, such as merging, are performed using an internal representation of the data. This representation differs from the on-disk format in two key aspects:
- The
resourcesToAttribution
join map is resolved by inlining attributions fromexternalAttributions
into the corresponding resources. - The folder structure defined by
resources
is reflected by resources directly containing their child resources.
This data structure ensures consistency between resources
, resourcesToAttribution
, and externalAttributions
without requiring updates to multiple locations.
The on-disk format of opossum
files is defined using pydantic
in opossum_file_model.py
. The conversion between the internal Opossum
and OpossumFileModel
is implemented in convert_to_opossum.py
, following the same structure as other input file formats. To write an instance of OpossumFileModel
to file, use the write_opossum_file
function from write_opossum_file.py
.