Skip to content

Commit 23b8c81

Browse files
authored
Merge pull request #250 from opossum-tool/docs-architecture
docs: add documentation about the structure/architecture of the code
2 parents 36d5e3d + 95f513a commit 23b8c81

File tree

5 files changed

+52
-0
lines changed

5 files changed

+52
-0
lines changed

README.md

+4
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,10 @@ To execute the code directly (i.e. without building it), use
9595
uv run opossum-file [OPTIONS] COMMAND [ARGS]...
9696
```
9797

98+
## Architecture
99+
100+
The architecture of the code is described in [a separate document](docs/architecture.md).
101+
98102
## Code quality tooling
99103

100104
To lint and test your changes, run

docs/architecture.md

+45
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
<!--
2+
SPDX-FileCopyrightText: TNG Technology Consulting GmbH <https://www.tngtech.com>
3+
4+
SPDX-License-Identifier: Apache-2.0
5+
-->
6+
7+
# Architecture of `opossum-file`
8+
9+
The `opossum-file` package is composed of three primary components:
10+
11+
1. [**Input format readers**](#input-format-readers): responsible for reading and converting different input file formats into the internal `Opossum` representation.
12+
1. [**Internal representation**](#internal-representation-of-opossum-files): a data structure used for all operations on `opossum` files, such as merging, which provides an easier-to-work-with format than the on-disk representation.
13+
1. [**On-disk representation**](#on-disk-opossum-format): the format used to save `opossum` files to disk, defined using `pydantic`.
14+
15+
![Architecture diagram](opossum-file-architecture.png)
16+
17+
The following sections provide a detailed overview of each component.
18+
19+
## Input format readers
20+
21+
`opossum-file` supports multiple input file formats, which are converted into the internal `Opossum` representation before further processing. This conversion is facilitated by the `InputReader` interface, which consists of a single method `read() -> Opossum`. The file path is set via the constructor, a complete invocation example is `ScancodeFileReader(path).read()`.
22+
23+
### Adding a New Input File Reader
24+
25+
To add support for a new input file format, follow these steps:
26+
27+
1. Create a new subfolder in `src/opossum_lib/input_formats/<format_name>`.
28+
1. Define the schema using `pydantic` (if applicable) in `<format_name>/entities`.
29+
1. Implement the conversion from the new schema to `Opossum` in `<format_name>/services/convert_to_opossum.py`.
30+
1. Write tests for the new format, mirroring the folder structure in `tests`.
31+
1. Create a subclass of `InputReader` with a `pathlib.Path` constructor and an instance method `.read()` returning an `Opossum` instance.
32+
1. Integrate the new reader with the CLI by adding a new argument in `src/opossum_lib/cli.py`, using existing arguments as a blueprint.
33+
34+
## Internal Representation of Opossum Files
35+
36+
All operations on `opossum` files, such as merging, are performed using an internal representation of the data. This representation differs from the on-disk format in two key aspects:
37+
38+
- The `resourcesToAttribution` join map is resolved by inlining attributions from `externalAttributions` into the corresponding resources.
39+
- The folder structure defined by `resources` is reflected by resources directly containing their child resources.
40+
41+
This data structure ensures consistency between `resources`, `resourcesToAttribution`, and `externalAttributions` without requiring updates to multiple locations.
42+
43+
## On-Disk Opossum Format
44+
45+
The on-disk format of `opossum` files is defined using [`pydantic`](https://docs.pydantic.dev/latest/) in [`opossum_file_model.py`](../src/opossum_lib/shared/entities/opossum_file_model.py). The conversion between the internal `Opossum` and `OpossumFileModel` is implemented in [`convert_to_opossum.py`](src/opossum_lib/input_formats/opossum/services/convert_to_opossum.py), following the same structure as other input file formats. To write an instance of `OpossumFileModel` to file, use the `write_opossum_file` function from [`write_opossum_file.py`](../src/opossum_lib/core/services/write_opossum_file.py).

docs/opossum-file-architecture.png

47.3 KB
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
SPDX-FileCopyrightText: TNG Technology Consulting GmbH <https://www.tngtech.com>
2+
3+
SPDX-License-Identifier: Apache-2.0

src/opossum_lib/shared/services/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)