Introducing Model object (the first iteration) #118

asofter · 2024-03-14T15:38:05Z

Description

This is the first iteration of introducing Model object that will be used by each scanner instead of relying on data and source. That way, we could re-use file reading and cache variables and provide context.

This Model uses context manager to open a stream of bytes and pass them to scanners separately instead of relying on scanners to handle it. They still can just scan in a custom way.

As this refactoring might take time, I decided to split it into multiple PRs, and in this PR, I used the same approach but with passing Model.

* use data stream instead of opening in each scanner

asofter · 2024-03-15T11:22:05Z

modelscan/model.py

+        if self._stream and self._source_file_used:
+            self._stream.close()
+
+    def __enter__(self) -> "Model":


Use contextmanager

asofter · 2024-03-15T11:22:49Z

modelscan/modelscan.py

+                    "ModelScan", ErrorCategories.PATH, "Path is not valid", str(path)
+                )
+            )
+        except ModelIsDir:


There is no scanner using folder scanning, so we can do it as a fallback if Path upstream is not file

asofter · 2024-03-15T11:23:26Z

modelscan/scanners/h5/scan.py

@@ -46,33 +45,16 @@ def scan(
                [],
            )

-        if data:
-            logger.warning(
-                f"{self.full_name()} got data bytes. It only support direct file scanning."


It supports also bytes scanning

asofter · 2024-03-19T20:07:00Z

modelscan/modelscan.py

+
+        for file in files:
+            with Model(file) as model:
+                yield model


always scan the main file

# Conflicts: # modelscan/modelscan.py # modelscan/scanners/pickle/scan.py

iamfaisalkhan

LGTM. Added some optional comments, we can address them in follow-up PR.

iamfaisalkhan · 2024-03-19T22:28:43Z

modelscan/scanners/h5/scan.py

        # Todo: source isn't guaranteed to be a file

-        with h5py.File(source, "r") as model_hdf5:
+        with h5py.File(model.get_stream(), "r") as model_hdf5:


Optional: Do we need 'r' since it is an open stream?

iamfaisalkhan · 2024-03-19T22:33:33Z

tests/test_modelscan.py

+            for skipped_file in results["summary"]["skipped"]["skipped_files"]
+        ]
+    ) == {
+        "safe_zip_pytorch.pt",


This file is being skipped since we are unrolling the .pt file? In that case this being in skipped file is confusing.

iamfaisalkhan · 2024-03-21T14:49:33Z

tests/test_modelscan.py

+    assert [
+        skipped_file["source"]
+        for skipped_file in results["summary"]["skipped"]["skipped_files"]
+    ] == ["test.zip"]


This might still be little confusing since test.zip was scanned for it's content.

iamfaisalkhan · 2024-03-21T14:53:14Z

modelscan/scanners/keras/scan.py

        machine_learning_library_name = "Keras"

        # if self._check_json_data(source, config_file):

-        operators_in_model = self._get_keras_operator_names(source, config_file)
+        operators_in_model = self._get_keras_operator_names(model)
        if operators_in_model:
            if "JSONDecodeError" in operators_in_model:


Outside the scope of this PR, but this should be handled by catching exception.

iamfaisalkhan · 2024-03-21T14:54:52Z

tests/test_modelscan.py

-                ]
-            ) == {
-                f"safe{file_extension}:metadata.json",
-                f"safe{file_extension}:config.json",


Should json files still be in skipped?

In this case, the scanned has only the main file, so it will be a bit confusing.

asofter added 3 commits March 13, 2024 17:22

* new model

80fc9db

* updates to the model

a997b6d

* use model in each scanner

705d17f

asofter requested review from iamfaisalkhan and swashko March 14, 2024 15:38

asofter added 3 commits March 14, 2024 16:38

Merge branch 'main' into introduce-model

5eea800

* fix after merge

abe38a3

* there is no scanner using folder, so we can extract it right away

a1363ec

* use data stream instead of opening in each scanner

asofter commented Mar 15, 2024

View reviewed changes

asofter added 2 commits March 18, 2024 19:07

Introduce middlewares (#119)

ee812c2

* simplify code

b7e0e3f

asofter commented Mar 19, 2024

View reviewed changes

modelscan/modelscan.py

for file in files:

with Model(file) as model:

yield model

Copy link

Contributor Author

asofter Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always scan the main file

asofter added 2 commits March 19, 2024 21:11

Merge remote-tracking branch 'origin/main' into introduce-model

c110330

# Conflicts: # modelscan/modelscan.py # modelscan/scanners/pickle/scan.py

* simplify code based on the feedback

1c5fa7f

iamfaisalkhan approved these changes Mar 21, 2024

View reviewed changes

asofter merged commit 5f1818b into main Mar 21, 2024
8 checks passed

asofter deleted the introduce-model branch March 21, 2024 15:15

asofter mentioned this pull request Mar 21, 2024

Follow-up for Model object introduction #121

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introducing Model object (the first iteration) #118

Introducing Model object (the first iteration) #118

asofter commented Mar 14, 2024 •

edited

Loading

asofter Mar 15, 2024

asofter Mar 15, 2024

asofter Mar 15, 2024

asofter Mar 19, 2024

iamfaisalkhan left a comment

iamfaisalkhan Mar 19, 2024

iamfaisalkhan Mar 19, 2024

iamfaisalkhan Mar 21, 2024

iamfaisalkhan Mar 21, 2024

iamfaisalkhan Mar 21, 2024

asofter Mar 21, 2024

Introducing Model object (the first iteration) #118

Introducing Model object (the first iteration) #118

Conversation

asofter commented Mar 14, 2024 • edited Loading

Description

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iamfaisalkhan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asofter commented Mar 14, 2024 •

edited

Loading