Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing Model object (the first iteration) #118

Merged
merged 10 commits into from
Mar 21, 2024
Merged

Conversation

asofter
Copy link
Contributor

@asofter asofter commented Mar 14, 2024

Description

This is the first iteration of introducing Model object that will be used by each scanner instead of relying on data and source. That way, we could re-use file reading and cache variables and provide context.

This Model uses context manager to open a stream of bytes and pass them to scanners separately instead of relying on scanners to handle it. They still can just scan in a custom way.

As this refactoring might take time, I decided to split it into multiple PRs, and in this PR, I used the same approach but with passing Model.

if self._stream and self._source_file_used:
self._stream.close()

def __enter__(self) -> "Model":
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use contextmanager

"ModelScan", ErrorCategories.PATH, "Path is not valid", str(path)
)
)
except ModelIsDir:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no scanner using folder scanning, so we can do it as a fallback if Path upstream is not file

@@ -46,33 +45,16 @@ def scan(
[],
)

if data:
logger.warning(
f"{self.full_name()} got data bytes. It only support direct file scanning."
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It supports also bytes scanning


for file in files:
with Model(file) as model:
yield model
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always scan the main file

asofter added 2 commits March 19, 2024 21:11
# Conflicts:
#	modelscan/modelscan.py
#	modelscan/scanners/pickle/scan.py
Copy link
Member

@iamfaisalkhan iamfaisalkhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Added some optional comments, we can address them in follow-up PR.

# Todo: source isn't guaranteed to be a file

with h5py.File(source, "r") as model_hdf5:
with h5py.File(model.get_stream(), "r") as model_hdf5:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: Do we need 'r' since it is an open stream?

for skipped_file in results["summary"]["skipped"]["skipped_files"]
]
) == {
"safe_zip_pytorch.pt",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is being skipped since we are unrolling the .pt file? In that case this being in skipped file is confusing.

assert [
skipped_file["source"]
for skipped_file in results["summary"]["skipped"]["skipped_files"]
] == ["test.zip"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might still be little confusing since test.zip was scanned for it's content.

machine_learning_library_name = "Keras"

# if self._check_json_data(source, config_file):

operators_in_model = self._get_keras_operator_names(source, config_file)
operators_in_model = self._get_keras_operator_names(model)
if operators_in_model:
if "JSONDecodeError" in operators_in_model:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outside the scope of this PR, but this should be handled by catching exception.

]
) == {
f"safe{file_extension}:metadata.json",
f"safe{file_extension}:config.json",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should json files still be in skipped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, the scanned has only the main file, so it will be a bit confusing.

@asofter asofter merged commit 5f1818b into main Mar 21, 2024
8 checks passed
@asofter asofter deleted the introduce-model branch March 21, 2024 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants