-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introducing Model object (the first iteration) #118
Conversation
* use data stream instead of opening in each scanner
if self._stream and self._source_file_used: | ||
self._stream.close() | ||
|
||
def __enter__(self) -> "Model": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use contextmanager
modelscan/modelscan.py
Outdated
"ModelScan", ErrorCategories.PATH, "Path is not valid", str(path) | ||
) | ||
) | ||
except ModelIsDir: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no scanner using folder scanning, so we can do it as a fallback if Path upstream is not file
@@ -46,33 +45,16 @@ def scan( | |||
[], | |||
) | |||
|
|||
if data: | |||
logger.warning( | |||
f"{self.full_name()} got data bytes. It only support direct file scanning." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It supports also bytes scanning
|
||
for file in files: | ||
with Model(file) as model: | ||
yield model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
always scan the main file
# Conflicts: # modelscan/modelscan.py # modelscan/scanners/pickle/scan.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Added some optional comments, we can address them in follow-up PR.
# Todo: source isn't guaranteed to be a file | ||
|
||
with h5py.File(source, "r") as model_hdf5: | ||
with h5py.File(model.get_stream(), "r") as model_hdf5: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: Do we need 'r' since it is an open stream?
tests/test_modelscan.py
Outdated
for skipped_file in results["summary"]["skipped"]["skipped_files"] | ||
] | ||
) == { | ||
"safe_zip_pytorch.pt", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is being skipped since we are unrolling the .pt file? In that case this being in skipped file is confusing.
assert [ | ||
skipped_file["source"] | ||
for skipped_file in results["summary"]["skipped"]["skipped_files"] | ||
] == ["test.zip"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might still be little confusing since test.zip was scanned for it's content.
machine_learning_library_name = "Keras" | ||
|
||
# if self._check_json_data(source, config_file): | ||
|
||
operators_in_model = self._get_keras_operator_names(source, config_file) | ||
operators_in_model = self._get_keras_operator_names(model) | ||
if operators_in_model: | ||
if "JSONDecodeError" in operators_in_model: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outside the scope of this PR, but this should be handled by catching exception.
] | ||
) == { | ||
f"safe{file_extension}:metadata.json", | ||
f"safe{file_extension}:config.json", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should json files still be in skipped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, the scanned has only the main file, so it will be a bit confusing.
Description
This is the first iteration of introducing
Model
object that will be used by each scanner instead of relying ondata
andsource
. That way, we could re-use file reading and cache variables and provide context.This
Model
uses context manager to open a stream of bytes and pass them to scanners separately instead of relying on scanners to handle it. They still can just scan in a custom way.As this refactoring might take time, I decided to split it into multiple PRs, and in this PR, I used the same approach but with passing
Model
.