Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change allowed_missing_percentage default #72

Merged
merged 4 commits into from
Feb 2, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- Now supports Python 3.9 (including CI test runs) [#40](https://github.com/iomega/spec2vec/issues/40)

## Changed

- changed default setting for `allowed_missing_percentage` to 50.0 to be less strict on model coverage [#72](https://github.com/iomega/spec2vec/pull/72)

## [0.5.0] - 2021-06-18

## Changed
Expand Down
8 changes: 3 additions & 5 deletions spec2vec/Spec2Vec.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,13 +78,11 @@ def spectrum_processing(s):

.. testoutput::

Removed adduct M-H from compound name.
...
['CCMSLIB00001058300', 'CCMSLIB00001058289', 'CCMSLIB00001058303', ...

"""
def __init__(self, model: Word2Vec, intensity_weighting_power: Union[float, int] = 0,
allowed_missing_percentage: Union[float, int] = 0, progress_bar: bool = False):
allowed_missing_percentage: Union[float, int] = 50, progress_bar: bool = False):
"""

Parameters
Expand All @@ -99,8 +97,8 @@ def __init__(self, model: Word2Vec, intensity_weighting_power: Union[float, int]
allowed_missing_percentage:
Set the maximum allowed percentage of the document that may be missing
from the input model. This is measured as percentage of the weighted, missing
words compared to all word vectors of the document. Default is 0, which
means no missing words are allowed.
words compared to all word vectors of the document. Default is 100, which
means up to 50% missing words are allowed (not a very strict setting!).
progress_bar:
Set to True to monitor the embedding creating with a progress bar.
Default is False.
Expand Down
2 changes: 1 addition & 1 deletion spec2vec/vector_operations.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ def _check_model_coverage():
f"Weighted missing percentage not covered by the given model is {missing_percentage:.2f}%.")

message = ("Missing percentage is larger than set maximum.",
"Consider retraining the used model or increasing the allowed percentage.")
"Consider retraining the used model or change the `allowed_missing_percentage`.")
assert missing_percentage <= allowed_missing_percentage, message

idx_not_in_model = [i for i, x in enumerate(document.words) if x not in model.wv.key_to_index]
Expand Down