Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ontology.py: saving unsanitized words #824

Closed
ajdapretnar opened this issue Apr 21, 2022 · 1 comment · Fixed by #896
Closed

ontology.py: saving unsanitized words #824

ajdapretnar opened this issue Apr 21, 2022 · 1 comment · Fixed by #896
Assignees

Comments

@ajdapretnar
Copy link
Collaborator

Describe the bug
Ontology generation fails when given raw strings with slashes, i.e "functional/adaptive interpretation". This will cause issues in L260, because the word is not stripped of slashes, which causes path problems.

To Reproduce
Steps to reproduce the behavior:

  1. Have an ontology with words containing slashes.
  2. Run Generate.

Expected behavior
Words are sanitized before saving their embeddings.

Orange version:
3.33.dev

Text add-on version:
1.7.0

Additional context

------------------------- FileNotFoundError Exception -------------------------
Traceback (most recent call last):
  File "/Users/ajda/orange/orange3/Orange/widgets/utils/concurrent.py", line 591, in _on_task_done
    super()._on_task_done(future)
  File "/Users/ajda/orange/orange3/Orange/widgets/utils/concurrent.py", line 547, in _on_task_done
    self.on_exception(ex)
  File "/Users/ajda/orange/orange3-text/orangecontrib/text/widgets/owontology.py", line 774, in on_exception
    raise ex
  File "/Users/ajda/.pyenv-x86/versions/3.9.10/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/ajda/orange/orange3-text/orangecontrib/text/widgets/owontology.py", line 40, in _run
    return handler(*args, callback=callback)
  File "/Users/ajda/orange/orange3-text/orangecontrib/text/ontology.py", line 289, in generate
    self._get_embeddings(words, wrap_callback(callback, end=0.1)),
  File "/Users/ajda/orange/orange3-text/orangecontrib/text/ontology.py", line 371, in _get_embeddings
    self.storage.save_embedding(words[i], embeddings[i, :])
  File "/Users/ajda/orange/orange3-text/orangecontrib/text/ontology.py", line 260, in save_embedding
    np.save(os.path.join(self.cache_dir, f'{word}.npy'), emb)
  File "<__array_function__ internals>", line 180, in save
  File "/Users/ajda/.pyenv-x86/versions/py3.9/lib/python3.9/site-packages/numpy/lib/npyio.py", line 515, in save
    file_ctx = open(file, "wb")
FileNotFoundError: [Errno 2] No such file or directory: '/Users/ajda/Library/Caches/Orange/3.33.0.dev/ontology/182 funkcionalna/adaptacijska interpretacija.npy'
-------------------------------------------------------------------------------

@lanzagar
Copy link
Contributor

The current caching system could be changed to something other than making separate files for each word.
Also, if it is not a LRU cache with a limited size, there should probably be a way to clear the cache if/when it gets big (currently it is cleared just when running tests?).
@PrimozGodec said he can coordinate this with @djukicn. We can also discuss it together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants