ontology.py: saving unsanitized words #824

ajdapretnar · 2022-04-21T15:38:10Z

Describe the bug
Ontology generation fails when given raw strings with slashes, i.e "functional/adaptive interpretation". This will cause issues in L260, because the word is not stripped of slashes, which causes path problems.

To Reproduce
Steps to reproduce the behavior:

Have an ontology with words containing slashes.
Run Generate.

Expected behavior
Words are sanitized before saving their embeddings.

Orange version:
3.33.dev

Text add-on version:
1.7.0

Additional context

------------------------- FileNotFoundError Exception -------------------------
Traceback (most recent call last):
  File "/Users/ajda/orange/orange3/Orange/widgets/utils/concurrent.py", line 591, in _on_task_done
    super()._on_task_done(future)
  File "/Users/ajda/orange/orange3/Orange/widgets/utils/concurrent.py", line 547, in _on_task_done
    self.on_exception(ex)
  File "/Users/ajda/orange/orange3-text/orangecontrib/text/widgets/owontology.py", line 774, in on_exception
    raise ex
  File "/Users/ajda/.pyenv-x86/versions/3.9.10/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/ajda/orange/orange3-text/orangecontrib/text/widgets/owontology.py", line 40, in _run
    return handler(*args, callback=callback)
  File "/Users/ajda/orange/orange3-text/orangecontrib/text/ontology.py", line 289, in generate
    self._get_embeddings(words, wrap_callback(callback, end=0.1)),
  File "/Users/ajda/orange/orange3-text/orangecontrib/text/ontology.py", line 371, in _get_embeddings
    self.storage.save_embedding(words[i], embeddings[i, :])
  File "/Users/ajda/orange/orange3-text/orangecontrib/text/ontology.py", line 260, in save_embedding
    np.save(os.path.join(self.cache_dir, f'{word}.npy'), emb)
  File "<__array_function__ internals>", line 180, in save
  File "/Users/ajda/.pyenv-x86/versions/py3.9/lib/python3.9/site-packages/numpy/lib/npyio.py", line 515, in save
    file_ctx = open(file, "wb")
FileNotFoundError: [Errno 2] No such file or directory: '/Users/ajda/Library/Caches/Orange/3.33.0.dev/ontology/182 funkcionalna/adaptacijska interpretacija.npy'
-------------------------------------------------------------------------------

The text was updated successfully, but these errors were encountered:

lanzagar · 2022-04-25T08:33:18Z

The current caching system could be changed to something other than making separate files for each word.
Also, if it is not a LRU cache with a limited size, there should probably be a way to clear the cache if/when it gets big (currently it is cleared just when running tests?).
@PrimozGodec said he can coordinate this with @djukicn. We can also discuss it together.

ajdapretnar assigned djukicn Apr 21, 2022

lanzagar assigned PrimozGodec Apr 25, 2022

ajdapretnar mentioned this issue Jul 18, 2022

[ENH] Ontology widget documentation #881

Merged

3 tasks

PrimozGodec mentioned this issue Aug 30, 2022

[FIX] Ontology - remove cache and other fixes #896

Merged

3 tasks

ajdapretnar closed this as completed in #896 Oct 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ontology.py: saving unsanitized words #824

ontology.py: saving unsanitized words #824

ajdapretnar commented Apr 21, 2022

lanzagar commented Apr 25, 2022

ontology.py: saving unsanitized words #824

ontology.py: saving unsanitized words #824

Comments

ajdapretnar commented Apr 21, 2022

lanzagar commented Apr 25, 2022