Concordance: tokenization and small fixes #232

ajdapretnar · 2017-04-07T12:19:35Z

Issue

Fixes #208.

Description of changes

Concordance computes tokens internally to distinguish from corpus.tokens.
Input and output set to Corpus.
Widget doesn't crash upon deleting the connection.

Includes

Code changes
Tests
Documentation

ajdapretnar · 2017-04-07T12:21:56Z

@janezd Crucial bug: when deleting a connection from, say, Corpus to Concordance (with a selection in Concordance), the widget crashes. Problem in modelReset.emit(). How to go about fixing this?

codecov-io · 2017-04-07T12:39:59Z

Codecov Report

Merging #232 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #232   +/-   ##
=======================================
  Coverage   93.81%   93.81%           
=======================================
  Files          32       32           
  Lines        1600     1600           
  Branches      294      294           
=======================================
  Hits         1501     1501           
  Misses         60       60           
  Partials       39       39

nikicc · 2017-05-16T17:59:41Z

orangecontrib/text/widgets/owconcordance.py

-                if self.corpus.has_tokens() else 'n/a'
+                len(self.corpus))
+            self.n_tokens = sum(map(len, self.model.tokens)) \
+                if self.model.tokens is not None else 'n/a'
            self.n_types = len(self.corpus.dictionary) \


self.corpus.dictionary should be replaced with the number of types from internal concordance tokenization.

Currently, we show number of tokens from internal concordances' tokenization and number of types from corpus' tokenization. But more serious problem than this mismatch is that the call to self.corpus.dictionary runs default preprocessor if the corpus is not yet preprocessed. Hence, when passing unpreprocessed data to concordance (e.g. Corpus -> Concordance) preprocessing is run twice. Once for concordances (as it should) and once since we call self.corpus.dictionary.

We should probably add something like self.n_types = len(set(tokens)) to ConcordanceModel's set_tokens method and use that instead.

ajdapretnar · 2017-05-25T09:03:05Z

Yaaay, after rebasing I have inadvertedly lost @janezd tests. 😞 How do I fix this and why on Earth does Git override someone else's commit??? That's bs. 😠

ajdapretnar · 2017-05-25T09:51:10Z

@nikicc After a lot of painful edits, this is ready to merge. :)

ajdapretnar changed the title ~~Concordance tokens~~ Concordance: tokenization and small fixes Apr 7, 2017

nikicc suggested changes May 16, 2017

View reviewed changes

ajdapretnar added 3 commits May 25, 2017 10:57

Concordance performs internal tokenization.

16103b4

Concordance: Change input from Table to Corpus

a3549c4

Prevent crash upon removing data

809dc5b

ajdapretnar force-pushed the concordance-tokens branch from 017924d to 809dc5b Compare May 25, 2017 08:58

Concordances: Add tests

136eb35

ajdapretnar mentioned this pull request May 25, 2017

[FIX] Concordance: selection settings #249

Merged

3 tasks

nikicc approved these changes May 25, 2017

View reviewed changes

nikicc merged commit e77756f into biolab:master May 26, 2017

ajdapretnar deleted the concordance-tokens branch June 29, 2017 07:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concordance: tokenization and small fixes #232

Concordance: tokenization and small fixes #232

ajdapretnar commented Apr 7, 2017 •

edited

Loading

ajdapretnar commented Apr 7, 2017

codecov-io commented Apr 7, 2017 •

edited

Loading

nikicc May 16, 2017

nikicc May 16, 2017

ajdapretnar commented May 25, 2017

ajdapretnar commented May 25, 2017

Concordance: tokenization and small fixes #232

Concordance: tokenization and small fixes #232

Conversation

ajdapretnar commented Apr 7, 2017 • edited Loading

Issue

Description of changes

Includes

ajdapretnar commented Apr 7, 2017

codecov-io commented Apr 7, 2017 • edited Loading

Codecov Report

nikicc May 16, 2017

Choose a reason for hiding this comment

nikicc May 16, 2017

Choose a reason for hiding this comment

ajdapretnar commented May 25, 2017

ajdapretnar commented May 25, 2017

ajdapretnar commented Apr 7, 2017 •

edited

Loading

codecov-io commented Apr 7, 2017 •

edited

Loading