Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace CoreNLP with spaCy #3

Open
2 of 5 tasks
bermeitinger-b opened this issue Jul 26, 2017 · 4 comments
Open
2 of 5 tasks

Replace CoreNLP with spaCy #3

bermeitinger-b opened this issue Jul 26, 2017 · 4 comments

Comments

@bermeitinger-b
Copy link
Member

bermeitinger-b commented Jul 26, 2017

Starting the CoreNLP server is not nice for anyone, it is big, relatively slow and the usage is a bit clunky.
Other options are either spaCy or nltk.

First experiments show that nltk's Named Entity Recognition is not very accurate and the sentence splitter is worse than CoreNLP.
The next choice is spaCy which shows nice results from simple experiments. Before we implement, we have to check the following:

  • Is the sentence splitter and tokenizer better than CoreNLP?
  • Can we deploy spaCy with the models according to their license?
  • Is the NER better than CoreNLP?
  • Can we have a higher throughput?
  • Is it parallelizable? CoreNLP doesn't like more than 2-4 requests at the same time.
@leonardossz
Copy link

Are you talking about this: https://stanfordnlp.github.io/CoreNLP/corenlp-server.html ?

@bermeitinger-b
Copy link
Member Author

Yes. PyCobalt currently uses CoreNLP as the NLP tool for POS-tagging and NER. Running it is clunky. It is cumbersome to start the CoreNLP server even if using docker. With spaCy all code is directly in Python. This benchmark shows the superiority in speed. NER is slightly worse.
Without CoreNLP, PyCobalt could be published as a "simple" Python module.

@swathimithran
Copy link

Great Move, I think Spacy is will be much better than CoreNLP. I am eagerly waiting for this update. Please let me know if you need any help.

@bermeitinger-b
Copy link
Member Author

bermeitinger-b commented Jul 27, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants