Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

splitComplex option sometimes not working #214

Closed
RicBent opened this issue Mar 7, 2025 · 2 comments
Closed

splitComplex option sometimes not working #214

RicBent opened this issue Mar 7, 2025 · 2 comments
Assignees

Comments

@RicBent
Copy link
Contributor

RicBent commented Mar 7, 2025

As far as I understand the flag, splitComplex is supposed to break down morphemes as far as possible, as documented in morphemes.txt. However, for some cases, this does not happen:

Sentence : 일이 잘 돼 가요?
Emitted Morpheme: 어요/EF
Definition: 어요 EF 26693 complex 어/EF 요/JX 0112

Minimal reproducible example in Python (but also happens in C++):

import kiwipiepy

k = kiwipiepy.Kiwi()
k.analyze('일이 잘 돼 가요?', split_complex=True)

Result:

[([Token(form='일', tag='NNG', start=0, len=1), Token(form='이', tag='JKS', start=1, len=1), Token(form='잘', tag='MAG', start=3, len=1), Token(form='되', tag='VV', start=5, len=1), Token(form='어', tag='EC', start=5, len=1), Token(form='가', tag='VX', start=7, len=1), Token(form='어요', tag='EF', start=7, len=2), Token(form='?', tag='SF', start=9, len=1)], -24.439472198486328)]
@bab2min
Copy link
Owner

bab2min commented Mar 8, 2025

@RicBent
Thank you for reporting the bug. As you mentioned, 어요/EF should be split into 어/EF and 요/JX when split_complex=True. I will examine the issue and fix it as soon as possible.

@bab2min bab2min self-assigned this Mar 8, 2025
@bab2min bab2min mentioned this issue Mar 8, 2025
@bab2min bab2min closed this as completed in 7502470 Mar 8, 2025
bab2min added a commit that referenced this issue Mar 8, 2025
@RicBent
Copy link
Contributor Author

RicBent commented Mar 9, 2025

Thank you a lot for fixing this so quickly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants