We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
存在问题:同一个错字重复出现时只修正了第一次出现的case
我发现如果混淆集中同一个词在句子中重复出现,只会修改第一次出现的。
举个例子: 混淆集
莪 我 祢 你
例句
s= "莪想说莪爱祢" m_custom = Corrector(custom_confusion_path_or_dict = "./my_custom_confusion.txt") m_custom.correct(s)
结果
{'source': '莪想说莪爱祢', 'target': '我想说莪爱你', 'errors': [('莪', '我', 0), ('祢', '你', 5)]}
第二个”莪“字没有被换掉。
使用confusion pipeline时,上面同一个例子,但是“莪”字两处都没有被改掉
from pycorrector import ConfusionCorrector confusion_dict = {"莪": "我", "祢": "你"} model_confusion = ConfusionCorrector(custom_confusion_path_or_dict=confusion_dict) model_confusion.correct("莪想说莪爱祢")
{'source': '莪想说莪爱祢', 'target': '莪想说莪爱你', 'errors': [('莪', '我', 0), ('祢', '你', 5)]}
检测到第一个'莪',但两处'莪'都没被改掉。
The text was updated successfully, but these errors were encountered:
I had these two issues fixed. I will submit a PR later.
Sorry, something went wrong.
Fix issue shibing624#470: Improve how confusion words are located and…
ba0a2b2
…/or replaced
Merge pull request #471 from treya-lin/dev
2499e79
Fix issue #470: Improve how confusion words are located and/or replaced
done
No branches or pull requests
1. kenlm
存在问题:同一个错字重复出现时只修正了第一次出现的case
我发现如果混淆集中同一个词在句子中重复出现,只会修改第一次出现的。
举个例子:
混淆集
例句
结果
第二个”莪“字没有被换掉。
2. confusion pipeline
使用confusion pipeline时,上面同一个例子,但是“莪”字两处都没有被改掉
结果
检测到第一个'莪',但两处'莪'都没被改掉。
The text was updated successfully, but these errors were encountered: