Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting the best language match from predictions #6

Open
goodmami opened this issue Jul 6, 2017 · 4 comments
Open

Getting the best language match from predictions #6

goodmami opened this issue Jul 6, 2017 · 4 comments

Comments

@goodmami
Copy link
Member

goodmami commented Jul 6, 2017

Currently (when the code works), it only returns the True/False prediction and its score (as model.Distribution objects). It may be the case that more than one, or none, of the languages are chosen as True. The score of the prediction should be used to rank the list of languages for a span, then use that one for the final prediction.

@MackieBlackburn
Copy link
Collaborator

Should this be done by modifying the test() function in main.py?

@goodmami
Copy link
Member Author

Yeah I suppose. Here's the relevant code block in the test() function:

for dist in model.test(instances):
    print(dir(dist))
    print(dist.classes())

You could write a function to normalize the values (e.g. set the one with the highest confidence of a False value to 0, the highest confidence of a True value to 1, and scale everything else accordingly. Then replace the code block above with something like:

ranked_list = normalize_probabilities(model.test(instances))
if len(ranked_list) != 0:
    top = ranked_list[0]
    ...

@MackieBlackburn
Copy link
Collaborator

MackieBlackburn commented Jul 31, 2017

Reviewing the code, it looks like the model.test() function returns a Distribution object, which contains a dictionary of class to probability. Each Distribution object also has a best_class field, so if I'm not mistaken this issue might be solved by doing

for dist in model.test(instances):
   print(dir(dist))
   top = dist.best_class

I can put some normalization code into the Distribution class to make sure the probabilities are normalized.

@goodmami
Copy link
Member Author

Hmm, possibly. I didn't write models.py, but I thought it returned a distribution for each language, and the classes were True and False, so if something had a high probability for False, best_class would return False for the language that distribution was made for.

I could be wrong though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants