Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better averaging of judges ratings #1281

Open
epugh opened this issue Mar 15, 2025 · 2 comments
Open

Better averaging of judges ratings #1281

epugh opened this issue Mar 15, 2025 · 2 comments

Comments

@epugh
Copy link
Member

epugh commented Mar 15, 2025

Is your feature request related to a problem? Please describe.
Today we just straight up average. @david-fisher doesn't love this!

Describe the solution you'd like
if i have 3 or more, then I take the three highest ratings.

if all thee agree, i am happy.

if all three do NOT agree, then I take the min of all three.

@jvia
Copy link
Contributor

jvia commented Apr 7, 2025

Curious why this would be a better approach.

@david-fisher
Copy link
Contributor

When combining the judgments from multiple judges, we can make a variety of assumptions. If we assume:

  1. Judges are generally subject matter experts (SMEs)
  2. Judges are rational
  3. Judges have individual biases when interpreting relevance

Appealing to 1, we take the three highest out of all the judgments there are. Note, many additional judgments could all agree with these first three. Here we are being optimistic.
If the judges do not agree, we appeal to numbers 2 and 3. If a rational judge deems the candidate to be poorer than the other judge or judges, we will trust them, and use their value. Here we are both being pessimistic, concerning the individual judgment, and optimistic, concerning the direction of the bias. That is, we expect judges to overrate, in the general case.

As cases grow, and the relative quality of the judges becomes clearer, alternative combinations could be used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants