-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect outliers using LOF #904
Conversation
// sum neighbour set LRD scores | ||
let lrd_scores: T = neighbours | ||
.iter() | ||
.map(|(neighbour, _)| local_reachability_densities[neighbour.data]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect there's a way to calculate LRDs for a subset (the point's neighbours) as we go, but I can't get it right so we're stuck with local_reachability_densities
and a redundant kth_dist
calculation for now. I think it would ultimately end up being two passes over the data anyway, but it might use less space.
The paper notes that in order to calculate outliers in a robust fashion it might be necessary to calculate an "ensemble" using a range of nearest-neighour values. Since the r*-tree is expensive to fill and can be re-used, it would make sense to have a "prepared outlier calculator" API that holds on to the tree, allowing you to pass in a new All suggestions for an API gratefully received. |
This is done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I have some questions, but all my suggestions were minor and should be considered "take it or leave it".
All review comments addressed, I think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Not sure if you were looking for more reviews, so I'll leave it to you if you want to merge it.
Thanks for the review! I think it's in good shape now but I'll leave it open for a couple of days / until @frewsxcv has kicked the tires on it. |
ed2bddc
to
0db5263
Compare
Outlier detection is an unsupervised algorithm for detecting outliers in groups of points (in the abstract sense) by computing local reachability density based on a specified number of neighbours.
bors r+ |
Build succeeded: |
CHANGES.md
if knowledge of this change could be valuable to users.Still draft due to:
perf: two of the LRD passes look very similar. I need to figure out whether this can be tidied up, or whether it's ultimately not going to affect perf muchCan't think of a particularly more elegant way to do this