-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v0.5.1 #86
v0.5.1 #86
Conversation
this bug was introduced in 19512eb
We should include running demo notebooks in github actions to automate checks. |
I don't have the rights to edit your branch, but a (partial) fix is here. I forgot to replace |
Looks like the issue is with defining positive pairs, because In this multilabel case, a positive pair has to be between a profile with a specific value in ![]() That's why the original code exploded the dataframe. |
Yes, I noticed that I misunderstood the problem in my original implementation. It's not far from the actual solution but I need to ask you a couple of questions before I implement that one. |
I am almost there: With this commit I get very similar results, I need to check where the different number of pos and total pairs are coming from. There is some filter that I am applying that I should not do (I am getting 74 final rows instead of 64).
|
Just for a sanity check, the data is exactly the same, right? @alxndrkalinin |
Got it working thanks to John's help. I was not ordering properly the keys and counts. This commit already includes all the changes. I also fixed a comment in build_rank_list that was mistaken as to what the np.lexsort does. This is the same as the picture above, the main difference is that I'm using Metadata_target_list as sameby name (due to the issue I've mentioned with pandas complaining when trying to group with a column that is a list.
Do take notice that I slightly changed the interface, and now find_pairs_multilabel provides keys and counts in addition to pairs. When the multilabel_col is not in sameby it instead returns only pairs. |
Opened a pull request on your branch to make it easier. I already rebased alxndrkalinin#3. |
@afermg thanks, can you also check tests? I get this:
|
Got it working. It required some minor fixes and adjusting the multilabel test to filter the pairs only when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Overview
This PR bumps the version to 0.5.1 and fixes issues related to phenotypic activity and consistency examples by addressing a bug in multilabel AP calculation and updating various supporting codes and notebooks.
- Fix and refactor the multilabel pairing function in src/copairs/matching.py.
- Update test behavior in tests/test_matching_multilabel.py to account for the modified output structure.
- Adjust grouping, type casts, and citation details in map modules, pyproject.toml, and README.md, as well as minor updates in example notebooks.
Reviewed Changes
File | Description |
---|---|
src/copairs/matching.py | Refactored find_pairs_multilabel with SQL condition and variable updates. |
tests/test_matching_multilabel.py | Modified test to extract the expected tuple element for sameby multilabel cases. |
pyproject.toml | Updated version from 0.5.0 to 0.5.1. |
src/copairs/map/map.py | Adjusted type annotations and grouping in mean_average_precision. |
README.md | Updated citation information. |
src/copairs/compute.py | Wrapped AP score computation with a np.errstate context for better numerical handling. |
src/copairs/map/multilabel.py | Updated parameter annotations and ensured proper dtype conversions. |
Examples notebooks | Updated execution counts and model IDs. |
Copilot reviewed 12 out of 12 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (2)
src/copairs/matching.py:571
- When shared_item is 'NOT', the SQL condition becomes 'WHERE NOT len(shared_items) > 0'. To ensure correct SQL syntax and readability, consider wrapping the condition in parentheses as 'WHERE NOT (len(shared_items) > 0)'.
f" WHERE {shared_item} len(shared_items) > 0"
src/copairs/matching.py:558
- [nitpick] The variable 'shared_item' is used as a flag to toggle the NOT operator in the SQL condition. Renaming it to something like 'negation_operator' or 'not_operator' could make its purpose clearer.
shared_item = "" // or shared_item = "NOT"
Running both phenotypic activity and consistency examples fails in the new version.
In the activity example is was due to the bug introduced in the earlier version, fixed in a8a35c5.
@afermg phenotypic consistency returns incorrect result for multilabel AP calculation, which in turn fails mAP calculation—can you take a look?