v0.5.1 #86

alxndrkalinin · 2025-02-25T23:33:25Z

Running both phenotypic activity and consistency examples fails in the new version.

In the activity example is was due to the bug introduced in the earlier version, fixed in a8a35c5.

@afermg phenotypic consistency returns incorrect result for multilabel AP calculation, which in turn fails mAP calculation—can you take a look?

this bug was introduced in 19512eb

alxndrkalinin · 2025-02-25T23:35:34Z

We should include running demo notebooks in github actions to automate checks.

afermg · 2025-02-26T00:34:11Z

It is this one, right?

afermg · 2025-02-26T02:32:27Z

I don't have the rights to edit your branch, but a (partial) fix is here. I forgot to replace find_pairs with find_pairs_multilabel. We still have fewer results, but I think that has to do with the fact that I am getting pairs on whether or not two lists intersect, not on the specific items of this intersection.

alxndrkalinin · 2025-02-26T04:33:03Z

Looks like the issue is with defining positive pairs, because find_pairs_multilabel returns indices of rows from original dataframe, which is not what we need.

In this multilabel case, a positive pair has to be between a profile with a specific value in Metadata_target and another profile with the same value in Metadata_target. Then, AP score is calculated per each unique combination of profile and target:

That's why the original code exploded the dataframe.

afermg · 2025-02-26T14:37:08Z

Yes, I noticed that I misunderstood the problem in my original implementation. It's not far from the actual solution but I need to ask you a couple of questions before I implement that one.

afermg · 2025-02-26T20:39:42Z

I am almost there: With this commit I get very similar results, I need to check where the different number of pos and total pairs are coming from. There is some filter that I am applying that I should not do (I am getting 74 final rows instead of 64).

target_aps.loc[target_aps["Metadata_target_list"] == "HTR3A"]

    average_precision  n_pos_pairs  n_total_pairs Metadata_target_list  
63           0.500000            1             42                 EGFR  
25           0.142857            1             39                HTR2A  
31           1.000000            1             39                CHRM5  
38           0.045455            1             39                HTR1E  
40           0.083333            1             39                HTR2B  
..                ...          ...            ...                  ...  
13           0.105128            2             42                 HRH1  
15           0.076923            1             41                 XIAP  
19           0.071429            1             41                CHRM2  
24           0.500000            1             42                CHRM4  
71           0.100000            1             42                CHRM3

afermg · 2025-02-26T20:59:36Z

Just for a sanity check, the data is exactly the same, right? @alxndrkalinin

afermg · 2025-02-26T23:25:31Z

Got it working thanks to John's help. I was not ordering properly the keys and counts. This commit already includes all the changes. I also fixed a comment in build_rank_list that was mistaken as to what the np.lexsort does. This is the same as the picture above, the main difference is that I'm using Metadata_target_list as sameby name (due to the issue I've mentioned with pandas complaining when trying to group with a column that is a list.

     Metadata_broad_sample                                    Metadata_target  \
52  BRD-A69636825-003-04-7        CACNA1C|CACNA1S|CACNA2D1|CACNG1|HTR3A|KCNA5   
32  BRD-A72309220-001-04-1  HTR1A|HTR1B|HTR1D|HTR1E|HTR1F|HTR2A|HTR2B|HTR2...   
37  BRD-A72309220-001-04-1  HTR1A|HTR1B|HTR1D|HTR1E|HTR1F|HTR2A|HTR2B|HTR2...   
39  BRD-A72309220-001-04-1  HTR1A|HTR1B|HTR1D|HTR1E|HTR1F|HTR2A|HTR2B|HTR2...   
41  BRD-A72309220-001-04-1  HTR1A|HTR1B|HTR1D|HTR1E|HTR1F|HTR2A|HTR2B|HTR2...   
..                     ...                                                ...   
16  BRD-K74363950-004-01-0                      CHRM1|CHRM2|CHRM3|CHRM4|CHRM5   
19  BRD-K74363950-004-01-0                      CHRM1|CHRM2|CHRM3|CHRM4|CHRM5   
22  BRD-K74363950-004-01-0                      CHRM1|CHRM2|CHRM3|CHRM4|CHRM5   
28  BRD-K76908866-001-07-6                                              ERBB2   
61  BRD-K81258678-001-01-0                                               RELA   

    average_precision  n_pos_pairs  n_total_pairs Metadata_target_list  
52           0.500000            1             42                HTR3A  
32           0.406071            4             42                HTR1A  
37           0.142857            1             39                HTR1B  
39           0.142857            1             39                HTR1D  
41           0.142857            1             39                HTR1E  
..                ...          ...            ...                  ...  
16           0.105128            2             42                CHRM3  
19           0.105128            2             42                CHRM4  
22           0.105128            2             42                CHRM5  
28           0.500000            1             42                ERBB2  
61           0.100000            1             42                 RELA  

[64 rows x 6 columns]

Do take notice that I slightly changed the interface, and now find_pairs_multilabel provides keys and counts in addition to pairs. When the multilabel_col is not in sameby it instead returns only pairs.

afermg · 2025-02-26T23:38:06Z

Opened a pull request on your branch to make it easier. I already rebased alxndrkalinin#3.

alxndrkalinin · 2025-02-27T00:24:21Z

@afermg thanks, can you also check tests? I get this:

FAILED tests/test_matching_multilabel.py::test_sameby - duckdb.duckdb.BinderException: Binder Error: Referenced column "shared_items" not found in FROM clause!
FAILED tests/test_matching_multilabel.py::test_sameby_other_cols - duckdb.duckdb.BinderException: Binder Error: Referenced column "shared_items" not found in FROM clause!
FAILED tests/test_matching_multilabel.py::test_only_sameby_many_cols - duckdb.duckdb.BinderException: Binder Error: Referenced column "shared_items" not found in FROM clause!

afermg · 2025-02-27T01:09:27Z

Got it working. It required some minor fixes and adjusting the multilabel test to filter the pairs only when multilabel_col is in sameby.

PR Overview

This PR bumps the version to 0.5.1 and fixes issues related to phenotypic activity and consistency examples by addressing a bug in multilabel AP calculation and updating various supporting codes and notebooks.

Fix and refactor the multilabel pairing function in src/copairs/matching.py.
Update test behavior in tests/test_matching_multilabel.py to account for the modified output structure.
Adjust grouping, type casts, and citation details in map modules, pyproject.toml, and README.md, as well as minor updates in example notebooks.

Reviewed Changes

File	Description
src/copairs/matching.py	Refactored find_pairs_multilabel with SQL condition and variable updates.
tests/test_matching_multilabel.py	Modified test to extract the expected tuple element for sameby multilabel cases.
pyproject.toml	Updated version from 0.5.0 to 0.5.1.
src/copairs/map/map.py	Adjusted type annotations and grouping in mean_average_precision.
README.md	Updated citation information.
src/copairs/compute.py	Wrapped AP score computation with a np.errstate context for better numerical handling.
src/copairs/map/multilabel.py	Updated parameter annotations and ensured proper dtype conversions.
Examples notebooks	Updated execution counts and model IDs.

Copilot reviewed 12 out of 12 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (2)

src/copairs/matching.py:571

When shared_item is 'NOT', the SQL condition becomes 'WHERE NOT len(shared_items) > 0'. To ensure correct SQL syntax and readability, consider wrapping the condition in parentheses as 'WHERE NOT (len(shared_items) > 0)'.

f" WHERE {shared_item} len(shared_items) > 0"

src/copairs/matching.py:558

[nitpick] The variable 'shared_item' is used as a flag to toggle the NOT operator in the SQL condition. Renaming it to something like 'negation_operator' or 'not_operator' could make its purpose clearer.

shared_item = ""  // or shared_item = "NOT"

src/copairs/map/map.py

tests/test_matching_multilabel.py

fix(map): keep sameby as columns in map_scores df

a8a35c5

this bug was introduced in 19512eb

chore(compte): silence division warning for ap_contiguous

6fc68e2

afermg and others added 3 commits February 25, 2025 22:38

fix(multilabel): replace find_pairs->find_pairs_multilabel

134ccaf

fix(multilabel): extract pairs from duckdb response

d788023

chire: ruff format

69e647b

afermg and others added 5 commits February 26, 2025 23:56

fix(multilabel): Cover both sameby and diffby cases

76866d8

fix(multilabel): examples are reproduced; fixed docs in build_rank_lists

a2e600e

chore: ruff formatting

745e6b2

fix(multilabel): ensure tests pass; run ruff

b196775

chore: rerun examples with fixes

d08f90a

alxndrkalinin requested a review from johnarevalo February 27, 2025 05:11

alxndrkalinin added 2 commits February 27, 2025 00:12

chore: add Alan and Suganya to reference

ba7c871

chore: bump version to 0.5.1

d8d672b

alxndrkalinin changed the title ~~Fix examples in 0.5.0~~ v0.5.1 Feb 27, 2025

chore(map): add/correct type annotations

37bdb51

alxndrkalinin requested a review from Copilot March 2, 2025 18:23

Copilot AI reviewed Mar 2, 2025

View reviewed changes

johnarevalo requested changes Mar 3, 2025

View reviewed changes

src/copairs/map/map.py Show resolved Hide resolved

tests/test_matching_multilabel.py Show resolved Hide resolved

johnarevalo mentioned this pull request Mar 10, 2025

Unify find_pairs_multilabel output #89

Open

johnarevalo merged commit e494da1 into cytomining:main Mar 10, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.5.1 #86

v0.5.1 #86

alxndrkalinin commented Feb 25, 2025

alxndrkalinin commented Feb 25, 2025

afermg commented Feb 26, 2025

afermg commented Feb 26, 2025

alxndrkalinin commented Feb 26, 2025

afermg commented Feb 26, 2025

afermg commented Feb 26, 2025 •

edited

Loading

afermg commented Feb 26, 2025

afermg commented Feb 26, 2025 •

edited

Loading

afermg commented Feb 26, 2025

alxndrkalinin commented Feb 27, 2025

afermg commented Feb 27, 2025 •

edited

Loading

v0.5.1 #86

v0.5.1 #86

Conversation

alxndrkalinin commented Feb 25, 2025

alxndrkalinin commented Feb 25, 2025

afermg commented Feb 26, 2025

afermg commented Feb 26, 2025

alxndrkalinin commented Feb 26, 2025

afermg commented Feb 26, 2025

afermg commented Feb 26, 2025 • edited Loading

afermg commented Feb 26, 2025

afermg commented Feb 26, 2025 • edited Loading

afermg commented Feb 26, 2025

alxndrkalinin commented Feb 27, 2025

afermg commented Feb 27, 2025 • edited Loading

Choose a reason for hiding this comment

PR Overview

Reviewed Changes

afermg commented Feb 26, 2025 •

edited

Loading

afermg commented Feb 26, 2025 •

edited

Loading

afermg commented Feb 27, 2025 •

edited

Loading