Parameter Tuning Code Integration #193

ntalluri · 2024-11-04T17:34:53Z

No description provided.

… what idea to use

ntalluri · 2024-11-18T18:16:43Z

config/egfr.yaml

should I add the gold standard to this?

ntalluri · 2024-11-18T18:17:38Z

test/evaluate/test_evaluate.py

+        edge_freq.to_csv(OUT_DIR + 'node-ensemble.csv', sep="\t", index=False)
+        assert filecmp.cmp(OUT_DIR + 'node-ensemble.csv', EXPECT_DIR + 'expected-node-ensemble.csv', shallow=False)
+
+    def test_precision_recal_curve_ensemble_nodes(self):


I don't know how else to test the ensemble node outputs other than looking at the image

ntalluri · 2024-11-18T18:35:26Z

config/config.yaml

+        # adds evaluation per algorithm per dataset-goldstandard pair
+        # evalution per algortihm will not run unless ml include and ml aggregate_per_algorithm is set to true
+        aggregate_per_algorithm: true
+        # TODO: should we decouple parts of eval that involve ml


Lots of coupling happening now. I put in a solution for now in config.py, but is it worth separating the functions into their own true/ false?

Maybe deal with some of the coupling by giving warnings and stopping the flow rather than silently shutting things off

ntalluri · 2024-11-18T18:36:00Z

spras/analysis/ml.py

@@ -142,8 +142,14 @@ def pca(dataframe: pd.DataFrame, output_png: str, output_var: str, output_coord:
    if not isinstance(labels, bool):
        raise ValueError(f"labels={labels} must be True or False")

-    scaler = StandardScaler()
+    #TODO: MinMaxScaler changes nothing about the data


I don't know if it is better to use StandardScalar or MinMaxScalar for the binary data

ntalluri · 2024-11-18T18:36:41Z

spras/evaluation.py

        for file in file_paths:
            df = pd.read_table(file, sep="\t", header=0, usecols=["Node1", "Node2"])
+            # TODO: do we want to include the pathways that are empty for evaluation / in the pr_df?


Currently the code will add a precision and recall for empty pathways. Is that something we shouldn't include?

ntalluri · 2024-11-18T18:37:27Z

Snakefile

+        final_input.extend(expand('{out_dir}{sep}{dataset_gold_standard_pair}-eval{sep}precision-recall-per-pathway.png',out_dir=out_dir,sep=SEP,dataset_gold_standard_pair=dataset_gold_standard_pairs)) 
+        final_input.extend(expand('{out_dir}{sep}{dataset_gold_standard_pair}-eval{sep}precision-recall-pca-chosen-pathway.txt',out_dir=out_dir,sep=SEP,dataset_gold_standard_pair=dataset_gold_standard_pairs))
+        final_input.extend(expand('{out_dir}{sep}{dataset_gold_standard_pair}-eval{sep}precision-recall-curve-ensemble-nodes.png',out_dir=out_dir,sep=SEP,dataset_gold_standard_pair=dataset_gold_standard_pairs,algorithm_params=algorithms_with_params))
+        # TODO: should we provide the node ensemble frequencies       


Since we are already calculating the node ensembles, should we give it to the user?

ntalluri · 2025-01-21T17:56:41Z

Note for self: Try seeing if it makes more sense to separate the parameter tuning and evaluation code into their classes for organization

ntalluri · 2025-02-13T17:53:48Z

I think I need to redo the idea for using ensembling for parameter tuning.

Currently the code will take in the ensemble file and build a node ensemble file by processing an ensemble of edge frequencies to identify the highest frequency associated with each node (y_scores). This is then compared to the node gold standard (y_trues). Then these are plotted (with no point labels) to a graph showing the PRC between the nodes in the output pathways and the gold standard, as well as the average PR between all the nodes.

I don’t think this can be used to do ensemble parameter tuning the way I think I would do it. It could be used to help parameter tune though to get a better grid search or for evaluation in general.

also the visualization for this could be way better (it lacks label information making it hard to fully understand)

agitter

I think I need to redo the idea for using ensembling for parameter tuning.

Is there a way to break this huge pull request into smaller parts? Currently it combines evaluation, parameter tuning, ensembling, etc. That is making it hard to coordinate all of those decisions and also review the big pull request. I'm wondering if we can start small and merge in a subset of the file changes to lock in some progress, like the evaluation code.

If it is all interdependent, we'll have to deal with that and proceed as is.

agitter · 2025-02-28T17:49:03Z

input/gs-egfr.txt

Can you please add some notes about how you prepared this file and link it back to the related issue? We'll want to be able to track that later.

…ameter tuning methods

ntalluri added 20 commits October 28, 2024 13:37

ideas for parameter tuning

1872c48

update to ml code for centroid, update to eval code to have recall

7e0a990

integrated all the code

d5d0461

new updates to the integration

20d20bf

clean up of comments

71aa43e

updated test_ml.py to work with new updates

691673c

in progress of testing

26178f9

spras/con

a5b3205

update to config.py to deal with ml and eval coupling

0aeda95

added TODO comments on ideas to scale the binary data, still not sure…

46c87fc

… what idea to use

cleaned up file names and left TODOs

5265c53

added the eval test cases, made a todo for config test case

9b7e687

added algorithms to be used for eval code

23d1070

pre commit test_evaluate.py

0408a20

updated evalute.py

f1f58e7

updated all config files

3507475

cleane dup spras/evaluation.py

dd0359f

updated spacing and added comments to the config files

ef15799

updated evalution.py code

47dab1a

cleaned up eval tests and added coupling tests to config

b3504b5

ntalluri commented Nov 18, 2024

View reviewed changes

config/egfr.yaml Outdated

Copy link

Collaborator Author

ntalluri Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I add the gold standard to this?

ntalluri commented Nov 18, 2024

View reviewed changes

ntalluri requested a review from agitter November 18, 2024 18:33

ntalluri commented Nov 18, 2024

View reviewed changes

ntalluri added 2 commits December 9, 2024 15:57

change how plot is

dfcd302

precommit

97a7d7b

agitter reviewed Feb 28, 2025

View reviewed changes

reset to having changed to make it about eval without each of the par…

8eb2ca1

…ameter tuning methods

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parameter Tuning Code Integration #193

Parameter Tuning Code Integration #193

ntalluri commented Nov 4, 2024

ntalluri Nov 18, 2024

ntalluri Nov 18, 2024

ntalluri Nov 18, 2024

ntalluri Nov 18, 2024

ntalluri Nov 18, 2024

ntalluri Nov 18, 2024

ntalluri Nov 18, 2024

ntalluri commented Jan 21, 2025

ntalluri commented Feb 13, 2025

agitter left a comment

agitter Feb 28, 2025

Parameter Tuning Code Integration #193

Are you sure you want to change the base?

Parameter Tuning Code Integration #193

Conversation

ntalluri commented Nov 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ntalluri commented Jan 21, 2025

ntalluri commented Feb 13, 2025

agitter left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment