feat/primekg loader #49

awmulyadi · 2025-01-08T11:45:04Z

For authors

Description

Please:
I have developed a set of codes for loading a series of biomedical knowledge graph-based datasets, i.e., PrimeKG, StarkQA-PrimeKG, and BioBridge-PrimeKG. Following are the detailed key updates:

Source Code: Within the folder of aiagents4pharma/talk2knowledgegraphs/datasets, I included an abstract class of Dataset followed by a set of implementation classes of PrimeKG, StarkQAPrimeKG, and BioBridgePrimeKG for loading the corresponding datasets.
Pytest: I included test cases of the above classes under the folder of aiagents4pharma/talk2knowledgegraphs/tests/*
Tutorial Notebooks: I added an interactive notebook for showcasing the use cases of the classes under docs/notebooks/talk2knowledgegraphs/*.
Documentation: Finally, I updated the related documentation using mkdocs, which is available in docs/talk2knowledgegraphs/*.

Fixes #32

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)
This change requires a documentation update

How Has This Been Tested?

Please describe the tests you conducted to verify your changes. These may involve creating new test scripts or updating existing ones.

Added new test(s) in the tests folder
Added new function(s) to an existing test(s) (i.e., aiagents4pharma/talk2knowledgegraphs/tests/*)

Checklist

My code follows the style guidelines mentioned in the Code/DevOps guides
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation (e.g. MkDocs)
My changes generate no new warnings
I have added or updated tests (in the tests folder) that prove my fix is effective or that my feature works
New and existing tests pass locally with my changes
Any dependent changes have been merged and published in downstream modules

For reviewers

Checklist pre-approval

Is there enough documentation?
If a new feature has been added, or a bug fixed, has a test been added to confirm good behavior?
Does the test(s) successfully test edge/corner cases?
Does the PR pass the tests? (if the repository has continuous integration)

Checklist post-approval

Does this PR merge develop into main? If so, please make sure to add a prefix (feat/fix/chore) and/or a suffix BREAKING CHANGE (if it's a major release) to your commit message.
Does this PR close an issue? If so, please make sure to descriptively close this issue when the PR is merged.

Checklist post-merge

When you approve of the PR, merge and close it (Read this article to know about different merge methods on GitHub)
Did this PR merge develop into main and is it suppose to run an automated release workflow (if applicable)? If so, please make sure to check under the "Actions" tab to see if the workflow has been initiated, and return later to verify that it has completed successfully.

dmccloskey

Nice work 💪. The notebooks were very easy to follow and demonstrate the functionality well which will hopefully come in handy for our hackathon participants.

Please take a look at my questions/comments and let me know if there is anything that needs to be clarified. In principle, I think everything looks good pending my assumptions I note are correct 😉.

aiagents4pharma/talk2knowledgegraphs/datasets/biobridge_primekg.py

aiagents4pharma/talk2knowledgegraphs/pyproject.toml

aiagents4pharma/talk2knowledgegraphs/tests/test_biobridge_primekg_loader.py

dmccloskey · 2025-01-08T17:55:41Z

docs/notebooks/talk2knowledgegraphs/tutorial_starkqa_primekg_loader.ipynb

Are there any pre-trained embeddings for stark?

I have rechecked the repository of Stark about this, and apparently they provided the query and node embeddings using 'text-embedding-ada-002'. However, there is no information on edge embeddings. I have updated the class of StarkQAPrimeKG by including pre-loaded embeddings of queries and nodes. However, we also need to embed the edges (~8M) using the 'text-embedding-ada-002' in the next step (KG construction). Alternatively, I am preparing codes to embed both nodes and edges using Ollama's nomic-embed-text to be included in the next PR.

Ref:
https://github.com/snap-stanford/stark/blob/main/emb_download.py

I see. I do not believe they used edge embeddings in their manuscript (?).

I wouldn't worry about embedding the edges using text-embedding-ada-002 if time is of the essence. It does not seem that it would help us reproduce their work. Please correct me if I am wrong.

Correct, in their manuscript and code repository, they didn't mention edge/relation embeddings. Thus, the methods used in their benchmark most likely didn't incorporate these features.

dmccloskey

Ready to be merged whenever you are ready.

dmccloskey · 2025-01-09T15:59:33Z

aiagents4pharma/talk2knowledgegraphs/datasets/starkqa_primekg.py

+        self.starkqa_node_info: dict = None
+        self.query_emb_dict: dict = None
+        self.node_emb_dict: dict = None
+


Fine to include the relation embeddings in a subsequent PR as mentioned in another comment.

dmccloskey · 2025-01-09T16:00:05Z

aiagents4pharma/talk2knowledgegraphs/datasets/starkqa_primekg.py

+
+        return starkqa, starkqa_split_idx, starkqa_node_info
+
+    def _load_stark_embeddings(self) -> tuple:


github-actions · 2025-01-09T16:13:08Z

🎉 This PR is included in version 1.5.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

awmulyadi added 19 commits December 10, 2024 12:42

feat: add primekg loader tool

2e23586

chpre: fix comments to load dataframes

114d1f5

fix: change file_id of dataverse

e2f9298

feat: add starkqa-primekg loader tool

989d2c4

chore: merge changes from main

1c01cea

feat: add tutorials of talk2knowledgegraphs

95c2af0

fix: execute a notebook

2bae5b3

feat: include processed node information of StarkQA-PrimeKG

2c82016

fix: update the approach to load PrimeKG

d931aab

fix: reorganize the dataset loaders

319beca

feat: add biobridge-primekg loader

27deb4e

chore: merge changes from main

3e23007

feat: update mkdocs for talk2knowledgegraphs

29df106

feat: update gha workflows

783bc33

fix: modify workflows

542b38d

fix: remove argument for negative sampling

4b2dd9b

fix: remove check-path workflows

68a434b

fix: modify coverage

ac06b05

fix: modify coverage over multi-os

9e15375

awmulyadi self-assigned this Jan 8, 2025

awmulyadi requested a review from dmccloskey January 8, 2025 13:27

dmccloskey approved these changes Jan 8, 2025

View reviewed changes

awmulyadi added 6 commits January 9, 2025 11:47

feat: add embeddings for starkqa, node info for biobridge

40cce60

chore: merge changes from main

f474202

fix: change torch version

9bf8730

fix: update workflow for installing reqs

ad48b53

fix: update biobridge

d7e6eb9

fix: remove bash on workflows

632b625

awmulyadi requested a review from dmccloskey January 9, 2025 12:16

dmccloskey approved these changes Jan 9, 2025

View reviewed changes

awmulyadi merged commit 5320b55 into main Jan 9, 2025
6 checks passed

awmulyadi deleted the feat/primekg-loader branch January 9, 2025 16:12

github-actions bot added the released label Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat/primekg loader #49

feat/primekg loader #49

awmulyadi commented Jan 8, 2025

dmccloskey left a comment

dmccloskey Jan 8, 2025

awmulyadi Jan 9, 2025

dmccloskey Jan 9, 2025

awmulyadi Jan 9, 2025

dmccloskey left a comment

dmccloskey Jan 9, 2025

dmccloskey Jan 9, 2025

github-actions bot commented Jan 9, 2025


		return starkqa, starkqa_split_idx, starkqa_node_info

		def _load_stark_embeddings(self) -> tuple:

feat/primekg loader #49

feat/primekg loader #49

Conversation

awmulyadi commented Jan 8, 2025

For authors

Description

Fixes #32

Type of change

How Has This Been Tested?

Checklist

For reviewers

Checklist pre-approval

Checklist post-approval

Checklist post-merge

dmccloskey left a comment

Choose a reason for hiding this comment

dmccloskey Jan 8, 2025

Choose a reason for hiding this comment

awmulyadi Jan 9, 2025

Choose a reason for hiding this comment

dmccloskey Jan 9, 2025

Choose a reason for hiding this comment

awmulyadi Jan 9, 2025

Choose a reason for hiding this comment

dmccloskey left a comment

Choose a reason for hiding this comment

dmccloskey Jan 9, 2025

Choose a reason for hiding this comment

dmccloskey Jan 9, 2025

Choose a reason for hiding this comment

github-actions bot commented Jan 9, 2025