-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat/primekg loader #49
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work 💪. The notebooks were very easy to follow and demonstrate the functionality well which will hopefully come in handy for our hackathon participants.
Please take a look at my questions/comments and let me know if there is anything that needs to be clarified. In principle, I think everything looks good pending my assumptions I note are correct 😉.
aiagents4pharma/talk2knowledgegraphs/tests/test_biobridge_primekg_loader.py
Show resolved
Hide resolved
aiagents4pharma/talk2knowledgegraphs/tests/test_biobridge_primekg_loader.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any pre-trained embeddings for stark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have rechecked the repository of Stark about this, and apparently they provided the query and node embeddings using 'text-embedding-ada-002'. However, there is no information on edge embeddings. I have updated the class of StarkQAPrimeKG by including pre-loaded embeddings of queries and nodes. However, we also need to embed the edges (~8M) using the 'text-embedding-ada-002' in the next step (KG construction). Alternatively, I am preparing codes to embed both nodes and edges using Ollama's nomic-embed-text to be included in the next PR.
Ref:
https://github.com/snap-stanford/stark/blob/main/emb_download.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I do not believe they used edge embeddings in their manuscript (?).
I wouldn't worry about embedding the edges using text-embedding-ada-002
if time is of the essence. It does not seem that it would help us reproduce their work. Please correct me if I am wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, in their manuscript and code repository, they didn't mention edge/relation embeddings. Thus, the methods used in their benchmark most likely didn't incorporate these features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ready to be merged whenever you are ready.
self.starkqa_node_info: dict = None | ||
self.query_emb_dict: dict = None | ||
self.node_emb_dict: dict = None | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine to include the relation embeddings in a subsequent PR as mentioned in another comment.
|
||
return starkqa, starkqa_split_idx, starkqa_node_info | ||
|
||
def _load_stark_embeddings(self) -> tuple: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
🎉 This PR is included in version 1.5.0 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
For authors
Description
Please:
I have developed a set of codes for loading a series of biomedical knowledge graph-based datasets, i.e., PrimeKG, StarkQA-PrimeKG, and BioBridge-PrimeKG. Following are the detailed key updates:
aiagents4pharma/talk2knowledgegraphs/datasets
, I included an abstract class ofDataset
followed by a set of implementation classes ofPrimeKG
,StarkQAPrimeKG
, andBioBridgePrimeKG
for loading the corresponding datasets.aiagents4pharma/talk2knowledgegraphs/tests/*
docs/notebooks/talk2knowledgegraphs/*
.docs/talk2knowledgegraphs/*
.Fixes #32
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests you conducted to verify your changes. These may involve creating new test scripts or updating existing ones.
tests
folderaiagents4pharma/talk2knowledgegraphs/tests/*
)Checklist
tests
folder) that prove my fix is effective or that my feature worksFor reviewers
Checklist pre-approval
Checklist post-approval
develop
intomain
? If so, please make sure to add a prefix (feat/fix/chore) and/or a suffix BREAKING CHANGE (if it's a major release) to your commit message.Checklist post-merge
develop
intomain
and is it suppose to run an automated release workflow (if applicable)? If so, please make sure to check under the "Actions" tab to see if the workflow has been initiated, and return later to verify that it has completed successfully.