Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: embeddings #58

Merged
merged 17 commits into from
Jan 14, 2025
Merged

feat: embeddings #58

merged 17 commits into from
Jan 14, 2025

Conversation

awmulyadi
Copy link
Contributor

@awmulyadi awmulyadi commented Jan 13, 2025

For authors

Description

I have developed a set of codes for performing embedding using textual and protein sequences as a pair of examples on StarkQA-PrimeKG and BioBridge-PrimeKG dataset.

Following are several detailed key updates:

  1. Source Code: Within the folder of aiagents4pharma/talk2knowledgegraphs/utils/embeddings, I included an abstract class of Embeddings followed by a set of implementation classes of EmbeddingWithSentenceTransformer and EmbeddingWithHuggingFace for embeddings.
  2. Pytest: I included test cases of the above classes under the folder of aiagents4pharma/talk2knowledgegraphs/tests/*
  3. Tutorial Notebooks: I added an interactive notebook for showcasing the use cases of the embedding classes under docs/notebooks/talk2knowledgegraphs/*. In addition, I also included a tutorial for evaluating StarQA-PrimeKG with Vector Similarity Search (VSS) model using textual embeddings. Finally, there is a dedicated tutorial for multimodal embedding alignment tutorial using linear regression and neural network as examples.
  4. Documentation: Finally, I updated the related documentation using mkdocs, which is available in docs/talk2knowledgegraphs/*.

Note: There is an issue related to the sentence-transformer library that caused failing tests on macos-latest. For the time being, I set the tests on macOS-13 instead and opened a separate issue for this one at #59 .

Fixes #33

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests you conducted to verify your changes. These may involve creating new test scripts or updating existing ones.

  • Added new test(s) in the tests folder
  • Added new function(s) to an existing test(s) (i.e., aiagents4pharma/talk2knowledgegraphs/tests/*)

Checklist

  • My code follows the style guidelines mentioned in the Code/DevOps guides
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (e.g. MkDocs)
  • My changes generate no new warnings
  • I have added or updated tests (in the tests folder) that prove my fix is effective or that my feature works
  • New and existing tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

For reviewers

Checklist pre-approval

  • Is there enough documentation?
  • If a new feature has been added, or a bug fixed, has a test been added to confirm good behavior?
  • Does the test(s) successfully test edge/corner cases?
  • Does the PR pass the tests? (if the repository has continuous integration)

Checklist post-approval

  • Does this PR merge develop into main? If so, please make sure to add a prefix (feat/fix/chore) and/or a suffix BREAKING CHANGE (if it's a major release) to your commit message.
  • Does this PR close an issue? If so, please make sure to descriptively close this issue when the PR is merged.

Checklist post-merge

  • When you approve of the PR, merge and close it (Read this article to know about different merge methods on GitHub)
  • Did this PR merge develop into main and is it suppose to run an automated release workflow (if applicable)? If so, please make sure to check under the "Actions" tab to see if the workflow has been initiated, and return later to verify that it has completed successfully.

@awmulyadi awmulyadi requested a review from dmccloskey January 14, 2025 09:37
Copy link
Member

@dmccloskey dmccloskey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please double check the notebooks for typos. The corrections can be added to a separate PR.

@awmulyadi awmulyadi merged commit 589a878 into main Jan 14, 2025
6 checks passed
Copy link
Contributor

🎉 This PR is included in version 1.6.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

@awmulyadi awmulyadi deleted the feat/textual-embeddings branch January 20, 2025 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Transfer over text embeddings workflows Talk2KnowledgeGraphs
2 participants