Under development!
IbisGraph brings graph processing capabilities to your data warehouse or lake house by implementing the Pregel computation model on top of Ibis. This means you can perform graph analytics directly where your data lives, without moving it to specialized graph databases or in-memory systems.
Key benefits:
- Process graph data in your existing data infrastructure
- Scale with your warehouse/lake resources
- Maintain data governance and security
- Leverage SQL engine optimizations
Supported backends include:
- DuckDB
- PostgreSQL
- SQLite
- Snowflake
- BigQuery
- Apache Spark
- And many others supported by Ibis
Install IbisGraph using pip:
pip install ibisgraph
You'll also need to install the appropriate Ibis backend. For example:
# For DuckDB
pip install "ibis-framework[duckdb]"
# For PostgreSQL
pip install "ibis-framework[postgres]"
# For Snowflake
pip install "ibis-framework[snowflake]"
Basic usage:
import ibis
import ibisgraph as ig
# Connect to your database
conn = ibis.duckdb.connect()
# Create a graph
graph = ig.Graph(nodes_table, edges_table)
# Run algorithms
pagerank = ig.centrality.pagerank(graph)
communities = ig.clustering.label_propagation(graph)
similarities = ig.similarity.node_similarity(graph)
For more detailed examples, check our documentation.
Is it a replacement for graph libraries like NetworkX or IGraph?
- No, IbisGraph is not a replacement for traditional graph libraries. While it implements graph algorithms using Pregel (which can be expressed in SQL), it will generally be slower than specialized implementations. Its value comes from being able to process graph data where it already lives.
Will it work on Databricks, Snowflake, PostgreSQL, etc.?
- Yes. IbisGraph works with any backend supported by Ibis.
Why Pregel?
- Pregel operations can be naturally expressed using SQL operations, making it ideal for implementing graph algorithms in data warehouses and lakes.
Is it better than GraphFrames for PySpark users?
- As a GraphFrames committer, I can say that GraphFrames algorithms are generally better optimized for Apache Spark. However, IbisGraph provides a more Pythonic API and doesn't require JVM configuration.
When should I use IbisGraph?
- Use IbisGraph when you need to process connected data stored in a database, datalake, or warehouse system without moving it out. While algorithms may run slower compared to specialized tools like Neo4j, the main advantage is processing data in place.
Implemented:
- Graph abstraction using Ibis Tables
- Degree calculations (in/out/total)
- Jaccard similarity index
- Pregel computation framework
- PageRank algorithm
- Shortest Paths
- Label Propagation
Coming soon:
- Weakly Connected Components
- Strongly Connected Components
- Attribute Propagation
- Random Walks
- Node2vec
- Gremlin support
- OpenCypher support
- The feature you will suggest
We welcome contributions! Here's how to get started:
- Clone the repository:
git clone https://github.com/SemyonSinchenko/ibisgraph.git
cd ibisgraph
- Create a virtual environment:
python -m venv venv
source venv/bin/activate # or `venv\Scripts\activate` on Windows
- Install development dependencies:
uv sync --all-groups
We use:
-
Pick an Issue
- Check existing issues or create a new one
- Comment on the issue you want to work on
-
Fork & Branch
- Fork the repository
- Create a feature branch
-
Development
- Write tests first
- Implement your changes
- Run tests:
pytest
- Run linter:
ruff check .
- Format code:
ruff format .
-
Submit PR
- Create a Pull Request
- Wait for review
- Address feedback
IbisGraph follows the Benevolent Dictator governance model. While we welcome all contributions, final decisions rest with the project maintainer to ensure consistent direction.