Tiny GraphRAG
A tiny 1000 line implementation of the GraphRAG algorithm using only language models that run locally. This implementation is designed to be easy to be easily understandable, hackable and forkable and not dependent on any framework.
Notably it does not use OpenAI or any commercial LLM providers and can be run locally.
| Component | Implementation |
|---|---|
| Vector Database | pgvector |
| Embedding Model | sentence-transformers |
| Language Model | meta-llama/Llama-3.2-3B |
| Entity Extractor | GLiNER |
| Relation Extract | GLiREL |
| Graph Database | networkx |
| Inference | llama-cpp |
Usage
First clone the repository:
cd tiny-graphrag
To install the dependencies run:
To setup the local Postgres vector database in a docker container run:
Then create the database tables with:
To build the graph and embeddings for a document run:
This will run for some time and when complete with the metadata.
Graph saved to: graphs/1_graph.pkl
Then we can query the database with the following modes:
- Local Search: Uses the graph structure to find relevant entities and their relationships, providing context-aware results based on the document's knowledge graph.
- Global Search: Analyzes document communities and their summaries to provide high-level insights and thematic understanding.
- Naive RAG: Combines vector similarity and keyword matching using Reciprocal Rank Fusion to find relevant text chunks directly.
Choose the search mode based on your needs:
- Use
localfor specific factual queries where relationships between entities matter - Use
globalfor thematic questions or high-level document understanding - Use
naivefor simple queries where direct text chunk matching is sufficient
poetry run graphrag query local --graph graphs/1_graph.pkl "What did Barack Obama study at Columbia University?"
# Global search
poetry run graphrag query global --graph graphs/1_graph.pkl --doc-id 1 "What are the main themes of this document?"
# Naive RAG
poetry run graphrag query naive "What efforts did Barack Obama due to combat climate change?"
Performance
The pipeline performance can be improved by using either Apple Silicon or a Nvidia GPU to drive the embedding and language model inference. This will fallback to CPU if no other options are available but the addition of dedicated hardware will improve the performance.
Apple Silicon
To run with acceleration you'll need the apple extras with thinc-apple-ops
installed.
Nvidia GPU
To run with acceleration you'll need the nvidia extras to run transformers
and flash-attn.
Library
You can also use this library directly.
# Initialize the database first
engine = init_db("postgresql://admin:admin@localhost:5432/tiny-graphrag")
# Process and store a document
doc_id, graph_path = store_document(
filepath="data/Barack_Obama.txt",
title="Barack Obama Wikipedia",
engine=engine
)
# Create query engine with the database connection
query_engine = QueryEngine(engine)
# Local search
result = query_engine.local_search(
query="What did Barack Obama study at Columbia University?",
graph_path=graph_path
)
# Global search
result = query_engine.global_search(
query="What are the main themes of this document?",
doc_id=doc_id
)
# Naive RAG
result = query_engine.naive_search(
query="What efforts did Barack Obama due to combat climate change?"
)
Custom Entities and Relations
You can customize the entity types and relation types when storing documents. Here's an example using geographic entities:
geography_entity_types = [
"City",
"Country",
"Landmark",
"River"
]
# Define custom relation types with constraints
geography_relation_types = {
"capital of": {
"allowed_head": ["City"],
"allowed_tail": ["Country"]
},
"flows through": {
"allowed_head": ["River"],
"allowed_tail": ["City", "Country"]
},
"located in": {
"allowed_head": ["City", "Landmark"],
"allowed_tail": ["Country"]
}
}
# Store document with custom types
doc_id, graph_path = store_document(
filepath="data/geography.txt",
title="World Geography",
engine=engine,
entity_types=geography_entity_types,
relation_types=geography_relation_types,
)
The constraints ensure that relationships make semantic sense, e.g., only cities can be capitals of countries.
Modules
main.py- CLI entry point and command handlersbuild.py- Core document processing pipeline for constructing knowledge graphs from textquery.py- Query processing and response generationchunking.py- Text segmentation and preprocessing utilitiescommunities.py- Graph community segmentationconfig.py- Configuration settings and environment variablesdb.py- Database connection and vector store managemententity_types.py- Entity type definitions and classificationrel_types.py- Relationship type definitions between entitiesextract.py- Named entity recognition and relation extractionprompts.py- LLM prompt templatessearch.py- Vector similarity search and ranking algorithmsvisualize.py- Graph and community visualization
License
MIT License