Kùzu Graph Store#

This notebook walks through configuring Kùzu to be the backend for graph storage in LlamaIndex.

%pip install llama-index-llms-openai
%pip install llama-index-graph-stores-kuzu

# My OpenAI Key
import os

os.environ["OPENAI_API_KEY"] = "API_KEY_HERE"

import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)

Prepare for Kùzu#

# Clean up all the directories used in this notebook
import shutil

shutil.rmtree("./test1", ignore_errors=True)
shutil.rmtree("./test2", ignore_errors=True)
shutil.rmtree("./test3", ignore_errors=True)

%pip install kuzu
import kuzu

db = kuzu.Database("test1")

Collecting kuzu
  Downloading kuzu-0.0.6-cp39-cp39-macosx_11_0_arm64.whl (5.5 MB)
     |████████████████████████████████| 5.5 MB 4.8 MB/s eta 0:00:01
?25hInstalling collected packages: kuzu
Successfully installed kuzu-0.0.6
WARNING: You are using pip version 21.2.4; however, version 23.2.1 is available.
You should consider upgrading via the '/Users/loganmarkewich/llama_index/llama-index/bin/python -m pip install --upgrade pip' command.
Note: you may need to restart the kernel to use updated packages.

Using Knowledge Graph with KuzuGraphStore#

from llama_index.graph_stores.kuzu import KuzuGraphStore

graph_store = KuzuGraphStore(db)

INFO:numexpr.utils:Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.

Building the Knowledge Graph#

from llama_index.core import SimpleDirectoryReader, KnowledgeGraphIndex

from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
from IPython.display import Markdown, display
import kuzu

documents = SimpleDirectoryReader(
    "../../../../examples/paul_graham_essay/data"
).load_data()

# define LLM

llm = OpenAI(temperature=0, model="gpt-3.5-turbo")
Settings.llm = llm
Settings.chunk_size = 512

from llama_index.core import StorageContext

storage_context = StorageContext.from_defaults(graph_store=graph_store)

# NOTE: can take a while!
index = KnowledgeGraphIndex.from_documents(
    documents,
    max_triplets_per_chunk=2,
    storage_context=storage_context,
)

Querying the Knowledge Graph#

First, we can query and send only the triplets to the LLM.

query_engine = index.as_query_engine(
    include_text=False, response_mode="tree_summarize"
)
response = query_engine.query(
    "Tell me more about Interleaf",
)

INFO:llama_index.indices.knowledge_graph.retriever:> Starting query: Tell me more about Interleaf

INFO:llama_index.indices.knowledge_graph.retriever:> Query keywords: ['Interleaf']
ERROR:llama_index.indices.knowledge_graph.retriever:Index was not constructed with embeddings, skipping embedding usage...
INFO:llama_index.indices.knowledge_graph.retriever:> Extracted relationships: The following are knowledge sequence in max depth 2 in the form of `subject [predicate, object, predicate_next_hop, object_next_hop ...]`
Interleaf ['made', 'software for creating documents']
Interleaf ['added', 'scripting language']
Interleaf ['taught', 'what not to do']

display(Markdown(f"<b>{response}</b>"))

Interleaf is a company that made software for creating documents. They also added a scripting language to their software. Additionally, they taught what not to do.

For more detailed answers, we can also send the text from where the retrieved tripets were extracted.

query_engine = index.as_query_engine(
    include_text=True, response_mode="tree_summarize"
)
response = query_engine.query(
    "Tell me more about Interleaf",
)

INFO:llama_index.indices.knowledge_graph.retriever:> Starting query: Tell me more about Interleaf
INFO:llama_index.indices.knowledge_graph.retriever:> Query keywords: ['Interleaf']
ERROR:llama_index.indices.knowledge_graph.retriever:Index was not constructed with embeddings, skipping embedding usage...
INFO:llama_index.indices.knowledge_graph.retriever:> Querying with idx: 144f784c-d052-4fed-86f8-c895da6e13df: each student had. But the Accademia wasn't teaching me anything except Italia...
INFO:llama_index.indices.knowledge_graph.retriever:> Querying with idx: 7c877dd3-3375-4ab7-8745-e0dfbabfe5bd: learned some useful things at Interleaf, though they were mostly about what n...
INFO:llama_index.indices.knowledge_graph.retriever:> Extracted relationships: The following are knowledge sequence in max depth 2 in the form of `subject [predicate, object, predicate_next_hop, object_next_hop ...]`
Interleaf ['made', 'software for creating documents']
Interleaf ['added', 'scripting language']
Interleaf ['taught', 'what not to do']

display(Markdown(f"<b>{response}</b>"))

Interleaf was a company that made software for creating documents. They were inspired by Emacs and added a scripting language to their software, which was a dialect of Lisp. The company hired a Lisp hacker to write things in this scripting language. The narrator worked at Interleaf for a year but admits to being a bad employee. They found it difficult to understand most of the software because it was primarily written in C, a language they did not know or want to learn. Despite this, they were paid well and managed to save enough money to go back to RISD and pay off their college loans. The narrator also learned some valuable lessons at Interleaf, such as the importance of having product people rather than sales people running technology companies, the drawbacks of having too many people edit code, the impact of office space on productivity, the value of corridor conversations over planned meetings, the challenges of dealing with big bureaucratic customers, and the importance of being the “entry level” option in a market.

Query with embeddings#

# NOTE: can take a while!
db = kuzu.Database("test2")
graph_store = KuzuGraphStore(db)
storage_context = StorageContext.from_defaults(graph_store=graph_store)
new_index = KnowledgeGraphIndex.from_documents(
    documents,
    max_triplets_per_chunk=2,
    storage_context=storage_context,
    include_embeddings=True,
)

WARNING:llama_index.llms.openai_utils:Retrying llama_index.llms.openai_utils.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised ServiceUnavailableError: The server is overloaded or not ready yet..

rel_map = graph_store.get_rel_map()

# query using top 3 triplets plus keywords (duplicate triplets are removed)
query_engine = index.as_query_engine(
    include_text=True,
    response_mode="tree_summarize",
    embedding_mode="hybrid",
    similarity_top_k=5,
)
response = query_engine.query(
    "Tell me more about what the author worked on at Interleaf",
)

INFO:llama_index.indices.knowledge_graph.retriever:> Starting query: Tell me more about what the author worked on at Interleaf
INFO:llama_index.indices.knowledge_graph.retriever:> Query keywords: ['Interleaf', 'author', 'worked']
ERROR:llama_index.indices.knowledge_graph.retriever:Index was not constructed with embeddings, skipping embedding usage...
INFO:llama_index.indices.knowledge_graph.retriever:> Querying with idx: 144f784c-d052-4fed-86f8-c895da6e13df: each student had. But the Accademia wasn't teaching me anything except Italia...
INFO:llama_index.indices.knowledge_graph.retriever:> Querying with idx: 7c877dd3-3375-4ab7-8745-e0dfbabfe5bd: learned some useful things at Interleaf, though they were mostly about what n...
INFO:llama_index.indices.knowledge_graph.retriever:> Extracted relationships: The following are knowledge sequence in max depth 2 in the form of `subject [predicate, object, predicate_next_hop, object_next_hop ...]`
Interleaf ['made', 'software for creating documents']
Interleaf ['added', 'scripting language']
Interleaf ['taught', 'what not to do']

display(Markdown(f"<b>{response}</b>"))

At Interleaf, the author worked on creating software for creating documents. They also worked on adding a scripting language, which was inspired by Emacs and was a dialect of Lisp. However, the author admits to being a bad employee and not fully understanding the software, as it was primarily written in C. They also mention that they spent a lot of time working on their book “On Lisp” during their time at Interleaf. Overall, the author learned some useful things at Interleaf, particularly about what not to do in technology companies.

Visualizing the Graph#

%pip install pyvis

Collecting pyvis
  Downloading pyvis-0.3.2-py3-none-any.whl (756 kB)
     |████████████████████████████████| 756 kB 2.0 MB/s eta 0:00:01
?25hCollecting jsonpickle>=1.4.1
  Downloading jsonpickle-3.0.1-py2.py3-none-any.whl (40 kB)
     |████████████████████████████████| 40 kB 4.1 MB/s eta 0:00:01
?25hRequirement already satisfied: networkx>=1.11 in /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages (from pyvis) (3.1)
Requirement already satisfied: ipython>=5.3.0 in /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages (from pyvis) (8.10.0)
Requirement already satisfied: jinja2>=2.9.6 in /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages (from pyvis) (3.1.2)
Requirement already satisfied: pexpect>4.3 in /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages (from ipython>=5.3.0->pyvis) (4.8.0)
Requirement already satisfied: backcall in /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages (from ipython>=5.3.0->pyvis) (0.2.0)
Requirement already satisfied: decorator in /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages (from ipython>=5.3.0->pyvis) (5.1.1)
Requirement already satisfied: pickleshare in /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages (from ipython>=5.3.0->pyvis) (0.7.5)
Requirement already satisfied: prompt-toolkit<3.1.0,>=3.0.30 in /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages (from ipython>=5.3.0->pyvis) (3.0.39)
Requirement already satisfied: appnope in /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages (from ipython>=5.3.0->pyvis) (0.1.3)
Requirement already satisfied: pygments>=2.4.0 in /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages (from ipython>=5.3.0->pyvis) (2.15.1)
Requirement already satisfied: traitlets>=5 in /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages (from ipython>=5.3.0->pyvis) (5.9.0)
Requirement already satisfied: jedi>=0.16 in /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages (from ipython>=5.3.0->pyvis) (0.18.2)
Requirement already satisfied: matplotlib-inline in /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages (from ipython>=5.3.0->pyvis) (0.1.6)
Requirement already satisfied: stack-data in /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages (from ipython>=5.3.0->pyvis) (0.6.2)
Requirement already satisfied: parso<0.9.0,>=0.8.0 in /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages (from jedi>=0.16->ipython>=5.3.0->pyvis) (0.8.3)
Requirement already satisfied: MarkupSafe>=2.0 in /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages (from jinja2>=2.9.6->pyvis) (2.1.3)
Requirement already satisfied: ptyprocess>=0.5 in /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages (from pexpect>4.3->ipython>=5.3.0->pyvis) (0.7.0)
Requirement already satisfied: wcwidth in /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages (from prompt-toolkit<3.1.0,>=3.0.30->ipython>=5.3.0->pyvis) (0.2.6)
Requirement already satisfied: executing>=1.2.0 in /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages (from stack-data->ipython>=5.3.0->pyvis) (1.2.0)
Requirement already satisfied: asttokens>=2.1.0 in /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages (from stack-data->ipython>=5.3.0->pyvis) (2.2.1)
Requirement already satisfied: pure-eval in /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages (from stack-data->ipython>=5.3.0->pyvis) (0.2.2)
Requirement already satisfied: six in /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages (from asttokens>=2.1.0->stack-data->ipython>=5.3.0->pyvis) (1.16.0)
Installing collected packages: jsonpickle, pyvis
Successfully installed jsonpickle-3.0.1 pyvis-0.3.2
WARNING: You are using pip version 21.2.4; however, version 23.2.1 is available.
You should consider upgrading via the '/Users/loganmarkewich/llama_index/llama-index/bin/python -m pip install --upgrade pip' command.
Note: you may need to restart the kernel to use updated packages.

## create graph
from pyvis.network import Network

g = index.get_networkx_graph()
net = Network(notebook=True, cdn_resources="in_line", directed=True)
net.from_nx(g)
net.show("kuzugraph_draw.html")

kuzugraph_draw.html

[Optional] Try building the graph and manually add triplets!#

from llama_index.core.node_parser import SentenceSplitter

node_parser = SentenceSplitter()

nodes = node_parser.get_nodes_from_documents(documents)

# initialize an empty database
db = kuzu.Database("test3")
graph_store = KuzuGraphStore(db)
storage_context = StorageContext.from_defaults(graph_store=graph_store)
index = KnowledgeGraphIndex(
    [],
    storage_context=storage_context,
)

# add keyword mappings and nodes manually
# add triplets (subject, relationship, object)

# for node 0
node_0_tups = [
    ("author", "worked on", "writing"),
    ("author", "worked on", "programming"),
]
for tup in node_0_tups:
    index.upsert_triplet_and_node(tup, nodes[0])

# for node 1
node_1_tups = [
    ("Interleaf", "made software for", "creating documents"),
    ("Interleaf", "added", "scripting language"),
    ("software", "generate", "web sites"),
]
for tup in node_1_tups:
    index.upsert_triplet_and_node(tup, nodes[1])

query_engine = index.as_query_engine(
    include_text=False, response_mode="tree_summarize"
)
response = query_engine.query(
    "Tell me more about Interleaf",
)

INFO:llama_index.indices.knowledge_graph.retriever:> Starting query: Tell me more about Interleaf
INFO:llama_index.indices.knowledge_graph.retriever:> Query keywords: ['Interleaf']
ERROR:llama_index.indices.knowledge_graph.retriever:Index was not constructed with embeddings, skipping embedding usage...
INFO:llama_index.indices.knowledge_graph.retriever:> Extracted relationships: The following are knowledge sequence in max depth 2 in the form of `subject [predicate, object, predicate_next_hop, object_next_hop ...]`
Interleaf ['made software for', 'creating documents']
Interleaf ['added', 'scripting language']

str(response)

'Interleaf is a software company that specializes in creating documents. They have also added a scripting language to their software.'