Retriever Query Engine with Custom Retrievers - Simple Hybrid Searchο
In this tutorial, we show you how to define a very simple version of hybrid search!
Combine keyword lookup retrieval with vector retrieval using βANDβ and βORβ conditions.
Setupο
import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
from llama_index import (
VectorStoreIndex,
SimpleKeywordTableIndex,
SimpleDirectoryReader,
ServiceContext,
StorageContext,
)
from IPython.display import Markdown, display
INFO:numexpr.utils:Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
NumExpr defaulting to 8 threads.
/home/loganm/miniconda3/envs/llama-index/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
Load Dataο
We first show how to convert a Document into a set of Nodes, and insert into a DocumentStore.
# load documents
documents = SimpleDirectoryReader("../data/paul_graham").load_data()
# initialize service context (set chunk size)
service_context = ServiceContext.from_defaults(chunk_size=1024)
node_parser = service_context.node_parser
nodes = node_parser.get_nodes_from_documents(documents)
# initialize storage context (by default it's in-memory)
storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)
Define Vector Index and Keyword Table Index over Same Dataο
We build a vector index and keyword index over the same DocumentStore
vector_index = VectorStoreIndex(nodes, storage_context=storage_context)
keyword_index = SimpleKeywordTableIndex(nodes, storage_context=storage_context)
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 17050 tokens
> [build_index_from_nodes] Total embedding token usage: 17050 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 0 tokens
> [build_index_from_nodes] Total embedding token usage: 0 tokens
Define Custom Retrieverο
We now define a custom retriever class that can implement basic hybrid search with both keyword lookup and semantic search.
setting βANDβ means we take the intersection of the two retrieved sets
setting βORβ means we take the union
# import QueryBundle
from llama_index import QueryBundle
# import NodeWithScore
from llama_index.schema import NodeWithScore
# Retrievers
from llama_index.retrievers import (
BaseRetriever,
VectorIndexRetriever,
KeywordTableSimpleRetriever,
)
from typing import List
class CustomRetriever(BaseRetriever):
"""Custom retriever that performs both semantic search and hybrid search."""
def __init__(
self,
vector_retriever: VectorIndexRetriever,
keyword_retriever: KeywordTableSimpleRetriever,
mode: str = "AND",
) -> None:
"""Init params."""
self._vector_retriever = vector_retriever
self._keyword_retriever = keyword_retriever
if mode not in ("AND", "OR"):
raise ValueError("Invalid mode.")
self._mode = mode
def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
"""Retrieve nodes given query."""
vector_nodes = self._vector_retriever.retrieve(query_bundle)
keyword_nodes = self._keyword_retriever.retrieve(query_bundle)
vector_ids = {n.node.node_id for n in vector_nodes}
keyword_ids = {n.node.node_id for n in keyword_nodes}
combined_dict = {n.node.node_id: n for n in vector_nodes}
combined_dict.update({n.node.node_id: n for n in keyword_nodes})
if self._mode == "AND":
retrieve_ids = vector_ids.intersection(keyword_ids)
else:
retrieve_ids = vector_ids.union(keyword_ids)
retrieve_nodes = [combined_dict[rid] for rid in retrieve_ids]
return retrieve_nodes
Plugin Retriever into Query Engineο
Plugin retriever into a query engine, and run some queries
from llama_index import get_response_synthesizer
from llama_index.query_engine import RetrieverQueryEngine
# define custom retriever
vector_retriever = VectorIndexRetriever(index=vector_index, similarity_top_k=2)
keyword_retriever = KeywordTableSimpleRetriever(index=keyword_index)
custom_retriever = CustomRetriever(vector_retriever, keyword_retriever)
# define response synthesizer
response_synthesizer = get_response_synthesizer()
# assemble query engine
custom_query_engine = RetrieverQueryEngine(
retriever=custom_retriever,
response_synthesizer=response_synthesizer,
)
# vector query engine
vector_query_engine = RetrieverQueryEngine(
retriever=vector_retriever,
response_synthesizer=response_synthesizer,
)
# keyword query engine
keyword_query_engine = RetrieverQueryEngine(
retriever=keyword_retriever,
response_synthesizer=response_synthesizer,
)
response = custom_query_engine.query("What did the author do during his time at YC?")
INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens
> [retrieve] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 12 tokens
> [retrieve] Total embedding token usage: 12 tokens
INFO:llama_index.indices.keyword_table.retrievers:> Starting query: What did the author do during his time at YC?
> Starting query: What did the author do during his time at YC?
INFO:llama_index.indices.keyword_table.retrievers:query keywords: ['time', 'yc', 'author']
query keywords: ['time', 'yc', 'author']
INFO:llama_index.indices.keyword_table.retrievers:> Extracted keywords: ['time', 'yc']
> Extracted keywords: ['time', 'yc']
INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 1919 tokens
> [get_response] Total LLM token usage: 1919 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
> [get_response] Total embedding token usage: 0 tokens
print(response)
The author worked on YC, wrote essays, worked on a new version of Arc, wrote Hacker News in Arc, wrote YC's internal software in Arc, and dealt with disputes between cofounders, figuring out when people were lying to them, and fighting with people who maltreated the startups.
# hybrid search can allow us to not retrieve nodes that are irrelevant
# Yale is never mentioned in the essay
response = custom_query_engine.query("What did the author do during his time at Yale?")
INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens
> [retrieve] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 11 tokens
> [retrieve] Total embedding token usage: 11 tokens
INFO:llama_index.indices.keyword_table.retrievers:> Starting query: What did the author do during his time at Yale?
> Starting query: What did the author do during his time at Yale?
INFO:llama_index.indices.keyword_table.retrievers:query keywords: ['yale', 'time', 'author']
query keywords: ['yale', 'time', 'author']
INFO:llama_index.indices.keyword_table.retrievers:> Extracted keywords: ['time']
> Extracted keywords: ['time']
INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 0 tokens
> [get_response] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
> [get_response] Total embedding token usage: 0 tokens
print(str(response))
len(response.source_nodes)
None
0
# in contrast, vector search will return an answer
response = vector_query_engine.query("What did the author do during his time at Yale?")
INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens
> [retrieve] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 11 tokens
> [retrieve] Total embedding token usage: 11 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 1871 tokens
> [get_response] Total LLM token usage: 1871 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
> [get_response] Total embedding token usage: 0 tokens
print(str(response))
len(response.source_nodes)
The author did not attend Yale. The context information provided is about the author's work before and after college.
2