Recursive Retriever + Document Agents

This guide shows how to combine recursive retrieval and “document agents” for advanced decision making over heterogeneous documents.

There are two motivating factors that lead to solutions for better retrieval:

  • Decoupling retrieval embeddings from chunk-based synthesis. Oftentimes fetching documents by their summaries will return more relevant context to queries rather than raw chunks. This is something that recursive retrieval directly allows.

  • Within a document, users may need to dynamically perform tasks beyond fact-based question-answering. We introduce the concept of “document agents” - agents that have access to both vector search and summary tools for a given document.

Setup and Download Data

In this section, we’ll define imports and then download Wikipedia articles about different cities. Each article is stored separately.

from llama_index import (
    VectorStoreIndex,
    SummaryIndex,
    SimpleKeywordTableIndex,
    SimpleDirectoryReader,
    ServiceContext,
)
from llama_index.schema import IndexNode
from llama_index.tools import QueryEngineTool, ToolMetadata
from llama_index.llms import OpenAI
wiki_titles = ["Toronto", "Seattle", "Chicago", "Boston", "Houston"]
from pathlib import Path

import requests

for title in wiki_titles:
    response = requests.get(
        "https://en.wikipedia.org/w/api.php",
        params={
            "action": "query",
            "format": "json",
            "titles": title,
            "prop": "extracts",
            # 'exintro': True,
            "explaintext": True,
        },
    ).json()
    page = next(iter(response["query"]["pages"].values()))
    wiki_text = page["extract"]

    data_path = Path("data")
    if not data_path.exists():
        Path.mkdir(data_path)

    with open(data_path / f"{title}.txt", "w") as fp:
        fp.write(wiki_text)
# Load all wiki documents
city_docs = {}
for wiki_title in wiki_titles:
    city_docs[wiki_title] = SimpleDirectoryReader(
        input_files=[f"data/{wiki_title}.txt"]
    ).load_data()

Define LLM + Service Context + Callback Manager

llm = OpenAI(temperature=0, model="gpt-3.5-turbo")
service_context = ServiceContext.from_defaults(llm=llm)

Build Document Agent for each Document

In this section we define “document agents” for each document.

First we define both a vector index (for semantic search) and summary index (for summarization) for each document. The two query engines are then converted into tools that are passed to an OpenAI function calling agent.

This document agent can dynamically choose to perform semantic search or summarization within a given document.

We create a separate document agent for each city.

from llama_index.agent import OpenAIAgent

# Build agents dictionary
agents = {}

for wiki_title in wiki_titles:
    # build vector index
    vector_index = VectorStoreIndex.from_documents(
        city_docs[wiki_title], service_context=service_context
    )
    # build summary index
    summary_index = SummaryIndex.from_documents(
        city_docs[wiki_title], service_context=service_context
    )
    # define query engines
    vector_query_engine = vector_index.as_query_engine()
    list_query_engine = summary_index.as_query_engine()

    # define tools
    query_engine_tools = [
        QueryEngineTool(
            query_engine=vector_query_engine,
            metadata=ToolMetadata(
                name="vector_tool",
                description=f"Useful for summarization questions related to {wiki_title}",
            ),
        ),
        QueryEngineTool(
            query_engine=list_query_engine,
            metadata=ToolMetadata(
                name="summary_tool",
                description=f"Useful for retrieving specific context from {wiki_title}",
            ),
        ),
    ]

    # build agent
    function_llm = OpenAI(model="gpt-3.5-turbo-0613")
    agent = OpenAIAgent.from_tools(
        query_engine_tools,
        llm=function_llm,
        verbose=True,
    )

    agents[wiki_title] = agent

Build Recursive Retriever over these Agents

Now we define a set of summary nodes, where each node links to the corresponding Wikipedia city article. We then define a RecursiveRetriever on top of these Nodes to route queries down to a given node, which will in turn route it to the relevant document agent.

We finally define a full query engine combining RecursiveRetriever into a RetrieverQueryEngine.

# define top-level nodes
nodes = []
for wiki_title in wiki_titles:
    # define index node that links to these agents
    wiki_summary = (
        f"This content contains Wikipedia articles about {wiki_title}. "
        f"Use this index if you need to lookup specific facts about {wiki_title}.\n"
        "Do not use this index if you want to analyze multiple cities."
    )
    node = IndexNode(text=wiki_summary, index_id=wiki_title)
    nodes.append(node)
# define top-level retriever
vector_index = VectorStoreIndex(nodes)
vector_retriever = vector_index.as_retriever(similarity_top_k=1)
# define recursive retriever
from llama_index.retrievers import RecursiveRetriever
from llama_index.query_engine import RetrieverQueryEngine
from llama_index.response_synthesizers import get_response_synthesizer
# note: can pass `agents` dict as `query_engine_dict` since every agent can be used as a query engine
recursive_retriever = RecursiveRetriever(
    "vector",
    retriever_dict={"vector": vector_retriever},
    query_engine_dict=agents,
    verbose=True,
)

Define Full Query Engine

This query engine uses the recursive retriever + response synthesis module to synthesize a response.

response_synthesizer = get_response_synthesizer(
    # service_context=service_context,
    response_mode="compact",
)
query_engine = RetrieverQueryEngine.from_args(
    recursive_retriever,
    response_synthesizer=response_synthesizer,
    service_context=service_context,
)

Running Example Queries

# should use Boston agent -> vector tool
response = query_engine.query("Tell me about the sports teams in Boston")
Retrieving with query id None: Tell me about the sports teams in Boston
Retrieved node with id, entering: Boston
Retrieving with query id Boston: Tell me about the sports teams in Boston
=== Calling Function ===
Calling function: vector_tool with args: {
  "input": "Boston sports teams"
}
Got output: Boston has teams in the four major North American men's professional sports leagues: Major League Baseball (MLB), National Football League (NFL), National Basketball Association (NBA), and National Hockey League (NHL). The city is home to the Boston Red Sox (MLB), New England Patriots (NFL), Boston Celtics (NBA), and Boston Bruins (NHL). These teams have collectively won 39 championships in their respective leagues. Additionally, Boston has a Major League Soccer (MLS) team called the New England Revolution.
========================
Got response: Boston is home to several professional sports teams in the major North American leagues. Here are the teams:

1. Boston Red Sox (MLB): The Red Sox are one of the oldest and most successful baseball teams in MLB. They have won multiple World Series championships, including recent victories in 2004, 2007, 2013, and 2018.

2. New England Patriots (NFL): The Patriots are one of the most successful teams in NFL history. Led by legendary quarterback Tom Brady, they have won six Super Bowl championships, including victories in 2001, 2003, 2004, 2014, 2016, and 2018.

3. Boston Celtics (NBA): The Celtics are one of the most storied franchises in NBA history. They have won a record 17 NBA championships, including notable victories in the 1960s and recent success in 2008.

4. Boston Bruins (NHL): The Bruins are a successful NHL team with a passionate fan base. They have won six Stanley Cup championships, with victories in 1929, 1939, 1941, 1970, 1972, and 2011.

In addition to these major sports teams, Boston also has a Major League Soccer (MLS) team called the New England Revolution. The Revolution play their home games at Gillette Stadium in Foxborough, Massachusetts.

Overall, Boston has a rich sports culture and a history of success in various sports leagues. The city's teams have a dedicated fan base and are an integral part of the local community.

print(response)
Boston is home to several professional sports teams in the major North American leagues. The city has teams in MLB, NFL, NBA, and NHL. The Boston Red Sox are a successful baseball team with multiple World Series championships. The New England Patriots are a dominant NFL team with six Super Bowl championships. The Boston Celtics have a rich history in the NBA, winning a record 17 NBA championships. The Boston Bruins are a successful NHL team with six Stanley Cup championships. Additionally, Boston has a Major League Soccer team called the New England Revolution. Overall, Boston has a strong sports culture and its teams have a dedicated fan base.
# should use Houston agent -> vector tool
response = query_engine.query("Tell me about the sports teams in Houston")
Retrieving with query id None: Tell me about the sports teams in Houston
Retrieved node with id, entering: Houston
Retrieving with query id Houston: Tell me about the sports teams in Houston
Got response: Houston is home to several professional sports teams across different leagues. Here are some of the major sports teams in Houston:

1. Houston Texans (NFL): The Houston Texans are a professional football team and compete in the National Football League (NFL). They were established in 2002 and play their home games at NRG Stadium.

2. Houston Rockets (NBA): The Houston Rockets are a professional basketball team and compete in the National Basketball Association (NBA). They were established in 1967 and have won two NBA championships. The Rockets play their home games at the Toyota Center.

3. Houston Astros (MLB): The Houston Astros are a professional baseball team and compete in Major League Baseball (MLB). They were established in 1962 and have won one World Series championship. The Astros play their home games at Minute Maid Park.

4. Houston Dynamo (MLS): The Houston Dynamo is a professional soccer team and compete in Major League Soccer (MLS). They were established in 2005 and have won two MLS Cup championships. The Dynamo play their home games at BBVA Stadium.

5. Houston Dash (NWSL): The Houston Dash is a professional women's soccer team and compete in the National Women's Soccer League (NWSL). They were established in 2013 and have won one NWSL Challenge Cup. The Dash also play their home games at BBVA Stadium.

These are just a few of the sports teams in Houston. The city also has minor league baseball, basketball, and hockey teams, as well as college sports teams representing universities in the area.

print(response)
Houston is home to several professional sports teams across different leagues. Some of the major sports teams in Houston include the Houston Texans (NFL), Houston Rockets (NBA), Houston Astros (MLB), Houston Dynamo (MLS), and Houston Dash (NWSL). These teams compete in football, basketball, baseball, soccer, and women's soccer respectively. Additionally, Houston also has minor league baseball, basketball, and hockey teams, as well as college sports teams representing universities in the area.
# should use Seattle agent -> summary tool
response = query_engine.query(
    "Give me a summary on all the positive aspects of Chicago"
)
Retrieving with query id None: Give me a summary on all the positive aspects of Chicago
Retrieved node with id, entering: Chicago
Retrieving with query id Chicago: Give me a summary on all the positive aspects of Chicago
=== Calling Function ===
Calling function: summary_tool with args: {
  "input": "positive aspects of Chicago"
}
Got output: Chicago is known for its vibrant arts and culture scene, with numerous museums, theaters, and galleries that showcase a wide range of artistic expressions. The city is also home to several prestigious universities and colleges, including the University of Chicago, Northwestern University, and Illinois Institute of Technology, which consistently rank among the top "National Universities" in the United States. These institutions offer excellent educational opportunities for students in various fields of study. Chicago's culinary scene is also renowned, with regional specialties like deep-dish pizza, Chicago-style hot dogs, and Italian beef sandwiches. The city's diverse population has contributed to a unique food culture, with dishes like Chicken Vesuvio, the Puerto Rican-influenced jibarito, and the Maxwell Street Polish reflecting its cultural melting pot. Overall, Chicago embraces its cultural diversity through its arts, education, and culinary offerings.
========================
Got response: Chicago is known for its vibrant arts and culture scene, with numerous museums, theaters, and galleries that showcase a wide range of artistic expressions. The city is also home to several prestigious universities and colleges, including the University of Chicago, Northwestern University, and Illinois Institute of Technology, which consistently rank among the top "National Universities" in the United States. These institutions offer excellent educational opportunities for students in various fields of study. Chicago's culinary scene is also renowned, with regional specialties like deep-dish pizza, Chicago-style hot dogs, and Italian beef sandwiches. The city's diverse population has contributed to a unique food culture, with dishes like Chicken Vesuvio, the Puerto Rican-influenced jibarito, and the Maxwell Street Polish reflecting its cultural melting pot. Overall, Chicago embraces its cultural diversity through its arts, education, and culinary offerings.

print(response)
Chicago is known for its vibrant arts and culture scene, with numerous museums, theaters, and galleries that showcase a wide range of artistic expressions. The city is also home to several prestigious universities and colleges, including the University of Chicago, Northwestern University, and Illinois Institute of Technology, which consistently rank among the top "National Universities" in the United States. These institutions offer excellent educational opportunities for students in various fields of study. Chicago's culinary scene is also renowned, with regional specialties like deep-dish pizza, Chicago-style hot dogs, and Italian beef sandwiches. The city's diverse population has contributed to a unique food culture, with dishes like Chicken Vesuvio, the Puerto Rican-influenced jibarito, and the Maxwell Street Polish reflecting its cultural melting pot. Overall, Chicago embraces its cultural diversity through its arts, education, and culinary offerings.