Controllable Agents for RAG#

Adding agentic capabilities on top of your RAG pipeline can allow you to reason over much more complex questions.

But a big pain point for agents is the lack of steerability/transparency. An agent may tackle a user query through chain-of-thought/planning, which requires repeated calls to an LLM. During this process it can be hard to inspect what’s going on, or stop/correct execution in the middle.

This notebook shows you how to use our brand-new lower-level agent API, which allows controllable step-wise execution, on top of a RAG pipeline.

We showcase this over Wikipedia documents.

!pip install llama-index

Setup Data#

Here we load a simple dataset of different cities from Wikipedia.

from llama_index import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    StorageContext,
    load_index_from_storage,
    ServiceContext,
)
from llama_index.llms import OpenAI
from llama_index.tools import QueryEngineTool, ToolMetadata

# llm = OpenAI(model="gpt-3.5-turbo")
llm = OpenAI(model="gpt-4-1106-preview")
service_context = ServiceContext.from_defaults(llm=llm)

Download Data#

!mkdir -p 'data/10q/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10q/uber_10q_march_2022.pdf' -O 'data/10q/uber_10q_march_2022.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10q/uber_10q_june_2022.pdf' -O 'data/10q/uber_10q_june_2022.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10q/uber_10q_sept_2022.pdf' -O 'data/10q/uber_10q_sept_2022.pdf'

Load data#

march_2022 = SimpleDirectoryReader(
    input_files=["./data/10q/uber_10q_march_2022.pdf"]
).load_data()
june_2022 = SimpleDirectoryReader(
    input_files=["./data/10q/uber_10q_june_2022.pdf"]
).load_data()
sept_2022 = SimpleDirectoryReader(
    input_files=["./data/10q/uber_10q_sept_2022.pdf"]
).load_data()

Build indices/query engines/tools#

import os


def get_tool(name, full_name, documents=None):
    if not os.path.exists(f"./data/{name}"):
        # build vector index
        vector_index = VectorStoreIndex.from_documents(
            documents, service_context=service_context
        )
        vector_index.storage_context.persist(persist_dir=f"./data/{name}")
    else:
        vector_index = load_index_from_storage(
            StorageContext.from_defaults(persist_dir=f"./data/{name}"),
            service_context=service_context,
        )
    query_engine = vector_index.as_query_engine(similarity_top_k=3)
    query_engine_tool = QueryEngineTool(
        query_engine=query_engine,
        metadata=ToolMetadata(
            name=name,
            description=(
                "Provides information about Uber quarterly financials ending"
                f" {full_name}"
            ),
        ),
    )
    return query_engine_tool

march_tool = get_tool("march_2022", "March 2022", documents=march_2022)
june_tool = get_tool("june_2022", "June 2022", documents=june_2022)
sept_tool = get_tool("sept_2022", "September 2022", documents=sept_2022)

query_engine_tools = [march_tool, june_tool, sept_tool]

Setup Agent#

In this section we define our tools and setup the agent.

from llama_index.agent import (
    AgentRunner,
    OpenAIAgentWorker,
    OpenAIAgent,
    ReActAgent,
)
from llama_index.agent.openai.step import OpenAIAgentWorker

# openai_step_engine = OpenAIAgentWorker.from_tools(
#     query_engine_tools, llm=llm, verbose=True
# )
# agent = AgentRunner(openai_step_engine)
# # alternative
# agent = OpenAIAgent.from_tools(query_engine_tools, llm=llm, verbose=True)

agent_llm = OpenAI(model="gpt-3.5-turbo")
# agent_llm = OpenAI(model="gpt-4-1106-preview")

agent = ReActAgent.from_tools(
    query_engine_tools, llm=agent_llm, verbose=True, max_iterations=20
)

Run Some Queries#

We now demonstrate the capabilities of our step-wise agent framework.

We show how it can handle complex queries, both e2e as well as step by step.

We can then show how we can steer the outputs.

Out of the box#

Calling chat will attempt to run the task end-to-end, and we notice that it only ends up calling one tool.

response = agent.chat(
    "Analyze the changes in R&D expenditures and revenue in the past 3 quarters"
)

Added user message to memory: Analyze the changes in R&D expenditures and revenue in the past 3 quarters
Thought: To analyze the changes in R&D expenditures and revenue in the past 3 quarters, I can use the tools march_2022, june_2022, and sept_2022 to gather the financial information for each quarter.
Action: march_2022
Action Input: {'input': 'R&D expenditures, revenue'}
Observation: Research and development expenses for the three months ended March 31, 2022, were $587 million, which was a 14% increase from $515 million in the same period in 2021. Revenue for the three months ended March 31, 2022, was $6,854 million, a 136% increase from $2,903 million in the same period in 2021.
Thought: (Implicit) I can answer without any more tools!
Response: Observation: According to the information from the march_2022 tool, the research and development expenses for the three months ended March 31, 2022, were $587 million, which represents a 14% increase from $515 million in the same period in 2021. Additionally, the revenue for the three months ended March 31, 2022, was $6,854 million, showing a significant 136% increase from $2,903 million in the same period in 2021.

print(str(response))

Test Step-Wise Execution#

The end-to-end chat didn’t work. Let’s try to break it down step-by-step, and inject our own feedback if things are going wrong.

# start task
task = agent.create_task(
    "Analyze the changes in R&D expenditures and revenue in the past 3 quarters"
)

This returns a Task object, which contains the input, additional state in extra_state, and other fields.

Now let’s try executing a single step of this task.

step_output = agent.run_step(task.task_id)

Added user message to memory: Analyze the changes in R&D expenditures and revenue in the past 3 quarters
Thought: To analyze the changes in R&D expenditures and revenue in the past 3 quarters, I can use the tools that provide information about Uber's quarterly financials for each quarter.
Action: sept_22
Action Input: {'input': 'R&D expenditures, revenue'}
Observation: Research and development expenses for the three months ended September 30, 2022, were $760 million, which was a 54% increase from $493 million in the same period in 2021. For the nine months ended September 30, 2022, research and development expenses were $2,051 million, a 37% increase from $1,496 million in the same period in 2021. The percentage of revenue allocated to research and development was 9% for both the three and nine months ended September 30, 2022, compared to 10% and 13%, respectively, for the same periods in 2021.

step_output = agent.run_step(task.task_id)

Thought: (Implicit) I can answer without any more tools!
Response: Observation: The research and development expenses for the three months ended September 30, 2022, were $760 million, which is a 54% increase from $493 million in the same period in 2021. For the nine months ended September 30, 2022, the research and development expenses were $2,051 million, a 37% increase from $1,496 million in the same period in 2021. The percentage of revenue allocated to research and development was 9% for both the three and nine months ended September 30, 2022, compared to 10% and 13%, respectively, for the same periods in 2021.

We run into the same issue. The query finished even though we haven’t analyzed the docs yet! Can we add a user input?

step_output = agent.run_step(task.task_id, input="What about June?")

Added user message to memory: What about June?
Thought: To provide information about June, I can use the "june_22" tool.
Action: june_22
Action Input: {'input': 'R&D expenditures, revenue'}
Observation: Research and development expenses for the three months ended June 30, 2022, were $704 million, which was a 44% increase from $488 million in the same period in 2021. For the six months ended June 30, 2022, research and development expenses were $1,291 million, a 29% increase from $1,003 million in the same period in 2021. The percentage of revenue allocated to research and development decreased from 12% to 9% for the three-month period and from 15% to 9% for the six-month period year-over-year.

print(step_output.is_last)

False

step_output = agent.run_step(task.task_id, input="What about March?")

Added user message to memory: What about March?
Thought: To provide information about March, I can use the "march_22" tool.
Action: march_22
Action Input: {'input': 'R&D expenditures, revenue'}
Observation: Research and development expenses for the three months ended March 31, 2022, were $587 million, which was a 14% increase from $515 million in the same period in 2021. Revenue for the three months ended March 31, 2022, was $6,854 million, a 136% increase from $2,903 million in the same period in 2021.

step_output = agent.run_step(task.task_id)

Thought: I have analyzed the changes in R&D expenditures and revenue for the past 3 quarters using the available tools.
Response: Here are the observations for the changes in R&D expenditures and revenue for the past 3 quarters:

- September 2022:
  - R&D expenses: $760 million (54% increase from September 2021)
  - Revenue allocation to R&D: 9% (same as September 2021)

- June 2022:
  - R&D expenses: $704 million (44% increase from June 2021)
  - Revenue allocation to R&D: 9% (decreased from 12% in June 2021)

- March 2022:
  - R&D expenses: $587 million (14% increase from March 2021)
  - Revenue: $6,854 million (136% increase from March 2021)

Please note that the revenue information is only available for the March 2022 quarter.

Let me know if there's anything else I can help with!

Since the steps look good, we are now ready to call finalize_response, get back our response.

This will also commit the task execution to the memory object present in our agent_runner. We can inspect it.

response = agent.finalize_response(task.task_id)

print(str(response))

Here are the observations for the changes in R&D expenditures and revenue for the past 3 quarters:

- September 2022:
  - R&D expenses: $760 million (54% increase from September 2021)
  - Revenue allocation to R&D: 9% (same as September 2021)

- June 2022:
  - R&D expenses: $704 million (44% increase from June 2021)
  - Revenue allocation to R&D: 9% (decreased from 12% in June 2021)

- March 2022:
  - R&D expenses: $587 million (14% increase from March 2021)
  - Revenue: $6,854 million (136% increase from March 2021)

Please note that the revenue information is only available for the March 2022 quarter.

Let me know if there's anything else I can help with!

Setup Human In the Loop Chat#

With these capabilities, it’s easy to setup human-in-the-loop (or LLM-in-the-loop) feedback when interacting with an agent, especially for long-running tasks.

We setup a double-loop: one for the task (the user “chatting” with an agent), and the other to control the intermediate executions.

agent_llm = OpenAI(model="gpt-3.5-turbo")
# agent_llm = OpenAI(model="gpt-4-1106-preview")

agent = ReActAgent.from_tools(
    query_engine_tools, llm=agent_llm, verbose=True, max_iterations=20
)

def chat_repl(exit_when_done: bool = True):
    """Chat REPL.

    Args:
        exit_when_done(bool): if True, automatically exit when step is finished.
            Set to False if you want to keep going even if step is marked as finished by the agent.
            If False, you need to explicitly call "exit" to finalize a task execution.

    """
    task_message = None
    while task_message != "exit":
        task_message = input(">> Human: ")
        if task_message == "exit":
            break

        task = agent.create_task(task_message)

        response = None
        step_output = None
        message = None
        while message != "exit":
            if message is None or message == "":
                step_output = agent.run_step(task.task_id)
            else:
                step_output = agent.run_step(task.task_id, input=message)
            if exit_when_done and step_output.is_last:
                print(
                    ">> Task marked as finished by the agent, executing task execution."
                )
                break

            message = input(
                ">> Add feedback during step? (press enter/leave blank to continue, and type 'exit' to stop): "
            )
            if message == "exit":
                break

        if step_output is None:
            print(">> You haven't run the agent. Task is discarded.")
        elif not step_output.is_last:
            print(">> The agent hasn't finished yet. Task is discarded.")
        else:
            response = agent.finalize_response(task.task_id)
        print(f"Agent: {str(response)}")

chat_repl()

>> Human:  What are the risk factors in the last two quarters?

Added user message to memory: What are the risk factors in the last two quarters?
Thought: I need to use a tool to help me answer the question.
Action: march_2022
Action Input: {'input': 'risk factors'}
Observation: The risk factors affecting the business include:

- Significant reliance on Gross Bookings from trips in large metropolitan areas, which may be negatively affected by various conditions including economic, social, weather, regulatory conditions, and circumstances like COVID-19.
- Potential failure to offer autonomous vehicle technologies on the platform, or offering technologies that may be inferior or perceived as less safe compared to competitors.
- Dependence on retaining and attracting high-quality personnel, with the risk that attrition or unsuccessful succession planning could adversely affect the business.
- Risks of security or data privacy breaches, unauthorized access, alteration, or destruction of proprietary, confidential, employee, or platform user data.
- Threats from cyberattacks, such as malware, ransomware, viruses, spamming, and phishing attacks, which could harm reputation, business, and operating results.
- Climate change risks, including physical and transitional risks, that could adversely impact the business if not managed effectively.
- Commitments related to climate change that require significant investment and management time, with the potential need to revise timeframes for implementing these commitments due to circumstances beyond control.
- Reliance on third parties for distribution of products and offerings and for providing software used in certain products, with the risk that interference could adversely affect the business.
- The need for additional capital to support business growth, which might not be available on reasonable terms or at all.
- Risks associated with identifying, acquiring, and integrating suitable businesses, and the performance and integration of acquired businesses.
- The possibility of being blocked from or limited in providing or operating products and offerings in certain jurisdictions, potentially requiring modifications to the business model.
- Numerous legal and regulatory risks that could have an adverse impact on the business and future prospects.
- Extensive government regulation and oversight relating to the provision of payment and financial services.
- Risks related to the collection, use, transfer, disclosure, and other processing of data, which could result in legal actions and negative publicity.
- The need to protect intellectual property and the risk of incurring significant expenses if third parties claim misappropriation of their intellectual property.
- Volatility in the market price of common stock, which may not align with operating performance, potentially leading to the inability to resell shares at or above the purchase price.
- The COVID-19 pandemic's adverse impact on business, financial condition, and results of operations, with the potential for continued adverse effects.
- Adverse effects on the business if Drivers were classified as employees instead of independent contractors.
- Highly competitive mobility, delivery, and logistics industries with low barriers to entry and well-capitalized competitors.
- The possibility of lowering fares or service fees and offering Driver incentives and consumer discounts to remain competitive.
- A history of significant losses and the expectation of increased operating expenses without guaranteed profitability.
- The necessity of attracting or maintaining a critical mass of Drivers, consumers, merchants, shippers, and carriers to keep the platform appealing.
- The importance of maintaining and enhancing brand and reputation, with the risk that failure to do so will harm the business.
- Challenges related to historical workplace culture and the need for successful efforts to address these challenges.
- The risk of not effectively managing growth or optimizing organizational structure, which could adversely affect financial performance and future prospects.
- The potential for major safety incidents due to criminal, violent, inappropriate, or dangerous activity by platform users, impacting the ability to attract and retain users.
- Substantial investments in new offerings and technologies that are inherently risky and may not yield expected benefits.
- Dependence on the performance and reliability of Internet, mobile, and other infrastructures that are not under the company's control, with the risk of disruptions affecting the platform's availability and efficiency.

>> Add feedback during step? (press enter/leave blank to continue, and type 'exit' to stop):  I meant June and September

Added user message to memory: I meant June and September
Thought: I need to use a tool to help me answer the question.
Action: june_2022
Action Input: {'input': 'risk factors'}
Observation: The risk factors include:

Impact of COVID-19 pandemic or future disease outbreaks on business partners and third-party vendors, potentially leading to adverse impacts on the company's business, financial performance, and stock price.
Economic conditions affecting discretionary consumer spending, which may lead to shifts in consumer behavior towards lower-cost alternatives or reduced usage of the company's services.
Increases in fuel, food, labor, energy, and other costs due to inflation and other factors, which could increase operating costs for drivers, merchants, and carriers, potentially reducing their activity on the platform.
Dependence on the performance and reliability of Internet, mobile, and other infrastructures that are not under the company's control.
Risks associated with criminal, violent, inappropriate, or dangerous activity on the platform, which could affect the ability to attract and retain users.
Substantial investments in new offerings and technologies that are inherently risky and may not yield expected benefits.
Concentration of Gross Bookings in large metropolitan areas, which are susceptible to various conditions and circumstances.
Potential failure to offer autonomous vehicle technologies or to compete effectively with competitors in this area.
Challenges in retaining and attracting high-quality personnel.
Security or data privacy breaches, cyberattacks, and other unauthorized access to data.
Climate change risks, including physical and transitional risks, and the ability to manage such risks.
Commitments related to climate change that require significant investment and may be subject to revision based on external factors.
Dependence on third parties for distribution of the platform and provision of software.
Need for additional capital to support business growth, which may not be available on reasonable terms.
Risks associated with identifying, acquiring, and integrating businesses.
Legal and regulatory risks, including those related to payment and financial services.
Risks related to data collection, use, and processing.
Intellectual property protection and potential misappropriation claims.
Volatility in the market price of common stock and the potential inability to meet investor or analyst expectations.
Risks related to government regulation and oversight.
Risks from catastrophic events, including disease outbreaks, weather events, war, or terrorist attacks.

>> Add feedback during step? (press enter/leave blank to continue, and type 'exit' to stop):  

Thought: I need to use a tool to help me answer the question.
Action: sept_2022
Action Input: {'input': 'risk factors'}
Observation: Risk factors include:

- Software releases causing interruptions or negative experiences for platform users, potentially leading to loss of users, revenue, and legal or regulatory issues.
- Risks associated with the use of artificial intelligence, such as flawed algorithms, biased datasets, and potential legal and reputational harm due to unintentional bias and discrimination.
- Climate change risks, including physical risks like extreme weather events and transitional risks such as market shifts toward electric vehicles and lower carbon business models.
- Regulatory requirements related to climate change and the company's own climate commitments, which require significant investment and may be impacted by external factors.
- General economic risks, such as the impact of contagious disease outbreaks like the COVID-19 pandemic, which can reduce demand for services and affect travel behavior.
- The need to attract and maintain a critical mass of platform users, including Drivers, consumers, merchants, Shippers, and Carriers.
- The importance of maintaining and enhancing the company's brand and reputation.
- Challenges related to workplace culture and the need for effective management of growth.
- Safety incidents on the platform affecting the ability to attract and retain users.
- Investments in new offerings and technologies that are inherently risky.
- Dependence on trips in large metropolitan areas, which may be affected by various conditions.
- The potential failure to offer autonomous vehicle technologies or to compete effectively in that space.
- Security or data privacy breaches and cyberattacks.
- Dependence on third parties for distribution of the platform and software used in products.
- The need for additional capital to support business growth.
- Challenges related to identifying, acquiring, and integrating businesses.
- Legal and regulatory risks, including those related to payment and financial services.

>> Add feedback during step? (press enter/leave blank to continue, and type 'exit' to stop):  

Thought: I can answer without using any more tools.
Response: The risk factors mentioned in both June and September include the impact of the COVID-19 pandemic or future disease outbreaks, economic conditions affecting consumer spending, increases in costs, dependence on external infrastructures, potential engagement in inappropriate activities by platform users, substantial investments in new technologies, concentration of Gross Bookings in large metropolitan areas, risks associated with autonomous vehicle technologies, the need to retain high-quality personnel, security and data privacy breaches, cyberattacks, climate change risks, dependence on third parties, the need for additional capital, risks related to acquisitions and integration, legal and regulatory risks, government regulation and oversight, risks related to data collection and processing, challenges in protecting intellectual property, volatility in stock price, risks in certain jurisdictions, and the impact of catastrophic events.
>> Task marked as finished by the agent, executing task execution.
Agent: The risk factors mentioned in both June and September include the impact of the COVID-19 pandemic or future disease outbreaks, economic conditions affecting consumer spending, increases in costs, dependence on external infrastructures, potential engagement in inappropriate activities by platform users, substantial investments in new technologies, concentration of Gross Bookings in large metropolitan areas, risks associated with autonomous vehicle technologies, the need to retain high-quality personnel, security and data privacy breaches, cyberattacks, climate change risks, dependence on third parties, the need for additional capital, risks related to acquisitions and integration, legal and regulatory risks, government regulation and oversight, risks related to data collection and processing, challenges in protecting intellectual property, volatility in stock price, risks in certain jurisdictions, and the impact of catastrophic events.

>> Human:  exit