OpenAI Agent + Query Engine Experimental Cookbook

In this notebook, we try out the OpenAIAgent across a variety of query engine tools and datasets. We explore how OpenAIAgent can compare/replace existing workflows solved by our retrievers/query engines.

  • Auto retrieval

  • Joint SQL and vector search

AutoRetrieval from a Vector Database

Our existing β€œauto-retrieval” capabilities (in VectorIndexAutoRetriever) allow an LLM to infer the right query parameters for a vector database - including both the query string and metadata filter.

Since the OpenAI Function API can infer function parameters, we explore its capabilities in performing auto-retrieval here.

import pinecone
import os

api_key = os.environ["PINECONE_API_KEY"]
pinecone.init(api_key=api_key, environment="us-west1-gcp")
# dimensions are for text-embedding-ada-002
try:
    pinecone.create_index(
        "quickstart", dimension=1536, metric="euclidean", pod_type="p1"
    )
except Exception:
    # most likely index already exists
    pass
pinecone_index = pinecone.Index("quickstart")
# Optional: delete data in your pinecone index
pinecone_index.delete(deleteAll=True, namespace="test")
{}
from llama_index import VectorStoreIndex, StorageContext
from llama_index.vector_stores import PineconeVectorStore
from llama_index.schema import TextNode

nodes = [
    TextNode(
        text="Michael Jordan is a retired professional basketball player, widely regarded as one of the greatest basketball players of all time.",
        metadata={
            "category": "Sports",
            "country": "United States",
        },
    ),
    TextNode(
        text="Angelina Jolie is an American actress, filmmaker, and humanitarian. She has received numerous awards for her acting and is known for her philanthropic work.",
        metadata={
            "category": "Entertainment",
            "country": "United States",
        },
    ),
    TextNode(
        text="Elon Musk is a business magnate, industrial designer, and engineer. He is the founder, CEO, and lead designer of SpaceX, Tesla, Inc., Neuralink, and The Boring Company.",
        metadata={
            "category": "Business",
            "country": "United States",
        },
    ),
    TextNode(
        text="Rihanna is a Barbadian singer, actress, and businesswoman. She has achieved significant success in the music industry and is known for her versatile musical style.",
        metadata={
            "category": "Music",
            "country": "Barbados",
        },
    ),
    TextNode(
        text="Cristiano Ronaldo is a Portuguese professional footballer who is considered one of the greatest football players of all time. He has won numerous awards and set multiple records during his career.",
        metadata={
            "category": "Sports",
            "country": "Portugal",
        },
    ),
]
vector_store = PineconeVectorStore(pinecone_index=pinecone_index, namespace="test")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex(nodes, storage_context=storage_context)
Upserted vectors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5/5 [00:00<00:00,  9.61it/s]

Define Function Tool

Here we define the function interface, which is passed to OpenAI to perform auto-retrieval.

We were not able to get OpenAI to work with nested pydantic objects or tuples as arguments, so we converted the metadata filter keys and values into lists for the function API to work with.

# define function tool
from llama_index.tools import FunctionTool
from llama_index.vector_stores.types import (
    VectorStoreInfo,
    MetadataInfo,
    ExactMatchFilter,
    MetadataFilters,
)
from llama_index.retrievers import VectorIndexRetriever
from llama_index.query_engine import RetrieverQueryEngine

from typing import List, Tuple, Any
from pydantic import BaseModel, Field

# hardcode top k for now
top_k = 3

# define vector store info describing schema of vector store
vector_store_info = VectorStoreInfo(
    content_info="brief biography of celebrities",
    metadata_info=[
        MetadataInfo(
            name="category",
            type="str",
            description="Category of the celebrity, one of [Sports, Entertainment, Business, Music]",
        ),
        MetadataInfo(
            name="country",
            type="str",
            description="Country of the celebrity, one of [United States, Barbados, Portugal]",
        ),
    ],
)


# define pydantic model for auto-retrieval function
class AutoRetrieveModel(BaseModel):
    query: str = Field(..., description="natural language query string")
    filter_key_list: List[str] = Field(
        ..., description="List of metadata filter field names"
    )
    filter_value_list: List[str] = Field(
        ...,
        description=(
            "List of metadata filter field values (corresponding to names specified in filter_key_list)"
        ),
    )


def auto_retrieve_fn(
    query: str, filter_key_list: List[str], filter_value_list: List[str]
):
    """Auto retrieval function.

    Performs auto-retrieval from a vector database, and then applies a set of filters.

    """
    query = query or "Query"

    exact_match_filters = [
        ExactMatchFilter(key=k, value=v)
        for k, v in zip(filter_key_list, filter_value_list)
    ]
    retriever = VectorIndexRetriever(
        index, filters=MetadataFilters(filters=exact_match_filters), top_k=top_k
    )
    query_engine = RetrieverQueryEngine.from_args(retriever)

    response = query_engine.query(query)
    return str(response)


description = f"""\
Use this tool to look up biographical information about celebrities.
The vector database schema is given below:
{vector_store_info.json()}
"""

auto_retrieve_tool = FunctionTool.from_defaults(
    fn=auto_retrieve_fn,
    name="celebrity_bios",
    description=description,
    fn_schema=AutoRetrieveModel,
)

Initialize Agent

from llama_index.agent import OpenAIAgent
from llama_index.llms import OpenAI

agent = OpenAIAgent.from_tools(
    [auto_retrieve_tool], llm=OpenAI(temperature=0, model="gpt-4-0613"), verbose=True
)
response = agent.chat("Tell me about two celebrities from the United States. ")
print(str(response))
=== Calling Function ===
Calling function: celebrity_bios with args: {
  "query": "celebrities",
  "filter_key_list": ["country"],
  "filter_value_list": ["United States"]
}
Got output: 
Celebrities in the United States who are associated with Entertainment and Sports include Angelina Jolie and Michael Jordan.
========================
Angelina Jolie is an American actress, filmmaker, and humanitarian. She has received an Academy Award and three Golden Globe Awards, and has been cited as Hollywood's highest-paid actress. Jolie made her screen debut as a child alongside her father, Jon Voight, in Lookin' to Get Out (1982), and her film career began in earnest a decade later with the low-budget production Cyborg 2 (1993), followed by her first leading role in a major film, Hackers (1995).

Michael Jordan is a retired professional basketball player from the United States. He is widely regarded as one of the greatest basketball players in history. Jordan was one of the most effectively marketed athletes of his generation and was instrumental in popularizing the NBA around the world in the 1980s and 1990s. He played 15 seasons in the NBA, winning six championships with the Chicago Bulls.