Token Predictors

Using our token predictors, we can predict the token usage of an operation before actually performing it.

We first show how to predict LLM token usage with the MockLLMPredictor class, see below. We then show how to also predict embedding token usage.

# My OpenAI Key
import os
os.environ['OPENAI_API_KEY'] = "INSERT OPENAI KEY"

Using MockLLMPredictor

Predicting Usage of GPT Tree Index

Here we predict usage of GPTTreeIndex during index construction and querying, without making any LLM calls.

NOTE: Predicting query usage before tree is built is only possible with GPTTreeIndex due to the nature of tree traversal. Results will be more accurate if GPTTreeIndex is actually built beforehand.

from llama_index import GPTTreeIndex, MockLLMPredictor, SimpleDirectoryReader, ServiceContext
documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
llm_predictor = MockLLMPredictor(max_tokens=256)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
index = GPTTreeIndex.from_documents(documents, service_context=service_context)
print(llm_predictor.last_token_usage)
19495
# default query
query_engine = index.as_query_engine(
    service_context=service_context
)
response = query_engine.query("What did the author do growing up?")
print(llm_predictor.last_token_usage)
5493

Predicting Usage of GPT Keyword Table Index Query

Here we build a real keyword table index over the data, but then predict query usage.

from llama_index import GPTKeywordTableIndex, MockLLMPredictor, SimpleDirectoryReader
documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
index = GPTKeywordTableIndex.from_documents(documents=documents)
llm_predictor = MockLLMPredictor(max_tokens=256)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
query_engine = index.as_query_engine(
    service_context=service_context
)
response = query_engine.query("What did the author do after his time at Y Combinator?")
print(llm_predictor.last_token_usage)
start token ct: 0
> Starting query: What did the author do after his time at Y Combinator?
query keywords: ['author', 'did', 'y', 'combinator', 'after', 'his', 'the', 'what', 'time', 'at', 'do']
Extracted keywords: ['combinator']
> Querying with idx: 3483810247393006047: of 2016 we moved to England. We wanted our kids...
> Querying with idx: 7597483754542696814: people edit code on our server through the brow...
> Querying with idx: 7572417251450701751: invited about 20 of the 225 groups to interview...
end token ct: 11313
> [query] Total token usage: 11313 tokens
11313

Predicting Usage of GPT List Index Query

Here we build a real list index over the data, but then predict query usage.

from llama_index import GPTListIndex, MockLLMPredictor, SimpleDirectoryReader
documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
index = GPTListIndex.from_documents(documents=documents)
llm_predictor = MockLLMPredictor(max_tokens=256)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
query_engine = index.as_query_engine(
    service_context=service_context
)
response = query_engine.query("What did the author do after his time at Y Combinator?")
start token ct: 0
> Starting query: What did the author do after his time at Y Combinator?
end token ct: 23941
> [query] Total token usage: 23941 tokens
print(llm_predictor.last_token_usage)
23941

Using MockEmbedding

Predicting Usage of GPT Simple Vector Index

from llama_index import GPTVectorStoreIndex, MockLLMPredictor, MockEmbedding, SimpleDirectoryReader
documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
index = GPTVectorStoreIndex.from_documents(documents=documents)
llm_predictor = MockLLMPredictor(max_tokens=256)
embed_model = MockEmbedding(embed_dim=1536)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, embed_model=embed_model)
query_engine = index.as_query_engine(
    service_context=service_context,
)
response = query_engine.query(
    "What did the author do after his time at Y Combinator?",
)
> [query] Total LLM token usage: 4374 tokens
> [query] Total embedding token usage: 14 tokens