Token Predictors
Using our token predictors, we can predict the token usage of an operation before actually performing it.
We first show how to predict LLM token usage with the MockLLMPredictor class, see below. We then show how to also predict embedding token usage.
# My OpenAI Key
import os
os.environ['OPENAI_API_KEY'] = "INSERT OPENAI KEY"
Using MockLLMPredictor
Predicting Usage of GPT Tree Index
Here we predict usage of GPTTreeIndex during index construction and querying, without making any LLM calls.
NOTE: Predicting query usage before tree is built is only possible with GPTTreeIndex due to the nature of tree traversal. Results will be more accurate if GPTTreeIndex is actually built beforehand.
from llama_index import GPTTreeIndex, MockLLMPredictor, SimpleDirectoryReader, ServiceContext
documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
llm_predictor = MockLLMPredictor(max_tokens=256)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
index = GPTTreeIndex.from_documents(documents, service_context=service_context)
print(llm_predictor.last_token_usage)
19495
# default query
query_engine = index.as_query_engine(
service_context=service_context
)
response = query_engine.query("What did the author do growing up?")
print(llm_predictor.last_token_usage)
5493
Predicting Usage of GPT Keyword Table Index Query
Here we build a real keyword table index over the data, but then predict query usage.
from llama_index import GPTKeywordTableIndex, MockLLMPredictor, SimpleDirectoryReader
documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
index = GPTKeywordTableIndex.from_documents(documents=documents)
llm_predictor = MockLLMPredictor(max_tokens=256)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
query_engine = index.as_query_engine(
service_context=service_context
)
response = query_engine.query("What did the author do after his time at Y Combinator?")
print(llm_predictor.last_token_usage)
start token ct: 0
> Starting query: What did the author do after his time at Y Combinator?
query keywords: ['author', 'did', 'y', 'combinator', 'after', 'his', 'the', 'what', 'time', 'at', 'do']
Extracted keywords: ['combinator']
> Querying with idx: 3483810247393006047: of 2016 we moved to England. We wanted our kids...
> Querying with idx: 7597483754542696814: people edit code on our server through the brow...
> Querying with idx: 7572417251450701751: invited about 20 of the 225 groups to interview...
end token ct: 11313
> [query] Total token usage: 11313 tokens
11313
Predicting Usage of GPT List Index Query
Here we build a real list index over the data, but then predict query usage.
from llama_index import GPTListIndex, MockLLMPredictor, SimpleDirectoryReader
documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
index = GPTListIndex.from_documents(documents=documents)
llm_predictor = MockLLMPredictor(max_tokens=256)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
query_engine = index.as_query_engine(
service_context=service_context
)
response = query_engine.query("What did the author do after his time at Y Combinator?")
start token ct: 0
> Starting query: What did the author do after his time at Y Combinator?
end token ct: 23941
> [query] Total token usage: 23941 tokens
print(llm_predictor.last_token_usage)
23941
Using MockEmbedding
Predicting Usage of GPT Simple Vector Index
from llama_index import GPTVectorStoreIndex, MockLLMPredictor, MockEmbedding, SimpleDirectoryReader
documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
index = GPTVectorStoreIndex.from_documents(documents=documents)
llm_predictor = MockLLMPredictor(max_tokens=256)
embed_model = MockEmbedding(embed_dim=1536)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, embed_model=embed_model)
query_engine = index.as_query_engine(
service_context=service_context,
)
response = query_engine.query(
"What did the author do after his time at Y Combinator?",
)
> [query] Total LLM token usage: 4374 tokens
> [query] Total embedding token usage: 14 tokens