LLM Predictors

Init params.

class llama_index.llm_predictor.HuggingFaceLLMPredictor(max_input_size: int = 4096, max_new_tokens: int = 256, temperature: float = 0.7, do_sample: bool = False, system_prompt: str = '', query_wrapper_prompt: ~llama_index.prompts.prompts.SimpleInputPrompt = <llama_index.prompts.prompts.SimpleInputPrompt object>, tokenizer_name: str = 'StabilityAI/stablelm-tuned-alpha-3b', model_name: str = 'StabilityAI/stablelm-tuned-alpha-3b', model: ~typing.Optional[~typing.Any] = None, tokenizer: ~typing.Optional[~typing.Any] = None, device_map: str = 'auto', stopping_ids: ~typing.Optional[~typing.List[int]] = None, tokenizer_kwargs: ~typing.Optional[dict] = None, model_kwargs: ~typing.Optional[dict] = None)

Huggingface Specific LLM predictor class.

Wrapper around an LLMPredictor to provide streamlined access to HuggingFace models.

Parameters
  • llm (Optional[langchain.llms.base.LLM]) – LLM from Langchain to use for predictions. Defaults to OpenAI’s text-davinci-003 model. Please see Langchain’s LLM Page for more details.

  • retry_on_throttling (bool) – Whether to retry on rate limit errors. Defaults to true.

async apredict(prompt: Prompt, **prompt_args: Any) Tuple[str, str]

Async predict the answer to a query.

Parameters

prompt (Prompt) – Prompt to use for prediction.

Returns

Tuple of the predicted answer and the formatted prompt.

Return type

Tuple[str, str]

get_llm_metadata() LLMMetadata

Get LLM metadata.

property last_token_usage: int

Get the last token usage.

predict(prompt: Prompt, **prompt_args: Any) Tuple[str, str]

Predict the answer to a query.

Parameters

prompt (Prompt) – Prompt to use for prediction.

Returns

Tuple of the predicted answer and the formatted prompt.

Return type

Tuple[str, str]

stream(prompt: Prompt, **prompt_args: Any) Tuple[Generator, str]

Stream the answer to a query.

NOTE: this is a beta feature. Will try to build or use better abstractions about response handling.

Parameters

prompt (Prompt) – Prompt to use for prediction.

Returns

The predicted answer.

Return type

str

property total_tokens_used: int

Get the total tokens used so far.

class llama_index.llm_predictor.LLMPredictor(llm: Optional[BaseLanguageModel] = None, retry_on_throttling: bool = True, cache: Optional[BaseCache] = None)

LLM predictor class.

Wrapper around an LLMChain from Langchain.

Parameters
  • llm (Optional[langchain.llms.base.LLM]) –

    LLM from Langchain to use for predictions. Defaults to OpenAI’s text-davinci-003 model. Please see Langchain’s LLM Page for more details.

  • retry_on_throttling (bool) – Whether to retry on rate limit errors. Defaults to true.

  • cache (Optional[langchain.cache.BaseCache]) – use cached result for LLM

async apredict(prompt: Prompt, **prompt_args: Any) Tuple[str, str]

Async predict the answer to a query.

Parameters

prompt (Prompt) – Prompt to use for prediction.

Returns

Tuple of the predicted answer and the formatted prompt.

Return type

Tuple[str, str]

get_llm_metadata() LLMMetadata

Get LLM metadata.

property last_token_usage: int

Get the last token usage.

property llm: BaseLanguageModel

Get LLM.

predict(prompt: Prompt, **prompt_args: Any) Tuple[str, str]

Predict the answer to a query.

Parameters

prompt (Prompt) – Prompt to use for prediction.

Returns

Tuple of the predicted answer and the formatted prompt.

Return type

Tuple[str, str]

stream(prompt: Prompt, **prompt_args: Any) Tuple[Generator, str]

Stream the answer to a query.

NOTE: this is a beta feature. Will try to build or use better abstractions about response handling.

Parameters

prompt (Prompt) – Prompt to use for prediction.

Returns

The predicted answer.

Return type

str

property total_tokens_used: int

Get the total tokens used so far.

class llama_index.llm_predictor.StructuredLLMPredictor(llm: Optional[BaseLanguageModel] = None, retry_on_throttling: bool = True, cache: Optional[BaseCache] = None)

Structured LLM predictor class.

Parameters

llm_predictor (BaseLLMPredictor) – LLM Predictor to use.

async apredict(prompt: Prompt, **prompt_args: Any) Tuple[str, str]

Async predict the answer to a query.

Parameters

prompt (Prompt) – Prompt to use for prediction.

Returns

Tuple of the predicted answer and the formatted prompt.

Return type

Tuple[str, str]

get_llm_metadata() LLMMetadata

Get LLM metadata.

property last_token_usage: int

Get the last token usage.

property llm: BaseLanguageModel

Get LLM.

predict(prompt: Prompt, **prompt_args: Any) Tuple[str, str]

Predict the answer to a query.

Parameters

prompt (Prompt) – Prompt to use for prediction.

Returns

Tuple of the predicted answer and the formatted prompt.

Return type

Tuple[str, str]

stream(prompt: Prompt, **prompt_args: Any) Tuple[Generator, str]

Stream the answer to a query.

NOTE: this is a beta feature. Will try to build or use better abstractions about response handling.

Parameters

prompt (Prompt) – Prompt to use for prediction.

Returns

The predicted answer.

Return type

str

property total_tokens_used: int

Get the total tokens used so far.