LLM Predictorsο
Init params.
- class llama_index.llm_predictor.HuggingFaceLLMPredictor(max_input_size: int = 4096, max_new_tokens: int = 256, temperature: float = 0.7, do_sample: bool = False, system_prompt: str = '', query_wrapper_prompt: ~llama_index.prompts.prompts.SimpleInputPrompt = <llama_index.prompts.prompts.SimpleInputPrompt object>, tokenizer_name: str = 'StabilityAI/stablelm-tuned-alpha-3b', model_name: str = 'StabilityAI/stablelm-tuned-alpha-3b', model: ~typing.Optional[~typing.Any] = None, tokenizer: ~typing.Optional[~typing.Any] = None, device_map: str = 'auto', stopping_ids: ~typing.Optional[~typing.List[int]] = None, tokenizer_kwargs: ~typing.Optional[dict] = None, model_kwargs: ~typing.Optional[dict] = None)ο
Huggingface Specific LLM predictor class.
Wrapper around an LLMPredictor to provide streamlined access to HuggingFace models.
- Parameters
llm (Optional[langchain.llms.base.LLM]) β LLM from Langchain to use for predictions. Defaults to OpenAIβs text-davinci-003 model. Please see Langchainβs LLM Page for more details.
retry_on_throttling (bool) β Whether to retry on rate limit errors. Defaults to true.
- async apredict(prompt: Prompt, **prompt_args: Any) Tuple[str, str] ο
Async predict the answer to a query.
- Parameters
prompt (Prompt) β Prompt to use for prediction.
- Returns
Tuple of the predicted answer and the formatted prompt.
- Return type
Tuple[str, str]
- get_llm_metadata() LLMMetadata ο
Get LLM metadata.
- property last_token_usage: intο
Get the last token usage.
- predict(prompt: Prompt, **prompt_args: Any) Tuple[str, str] ο
Predict the answer to a query.
- Parameters
prompt (Prompt) β Prompt to use for prediction.
- Returns
Tuple of the predicted answer and the formatted prompt.
- Return type
Tuple[str, str]
- stream(prompt: Prompt, **prompt_args: Any) Tuple[Generator, str] ο
Stream the answer to a query.
NOTE: this is a beta feature. Will try to build or use better abstractions about response handling.
- Parameters
prompt (Prompt) β Prompt to use for prediction.
- Returns
The predicted answer.
- Return type
str
- property total_tokens_used: intο
Get the total tokens used so far.
- class llama_index.llm_predictor.LLMPredictor(llm: Optional[BaseLanguageModel] = None, retry_on_throttling: bool = True, cache: Optional[BaseCache] = None)ο
LLM predictor class.
Wrapper around an LLMChain from Langchain.
- Parameters
llm (Optional[langchain.llms.base.LLM]) β
LLM from Langchain to use for predictions. Defaults to OpenAIβs text-davinci-003 model. Please see Langchainβs LLM Page for more details.
retry_on_throttling (bool) β Whether to retry on rate limit errors. Defaults to true.
cache (Optional[langchain.cache.BaseCache]) β use cached result for LLM
- async apredict(prompt: Prompt, **prompt_args: Any) Tuple[str, str] ο
Async predict the answer to a query.
- Parameters
prompt (Prompt) β Prompt to use for prediction.
- Returns
Tuple of the predicted answer and the formatted prompt.
- Return type
Tuple[str, str]
- get_llm_metadata() LLMMetadata ο
Get LLM metadata.
- property last_token_usage: intο
Get the last token usage.
- property llm: BaseLanguageModelο
Get LLM.
- predict(prompt: Prompt, **prompt_args: Any) Tuple[str, str] ο
Predict the answer to a query.
- Parameters
prompt (Prompt) β Prompt to use for prediction.
- Returns
Tuple of the predicted answer and the formatted prompt.
- Return type
Tuple[str, str]
- stream(prompt: Prompt, **prompt_args: Any) Tuple[Generator, str] ο
Stream the answer to a query.
NOTE: this is a beta feature. Will try to build or use better abstractions about response handling.
- Parameters
prompt (Prompt) β Prompt to use for prediction.
- Returns
The predicted answer.
- Return type
str
- property total_tokens_used: intο
Get the total tokens used so far.
- class llama_index.llm_predictor.StructuredLLMPredictor(llm: Optional[BaseLanguageModel] = None, retry_on_throttling: bool = True, cache: Optional[BaseCache] = None)ο
Structured LLM predictor class.
- Parameters
llm_predictor (BaseLLMPredictor) β LLM Predictor to use.
- async apredict(prompt: Prompt, **prompt_args: Any) Tuple[str, str] ο
Async predict the answer to a query.
- Parameters
prompt (Prompt) β Prompt to use for prediction.
- Returns
Tuple of the predicted answer and the formatted prompt.
- Return type
Tuple[str, str]
- get_llm_metadata() LLMMetadata ο
Get LLM metadata.
- property last_token_usage: intο
Get the last token usage.
- property llm: BaseLanguageModelο
Get LLM.
- predict(prompt: Prompt, **prompt_args: Any) Tuple[str, str] ο
Predict the answer to a query.
- Parameters
prompt (Prompt) β Prompt to use for prediction.
- Returns
Tuple of the predicted answer and the formatted prompt.
- Return type
Tuple[str, str]
- stream(prompt: Prompt, **prompt_args: Any) Tuple[Generator, str] ο
Stream the answer to a query.
NOTE: this is a beta feature. Will try to build or use better abstractions about response handling.
- Parameters
prompt (Prompt) β Prompt to use for prediction.
- Returns
The predicted answer.
- Return type
str
- property total_tokens_used: intο
Get the total tokens used so far.