Finetuningο
Finetuning modules.
- class llama_index.finetuning.EmbeddingAdapterFinetuneEngine(dataset: EmbeddingQAFinetuneDataset, embed_model: BaseEmbedding, batch_size: int = 10, epochs: int = 1, adapter_model: Optional[Any] = None, dim: Optional[int] = None, device: Optional[str] = None, model_output_path: str = 'model_output', model_checkpoint_path: Optional[str] = None, checkpoint_save_steps: int = 100, verbose: bool = False, bias: bool = False, **train_kwargs: Any)ο
Embedding adapter finetune engine.
- Parameters
dataset (EmbeddingQAFinetuneDataset) β Dataset to finetune on.
embed_model (BaseEmbedding) β Embedding model to finetune.
batch_size (Optional[int]) β Batch size. Defaults to 10.
epochs (Optional[int]) β Number of epochs. Defaults to 1.
dim (Optional[int]) β Dimension of embedding. Defaults to None.
adapter_model (Optional[BaseAdapter]) β Adapter model. Defaults to None, in which case a linear adapter is used.
device (Optional[str]) β Device to use. Defaults to None.
model_output_path (str) β Path to save model output. Defaults to βmodel_outputβ.
model_checkpoint_path (Optional[str]) β Path to save model checkpoints. Defaults to None (donβt save checkpoints).
verbose (bool) β Whether to show progress bar. Defaults to False.
bias (bool) β Whether to use bias. Defaults to False.
- finetune(**train_kwargs: Any) None ο
Finetune.
- classmethod from_model_path(dataset: EmbeddingQAFinetuneDataset, embed_model: BaseEmbedding, model_path: str, model_cls: Optional[Type[Any]] = None, **kwargs: Any) EmbeddingAdapterFinetuneEngine ο
Load from model path.
- Parameters
dataset (EmbeddingQAFinetuneDataset) β Dataset to finetune on.
embed_model (BaseEmbedding) β Embedding model to finetune.
model_path (str) β Path to model.
model_cls (Optional[Type[Any]]) β Adapter model class. Defaults to None.
**kwargs (Any) β Additional kwargs (see __init__)
- get_finetuned_model(**model_kwargs: Any) BaseEmbedding ο
Get finetuned model.
- smart_batching_collate(batch: List) Tuple[Any, Any] ο
Smart batching collate.
- pydantic model llama_index.finetuning.EmbeddingQAFinetuneDatasetο
Embedding QA Finetuning Dataset.
- Parameters
queries (Dict[str, str]) β Dict id -> query.
corpus (Dict[str, str]) β Dict id -> string.
relevant_docs (Dict[str, List[str]]) β Dict query id -> list of doc ids.
Show JSON schema
{ "title": "EmbeddingQAFinetuneDataset", "description": "Embedding QA Finetuning Dataset.\n\nArgs:\n queries (Dict[str, str]): Dict id -> query.\n corpus (Dict[str, str]): Dict id -> string.\n relevant_docs (Dict[str, List[str]]): Dict query id -> list of doc ids.", "type": "object", "properties": { "queries": { "title": "Queries", "type": "object", "additionalProperties": { "type": "string" } }, "corpus": { "title": "Corpus", "type": "object", "additionalProperties": { "type": "string" } }, "relevant_docs": { "title": "Relevant Docs", "type": "object", "additionalProperties": { "type": "array", "items": { "type": "string" } } } }, "required": [ "queries", "corpus", "relevant_docs" ] }
- Fields
corpus (Dict[str, str])
queries (Dict[str, str])
relevant_docs (Dict[str, List[str]])
- field corpus: Dict[str, str] [Required]ο
- field queries: Dict[str, str] [Required]ο
- field relevant_docs: Dict[str, List[str]] [Required]ο
- classmethod construct(_fields_set: Optional[SetStr] = None, **values: Any) Model ο
Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed. Behaves as if Config.extra = βallowβ was set since it adds all passed values
- copy(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, update: Optional[DictStrAny] = None, deep: bool = False) Model ο
Duplicate a model, optionally choose which fields to include, exclude and change.
- Parameters
include β fields to include in new model
exclude β fields to exclude from new model, as with values this takes precedence over include
update β values to change/add in the new model. Note: the data is not validated before creating the new model: you should trust this data
deep β set to True to make a deep copy of the model
- Returns
new model instance
- dict(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False) DictStrAny ο
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
- classmethod from_json(path: str) EmbeddingQAFinetuneDataset ο
Load json.
- classmethod from_orm(obj: Any) Model ο
- json(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, encoder: Optional[Callable[[Any], Any]] = None, models_as_dict: bool = True, **dumps_kwargs: Any) unicode ο
Generate a JSON representation of the model, include and exclude arguments as per dict().
encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().
- classmethod parse_file(path: Union[str, Path], *, content_type: unicode = None, encoding: unicode = 'utf8', proto: Protocol = None, allow_pickle: bool = False) Model ο
- classmethod parse_obj(obj: Any) Model ο
- classmethod parse_raw(b: Union[str, bytes], *, content_type: unicode = None, encoding: unicode = 'utf8', proto: Protocol = None, allow_pickle: bool = False) Model ο
- save_json(path: str) None ο
Save json.
- classmethod schema(by_alias: bool = True, ref_template: unicode = '#/definitions/{model}') DictStrAny ο
- classmethod schema_json(*, by_alias: bool = True, ref_template: unicode = '#/definitions/{model}', **dumps_kwargs: Any) unicode ο
- classmethod update_forward_refs(**localns: Any) None ο
Try to update ForwardRefs on fields based on this Model, globalns and localns.
- classmethod validate(value: Any) Model ο
- class llama_index.finetuning.OpenAIFinetuneEngine(base_model: str, data_path: str, verbose: bool = False, start_job_id: Optional[str] = None)ο
OpenAI Finetuning Engine.
- finetune() None ο
Finetune model.
- classmethod from_finetuning_handler(finetuning_handler: OpenAIFineTuningHandler, base_model: str, data_path: str, **kwargs: Any) OpenAIFinetuneEngine ο
Initialize from finetuning handler.
Used to finetune an OpenAI model into another OpenAI model (e.g. gpt-3.5-turbo on top of GPT-4).
- get_current_job() Any ο
Get current job.
- class llama_index.finetuning.SentenceTransformersFinetuneEngine(dataset: EmbeddingQAFinetuneDataset, model_id: str = 'BAAI/bge-small-en', model_output_path: str = 'exp_finetune', batch_size: int = 10, val_dataset: Optional[EmbeddingQAFinetuneDataset] = None, loss: Optional[Any] = None, epochs: int = 2, show_progress_bar: bool = True, evaluation_steps: int = 50)ο
Sentence Transformers Finetune Engine.
- finetune(**train_kwargs: Any) None ο
Finetune model.
- get_finetuned_model(**model_kwargs: Any) BaseEmbedding ο
Gets finetuned model.
- llama_index.finetuning.generate_qa_embedding_pairs(nodes: List[TextNode], llm: Optional[LLM] = None, qa_generate_prompt_tmpl: str = 'Context information is below.\n\n---------------------\n{context_str}\n---------------------\n\nGiven the context information and not prior knowledge.\ngenerate only questions based on the below query.\n\nYou are a Teacher/ Professor. Your task is to setup {num_questions_per_chunk} questions for an upcoming quiz/examination. The questions should be diverse in nature across the document. Restrict the questions to the context information provided."\n', num_questions_per_chunk: int = 2) EmbeddingQAFinetuneDataset ο
Generate examples given a set of nodes.