Embeddingsο
Users have a few options to choose from when it comes to embeddings.
OpenAIEmbedding
: the default embedding class. Defaults to βtext-embedding-ada-002βLangchainEmbedding
: a wrapper around Langchainβs embedding models.
OpenAI embeddings file.
- llama_index.embeddings.openai.OAEMMο
alias of
OpenAIEmbeddingModeModel
- llama_index.embeddings.openai.OAEMTο
alias of
OpenAIEmbeddingModelType
- class llama_index.embeddings.openai.OpenAIEmbedding(mode: str = OpenAIEmbeddingMode.TEXT_SEARCH_MODE, model: str = OpenAIEmbeddingModelType.TEXT_EMBED_ADA_002, deployment_name: Optional[str] = None, embed_batch_size: int = 10, tokenizer: Optional[Callable] = None, callback_manager: Optional[CallbackManager] = None, **kwargs: Any)ο
OpenAI class for embeddings.
- Parameters
mode (str) β
Mode for embedding. Defaults to OpenAIEmbeddingMode.TEXT_SEARCH_MODE. Options are:
OpenAIEmbeddingMode.SIMILARITY_MODE
OpenAIEmbeddingMode.TEXT_SEARCH_MODE
model (str) β
Model for embedding. Defaults to OpenAIEmbeddingModelType.TEXT_EMBED_ADA_002. Options are:
OpenAIEmbeddingModelType.DAVINCI
OpenAIEmbeddingModelType.CURIE
OpenAIEmbeddingModelType.BABBAGE
OpenAIEmbeddingModelType.ADA
OpenAIEmbeddingModelType.TEXT_EMBED_ADA_002
deployment_name (Optional[str]) β Optional deployment of model. Defaults to None. If this value is not None, mode and model will be ignored. Only available for using AzureOpenAI.
- async aget_queued_text_embeddings(text_queue: List[Tuple[str, str]]) Tuple[List[str], List[List[float]]] ο
Asynchronously get a list of text embeddings.
Call async embedding API to get embeddings for all queued texts in parallel. Argument text_queue must be passed in to avoid updating it async.
- get_agg_embedding_from_queries(queries: List[str], agg_fn: Optional[Callable[[...], List[float]]] = None) List[float] ο
Get aggregated embedding from multiple queries.
- get_query_embedding(query: str) List[float] ο
Get query embedding.
- get_queued_text_embeddings() Tuple[List[str], List[List[float]]] ο
Get queued text embeddings.
Call embedding API to get embeddings for all queued texts.
- get_text_embedding(text: str) List[float] ο
Get text embedding.
- property last_token_usage: intο
Get the last token usage.
- queue_text_for_embedding(text_id: str, text: str) None ο
Queue text for embedding.
Used for batching texts during embedding calls.
- similarity(embedding1: List, embedding2: List, mode: SimilarityMode = SimilarityMode.DEFAULT) float ο
Get embedding similarity.
- property total_tokens_used: intο
Get the total tokens used so far.
- class llama_index.embeddings.openai.OpenAIEmbeddingModeModel(value)ο
OpenAI embedding mode model.
- class llama_index.embeddings.openai.OpenAIEmbeddingModelType(value)ο
OpenAI embedding model type.
- async llama_index.embeddings.openai.aget_embedding(text: str, engine: Optional[str] = None, **kwargs: Any) List[float] ο
Asynchronously get embedding.
NOTE: Copied from OpenAIβs embedding utils: https://github.com/openai/openai-python/blob/main/openai/embeddings_utils.py
Copied here to avoid importing unnecessary dependencies like matplotlib, plotly, scipy, sklearn.
- async llama_index.embeddings.openai.aget_embeddings(list_of_text: List[str], engine: Optional[str] = None, **kwargs: Any) List[List[float]] ο
Asynchronously get embeddings.
NOTE: Copied from OpenAIβs embedding utils: https://github.com/openai/openai-python/blob/main/openai/embeddings_utils.py
Copied here to avoid importing unnecessary dependencies like matplotlib, plotly, scipy, sklearn.
- llama_index.embeddings.openai.get_embedding(text: str, engine: Optional[str] = None, **kwargs: Any) List[float] ο
Get embedding.
NOTE: Copied from OpenAIβs embedding utils: https://github.com/openai/openai-python/blob/main/openai/embeddings_utils.py
Copied here to avoid importing unnecessary dependencies like matplotlib, plotly, scipy, sklearn.
- llama_index.embeddings.openai.get_embeddings(list_of_text: List[str], engine: Optional[str] = None, **kwargs: Any) List[List[float]] ο
Get embeddings.
NOTE: Copied from OpenAIβs embedding utils: https://github.com/openai/openai-python/blob/main/openai/embeddings_utils.py
Copied here to avoid importing unnecessary dependencies like matplotlib, plotly, scipy, sklearn.
- llama_index.embeddings.openai.get_engine(mode: str, model: str, mode_model_dict: Dict[Tuple[OpenAIEmbeddingMode, str], OpenAIEmbeddingModeModel]) OpenAIEmbeddingModeModel ο
Get engine.
We also introduce a LangchainEmbedding
class, which is a wrapper around Langchainβs embedding models.
A full list of embeddings can be found here.
Langchain Embedding Wrapper Module.
- class llama_index.embeddings.langchain.LangchainEmbedding(langchain_embedding: Embeddings, **kwargs: Any)ο
External embeddings (taken from Langchain).
- Parameters
langchain_embedding (langchain.embeddings.Embeddings) β Langchain embeddings class.
- async aget_queued_text_embeddings(text_queue: List[Tuple[str, str]]) Tuple[List[str], List[List[float]]] ο
Asynchronously get a list of text embeddings.
Call async embedding API to get embeddings for all queued texts in parallel. Argument text_queue must be passed in to avoid updating it async.
- get_agg_embedding_from_queries(queries: List[str], agg_fn: Optional[Callable[[...], List[float]]] = None) List[float] ο
Get aggregated embedding from multiple queries.
- get_query_embedding(query: str) List[float] ο
Get query embedding.
- get_queued_text_embeddings() Tuple[List[str], List[List[float]]] ο
Get queued text embeddings.
Call embedding API to get embeddings for all queued texts.
- get_text_embedding(text: str) List[float] ο
Get text embedding.
- property last_token_usage: intο
Get the last token usage.
- queue_text_for_embedding(text_id: str, text: str) None ο
Queue text for embedding.
Used for batching texts during embedding calls.
- similarity(embedding1: List, embedding2: List, mode: SimilarityMode = SimilarityMode.DEFAULT) float ο
Get embedding similarity.
- property total_tokens_used: intο
Get the total tokens used so far.