Vector Store

Vector stores.

class llama_index.vector_stores.AwaDBVectorStore(table_name: str = 'llamaindex_awadb', log_and_data_dir: Optional[str] = None, **kwargs: Any)

AwaDB vector store.

In this vector store, embeddings are stored within a AwaDB table.

During query time, the index uses AwaDB to query for the top k most similar nodes.

Parameters

chroma_collection (chromadb.api.models.Collection.Collection) – ChromaDB collection instance

add(nodes: List[BaseNode]) List[str]

Add nodes to AwaDB.

Parameters

nodes – List[BaseNode]: list of nodes with embeddings

Returns

Added node ids

async adelete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id. NOTE: this is not implemented for all vector stores. If not implemented, it will just call delete synchronously.

async aquery(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Asynchronously query vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call query synchronously.

async async_add(nodes: List[BaseNode]) List[str]

Asynchronously add nodes with embedding to vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call add synchronously.

property client: Any

Get AwaDB client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

Parameters

ref_doc_id (str) – The doc_id of the document to delete.

Returns

None

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

Parameters

query – vector store query

Returns

Query results

Return type

VectorStoreQueryResult

class llama_index.vector_stores.BagelVectorStore(collection: Any, **kwargs: Any)

Vector store for Bagel.

add(nodes: List[BaseNode], **kwargs: Any) List[str]

Add a list of nodes with embeddings to the vector store.

Parameters
  • nodes – List of nodes with embeddings.

  • kwargs – Additional arguments.

Returns

List of document ids.

async adelete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id. NOTE: this is not implemented for all vector stores. If not implemented, it will just call delete synchronously.

async aquery(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Asynchronously query vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call query synchronously.

async async_add(nodes: List[BaseNode]) List[str]

Asynchronously add nodes with embedding to vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call add synchronously.

property client: Any

Get the Bagel cluster.

delete(ref_doc_id: str, **kwargs: Any) None

Delete a document from the vector store.

Parameters
  • ref_doc_id – Reference document id.

  • kwargs – Additional arguments.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query the vector store.

Parameters
  • query – Query to run.

  • kwargs – Additional arguments.

Returns

Query result.

class llama_index.vector_stores.CassandraVectorStore(session: Any, keyspace: str, table: str, embedding_dimension: int, ttl_seconds: Optional[int] = None, insertion_batch_size: int = 20)

Cassandra Vector Store.

An abstraction of a Cassandra table with vector-similarity-search. Documents, and their embeddings, are stored in a Cassandra table and a vector-capable index is used for searches. The table does not need to exist beforehand: if necessary it will be created behind the scenes.

All Cassandra operations are done through the cassIO library.

Parameters
  • session (cassandra.cluster.Session) – the Cassandra session to use

  • keyspace (str) – name of the Cassandra keyspace to work in

  • table (str) – table name to use. If not existing, it will be created.

  • embedding_dimension (int) – length of the embedding vectors in use.

  • ttl_seconds (Optional[int]) – expiration time for inserted entries. Default is no expiration.

add(nodes: List[BaseNode]) List[str]

Add nodes to index.

Args

nodes: List[BaseNode]: list of node with embeddings

async adelete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id. NOTE: this is not implemented for all vector stores. If not implemented, it will just call delete synchronously.

async aquery(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Asynchronously query vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call query synchronously.

async async_add(nodes: List[BaseNode]) List[str]

Asynchronously add nodes with embedding to vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call add synchronously.

property client: Any

Return the underlying cassIO vector table object

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

Parameters

ref_doc_id (str) – The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

Supported query modes: β€˜default’ (most similar vectors) and β€˜mmr’.

Parameters

query (VectorStoreQuery) –

the basic query definition. Defines: mode (VectorStoreQueryMode): one of the supported modes query_embedding (List[float]): query embedding to search against similarity_top_k (int): top k most similar nodes mmr_threshold (Optional[float]): this is the 0-to-1 MMR lambda.

If present, takes precedence over the kwargs parameter. Ignored unless for MMR queries.

Args for query.mode == β€˜mmr’ (ignored otherwise):
mmr_threshold (Optional[float]): this is the 0-to-1 lambda for MMR.

Note that in principle mmr_threshold could come in the query

mmr_prefetch_factor (Optional[float]): factor applied to top_k

for prefetch pool size. Defaults to 4.0

mmr_prefetch_k (Optional[int]): prefetch pool size. This cannot be

passed together with mmr_prefetch_factor

class llama_index.vector_stores.ChatGPTRetrievalPluginClient(endpoint_url: str, bearer_token: Optional[str] = None, retries: Optional[Retry] = None, batch_size: int = 100, **kwargs: Any)

ChatGPT Retrieval Plugin Client.

In this client, we make use of the endpoints defined by ChatGPT.

Parameters
  • endpoint_url (str) – URL of the ChatGPT Retrieval Plugin.

  • bearer_token (Optional[str]) – Bearer token for the ChatGPT Retrieval Plugin.

  • retries (Optional[Retry]) – Retry object for the ChatGPT Retrieval Plugin.

  • batch_size (int) – Batch size for the ChatGPT Retrieval Plugin.

add(nodes: List[BaseNode]) List[str]

Add nodes to index.

async adelete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id. NOTE: this is not implemented for all vector stores. If not implemented, it will just call delete synchronously.

async aquery(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Asynchronously query vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call query synchronously.

async async_add(nodes: List[BaseNode]) List[str]

Asynchronously add nodes with embedding to vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call add synchronously.

property client: None

Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

Parameters

ref_doc_id (str) – The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Get nodes for response.

pydantic model llama_index.vector_stores.ChromaVectorStore

Chroma vector store.

In this vector store, embeddings are stored within a ChromaDB collection.

During query time, the index uses ChromaDB to query for the top k most similar nodes.

Parameters

chroma_collection (chromadb.api.models.Collection.Collection) – ChromaDB collection instance

Show JSON schema
{
   "title": "ChromaVectorStore",
   "description": "Chroma vector store.\n\nIn this vector store, embeddings are stored within a ChromaDB collection.\n\nDuring query time, the index uses ChromaDB to query for the top\nk most similar nodes.\n\nArgs:\n    chroma_collection (chromadb.api.models.Collection.Collection):\n        ChromaDB collection instance",
   "type": "object",
   "properties": {
      "stores_text": {
         "title": "Stores Text",
         "default": true,
         "type": "boolean"
      },
      "is_embedding_query": {
         "title": "Is Embedding Query",
         "default": true,
         "type": "boolean"
      },
      "flat_metadata": {
         "title": "Flat Metadata",
         "default": true,
         "type": "boolean"
      },
      "host": {
         "title": "Host",
         "type": "string"
      },
      "port": {
         "title": "Port",
         "type": "string"
      },
      "ssl": {
         "title": "Ssl",
         "type": "boolean"
      },
      "headers": {
         "title": "Headers",
         "type": "object",
         "additionalProperties": {
            "type": "string"
         }
      },
      "collection_kwargs": {
         "title": "Collection Kwargs",
         "type": "object"
      }
   },
   "required": [
      "ssl"
   ]
}

Fields
  • collection_kwargs (Dict[str, Any])

  • flat_metadata (bool)

  • headers (Optional[Dict[str, str]])

  • host (Optional[str])

  • is_embedding_query (bool)

  • port (Optional[str])

  • ssl (bool)

  • stores_text (bool)

field collection_kwargs: Dict[str, Any] [Optional]
field flat_metadata: bool = True
field headers: Optional[Dict[str, str]] = None
field host: Optional[str] = None
field is_embedding_query: bool = True
field port: Optional[str] = None
field ssl: bool [Required]
field stores_text: bool = True
add(nodes: List[BaseNode]) List[str]

Add nodes to index.

Args

nodes: List[BaseNode]: list of nodes with embeddings

async adelete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id. NOTE: this is not implemented for all vector stores. If not implemented, it will just call delete synchronously.

async aquery(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Asynchronously query vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call query synchronously.

async async_add(nodes: List[BaseNode]) List[str]

Asynchronously add nodes to vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call add synchronously.

classmethod class_name() str

Get class name.

classmethod construct(_fields_set: Optional[SetStr] = None, **values: Any) Model

Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed. Behaves as if Config.extra = β€˜allow’ was set since it adds all passed values

copy(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, update: Optional[DictStrAny] = None, deep: bool = False) Model

Duplicate a model, optionally choose which fields to include, exclude and change.

Parameters
  • include – fields to include in new model

  • exclude – fields to exclude from new model, as with values this takes precedence over include

  • update – values to change/add in the new model. Note: the data is not validated before creating the new model: you should trust this data

  • deep – set to True to make a deep copy of the model

Returns

new model instance

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

Parameters

ref_doc_id (str) – The doc_id of the document to delete.

dict(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False) DictStrAny

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

classmethod from_dict(data: Dict[str, Any], **kwargs: Any) Self
classmethod from_json(data_str: str, **kwargs: Any) Self
classmethod from_orm(obj: Any) Model
classmethod from_params(collection_name: str, host: Optional[str] = None, port: Optional[str] = None, ssl: bool = False, headers: Optional[Dict[str, str]] = None, collection_kwargs: Optional[dict] = None, **kwargs: Any) ChromaVectorStore
json(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, encoder: Optional[Callable[[Any], Any]] = None, models_as_dict: bool = True, **dumps_kwargs: Any) unicode

Generate a JSON representation of the model, include and exclude arguments as per dict().

encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().

classmethod parse_file(path: Union[str, Path], *, content_type: unicode = None, encoding: unicode = 'utf8', proto: Protocol = None, allow_pickle: bool = False) Model
classmethod parse_obj(obj: Any) Model
classmethod parse_raw(b: Union[str, bytes], *, content_type: unicode = None, encoding: unicode = 'utf8', proto: Protocol = None, allow_pickle: bool = False) Model
persist(persist_path: str, fs: Optional[AbstractFileSystem] = None) None
query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

Parameters
  • query_embedding (List[float]) – query embedding

  • similarity_top_k (int) – top k most similar nodes

classmethod schema(by_alias: bool = True, ref_template: unicode = '#/definitions/{model}') DictStrAny
classmethod schema_json(*, by_alias: bool = True, ref_template: unicode = '#/definitions/{model}', **dumps_kwargs: Any) unicode
to_dict(**kwargs: Any) Dict[str, Any]
to_json(**kwargs: Any) str
classmethod update_forward_refs(**localns: Any) None

Try to update ForwardRefs on fields based on this Model, globalns and localns.

classmethod validate(value: Any) Model
property client: Any

Return client.

class llama_index.vector_stores.CognitiveSearchVectorStore(search_or_index_client: Any, id_field_key: str, chunk_field_key: str, embedding_field_key: str, metadata_string_field_key: str, doc_id_field_key: str, filterable_metadata_field_keys: Optional[Union[List[str], Dict[str, str], Dict[str, Tuple[str, MetadataIndexFieldType]]]] = None, index_name: Optional[str] = None, index_mapping: Optional[Callable[[Dict[str, str], Dict[str, Any]], Dict[str, str]]] = None, index_management: IndexManagement = IndexManagement.NO_VALIDATION, embedding_dimensionality: int = 1536, **kwargs: Any)
add(nodes: List[BaseNode]) List[str]

Add nodes to index associated with the configured search client.

Args

nodes: List[BaseNode]: nodes with embeddings

async adelete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id. NOTE: this is not implemented for all vector stores. If not implemented, it will just call delete synchronously.

async aquery(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Asynchronously query vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call query synchronously.

async async_add(nodes: List[BaseNode]) List[str]

Asynchronously add nodes with embedding to vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call add synchronously.

property client: Any

Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete documents from the Cognitive Search Index with doc_id_field_key field equal to ref_doc_id.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query vector store.

class llama_index.vector_stores.DeepLakeVectorStore(dataset_path: str = 'llama_index', token: Optional[str] = None, read_only: Optional[bool] = False, ingestion_batch_size: int = 1024, ingestion_num_workers: int = 4, overwrite: bool = False, exec_option: str = 'python', verbose: bool = True, **kwargs: Any)

The DeepLake Vector Store.

In this vector store we store the text, its embedding and a few pieces of its metadata in a deeplake dataset. This implemnetation allows the use of an already existing deeplake dataset if it is one that was created this vector store. It also supports creating a new one if the dataset doesnt exist or if overwrite is set to True.

add(nodes: List[BaseNode]) List[str]

Add the embeddings and their nodes into DeepLake.

Parameters

nodes (List[BaseNode]) – List of nodes with embeddings to insert.

Returns

List of ids inserted.

Return type

List[str]

async adelete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id. NOTE: this is not implemented for all vector stores. If not implemented, it will just call delete synchronously.

async aquery(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Asynchronously query vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call query synchronously.

async async_add(nodes: List[BaseNode]) List[str]

Asynchronously add nodes with embedding to vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call add synchronously.

property client: Any

Get client.

Returns

DeepLake vectorstore dataset.

Return type

Any

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

Parameters

ref_doc_id (str) – The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

Parameters

query (VectorStoreQuery) – VectorStoreQuery class input, it has the following attributes: 1. query_embedding (List[float]): query embedding 2. similarity_top_k (int): top k most similar nodes

Returns

VectorStoreQueryResult

class llama_index.vector_stores.DocArrayHnswVectorStore(work_dir: str, dim: int = 1536, dist_metric: Literal['cosine', 'ip', 'l2'] = 'cosine', max_elements: int = 1024, ef_construction: int = 200, ef: int = 10, M: int = 16, allow_replace_deleted: bool = True, num_threads: int = 1)

Class representing a DocArray HNSW vector store.

This class is a lightweight Document Index implementation provided by Docarray. It stores vectors on disk in hnswlib, and stores all other data in SQLite.

add(nodes: List[BaseNode]) List[str]

Adds nodes to the vector store.

Parameters

nodes (List[BaseNode]) – List of nodes with embeddings.

Returns

List of document IDs added to the vector store.

Return type

List[str]

async adelete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id. NOTE: this is not implemented for all vector stores. If not implemented, it will just call delete synchronously.

async aquery(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Asynchronously query vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call query synchronously.

async async_add(nodes: List[BaseNode]) List[str]

Asynchronously add nodes with embedding to vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call add synchronously.

property client: Any

Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Deletes a document from the vector store.

Parameters
  • ref_doc_id (str) – Document ID to be deleted.

  • **delete_kwargs (Any) – Additional arguments to pass to the delete method.

num_docs() int

Retrieves the number of documents in the index.

Returns

The number of documents in the index.

Return type

int

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Queries the vector store and retrieves the results.

Parameters

query (VectorStoreQuery) – Query for the vector store.

Returns

Result of the query from vector store.

Return type

VectorStoreQueryResult

class llama_index.vector_stores.DocArrayInMemoryVectorStore(index_path: Optional[str] = None, metric: Literal['cosine_sim', 'euclidian_dist', 'sgeuclidean_dist'] = 'cosine_sim')

Class representing a DocArray In-Memory vector store.

This class is a document index provided by Docarray that stores documents in memory.

add(nodes: List[BaseNode]) List[str]

Adds nodes to the vector store.

Parameters

nodes (List[BaseNode]) – List of nodes with embeddings.

Returns

List of document IDs added to the vector store.

Return type

List[str]

async adelete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id. NOTE: this is not implemented for all vector stores. If not implemented, it will just call delete synchronously.

async aquery(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Asynchronously query vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call query synchronously.

async async_add(nodes: List[BaseNode]) List[str]

Asynchronously add nodes with embedding to vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call add synchronously.

property client: Any

Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Deletes a document from the vector store.

Parameters
  • ref_doc_id (str) – Document ID to be deleted.

  • **delete_kwargs (Any) – Additional arguments to pass to the delete method.

num_docs() int

Retrieves the number of documents in the index.

Returns

The number of documents in the index.

Return type

int

persist(persist_path: str, fs: Optional[AbstractFileSystem] = None) None

Persists the in-memory vector store to a file.

Parameters
  • persist_path (str) – The path to persist the index.

  • fs (fsspec.AbstractFileSystem, optional) – Filesystem to persist to. (doesn’t apply)

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Queries the vector store and retrieves the results.

Parameters

query (VectorStoreQuery) – Query for the vector store.

Returns

Result of the query from vector store.

Return type

VectorStoreQueryResult

class llama_index.vector_stores.ElasticsearchStore(index_name: str, es_client: Optional[Any] = None, es_url: Optional[str] = None, es_cloud_id: Optional[str] = None, es_api_key: Optional[str] = None, es_user: Optional[str] = None, es_password: Optional[str] = None, text_field: str = 'content', vector_field: str = 'embedding', batch_size: int = 200, distance_strategy: Optional[Literal['COSINE', 'DOT_PRODUCT', 'EUCLIDEAN_DISTANCE']] = 'COSINE')

Elasticsearch vector store.

Parameters
  • index_name – Name of the Elasticsearch index.

  • es_client – Optional. Pre-existing AsyncElasticsearch client.

  • es_url – Optional. Elasticsearch URL.

  • es_cloud_id – Optional. Elasticsearch cloud ID.

  • es_api_key – Optional. Elasticsearch API key.

  • es_user – Optional. Elasticsearch username.

  • es_password – Optional. Elasticsearch password.

  • text_field – Optional. Name of the Elasticsearch field that stores the text.

  • vector_field – Optional. Name of the Elasticsearch field that stores the embedding.

  • batch_size – Optional. Batch size for bulk indexing. Defaults to 200.

  • distance_strategy – Optional. Distance strategy to use for similarity search. Defaults to β€œCOSINE”.

Raises
  • ConnectionError – If AsyncElasticsearch client cannot connect to Elasticsearch.

  • ValueError – If neither es_client nor es_url nor es_cloud_id is provided.

add(nodes: List[BaseNode], *, create_index_if_not_exists: bool = True) List[str]

Add nodes to Elasticsearch index.

Parameters
  • nodes – List of nodes with embeddings.

  • create_index_if_not_exists – Optional. Whether to create the Elasticsearch index if it doesn’t already exist. Defaults to True.

Returns

List of node IDs that were added to the index.

Raises
  • ImportError – If elasticsearch[β€˜async’] python package is not installed.

  • BulkIndexError – If AsyncElasticsearch async_bulk indexing fails.

async adelete(ref_doc_id: str, **delete_kwargs: Any) None

Async delete node from Elasticsearch index.

Parameters
  • ref_doc_id – ID of the node to delete.

  • delete_kwargs – Optional. Additional arguments to pass to AsyncElasticsearch delete_by_query.

Raises

Exception – If AsyncElasticsearch delete_by_query fails.

async aquery(query: VectorStoreQuery, custom_query: Optional[Callable[[Dict, Optional[VectorStoreQuery]], Dict]] = None, es_filter: Optional[List[Dict]] = None, **kwargs: Any) VectorStoreQueryResult

Asynchronous query index for top k most similar nodes.

Parameters
  • query_embedding (VectorStoreQuery) – query embedding

  • custom_query – Optional. custom query function that takes in the es query body and returns a modified query body. This can be used to add additional query parameters to the AsyncElasticsearch query.

  • es_filter – Optional. AsyncElasticsearch filter to apply to the query. If filter is provided in the query, this filter will be ignored.

Returns

Result of the query.

Return type

VectorStoreQueryResult

Raises

Exception – If AsyncElasticsearch query fails.

async async_add(nodes: List[BaseNode], *, create_index_if_not_exists: bool = True) List[str]

Asynchronous method to add nodes to Elasticsearch index.

Parameters
  • nodes – List of nodes with embeddings.

  • create_index_if_not_exists – Optional. Whether to create the AsyncElasticsearch index if it doesn’t already exist. Defaults to True.

Returns

List of node IDs that were added to the index.

Raises
  • ImportError – If elasticsearch python package is not installed.

  • BulkIndexError – If AsyncElasticsearch async_bulk indexing fails.

property client: Any

Get async elasticsearch client

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete node from Elasticsearch index.

Parameters
  • ref_doc_id – ID of the node to delete.

  • delete_kwargs – Optional. Additional arguments to pass to Elasticsearch delete_by_query.

Raises

Exception – If Elasticsearch delete_by_query fails.

static get_user_agent() str

Get user agent for elasticsearch client

query(query: VectorStoreQuery, custom_query: Optional[Callable[[Dict, Optional[VectorStoreQuery]], Dict]] = None, es_filter: Optional[List[Dict]] = None, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

Parameters
  • query_embedding (List[float]) – query embedding

  • custom_query – Optional. custom query function that takes in the es query body and returns a modified query body. This can be used to add additional query parameters to the Elasticsearch query.

  • es_filter – Optional. Elasticsearch filter to apply to the query. If filter is provided in the query, this filter will be ignored.

Returns

Result of the query.

Return type

VectorStoreQueryResult

Raises

Exception – If Elasticsearch query fails.

class llama_index.vector_stores.EpsillaVectorStore(client: Any, collection_name: str = 'llama_collection', db_path: Optional[str] = './storage', db_name: Optional[str] = 'llama_db', dimension: Optional[int] = None, overwrite: bool = False, **kwargs: Any)

The Epsilla Vector Store.

In this vector store we store the text, its embedding and a few pieces of its metadata in a Epsilla collection. This implemnetation allows the use of an already existing collection. It also supports creating a new one if the collection does not exist or if overwrite is set to True.

As a prerequisite, you need to install pyepsilla package and have a running Epsilla vector database (for example, through our docker image) See the following documentation for how to run an Epsilla vector database: https://epsilla-inc.gitbook.io/epsilladb/quick-start

Parameters
  • client (Any) – Epsilla client to connect to.

  • collection_name (Optional[str]) – Which collection to use. Defaults to β€œllama_collection”.

  • db_path (Optional[str]) – The path where the database will be persisted. Defaults to β€œ/tmp/langchain-epsilla”.

  • db_name (Optional[str]) – Give a name to the loaded database. Defaults to β€œlangchain_store”.

  • dimension (Optional[int]) – The dimension of the embeddings. If not provided, collection creation will be done on first insert. Defaults to None.

  • overwrite (Optional[bool]) – Whether to overwrite existing collection with same name. Defaults to False.

Returns

Vectorstore that supports add, delete, and query.

Return type

EpsillaVectorStore

add(nodes: List[BaseNode]) List[str]

Add nodes to Epsilla vector store.

Args

nodes: List[BaseNode]: list of nodes with embeddings

Returns

List of ids inserted.

Return type

List[str]

async adelete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id. NOTE: this is not implemented for all vector stores. If not implemented, it will just call delete synchronously.

async aquery(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Asynchronously query vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call query synchronously.

async async_add(nodes: List[BaseNode]) List[str]

Asynchronously add nodes with embedding to vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call add synchronously.

client() Any

Return the Epsilla client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

Parameters

ref_doc_id (str) – The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

Parameters

query (VectorStoreQuery) – query.

Returns

Vector store query result.

class llama_index.vector_stores.FaissVectorStore(faiss_index: Any)

Faiss Vector Store.

Embeddings are stored within a Faiss index.

During query time, the index uses Faiss to query for the top k embeddings, and returns the corresponding indices.

Parameters

faiss_index (faiss.Index) – Faiss index instance

add(nodes: List[BaseNode]) List[str]

Add nodes to index.

NOTE: in the Faiss vector store, we do not store text in Faiss.

Args

nodes: List[BaseNode]: list of nodes with embeddings

async adelete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id. NOTE: this is not implemented for all vector stores. If not implemented, it will just call delete synchronously.

async aquery(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Asynchronously query vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call query synchronously.

async async_add(nodes: List[BaseNode]) List[str]

Asynchronously add nodes with embedding to vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call add synchronously.

property client: Any

Return the faiss index.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

Parameters

ref_doc_id (str) – The doc_id of the document to delete.

persist(persist_path: str = './storage/vector_store.json', fs: Optional[AbstractFileSystem] = None) None

Save to file.

This method saves the vector store to disk.

Parameters

persist_path (str) – The save_path of the file.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

Parameters
  • query_embedding (List[float]) – query embedding

  • similarity_top_k (int) – top k most similar nodes

class llama_index.vector_stores.LanceDBVectorStore(uri: str, table_name: str = 'vectors', nprobes: int = 20, refine_factor: Optional[int] = None, **kwargs: Any)

The LanceDB Vector Store.

Stores text and embeddings in LanceDB. The vector store will open an existing

LanceDB dataset or create the dataset if it does not exist.

Parameters
  • uri (str, required) – Location where LanceDB will store its files.

  • table_name (str, optional) – The table name where the embeddings will be stored. Defaults to β€œvectors”.

  • nprobes (int, optional) – The number of probes used. A higher number makes search more accurate but also slower. Defaults to 20.

  • refine_factor – (int, optional): Refine the results by reading extra elements and re-ranking them in memory. Defaults to None

Raises

ImportError – Unable to import lancedb.

Returns

VectorStore that supports creating LanceDB datasets and

querying it.

Return type

LanceDBVectorStore

add(nodes: List[BaseNode]) List[str]

Add nodes with embedding to vector store.

async adelete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id. NOTE: this is not implemented for all vector stores. If not implemented, it will just call delete synchronously.

async aquery(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Asynchronously query vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call query synchronously.

async async_add(nodes: List[BaseNode]) List[str]

Asynchronously add nodes with embedding to vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call add synchronously.

property client: None

Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

Parameters

ref_doc_id (str) – The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

pydantic model llama_index.vector_stores.MetadataFilters

Metadata filters for vector stores.

Currently only supports exact match filters. TODO: support more advanced expressions.

Show JSON schema
{
   "title": "MetadataFilters",
   "description": "Metadata filters for vector stores.\n\nCurrently only supports exact match filters.\nTODO: support more advanced expressions.",
   "type": "object",
   "properties": {
      "filters": {
         "title": "Filters",
         "type": "array",
         "items": {
            "$ref": "#/definitions/ExactMatchFilter"
         }
      }
   },
   "required": [
      "filters"
   ],
   "definitions": {
      "ExactMatchFilter": {
         "title": "ExactMatchFilter",
         "description": "Exact match metadata filter for vector stores.\n\nValue uses Strict* types, as int, float and str are compatible types and were all\nconverted to string before.\n\nSee: https://docs.pydantic.dev/latest/usage/types/#strict-types",
         "type": "object",
         "properties": {
            "key": {
               "title": "Key",
               "type": "string"
            },
            "value": {
               "title": "Value",
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "number"
                  },
                  {
                     "type": "string"
                  }
               ]
            }
         },
         "required": [
            "key",
            "value"
         ]
      }
   }
}

Fields
field filters: List[ExactMatchFilter] [Required]
classmethod construct(_fields_set: Optional[SetStr] = None, **values: Any) Model

Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed. Behaves as if Config.extra = β€˜allow’ was set since it adds all passed values

copy(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, update: Optional[DictStrAny] = None, deep: bool = False) Model

Duplicate a model, optionally choose which fields to include, exclude and change.

Parameters
  • include – fields to include in new model

  • exclude – fields to exclude from new model, as with values this takes precedence over include

  • update – values to change/add in the new model. Note: the data is not validated before creating the new model: you should trust this data

  • deep – set to True to make a deep copy of the model

Returns

new model instance

dict(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False) DictStrAny

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

classmethod from_dict(filter_dict: Dict) MetadataFilters

Create MetadataFilters from json.

classmethod from_orm(obj: Any) Model
json(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, encoder: Optional[Callable[[Any], Any]] = None, models_as_dict: bool = True, **dumps_kwargs: Any) unicode

Generate a JSON representation of the model, include and exclude arguments as per dict().

encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().

classmethod parse_file(path: Union[str, Path], *, content_type: unicode = None, encoding: unicode = 'utf8', proto: Protocol = None, allow_pickle: bool = False) Model
classmethod parse_obj(obj: Any) Model
classmethod parse_raw(b: Union[str, bytes], *, content_type: unicode = None, encoding: unicode = 'utf8', proto: Protocol = None, allow_pickle: bool = False) Model
classmethod schema(by_alias: bool = True, ref_template: unicode = '#/definitions/{model}') DictStrAny
classmethod schema_json(*, by_alias: bool = True, ref_template: unicode = '#/definitions/{model}', **dumps_kwargs: Any) unicode
classmethod update_forward_refs(**localns: Any) None

Try to update ForwardRefs on fields based on this Model, globalns and localns.

classmethod validate(value: Any) Model
class llama_index.vector_stores.MetalVectorStore(api_key: str, client_id: str, index_id: str)
add(nodes: List[BaseNode]) List[str]

Add nodes to index.

Args

nodes: List[BaseNode]: list of nodes with embeddings.

async adelete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id. NOTE: this is not implemented for all vector stores. If not implemented, it will just call delete synchronously.

async aquery(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Asynchronously query vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call query synchronously.

async async_add(nodes: List[BaseNode]) List[str]

Asynchronously add nodes with embedding to vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call add synchronously.

property client: Any

Return Metal client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

Parameters

ref_doc_id (str) – The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query vector store.

class llama_index.vector_stores.MilvusVectorStore(uri: str = 'http://localhost:19530', token: str = '', collection_name: str = 'llamalection', dim: Optional[int] = None, embedding_field: str = 'embedding', doc_id_field: str = 'doc_id', similarity_metric: str = 'IP', consistency_level: str = 'Strong', overwrite: bool = False, text_key: Optional[str] = None, **kwargs: Any)

The Milvus Vector Store.

In this vector store we store the text, its embedding and a its metadata in a Milvus collection. This implementation allows the use of an already existing collection. It also supports creating a new one if the collection doesnt exist or if overwrite is set to True.

Parameters
  • uri (str, optional) – The URI to connect to, comes in the form of β€œhttp://address:port”.

  • token (str, optional) – The token for log in. Empty if not using rbac, if using rbac it will most likely be β€œusername:password”.

  • collection_name (str, optional) – The name of the collection where data will be stored. Defaults to β€œllamalection”.

  • dim (int, optional) – The dimension of the embedding vectors for the collection. Required if creating a new colletion.

  • embedding_field (str, optional) – The name of the embedding field for the collection, defaults to DEFAULT_EMBEDDING_KEY.

  • doc_id_field (str, optional) – The name of the doc_id field for the collection, defaults to DEFAULT_DOC_ID_KEY.

  • similarity_metric (str, optional) – The similarity metric to use, currently supports IP and L2.

  • consistency_level (str, optional) – Which consistency level to use for a newly created collection. Defaults to β€œSession”.

  • overwrite (bool, optional) – Whether to overwrite existing collection with same name. Defaults to False.

  • text_key (str, optional) – What key text is stored in in the passed collection. Used when bringing your own collection. Defaults to None.

Raises
  • ImportError – Unable to import pymilvus.

  • MilvusException – Error communicating with Milvus, more can be found in logging under Debug.

Returns

Vectorstore that supports add, delete, and query.

Return type

MilvusVectorstore

add(nodes: List[BaseNode]) List[str]

Add the embeddings and their nodes into Milvus.

Parameters

nodes (List[BaseNode]) – List of nodes with embeddings to insert.

Raises

MilvusException – Failed to insert data.

Returns

List of ids inserted.

Return type

List[str]

async adelete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id. NOTE: this is not implemented for all vector stores. If not implemented, it will just call delete synchronously.

async aquery(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Asynchronously query vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call query synchronously.

async async_add(nodes: List[BaseNode]) List[str]

Asynchronously add nodes with embedding to vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call add synchronously.

property client: Any

Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

Parameters

ref_doc_id (str) – The doc_id of the document to delete.

Raises

MilvusException – Failed to delete the doc.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

Parameters
  • query_embedding (List[float]) – query embedding

  • similarity_top_k (int) – top k most similar nodes

  • doc_ids (Optional[List[str]]) – list of doc_ids to filter by

  • node_ids (Optional[List[str]]) – list of node_ids to filter by

  • output_fields (Optional[List[str]]) – list of fields to return

  • embedding_field (Optional[str]) – name of embedding field

class llama_index.vector_stores.MyScaleVectorStore(myscale_client: Optional[Any] = None, table: str = 'llama_index', database: str = 'default', index_type: str = 'MSTG', metric: str = 'cosine', batch_size: int = 32, index_params: Optional[dict] = None, search_params: Optional[dict] = None, service_context: Optional[ServiceContext] = None, **kwargs: Any)

MyScale Vector Store.

In this vector store, embeddings and docs are stored within an existing MyScale cluster.

During query time, the index uses MyScale to query for the top k most similar nodes.

Parameters
  • myscale_client (httpclient) – clickhouse-connect httpclient of an existing MyScale cluster.

  • table (str, optional) – The name of the MyScale table where data will be stored. Defaults to β€œllama_index”.

  • database (str, optional) – The name of the MyScale database where data will be stored. Defaults to β€œdefault”.

  • index_type (str, optional) – The type of the MyScale vector index. Defaults to β€œIVFFLAT”.

  • metric (str, optional) – The metric type of the MyScale vector index. Defaults to β€œcosine”.

  • batch_size (int, optional) – the size of documents to insert. Defaults to 32.

  • index_params (dict, optional) – The index parameters for MyScale. Defaults to None.

  • search_params (dict, optional) – The search parameters for a MyScale query. Defaults to None.

  • service_context (ServiceContext, optional) – Vector store service context. Defaults to None

add(nodes: List[BaseNode]) List[str]

Add nodes to index.

Args

nodes: List[BaseNode]: list of nodes with embeddings

async adelete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id. NOTE: this is not implemented for all vector stores. If not implemented, it will just call delete synchronously.

async aquery(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Asynchronously query vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call query synchronously.

async async_add(nodes: List[BaseNode]) List[str]

Asynchronously add nodes with embedding to vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call add synchronously.

property client: Any

Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

Parameters

ref_doc_id (str) – The doc_id of the document to delete.

drop() None

Drop MyScale Index and table

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

Parameters

query (VectorStoreQuery) – query

class llama_index.vector_stores.Neo4jVectorStore(username: str, password: str, url: str, embedding_dimension: int, database: str = 'neo4j', index_name: str = 'vector', node_label: str = 'Chunk', embedding_node_property: str = 'embedding', text_node_property: str = 'text', distance_strategy: str = 'cosine', retrieval_query: str = '', **kwargs: Any)
add(nodes: List[BaseNode]) List[str]

Add nodes with embedding to vector store.

async adelete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id. NOTE: this is not implemented for all vector stores. If not implemented, it will just call delete synchronously.

async aquery(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Asynchronously query vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call query synchronously.

async async_add(nodes: List[BaseNode]) List[str]

Asynchronously add nodes with embedding to vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call add synchronously.

property client: Any

Get client.

create_new_index() None

This method constructs a Cypher query and executes it to create a new vector index in Neo4j.

database_query(query: str, params: Optional[dict] = None) List[Dict[str, Any]]

This method sends a Cypher query to the connected Neo4j database and returns the results as a list of dictionaries.

Parameters
  • query (str) – The Cypher query to execute.

  • params (dict, optional) – Dictionary of query parameters. Defaults to {}.

Returns

List of dictionaries containing the query results.

Return type

List[Dict[str, Any]]

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query vector store.

retrieve_existing_index() bool

Check if the vector index exists in the Neo4j database and returns its embedding dimension.

This method queries the Neo4j database for existing indexes and attempts to retrieve the dimension of the vector index with the specified name. If the index exists, its dimension is returned. If the index doesn’t exist, None is returned.

Returns

The embedding dimension of the existing index if found.

Return type

int or None

class llama_index.vector_stores.OpensearchVectorClient(endpoint: str, index: str, dim: int, embedding_field: str = 'embedding', text_field: str = 'content', method: Optional[dict] = None, **kwargs: Any)

Object encapsulating an Opensearch index that has vector search enabled.

If the index does not yet exist, it is created during init. Therefore, the underlying index is assumed to either: 1) not exist yet or 2) be created due to previous usage of this class.

Parameters
  • endpoint (str) – URL (http/https) of elasticsearch endpoint

  • index (str) – Name of the elasticsearch index

  • dim (int) – Dimension of the vector

  • embedding_field (str) – Name of the field in the index to store embedding array in.

  • text_field (str) – Name of the field to grab text from

  • method (Optional[dict]) – Opensearch β€œmethod” JSON obj for configuring the KNN index. This includes engine, metric, and other config params. Defaults to: {β€œname”: β€œhnsw”, β€œspace_type”: β€œl2”, β€œengine”: β€œfaiss”, β€œparameters”: {β€œef_construction”: 256, β€œm”: 48}}

  • **kwargs – Optional arguments passed to the OpenSearch client from opensearch-py.

delete_doc_id(doc_id: str) None

Delete a document.

Parameters

doc_id (str) – document id

index_results(nodes: List[BaseNode], **kwargs: Any) List[str]

Store results in the index.

knn(query_embedding: List[float], k: int, filters: Optional[MetadataFilters] = None) VectorStoreQueryResult

Do knn search.

If there are no filters do approx-knn search. If there are (pre)-filters, do an exhaustive exact knn search using β€˜painless

scripting’.

Note that approximate knn search does not support pre-filtering.

Parameters
Returns

Up to k docs closest to query_embedding

class llama_index.vector_stores.OpensearchVectorStore(client: OpensearchVectorClient)

Elasticsearch/Opensearch vector store.

Parameters

client (OpensearchVectorClient) – Vector index client to use for data insertion/querying.

add(nodes: List[BaseNode]) List[str]

Add nodes to index.

Args

nodes: List[BaseNode]: list of nodes with embeddings.

async adelete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id. NOTE: this is not implemented for all vector stores. If not implemented, it will just call delete synchronously.

async aquery(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Asynchronously query vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call query synchronously.

async async_add(nodes: List[BaseNode]) List[str]

Asynchronously add nodes with embedding to vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call add synchronously.

property client: Any

Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

Parameters

ref_doc_id (str) – The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

Parameters

query_embedding (List[float]) – query embedding

pydantic model llama_index.vector_stores.PGVectorStore

Show JSON schema
{
   "title": "PGVectorStore",
   "description": "Abstract vector store protocol.",
   "type": "object",
   "properties": {
      "stores_text": {
         "title": "Stores Text",
         "default": true,
         "type": "boolean"
      },
      "is_embedding_query": {
         "title": "Is Embedding Query",
         "default": true,
         "type": "boolean"
      },
      "connection_string": {
         "title": "Connection String",
         "type": "string"
      },
      "async_connection_string": {
         "title": "Async Connection String",
         "type": "string"
      },
      "table_name": {
         "title": "Table Name",
         "type": "string"
      },
      "embed_dim": {
         "title": "Embed Dim",
         "type": "integer"
      },
      "hybrid_search": {
         "title": "Hybrid Search",
         "type": "boolean"
      },
      "text_search_config": {
         "title": "Text Search Config",
         "type": "string"
      },
      "debug": {
         "title": "Debug",
         "type": "boolean"
      },
      "flat_metadata": {
         "title": "Flat Metadata",
         "default": false,
         "type": "boolean"
      }
   },
   "required": [
      "connection_string",
      "async_connection_string",
      "table_name",
      "embed_dim",
      "hybrid_search",
      "text_search_config",
      "debug"
   ]
}

Fields
  • async_connection_string (str)

  • connection_string (str)

  • debug (bool)

  • embed_dim (int)

  • hybrid_search (bool)

  • is_embedding_query (bool)

  • stores_text (bool)

  • table_name (str)

  • text_search_config (str)

field async_connection_string: str [Required]
field connection_string: str [Required]
field debug: bool [Required]
field embed_dim: int [Required]
field is_embedding_query: bool = True
field stores_text: bool = True
field table_name: str [Required]
field text_search_config: str [Required]
class Select(*entities: _ColumnsClauseArgument[Any])

Represents a SELECT statement.

The _sql.Select object is normally constructed using the _sql.select() function. See that function for details.

See also

_sql.select()

tutorial_selecting_data - in the 2.0 tutorial

add_columns(*entities: _ColumnsClauseArgument[Any]) Select[Any]

Return a new _expression.select() construct with the given entities appended to its columns clause.

E.g.:

my_select = my_select.add_columns(table.c.new_column)

The original expressions in the columns clause remain in place. To replace the original expressions with new ones, see the method _expression.Select.with_only_columns().

Parameters

*entities – column, table, or other entity expressions to be added to the columns clause

See also

_expression.Select.with_only_columns() - replaces existing expressions rather than appending.

orm_queryguide_select_multiple_entities - ORM-centric example

add_cte(*ctes: CTE, nest_here: bool = False) Self

Add one or more _sql.CTE constructs to this statement.

This method will associate the given _sql.CTE constructs with the parent statement such that they will each be unconditionally rendered in the WITH clause of the final statement, even if not referenced elsewhere within the statement or any sub-selects.

The optional :paramref:`.HasCTE.add_cte.nest_here` parameter when set to True will have the effect that each given _sql.CTE will render in a WITH clause rendered directly along with this statement, rather than being moved to the top of the ultimate rendered statement, even if this statement is rendered as a subquery within a larger statement.

This method has two general uses. One is to embed CTE statements that serve some purpose without being referenced explicitly, such as the use case of embedding a DML statement such as an INSERT or UPDATE as a CTE inline with a primary statement that may draw from its results indirectly. The other is to provide control over the exact placement of a particular series of CTE constructs that should remain rendered directly in terms of a particular statement that may be nested in a larger statement.

E.g.:

from sqlalchemy import table, column, select
t = table('t', column('c1'), column('c2'))

ins = t.insert().values({"c1": "x", "c2": "y"}).cte()

stmt = select(t).add_cte(ins)

Would render:

WITH anon_1 AS
(INSERT INTO t (c1, c2) VALUES (:param_1, :param_2))
SELECT t.c1, t.c2
FROM t

Above, the β€œanon_1” CTE is not referred towards in the SELECT statement, however still accomplishes the task of running an INSERT statement.

Similarly in a DML-related context, using the PostgreSQL _postgresql.Insert construct to generate an β€œupsert”:

from sqlalchemy import table, column
from sqlalchemy.dialects.postgresql import insert

t = table("t", column("c1"), column("c2"))

delete_statement_cte = (
    t.delete().where(t.c.c1 < 1).cte("deletions")
)

insert_stmt = insert(t).values({"c1": 1, "c2": 2})
update_statement = insert_stmt.on_conflict_do_update(
    index_elements=[t.c.c1],
    set_={
        "c1": insert_stmt.excluded.c1,
        "c2": insert_stmt.excluded.c2,
    },
).add_cte(delete_statement_cte)

print(update_statement)

The above statement renders as:

WITH deletions AS
(DELETE FROM t WHERE t.c1 < %(c1_1)s)
INSERT INTO t (c1, c2) VALUES (%(c1)s, %(c2)s)
ON CONFLICT (c1) DO UPDATE SET c1 = excluded.c1, c2 = excluded.c2

New in version 1.4.21.

Parameters
  • *ctes –

    zero or more CTE constructs.

    Changed in version 2.0: Multiple CTE instances are accepted

  • nest_here –

    if True, the given CTE or CTEs will be rendered as though they specified the :paramref:`.HasCTE.cte.nesting` flag to True when they were added to this HasCTE. Assuming the given CTEs are not referenced in an outer-enclosing statement as well, the CTEs given should render at the level of this statement when this flag is given.

    New in version 2.0.

alias(name: Optional[str] = None, flat: bool = False) Subquery

Return a named subquery against this _expression.SelectBase.

For a _expression.SelectBase (as opposed to a _expression.FromClause), this returns a Subquery object which behaves mostly the same as the _expression.Alias object that is used with a _expression.FromClause.

Changed in version 1.4: The _expression.SelectBase.alias() method is now a synonym for the _expression.SelectBase.subquery() method.

as_scalar() ScalarSelect[Any]

Deprecated since version 1.4: The _expression.SelectBase.as_scalar() method is deprecated and will be removed in a future release. Please refer to _expression.SelectBase.scalar_subquery().

property c: ReadOnlyColumnCollection[str, KeyedColumnElement[Any]]

Deprecated since version 1.4: The _expression.SelectBase.c and _expression.SelectBase.columns attributes are deprecated and will be removed in a future release; these attributes implicitly create a subquery that should be explicit. Please call _expression.SelectBase.subquery() first in order to create a subquery, which then contains this attribute. To access the columns that this SELECT object SELECTs from, use the _expression.SelectBase.selected_columns attribute.

column(column: _ColumnsClauseArgument[Any]) Select[Any]

Return a new _expression.select() construct with the given column expression added to its columns clause.

Deprecated since version 1.4: The _expression.Select.column() method is deprecated and will be removed in a future release. Please use _expression.Select.add_columns()

E.g.:

my_select = my_select.column(table.c.new_column)

See the documentation for _expression.Select.with_only_columns() for guidelines on adding /replacing the columns of a _expression.Select object.

property column_descriptions: Any

Return a plugin-enabled β€˜column descriptions’ structure referring to the columns which are SELECTed by this statement.

This attribute is generally useful when using the ORM, as an extended structure which includes information about mapped entities is returned. The section queryguide_inspection contains more background.

For a Core-only statement, the structure returned by this accessor is derived from the same objects that are returned by the Select.selected_columns accessor, formatted as a list of dictionaries which contain the keys name, type and expr, which indicate the column expressions to be selected:

>>> stmt = select(user_table)
>>> stmt.column_descriptions
[
    {
        'name': 'id',
        'type': Integer(),
        'expr': Column('id', Integer(), ...)},
    {
        'name': 'name',
        'type': String(length=30),
        'expr': Column('name', String(length=30), ...)}
]

Changed in version 1.4.33: The Select.column_descriptions attribute returns a structure for a Core-only set of entities, not just ORM-only entities.

See also

UpdateBase.entity_description - entity information for an insert(), update(), or delete()

queryguide_inspection - ORM background

property columns_clause_froms: List[FromClause]

Return the set of _expression.FromClause objects implied by the columns clause of this SELECT statement.

New in version 1.4.23.

See also

_sql.Select.froms - β€œfinal” FROM list taking the full statement into account

_sql.Select.with_only_columns() - makes use of this collection to set up a new FROM list

compare(other: ClauseElement, **kw: Any) bool

Compare this _expression.ClauseElement to the given _expression.ClauseElement.

Subclasses should override the default behavior, which is a straight identity comparison.

**kw are arguments consumed by subclass compare() methods and may be used to modify the criteria for comparison (see _expression.ColumnElement).

compile(bind: Optional[Union[Engine, Connection]] = None, dialect: Optional[Dialect] = None, **kw: Any) Compiled

Compile this SQL expression.

The return value is a Compiled object. Calling str() or unicode() on the returned value will yield a string representation of the result. The Compiled object also can return a dictionary of bind parameter names and values using the params accessor.

Parameters
  • bind – An Connection or Engine which can provide a Dialect in order to generate a Compiled object. If the bind and dialect parameters are both omitted, a default SQL compiler is used.

  • column_keys – Used for INSERT and UPDATE statements, a list of column names which should be present in the VALUES clause of the compiled statement. If None, all columns from the target table object are rendered.

  • dialect – A Dialect instance which can generate a Compiled object. This argument takes precedence over the bind argument.

  • compile_kwargs –

    optional dictionary of additional parameters that will be passed through to the compiler within all β€œvisit” methods. This allows any custom flag to be passed through to a custom compilation construct, for example. It is also used for the case of passing the literal_binds flag through:

    from sqlalchemy.sql import table, column, select
    
    t = table('t', column('x'))
    
    s = select(t).where(t.c.x == 5)
    
    print(s.compile(compile_kwargs={"literal_binds": True}))
    

See also

faq_sql_expression_string

correlate(*fromclauses: Union[Literal[None, False], _FromClauseArgument]) Self

Return a new _expression.Select which will correlate the given FROM clauses to that of an enclosing _expression.Select.

Calling this method turns off the _expression.Select object’s default behavior of β€œauto-correlation”. Normally, FROM elements which appear in a _expression.Select that encloses this one via its WHERE clause, ORDER BY, HAVING or columns clause will be omitted from this _expression.Select object’s FROM clause. Setting an explicit correlation collection using the _expression.Select.correlate() method provides a fixed list of FROM objects that can potentially take place in this process.

When _expression.Select.correlate() is used to apply specific FROM clauses for correlation, the FROM elements become candidates for correlation regardless of how deeply nested this _expression.Select object is, relative to an enclosing _expression.Select which refers to the same FROM object. This is in contrast to the behavior of β€œauto-correlation” which only correlates to an immediate enclosing _expression.Select. Multi-level correlation ensures that the link between enclosed and enclosing _expression.Select is always via at least one WHERE/ORDER BY/HAVING/columns clause in order for correlation to take place.

If None is passed, the _expression.Select object will correlate none of its FROM entries, and all will render unconditionally in the local FROM clause.

Parameters

*fromclauses – one or more FromClause or other FROM-compatible construct such as an ORM mapped entity to become part of the correlate collection; alternatively pass a single value None to remove all existing correlations.

See also

_expression.Select.correlate_except()

tutorial_scalar_subquery

correlate_except(*fromclauses: Union[Literal[None, False], _FromClauseArgument]) Self

Return a new _expression.Select which will omit the given FROM clauses from the auto-correlation process.

Calling _expression.Select.correlate_except() turns off the _expression.Select object’s default behavior of β€œauto-correlation” for the given FROM elements. An element specified here will unconditionally appear in the FROM list, while all other FROM elements remain subject to normal auto-correlation behaviors.

If None is passed, or no arguments are passed, the _expression.Select object will correlate all of its FROM entries.

Parameters

*fromclauses – a list of one or more _expression.FromClause constructs, or other compatible constructs (i.e. ORM-mapped classes) to become part of the correlate-exception collection.

See also

_expression.Select.correlate()

tutorial_scalar_subquery

corresponding_column(column: KeyedColumnElement[Any], require_embedded: bool = False) Optional[KeyedColumnElement[Any]]

Given a _expression.ColumnElement, return the exported _expression.ColumnElement object from the _expression.Selectable.exported_columns collection of this _expression.Selectable which corresponds to that original _expression.ColumnElement via a common ancestor column.

Parameters
  • column – the target _expression.ColumnElement to be matched.

  • require_embedded – only return corresponding columns for the given _expression.ColumnElement, if the given _expression.ColumnElement is actually present within a sub-element of this _expression.Selectable. Normally the column will match if it merely shares a common ancestor with one of the exported columns of this _expression.Selectable.

See also

_expression.Selectable.exported_columns - the _expression.ColumnCollection that is used for the operation.

_expression.ColumnCollection.corresponding_column() - implementation method.

cte(name: Optional[str] = None, recursive: bool = False, nesting: bool = False) CTE

Return a new _expression.CTE, or Common Table Expression instance.

Common table expressions are a SQL standard whereby SELECT statements can draw upon secondary statements specified along with the primary statement, using a clause called β€œWITH”. Special semantics regarding UNION can also be employed to allow β€œrecursive” queries, where a SELECT statement can draw upon the set of rows that have previously been selected.

CTEs can also be applied to DML constructs UPDATE, INSERT and DELETE on some databases, both as a source of CTE rows when combined with RETURNING, as well as a consumer of CTE rows.

SQLAlchemy detects _expression.CTE objects, which are treated similarly to _expression.Alias objects, as special elements to be delivered to the FROM clause of the statement as well as to a WITH clause at the top of the statement.

For special prefixes such as PostgreSQL β€œMATERIALIZED” and β€œNOT MATERIALIZED”, the _expression.CTE.prefix_with() method may be used to establish these.

Changed in version 1.3.13: Added support for prefixes. In particular - MATERIALIZED and NOT MATERIALIZED.

Parameters
  • name – name given to the common table expression. Like _expression.FromClause.alias(), the name can be left as None in which case an anonymous symbol will be used at query compile time.

  • recursive – if True, will render WITH RECURSIVE. A recursive common table expression is intended to be used in conjunction with UNION ALL in order to derive rows from those already selected.

  • nesting –

    if True, will render the CTE locally to the statement in which it is referenced. For more complex scenarios, the HasCTE.add_cte() method using the :paramref:`.HasCTE.add_cte.nest_here` parameter may also be used to more carefully control the exact placement of a particular CTE.

    New in version 1.4.24.

    See also

    HasCTE.add_cte()

The following examples include two from PostgreSQL’s documentation at https://www.postgresql.org/docs/current/static/queries-with.html, as well as additional examples.

Example 1, non recursive:

from sqlalchemy import (Table, Column, String, Integer,
                        MetaData, select, func)

metadata = MetaData()

orders = Table('orders', metadata,
    Column('region', String),
    Column('amount', Integer),
    Column('product', String),
    Column('quantity', Integer)
)

regional_sales = select(
                    orders.c.region,
                    func.sum(orders.c.amount).label('total_sales')
                ).group_by(orders.c.region).cte("regional_sales")


top_regions = select(regional_sales.c.region).\
        where(
            regional_sales.c.total_sales >
            select(
                func.sum(regional_sales.c.total_sales) / 10
            )
        ).cte("top_regions")

statement = select(
            orders.c.region,
            orders.c.product,
            func.sum(orders.c.quantity).label("product_units"),
            func.sum(orders.c.amount).label("product_sales")
    ).where(orders.c.region.in_(
        select(top_regions.c.region)
    )).group_by(orders.c.region, orders.c.product)

result = conn.execute(statement).fetchall()

Example 2, WITH RECURSIVE:

from sqlalchemy import (Table, Column, String, Integer,
                        MetaData, select, func)

metadata = MetaData()

parts = Table('parts', metadata,
    Column('part', String),
    Column('sub_part', String),
    Column('quantity', Integer),
)

included_parts = select(\
    parts.c.sub_part, parts.c.part, parts.c.quantity\
    ).\
    where(parts.c.part=='our part').\
    cte(recursive=True)


incl_alias = included_parts.alias()
parts_alias = parts.alias()
included_parts = included_parts.union_all(
    select(
        parts_alias.c.sub_part,
        parts_alias.c.part,
        parts_alias.c.quantity
    ).\
    where(parts_alias.c.part==incl_alias.c.sub_part)
)

statement = select(
            included_parts.c.sub_part,
            func.sum(included_parts.c.quantity).
              label('total_quantity')
        ).\
        group_by(included_parts.c.sub_part)

result = conn.execute(statement).fetchall()

Example 3, an upsert using UPDATE and INSERT with CTEs:

from datetime import date
from sqlalchemy import (MetaData, Table, Column, Integer,
                        Date, select, literal, and_, exists)

metadata = MetaData()

visitors = Table('visitors', metadata,
    Column('product_id', Integer, primary_key=True),
    Column('date', Date, primary_key=True),
    Column('count', Integer),
)

# add 5 visitors for the product_id == 1
product_id = 1
day = date.today()
count = 5

update_cte = (
    visitors.update()
    .where(and_(visitors.c.product_id == product_id,
                visitors.c.date == day))
    .values(count=visitors.c.count + count)
    .returning(literal(1))
    .cte('update_cte')
)

upsert = visitors.insert().from_select(
    [visitors.c.product_id, visitors.c.date, visitors.c.count],
    select(literal(product_id), literal(day), literal(count))
        .where(~exists(update_cte.select()))
)

connection.execute(upsert)

Example 4, Nesting CTE (SQLAlchemy 1.4.24 and above):

value_a = select(
    literal("root").label("n")
).cte("value_a")

# A nested CTE with the same name as the root one
value_a_nested = select(
    literal("nesting").label("n")
).cte("value_a", nesting=True)

# Nesting CTEs takes ascendency locally
# over the CTEs at a higher level
value_b = select(value_a_nested.c.n).cte("value_b")

value_ab = select(value_a.c.n.label("a"), value_b.c.n.label("b"))

The above query will render the second CTE nested inside the first, shown with inline parameters below as:

WITH
    value_a AS
        (SELECT 'root' AS n),
    value_b AS
        (WITH value_a AS
            (SELECT 'nesting' AS n)
        SELECT value_a.n AS n FROM value_a)
SELECT value_a.n AS a, value_b.n AS b
FROM value_a, value_b

The same CTE can be set up using the HasCTE.add_cte() method as follows (SQLAlchemy 2.0 and above):

value_a = select(
    literal("root").label("n")
).cte("value_a")

# A nested CTE with the same name as the root one
value_a_nested = select(
    literal("nesting").label("n")
).cte("value_a")

# Nesting CTEs takes ascendency locally
# over the CTEs at a higher level
value_b = (
    select(value_a_nested.c.n).
    add_cte(value_a_nested, nest_here=True).
    cte("value_b")
)

value_ab = select(value_a.c.n.label("a"), value_b.c.n.label("b"))

Example 5, Non-Linear CTE (SQLAlchemy 1.4.28 and above):

edge = Table(
    "edge",
    metadata,
    Column("id", Integer, primary_key=True),
    Column("left", Integer),
    Column("right", Integer),
)

root_node = select(literal(1).label("node")).cte(
    "nodes", recursive=True
)

left_edge = select(edge.c.left).join(
    root_node, edge.c.right == root_node.c.node
)
right_edge = select(edge.c.right).join(
    root_node, edge.c.left == root_node.c.node
)

subgraph_cte = root_node.union(left_edge, right_edge)

subgraph = select(subgraph_cte)

The above query will render 2 UNIONs inside the recursive CTE:

WITH RECURSIVE nodes(node) AS (
        SELECT 1 AS node
    UNION
        SELECT edge."left" AS "left"
        FROM edge JOIN nodes ON edge."right" = nodes.node
    UNION
        SELECT edge."right" AS "right"
        FROM edge JOIN nodes ON edge."left" = nodes.node
)
SELECT nodes.node FROM nodes

See also

_orm.Query.cte() - ORM version of _expression.HasCTE.cte().

distinct(*expr: _ColumnExpressionArgument[Any]) Self

Return a new _expression.select() construct which will apply DISTINCT to its columns clause.

Parameters

*expr –

optional column expressions. When present, the PostgreSQL dialect will render a DISTINCT ON (<expressions>>) construct.

Deprecated since version 1.4: Using *expr in other dialects is deprecated and will raise _exc.CompileError in a future version.

except_(*other: _SelectStatementForCompoundArgument) CompoundSelect

Return a SQL EXCEPT of this select() construct against the given selectable provided as positional arguments.

Parameters

*other –

one or more elements with which to create a UNION.

Changed in version 1.4.28: multiple elements are now accepted.

except_all(*other: _SelectStatementForCompoundArgument) CompoundSelect

Return a SQL EXCEPT ALL of this select() construct against the given selectables provided as positional arguments.

Parameters

*other –

one or more elements with which to create a UNION.

Changed in version 1.4.28: multiple elements are now accepted.

execution_options(**kw: Any) Self

Set non-SQL options for the statement which take effect during execution.

Execution options can be set at many scopes, including per-statement, per-connection, or per execution, using methods such as _engine.Connection.execution_options() and parameters which accept a dictionary of options such as :paramref:`_engine.Connection.execute.execution_options` and :paramref:`_orm.Session.execute.execution_options`.

The primary characteristic of an execution option, as opposed to other kinds of options such as ORM loader options, is that execution options never affect the compiled SQL of a query, only things that affect how the SQL statement itself is invoked or how results are fetched. That is, execution options are not part of what’s accommodated by SQL compilation nor are they considered part of the cached state of a statement.

The _sql.Executable.execution_options() method is generative, as is the case for the method as applied to the _engine.Engine and _orm.Query objects, which means when the method is called, a copy of the object is returned, which applies the given parameters to that new copy, but leaves the original unchanged:

statement = select(table.c.x, table.c.y)
new_statement = statement.execution_options(my_option=True)

An exception to this behavior is the _engine.Connection object, where the _engine.Connection.execution_options() method is explicitly not generative.

The kinds of options that may be passed to _sql.Executable.execution_options() and other related methods and parameter dictionaries include parameters that are explicitly consumed by SQLAlchemy Core or ORM, as well as arbitrary keyword arguments not defined by SQLAlchemy, which means the methods and/or parameter dictionaries may be used for user-defined parameters that interact with custom code, which may access the parameters using methods such as _sql.Executable.get_execution_options() and _engine.Connection.get_execution_options(), or within selected event hooks using a dedicated execution_options event parameter such as :paramref:`_events.ConnectionEvents.before_execute.execution_options` or _orm.ORMExecuteState.execution_options, e.g.:

from sqlalchemy import event

@event.listens_for(some_engine, "before_execute")
def _process_opt(conn, statement, multiparams, params, execution_options):
    "run a SQL function before invoking a statement"

    if execution_options.get("do_special_thing", False):
        conn.exec_driver_sql("run_special_function()")

Within the scope of options that are explicitly recognized by SQLAlchemy, most apply to specific classes of objects and not others. The most common execution options include:

See also

_engine.Connection.execution_options()

:paramref:`_engine.Connection.execute.execution_options`

:paramref:`_orm.Session.execute.execution_options`

orm_queryguide_execution_options - documentation on all ORM-specific execution options

exists() Exists

Return an _sql.Exists representation of this selectable, which can be used as a column expression.

The returned object is an instance of _sql.Exists.

See also

_sql.exists()

tutorial_exists - in the 2.0 style tutorial.

New in version 1.4.

property exported_columns: ReadOnlyColumnCollection[str, ColumnElement[Any]]

A _expression.ColumnCollection that represents the β€œexported” columns of this _expression.Selectable, not including _sql.TextClause constructs.

The β€œexported” columns for a _expression.SelectBase object are synonymous with the _expression.SelectBase.selected_columns collection.

New in version 1.4.

See also

_expression.Select.exported_columns

_expression.Selectable.exported_columns

_expression.FromClause.exported_columns

fetch(count: _LimitOffsetType, with_ties: bool = False, percent: bool = False) Self

Return a new selectable with the given FETCH FIRST criterion applied.

This is a numeric value which usually renders as FETCH {FIRST | NEXT} [ count ] {ROW | ROWS} {ONLY | WITH TIES} expression in the resulting select. This functionality is is currently implemented for Oracle, PostgreSQL, MSSQL.

Use _sql.GenerativeSelect.offset() to specify the offset.

Note

The _sql.GenerativeSelect.fetch() method will replace any clause applied with _sql.GenerativeSelect.limit().

New in version 1.4.

Parameters
  • count – an integer COUNT parameter, or a SQL expression that provides an integer result. When percent=True this will represent the percentage of rows to return, not the absolute value. Pass None to reset it.

  • with_ties – When True, the WITH TIES option is used to return any additional rows that tie for the last place in the result set according to the ORDER BY clause. The ORDER BY may be mandatory in this case. Defaults to False

  • percent – When True, count represents the percentage of the total number of selected rows to return. Defaults to False

See also

_sql.GenerativeSelect.limit()

_sql.GenerativeSelect.offset()

filter(*criteria: _ColumnExpressionArgument[bool]) Self

A synonym for the _sql.Select.where() method.

filter_by(**kwargs: Any) Self

apply the given filtering criterion as a WHERE clause to this select.

from_statement(statement: ReturnsRowsRole) ExecutableReturnsRows

Apply the columns which this Select would select onto another statement.

This operation is plugin-specific and will raise a not supported exception if this _sql.Select does not select from plugin-enabled entities.

The statement is typically either a _expression.text() or _expression.select() construct, and should return the set of columns appropriate to the entities represented by this Select.

See also

orm_queryguide_selecting_text - usage examples in the ORM Querying Guide

property froms: Sequence[FromClause]

Return the displayed list of _expression.FromClause elements.

Deprecated since version 1.4.23: The _expression.Select.froms attribute is moved to the _expression.Select.get_final_froms() method.

get_children(**kw: Any) Iterable[ClauseElement]

Return immediate child visitors.HasTraverseInternals elements of this visitors.HasTraverseInternals.

This is used for visit traversal.

**kw may contain flags that change the collection that is returned, for example to return a subset of items in order to cut down on larger traversals, or to return child items from a different context (such as schema-level collections instead of clause-level).

get_execution_options() _ExecuteOptions

Get the non-SQL options which will take effect during execution.

New in version 1.3.

See also

Executable.execution_options()

get_final_froms() Sequence[FromClause]

Compute the final displayed list of _expression.FromClause elements.

This method will run through the full computation required to determine what FROM elements will be displayed in the resulting SELECT statement, including shadowing individual tables with JOIN objects, as well as full computation for ORM use cases including eager loading clauses.

For ORM use, this accessor returns the post compilation list of FROM objects; this collection will include elements such as eagerly loaded tables and joins. The objects will not be ORM enabled and not work as a replacement for the _sql.Select.select_froms() collection; additionally, the method is not well performing for an ORM enabled statement as it will incur the full ORM construction process.

To retrieve the FROM list that’s implied by the β€œcolumns” collection passed to the _sql.Select originally, use the _sql.Select.columns_clause_froms accessor.

To select from an alternative set of columns while maintaining the FROM list, use the _sql.Select.with_only_columns() method and pass the :paramref:`_sql.Select.with_only_columns.maintain_column_froms` parameter.

New in version 1.4.23: - the _sql.Select.get_final_froms() method replaces the previous _sql.Select.froms accessor, which is deprecated.

See also

_sql.Select.columns_clause_froms

get_label_style() SelectLabelStyle

Retrieve the current label style.

New in version 1.4.

group_by(_GenerativeSelect__first: Union[Literal[None, _NoArg.NO_ARG], _ColumnExpressionOrStrLabelArgument[Any]] = _NoArg.NO_ARG, *clauses: _ColumnExpressionOrStrLabelArgument[Any]) Self

Return a new selectable with the given list of GROUP BY criterion applied.

All existing GROUP BY settings can be suppressed by passing None.

e.g.:

stmt = select(table.c.name, func.max(table.c.stat)).\
group_by(table.c.name)
Parameters

*clauses – a series of _expression.ColumnElement constructs which will be used to generate an GROUP BY clause.

See also

tutorial_group_by_w_aggregates - in the unified_tutorial

tutorial_order_by_label - in the unified_tutorial

having(*having: _ColumnExpressionArgument[bool]) Self

Return a new _expression.select() construct with the given expression added to its HAVING clause, joined to the existing clause via AND, if any.

inherit_cache: Optional[bool] = None

Indicate if this HasCacheKey instance should make use of the cache key generation scheme used by its immediate superclass.

The attribute defaults to None, which indicates that a construct has not yet taken into account whether or not its appropriate for it to participate in caching; this is functionally equivalent to setting the value to False, except that a warning is also emitted.

This flag can be set to True on a particular class, if the SQL that corresponds to the object does not change based on attributes which are local to this class, and not its superclass.

See also

compilerext_caching - General guideslines for setting the HasCacheKey.inherit_cache attribute for third-party or user defined SQL constructs.

property inner_columns: _SelectIterable

An iterator of all _expression.ColumnElement expressions which would be rendered into the columns clause of the resulting SELECT statement.

This method is legacy as of 1.4 and is superseded by the _expression.Select.exported_columns collection.

intersect(*other: _SelectStatementForCompoundArgument) CompoundSelect

Return a SQL INTERSECT of this select() construct against the given selectables provided as positional arguments.

Parameters
  • *other –

    one or more elements with which to create a UNION.

    Changed in version 1.4.28: multiple elements are now accepted.

  • **kwargs – keyword arguments are forwarded to the constructor for the newly created _sql.CompoundSelect object.

intersect_all(*other: _SelectStatementForCompoundArgument) CompoundSelect

Return a SQL INTERSECT ALL of this select() construct against the given selectables provided as positional arguments.

Parameters
  • *other –

    one or more elements with which to create a UNION.

    Changed in version 1.4.28: multiple elements are now accepted.

  • **kwargs – keyword arguments are forwarded to the constructor for the newly created _sql.CompoundSelect object.

is_derived_from(fromclause: Optional[FromClause]) bool

Return True if this ReturnsRows is β€˜derived’ from the given FromClause.

An example would be an Alias of a Table is derived from that Table.

join(target: _JoinTargetArgument, onclause: Optional[_OnClauseArgument] = None, *, isouter: bool = False, full: bool = False) Self

Create a SQL JOIN against this _expression.Select object’s criterion and apply generatively, returning the newly resulting _expression.Select.

E.g.:

stmt = select(user_table).join(address_table, user_table.c.id == address_table.c.user_id)

The above statement generates SQL similar to:

SELECT user.id, user.name FROM user JOIN address ON user.id = address.user_id

Changed in version 1.4: _expression.Select.join() now creates a _sql.Join object between a _sql.FromClause source that is within the FROM clause of the existing SELECT, and a given target _sql.FromClause, and then adds this _sql.Join to the FROM clause of the newly generated SELECT statement. This is completely reworked from the behavior in 1.3, which would instead create a subquery of the entire _expression.Select and then join that subquery to the target.

This is a backwards incompatible change as the previous behavior was mostly useless, producing an unnamed subquery rejected by most databases in any case. The new behavior is modeled after that of the very successful _orm.Query.join() method in the ORM, in order to support the functionality of _orm.Query being available by using a _sql.Select object with an _orm.Session.

See the notes for this change at change_select_join.

Parameters
  • target – target table to join towards

  • onclause – ON clause of the join. If omitted, an ON clause is generated automatically based on the _schema.ForeignKey linkages between the two tables, if one can be unambiguously determined, otherwise an error is raised.

  • isouter – if True, generate LEFT OUTER join. Same as _expression.Select.outerjoin().

  • full – if True, generate FULL OUTER join.

See also

tutorial_select_join - in the /tutorial/index

orm_queryguide_joins - in the queryguide_toplevel

_expression.Select.join_from()

_expression.Select.outerjoin()

join_from(from_: _FromClauseArgument, target: _JoinTargetArgument, onclause: Optional[_OnClauseArgument] = None, *, isouter: bool = False, full: bool = False) Self

Create a SQL JOIN against this _expression.Select object’s criterion and apply generatively, returning the newly resulting _expression.Select.

E.g.:

stmt = select(user_table, address_table).join_from(
    user_table, address_table, user_table.c.id == address_table.c.user_id
)

The above statement generates SQL similar to:

SELECT user.id, user.name, address.id, address.email, address.user_id
FROM user JOIN address ON user.id = address.user_id

New in version 1.4.

Parameters
  • from_ – the left side of the join, will be rendered in the FROM clause and is roughly equivalent to using the Select.select_from() method.

  • target – target table to join towards

  • onclause – ON clause of the join.

  • isouter – if True, generate LEFT OUTER join. Same as _expression.Select.outerjoin().

  • full – if True, generate FULL OUTER join.

See also

tutorial_select_join - in the /tutorial/index

orm_queryguide_joins - in the queryguide_toplevel

_expression.Select.join()

label(name: Optional[str]) Label[Any]

Return a β€˜scalar’ representation of this selectable, embedded as a subquery with a label.

See also

_expression.SelectBase.scalar_subquery().

lateral(name: Optional[str] = None) LateralFromClause

Return a LATERAL alias of this _expression.Selectable.

The return value is the _expression.Lateral construct also provided by the top-level _expression.lateral() function.

See also

tutorial_lateral_correlation - overview of usage.

limit(limit: _LimitOffsetType) Self

Return a new selectable with the given LIMIT criterion applied.

This is a numerical value which usually renders as a LIMIT expression in the resulting select. Backends that don’t support LIMIT will attempt to provide similar functionality.

Note

The _sql.GenerativeSelect.limit() method will replace any clause applied with _sql.GenerativeSelect.fetch().

Parameters

limit – an integer LIMIT parameter, or a SQL expression that provides an integer result. Pass None to reset it.

See also

_sql.GenerativeSelect.fetch()

_sql.GenerativeSelect.offset()

offset(offset: _LimitOffsetType) Self

Return a new selectable with the given OFFSET criterion applied.

This is a numeric value which usually renders as an OFFSET expression in the resulting select. Backends that don’t support OFFSET will attempt to provide similar functionality.

Parameters

offset – an integer OFFSET parameter, or a SQL expression that provides an integer result. Pass None to reset it.

See also

_sql.GenerativeSelect.limit()

_sql.GenerativeSelect.fetch()

options(*options: ExecutableOption) Self

Apply options to this statement.

In the general sense, options are any kind of Python object that can be interpreted by the SQL compiler for the statement. These options can be consumed by specific dialects or specific kinds of compilers.

The most commonly known kind of option are the ORM level options that apply β€œeager load” and other loading behaviors to an ORM query. However, options can theoretically be used for many other purposes.

For background on specific kinds of options for specific kinds of statements, refer to the documentation for those option objects.

Changed in version 1.4: - added Executable.options() to Core statement objects towards the goal of allowing unified Core / ORM querying capabilities.

See also

loading_columns - refers to options specific to the usage of ORM queries

relationship_loader_options - refers to options specific to the usage of ORM queries

order_by(_GenerativeSelect__first: Union[Literal[None, _NoArg.NO_ARG], _ColumnExpressionOrStrLabelArgument[Any]] = _NoArg.NO_ARG, *clauses: _ColumnExpressionOrStrLabelArgument[Any]) Self

Return a new selectable with the given list of ORDER BY criteria applied.

e.g.:

stmt = select(table).order_by(table.c.id, table.c.name)

Calling this method multiple times is equivalent to calling it once with all the clauses concatenated. All existing ORDER BY criteria may be cancelled by passing None by itself. New ORDER BY criteria may then be added by invoking _orm.Query.order_by() again, e.g.:

# will erase all ORDER BY and ORDER BY new_col alone
stmt = stmt.order_by(None).order_by(new_col)
Parameters

*clauses – a series of _expression.ColumnElement constructs which will be used to generate an ORDER BY clause.

See also

tutorial_order_by - in the unified_tutorial

tutorial_order_by_label - in the unified_tutorial

outerjoin(target: _JoinTargetArgument, onclause: Optional[_OnClauseArgument] = None, *, full: bool = False) Self

Create a left outer join.

Parameters are the same as that of _expression.Select.join().

Changed in version 1.4: _expression.Select.outerjoin() now creates a _sql.Join object between a _sql.FromClause source that is within the FROM clause of the existing SELECT, and a given target _sql.FromClause, and then adds this _sql.Join to the FROM clause of the newly generated SELECT statement. This is completely reworked from the behavior in 1.3, which would instead create a subquery of the entire _expression.Select and then join that subquery to the target.

This is a backwards incompatible change as the previous behavior was mostly useless, producing an unnamed subquery rejected by most databases in any case. The new behavior is modeled after that of the very successful _orm.Query.join() method in the ORM, in order to support the functionality of _orm.Query being available by using a _sql.Select object with an _orm.Session.

See the notes for this change at change_select_join.

See also

tutorial_select_join - in the /tutorial/index

orm_queryguide_joins - in the queryguide_toplevel

_expression.Select.join()

outerjoin_from(from_: _FromClauseArgument, target: _JoinTargetArgument, onclause: Optional[_OnClauseArgument] = None, *, full: bool = False) Self

Create a SQL LEFT OUTER JOIN against this _expression.Select object’s criterion and apply generatively, returning the newly resulting _expression.Select.

Usage is the same as that of _selectable.Select.join_from().

params(_ClauseElement__optionaldict: Optional[Mapping[str, Any]] = None, **kwargs: Any) Self

Return a copy with _expression.bindparam() elements replaced.

Returns a copy of this ClauseElement with _expression.bindparam() elements replaced with values taken from the given dictionary:

>>> clause = column('x') + bindparam('foo')
>>> print(clause.compile().params)
{'foo':None}
>>> print(clause.params({'foo':7}).compile().params)
{'foo':7}
prefix_with(*prefixes: _TextCoercedExpressionArgument[Any], dialect: str = '*') Self

Add one or more expressions following the statement keyword, i.e. SELECT, INSERT, UPDATE, or DELETE. Generative.

This is used to support backend-specific prefix keywords such as those provided by MySQL.

E.g.:

stmt = table.insert().prefix_with("LOW_PRIORITY", dialect="mysql")

# MySQL 5.7 optimizer hints
stmt = select(table).prefix_with(
    "/*+ BKA(t1) */", dialect="mysql")

Multiple prefixes can be specified by multiple calls to _expression.HasPrefixes.prefix_with().

Parameters
  • *prefixes – textual or _expression.ClauseElement construct which will be rendered following the INSERT, UPDATE, or DELETE keyword.

  • dialect – optional string dialect name which will limit rendering of this prefix to only that dialect.

reduce_columns(only_synonyms: bool = True) Select

Return a new _expression.select() construct with redundantly named, equivalently-valued columns removed from the columns clause.

β€œRedundant” here means two columns where one refers to the other either based on foreign key, or via a simple equality comparison in the WHERE clause of the statement. The primary purpose of this method is to automatically construct a select statement with all uniquely-named columns, without the need to use table-qualified labels as _expression.Select.set_label_style() does.

When columns are omitted based on foreign key, the referred-to column is the one that’s kept. When columns are omitted based on WHERE equivalence, the first column in the columns clause is the one that’s kept.

Parameters

only_synonyms – when True, limit the removal of columns to those which have the same name as the equivalent. Otherwise, all columns that are equivalent to another are removed.

replace_selectable(old: FromClause, alias: Alias) Self

Replace all occurrences of _expression.FromClause β€˜old’ with the given _expression.Alias object, returning a copy of this _expression.FromClause.

Deprecated since version 1.4: The Selectable.replace_selectable() method is deprecated, and will be removed in a future release. Similar functionality is available via the sqlalchemy.sql.visitors module.

scalar_subquery() ScalarSelect[Any]

Return a β€˜scalar’ representation of this selectable, which can be used as a column expression.

The returned object is an instance of _sql.ScalarSelect.

Typically, a select statement which has only one column in its columns clause is eligible to be used as a scalar expression. The scalar subquery can then be used in the WHERE clause or columns clause of an enclosing SELECT.

Note that the scalar subquery differentiates from the FROM-level subquery that can be produced using the _expression.SelectBase.subquery() method.

See also

tutorial_scalar_subquery - in the 2.0 tutorial

select(*arg: Any, **kw: Any) Select

Deprecated since version 1.4: The _expression.SelectBase.select() method is deprecated and will be removed in a future release; this method implicitly creates a subquery that should be explicit. Please call _expression.SelectBase.subquery() first in order to create a subquery, which then can be selected.

select_from(*froms: _FromClauseArgument) Self

Return a new _expression.select() construct with the given FROM expression(s) merged into its list of FROM objects.

E.g.:

table1 = table('t1', column('a'))
table2 = table('t2', column('b'))
s = select(table1.c.a).\
    select_from(
        table1.join(table2, table1.c.a==table2.c.b)
    )

The β€œfrom” list is a unique set on the identity of each element, so adding an already present _schema.Table or other selectable will have no effect. Passing a _expression.Join that refers to an already present _schema.Table or other selectable will have the effect of concealing the presence of that selectable as an individual element in the rendered FROM list, instead rendering it into a JOIN clause.

While the typical purpose of _expression.Select.select_from() is to replace the default, derived FROM clause with a join, it can also be called with individual table elements, multiple times if desired, in the case that the FROM clause cannot be fully derived from the columns clause:

select(func.count('*')).select_from(table1)
selected_columns

A _expression.ColumnCollection representing the columns that this SELECT statement or similar construct returns in its result set, not including _sql.TextClause constructs.

This collection differs from the _expression.FromClause.columns collection of a _expression.FromClause in that the columns within this collection cannot be directly nested inside another SELECT statement; a subquery must be applied first which provides for the necessary parenthesization required by SQL.

For a _expression.select() construct, the collection here is exactly what would be rendered inside the β€œSELECT” statement, and the _expression.ColumnElement objects are directly present as they were given, e.g.:

col1 = column('q', Integer)
col2 = column('p', Integer)
stmt = select(col1, col2)

Above, stmt.selected_columns would be a collection that contains the col1 and col2 objects directly. For a statement that is against a _schema.Table or other _expression.FromClause, the collection will use the _expression.ColumnElement objects that are in the _expression.FromClause.c collection of the from element.

A use case for the _sql.Select.selected_columns collection is to allow the existing columns to be referenced when adding additional criteria, e.g.:

def filter_on_id(my_select, id):
    return my_select.where(my_select.selected_columns['id'] == id)

stmt = select(MyModel)

# adds "WHERE id=:param" to the statement
stmt = filter_on_id(stmt, 42)

Note

The _sql.Select.selected_columns collection does not include expressions established in the columns clause using the _sql.text() construct; these are silently omitted from the collection. To use plain textual column expressions inside of a _sql.Select construct, use the _sql.literal_column() construct.

New in version 1.4.

self_group(against: Optional[OperatorType] = None) Union[SelectStatementGrouping, Self]

Apply a β€˜grouping’ to this _expression.ClauseElement.

This method is overridden by subclasses to return a β€œgrouping” construct, i.e. parenthesis. In particular it’s used by β€œbinary” expressions to provide a grouping around themselves when placed into a larger expression, as well as by _expression.select() constructs when placed into the FROM clause of another _expression.select(). (Note that subqueries should be normally created using the _expression.Select.alias() method, as many platforms require nested SELECT statements to be named).

As expressions are composed together, the application of self_group() is automatic - end-user code should never need to use this method directly. Note that SQLAlchemy’s clause constructs take operator precedence into account - so parenthesis might not be needed, for example, in an expression like x OR (y AND z) - AND takes precedence over OR.

The base self_group() method of _expression.ClauseElement just returns self.

set_label_style(style: SelectLabelStyle) Self

Return a new selectable with the specified label style.

There are three β€œlabel styles” available, _sql.SelectLabelStyle.LABEL_STYLE_DISAMBIGUATE_ONLY, _sql.SelectLabelStyle.LABEL_STYLE_TABLENAME_PLUS_COL, and _sql.SelectLabelStyle.LABEL_STYLE_NONE. The default style is _sql.SelectLabelStyle.LABEL_STYLE_TABLENAME_PLUS_COL.

In modern SQLAlchemy, there is not generally a need to change the labeling style, as per-expression labels are more effectively used by making use of the _sql.ColumnElement.label() method. In past versions, _sql.LABEL_STYLE_TABLENAME_PLUS_COL was used to disambiguate same-named columns from different tables, aliases, or subqueries; the newer _sql.LABEL_STYLE_DISAMBIGUATE_ONLY now applies labels only to names that conflict with an existing name so that the impact of this labeling is minimal.

The rationale for disambiguation is mostly so that all column expressions are available from a given _sql.FromClause.c collection when a subquery is created.

New in version 1.4: - the _sql.GenerativeSelect.set_label_style() method replaces the previous combination of .apply_labels(), .with_labels() and use_labels=True methods and/or parameters.

See also

_sql.LABEL_STYLE_DISAMBIGUATE_ONLY

_sql.LABEL_STYLE_TABLENAME_PLUS_COL

_sql.LABEL_STYLE_NONE

_sql.LABEL_STYLE_DEFAULT

slice(start: int, stop: int) Self

Apply LIMIT / OFFSET to this statement based on a slice.

The start and stop indices behave like the argument to Python’s built-in range() function. This method provides an alternative to using LIMIT/OFFSET to get a slice of the query.

For example,

stmt = select(User).order_by(User).id.slice(1, 3)

renders as

SELECT users.id AS users_id,
       users.name AS users_name
FROM users ORDER BY users.id
LIMIT ? OFFSET ?
(2, 1)

Note

The _sql.GenerativeSelect.slice() method will replace any clause applied with _sql.GenerativeSelect.fetch().

New in version 1.4: Added the _sql.GenerativeSelect.slice() method generalized from the ORM.

See also

_sql.GenerativeSelect.limit()

_sql.GenerativeSelect.offset()

_sql.GenerativeSelect.fetch()

subquery(name: Optional[str] = None) Subquery

Return a subquery of this _expression.SelectBase.

A subquery is from a SQL perspective a parenthesized, named construct that can be placed in the FROM clause of another SELECT statement.

Given a SELECT statement such as:

stmt = select(table.c.id, table.c.name)

The above statement might look like:

SELECT table.id, table.name FROM table

The subquery form by itself renders the same way, however when embedded into the FROM clause of another SELECT statement, it becomes a named sub-element:

subq = stmt.subquery()
new_stmt = select(subq)

The above renders as:

SELECT anon_1.id, anon_1.name
FROM (SELECT table.id, table.name FROM table) AS anon_1

Historically, _expression.SelectBase.subquery() is equivalent to calling the _expression.FromClause.alias() method on a FROM object; however, as a _expression.SelectBase object is not directly FROM object, the _expression.SelectBase.subquery() method provides clearer semantics.

New in version 1.4.

suffix_with(*suffixes: _TextCoercedExpressionArgument[Any], dialect: str = '*') Self

Add one or more expressions following the statement as a whole.

This is used to support backend-specific suffix keywords on certain constructs.

E.g.:

stmt = select(col1, col2).cte().suffix_with(
    "cycle empno set y_cycle to 1 default 0", dialect="oracle")

Multiple suffixes can be specified by multiple calls to _expression.HasSuffixes.suffix_with().

Parameters
  • *suffixes – textual or _expression.ClauseElement construct which will be rendered following the target clause.

  • dialect – Optional string dialect name which will limit rendering of this suffix to only that dialect.

union(*other: _SelectStatementForCompoundArgument) CompoundSelect

Return a SQL UNION of this select() construct against the given selectables provided as positional arguments.

Parameters
  • *other –

    one or more elements with which to create a UNION.

    Changed in version 1.4.28: multiple elements are now accepted.

  • **kwargs – keyword arguments are forwarded to the constructor for the newly created _sql.CompoundSelect object.

union_all(*other: _SelectStatementForCompoundArgument) CompoundSelect

Return a SQL UNION ALL of this select() construct against the given selectables provided as positional arguments.

Parameters
  • *other –

    one or more elements with which to create a UNION.

    Changed in version 1.4.28: multiple elements are now accepted.

  • **kwargs – keyword arguments are forwarded to the constructor for the newly created _sql.CompoundSelect object.

unique_params(_ClauseElement__optionaldict: Optional[Dict[str, Any]] = None, **kwargs: Any) Self

Return a copy with _expression.bindparam() elements replaced.

Same functionality as _expression.ClauseElement.params(), except adds unique=True to affected bind parameters so that multiple statements can be used.

where(*whereclause: _ColumnExpressionArgument[bool]) Self

Return a new _expression.select() construct with the given expression added to its WHERE clause, joined to the existing clause via AND, if any.

property whereclause: Optional[ColumnElement[Any]]

Return the completed WHERE clause for this _expression.Select statement.

This assembles the current collection of WHERE criteria into a single _expression.BooleanClauseList construct.

New in version 1.4.

with_for_update(*, nowait: bool = False, read: bool = False, of: Optional[_ForUpdateOfArgument] = None, skip_locked: bool = False, key_share: bool = False) Self

Specify a FOR UPDATE clause for this _expression.GenerativeSelect.

E.g.:

stmt = select(table).with_for_update(nowait=True)

On a database like PostgreSQL or Oracle, the above would render a statement like:

SELECT table.a, table.b FROM table FOR UPDATE NOWAIT

on other backends, the nowait option is ignored and instead would produce:

SELECT table.a, table.b FROM table FOR UPDATE

When called with no arguments, the statement will render with the suffix FOR UPDATE. Additional arguments can then be provided which allow for common database-specific variants.

Parameters
  • nowait – boolean; will render FOR UPDATE NOWAIT on Oracle and PostgreSQL dialects.

  • read – boolean; will render LOCK IN SHARE MODE on MySQL, FOR SHARE on PostgreSQL. On PostgreSQL, when combined with nowait, will render FOR SHARE NOWAIT.

  • of – SQL expression or list of SQL expression elements, (typically _schema.Column objects or a compatible expression, for some backends may also be a table expression) which will render into a FOR UPDATE OF clause; supported by PostgreSQL, Oracle, some MySQL versions and possibly others. May render as a table or as a column depending on backend.

  • skip_locked – boolean, will render FOR UPDATE SKIP LOCKED on Oracle and PostgreSQL dialects or FOR SHARE SKIP LOCKED if read=True is also specified.

  • key_share – boolean, will render FOR NO KEY UPDATE, or if combined with read=True will render FOR KEY SHARE, on the PostgreSQL dialect.

with_hint(selectable: _FromClauseArgument, text: str, dialect_name: str = '*') Self

Add an indexing or other executional context hint for the given selectable to this _expression.Select or other selectable object.

The text of the hint is rendered in the appropriate location for the database backend in use, relative to the given _schema.Table or _expression.Alias passed as the selectable argument. The dialect implementation typically uses Python string substitution syntax with the token %(name)s to render the name of the table or alias. E.g. when using Oracle, the following:

select(mytable).\
    with_hint(mytable, "index(%(name)s ix_mytable)")

Would render SQL as:

select /*+ index(mytable ix_mytable) */ ... from mytable

The dialect_name option will limit the rendering of a particular hint to a particular backend. Such as, to add hints for both Oracle and Sybase simultaneously:

select(mytable).\
    with_hint(mytable, "index(%(name)s ix_mytable)", 'oracle').\
    with_hint(mytable, "WITH INDEX ix_mytable", 'mssql')

See also

_expression.Select.with_statement_hint()

with_only_columns(*entities: _ColumnsClauseArgument[Any], maintain_column_froms: bool = False, **_Select__kw: Any) Select[Any]

Return a new _expression.select() construct with its columns clause replaced with the given entities.

By default, this method is exactly equivalent to as if the original _expression.select() had been called with the given entities. E.g. a statement:

s = select(table1.c.a, table1.c.b)
s = s.with_only_columns(table1.c.b)

should be exactly equivalent to:

s = select(table1.c.b)

In this mode of operation, _sql.Select.with_only_columns() will also dynamically alter the FROM clause of the statement if it is not explicitly stated. To maintain the existing set of FROMs including those implied by the current columns clause, add the :paramref:`_sql.Select.with_only_columns.maintain_column_froms` parameter:

s = select(table1.c.a, table2.c.b)
s = s.with_only_columns(table1.c.a, maintain_column_froms=True)

The above parameter performs a transfer of the effective FROMs in the columns collection to the _sql.Select.select_from() method, as though the following were invoked:

s = select(table1.c.a, table2.c.b)
s = s.select_from(table1, table2).with_only_columns(table1.c.a)

The :paramref:`_sql.Select.with_only_columns.maintain_column_froms` parameter makes use of the _sql.Select.columns_clause_froms collection and performs an operation equivalent to the following:

s = select(table1.c.a, table2.c.b)
s = s.select_from(*s.columns_clause_froms).with_only_columns(table1.c.a)
Parameters
  • *entities – column expressions to be used.

  • maintain_column_froms –

    boolean parameter that will ensure the FROM list implied from the current columns clause will be transferred to the _sql.Select.select_from() method first.

    New in version 1.4.23.

with_statement_hint(text: str, dialect_name: str = '*') Self

Add a statement hint to this _expression.Select or other selectable object.

This method is similar to _expression.Select.with_hint() except that it does not require an individual table, and instead applies to the statement as a whole.

Hints here are specific to the backend database and may include directives such as isolation levels, file directives, fetch directives, etc.

See also

_expression.Select.with_hint()

_expression.Select.prefix_with() - generic SELECT prefixing which also can suit some database-specific HINT syntaxes such as MySQL optimizer hints

add(nodes: List[BaseNode]) List[str]

Add nodes to vector store.

async adelete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id. NOTE: this is not implemented for all vector stores. If not implemented, it will just call delete synchronously.

async aquery(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Asynchronously query vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call query synchronously.

async async_add(nodes: List[BaseNode]) List[str]

Asynchronously add nodes to vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call add synchronously.

classmethod class_name() str

Get class name.

async close() None
classmethod construct(_fields_set: Optional[SetStr] = None, **values: Any) Model

Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed. Behaves as if Config.extra = β€˜allow’ was set since it adds all passed values

copy(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, update: Optional[DictStrAny] = None, deep: bool = False) Model

Duplicate a model, optionally choose which fields to include, exclude and change.

Parameters
  • include – fields to include in new model

  • exclude – fields to exclude from new model, as with values this takes precedence over include

  • update – values to change/add in the new model. Note: the data is not validated before creating the new model: you should trust this data

  • deep – set to True to make a deep copy of the model

Returns

new model instance

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

dict(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False) DictStrAny

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

classmethod from_dict(data: Dict[str, Any], **kwargs: Any) Self
classmethod from_json(data_str: str, **kwargs: Any) Self
classmethod from_orm(obj: Any) Model
classmethod from_params(host: Optional[str] = None, port: Optional[str] = None, database: Optional[str] = None, user: Optional[str] = None, password: Optional[str] = None, table_name: str = 'llamaindex', connection_string: Optional[str] = None, async_connection_string: Optional[str] = None, hybrid_search: bool = False, text_search_config: str = 'english', embed_dim: int = 1536, debug: bool = False) PGVectorStore

Return connection string from database parameters.

json(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, encoder: Optional[Callable[[Any], Any]] = None, models_as_dict: bool = True, **dumps_kwargs: Any) unicode

Generate a JSON representation of the model, include and exclude arguments as per dict().

encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().

classmethod parse_file(path: Union[str, Path], *, content_type: unicode = None, encoding: unicode = 'utf8', proto: Protocol = None, allow_pickle: bool = False) Model
classmethod parse_obj(obj: Any) Model
classmethod parse_raw(b: Union[str, bytes], *, content_type: unicode = None, encoding: unicode = 'utf8', proto: Protocol = None, allow_pickle: bool = False) Model
persist(persist_path: str, fs: Optional[AbstractFileSystem] = None) None
query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query vector store.

classmethod schema(by_alias: bool = True, ref_template: unicode = '#/definitions/{model}') DictStrAny
classmethod schema_json(*, by_alias: bool = True, ref_template: unicode = '#/definitions/{model}', **dumps_kwargs: Any) unicode
to_dict(**kwargs: Any) Dict[str, Any]
to_json(**kwargs: Any) str
classmethod update_forward_refs(**localns: Any) None

Try to update ForwardRefs on fields based on this Model, globalns and localns.

classmethod validate(value: Any) Model
property client: Any

Get client.

pydantic model llama_index.vector_stores.PineconeVectorStore

Pinecone Vector Store.

In this vector store, embeddings and docs are stored within a Pinecone index.

During query time, the index uses Pinecone to query for the top k most similar nodes.

Parameters
  • pinecone_index (Optional[pinecone.Index]) – Pinecone index instance

  • insert_kwargs (Optional[Dict]) – insert kwargs during upsert call.

  • add_sparse_vector (bool) – whether to add sparse vector to index.

  • tokenizer (Optional[Callable]) – tokenizer to use to generate sparse

Show JSON schema
{
   "title": "PineconeVectorStore",
   "description": "Pinecone Vector Store.\n\nIn this vector store, embeddings and docs are stored within a\nPinecone index.\n\nDuring query time, the index uses Pinecone to query for the top\nk most similar nodes.\n\nArgs:\n    pinecone_index (Optional[pinecone.Index]): Pinecone index instance\n    insert_kwargs (Optional[Dict]): insert kwargs during `upsert` call.\n    add_sparse_vector (bool): whether to add sparse vector to index.\n    tokenizer (Optional[Callable]): tokenizer to use to generate sparse",
   "type": "object",
   "properties": {
      "stores_text": {
         "title": "Stores Text",
         "default": true,
         "type": "boolean"
      },
      "is_embedding_query": {
         "title": "Is Embedding Query",
         "default": true,
         "type": "boolean"
      },
      "flat_metadata": {
         "title": "Flat Metadata",
         "default": true,
         "type": "boolean"
      },
      "api_key": {
         "title": "Api Key",
         "type": "string"
      },
      "index_name": {
         "title": "Index Name",
         "type": "string"
      },
      "environment": {
         "title": "Environment",
         "type": "string"
      },
      "namespace": {
         "title": "Namespace",
         "type": "string"
      },
      "insert_kwargs": {
         "title": "Insert Kwargs",
         "type": "object"
      },
      "add_sparse_vector": {
         "title": "Add Sparse Vector",
         "type": "boolean"
      },
      "text_key": {
         "title": "Text Key",
         "type": "string"
      },
      "batch_size": {
         "title": "Batch Size",
         "type": "integer"
      }
   },
   "required": [
      "add_sparse_vector",
      "text_key",
      "batch_size"
   ]
}

Fields
  • add_sparse_vector (bool)

  • api_key (Optional[str])

  • batch_size (int)

  • environment (Optional[str])

  • flat_metadata (bool)

  • index_name (Optional[str])

  • insert_kwargs (Optional[Dict])

  • is_embedding_query (bool)

  • namespace (Optional[str])

  • stores_text (bool)

  • text_key (str)

field add_sparse_vector: bool [Required]
field api_key: Optional[str] = None
field batch_size: int [Required]
field environment: Optional[str] = None
field flat_metadata: bool = True
field index_name: Optional[str] = None
field insert_kwargs: Optional[Dict] = None
field is_embedding_query: bool = True
field namespace: Optional[str] = None
field stores_text: bool = True
field text_key: str [Required]
add(nodes: List[BaseNode]) List[str]

Add nodes to index.

Args

nodes: List[BaseNode]: list of nodes with embeddings

async adelete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id. NOTE: this is not implemented for all vector stores. If not implemented, it will just call delete synchronously.

async aquery(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Asynchronously query vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call query synchronously.

async async_add(nodes: List[BaseNode]) List[str]

Asynchronously add nodes to vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call add synchronously.

classmethod class_name() str

Get class name.

classmethod construct(_fields_set: Optional[SetStr] = None, **values: Any) Model

Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed. Behaves as if Config.extra = β€˜allow’ was set since it adds all passed values

copy(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, update: Optional[DictStrAny] = None, deep: bool = False) Model

Duplicate a model, optionally choose which fields to include, exclude and change.

Parameters
  • include – fields to include in new model

  • exclude – fields to exclude from new model, as with values this takes precedence over include

  • update – values to change/add in the new model. Note: the data is not validated before creating the new model: you should trust this data

  • deep – set to True to make a deep copy of the model

Returns

new model instance

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

Parameters

ref_doc_id (str) – The doc_id of the document to delete.

dict(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False) DictStrAny

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

classmethod from_dict(data: Dict[str, Any], **kwargs: Any) Self
classmethod from_json(data_str: str, **kwargs: Any) Self
classmethod from_orm(obj: Any) Model
classmethod from_params(api_key: Optional[str] = None, index_name: Optional[str] = None, environment: Optional[str] = None, namespace: Optional[str] = None, insert_kwargs: Optional[Dict] = None, add_sparse_vector: bool = False, tokenizer: Optional[Callable] = None, text_key: str = 'text', batch_size: int = 100, **kwargs: Any) PineconeVectorStore
json(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, encoder: Optional[Callable[[Any], Any]] = None, models_as_dict: bool = True, **dumps_kwargs: Any) unicode

Generate a JSON representation of the model, include and exclude arguments as per dict().

encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().

classmethod parse_file(path: Union[str, Path], *, content_type: unicode = None, encoding: unicode = 'utf8', proto: Protocol = None, allow_pickle: bool = False) Model
classmethod parse_obj(obj: Any) Model
classmethod parse_raw(b: Union[str, bytes], *, content_type: unicode = None, encoding: unicode = 'utf8', proto: Protocol = None, allow_pickle: bool = False) Model
persist(persist_path: str, fs: Optional[AbstractFileSystem] = None) None
query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

Parameters
  • query_embedding (List[float]) – query embedding

  • similarity_top_k (int) – top k most similar nodes

classmethod schema(by_alias: bool = True, ref_template: unicode = '#/definitions/{model}') DictStrAny
classmethod schema_json(*, by_alias: bool = True, ref_template: unicode = '#/definitions/{model}', **dumps_kwargs: Any) unicode
to_dict(**kwargs: Any) Dict[str, Any]
to_json(**kwargs: Any) str
classmethod update_forward_refs(**localns: Any) None

Try to update ForwardRefs on fields based on this Model, globalns and localns.

classmethod validate(value: Any) Model
property client: Any

Return Pinecone client.

pydantic model llama_index.vector_stores.QdrantVectorStore

Qdrant Vector Store.

In this vector store, embeddings and docs are stored within a Qdrant collection.

During query time, the index uses Qdrant to query for the top k most similar nodes.

Parameters
  • collection_name – (str): name of the Qdrant collection

  • client (Optional[Any]) – QdrantClient instance from qdrant-client package

Show JSON schema
{
   "title": "QdrantVectorStore",
   "description": "Qdrant Vector Store.\n\nIn this vector store, embeddings and docs are stored within a\nQdrant collection.\n\nDuring query time, the index uses Qdrant to query for the top\nk most similar nodes.\n\nArgs:\n    collection_name: (str): name of the Qdrant collection\n    client (Optional[Any]): QdrantClient instance from `qdrant-client` package",
   "type": "object",
   "properties": {
      "stores_text": {
         "title": "Stores Text",
         "default": true,
         "type": "boolean"
      },
      "is_embedding_query": {
         "title": "Is Embedding Query",
         "default": true,
         "type": "boolean"
      },
      "flat_metadata": {
         "title": "Flat Metadata",
         "default": false,
         "type": "boolean"
      },
      "collection_name": {
         "title": "Collection Name",
         "type": "string"
      },
      "url": {
         "title": "Url",
         "type": "string"
      },
      "api_key": {
         "title": "Api Key",
         "type": "string"
      },
      "batch_size": {
         "title": "Batch Size",
         "type": "integer"
      },
      "client_kwargs": {
         "title": "Client Kwargs",
         "type": "object"
      }
   },
   "required": [
      "collection_name",
      "batch_size"
   ]
}

Fields
  • api_key (Optional[str])

  • batch_size (int)

  • client_kwargs (dict)

  • collection_name (str)

  • flat_metadata (bool)

  • is_embedding_query (bool)

  • stores_text (bool)

  • url (Optional[str])

field api_key: Optional[str] = None
field batch_size: int [Required]
field client_kwargs: dict [Optional]
field collection_name: str [Required]
field flat_metadata: bool = False
field is_embedding_query: bool = True
field stores_text: bool = True
field url: Optional[str] = None
add(nodes: List[BaseNode]) List[str]

Add nodes to index.

Args

nodes: List[BaseNode]: list of nodes with embeddings

async adelete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id. NOTE: this is not implemented for all vector stores. If not implemented, it will just call delete synchronously.

async aquery(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Asynchronously query vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call query synchronously.

async async_add(nodes: List[BaseNode]) List[str]

Asynchronously add nodes to vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call add synchronously.

classmethod class_name() str

Get class name.

classmethod construct(_fields_set: Optional[SetStr] = None, **values: Any) Model

Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed. Behaves as if Config.extra = β€˜allow’ was set since it adds all passed values

copy(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, update: Optional[DictStrAny] = None, deep: bool = False) Model

Duplicate a model, optionally choose which fields to include, exclude and change.

Parameters
  • include – fields to include in new model

  • exclude – fields to exclude from new model, as with values this takes precedence over include

  • update – values to change/add in the new model. Note: the data is not validated before creating the new model: you should trust this data

  • deep – set to True to make a deep copy of the model

Returns

new model instance

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

Parameters

ref_doc_id (str) – The doc_id of the document to delete.

dict(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False) DictStrAny

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

classmethod from_dict(data: Dict[str, Any], **kwargs: Any) Self
classmethod from_json(data_str: str, **kwargs: Any) Self
classmethod from_orm(obj: Any) Model
classmethod from_params(collection_name: str, url: Optional[str] = None, api_key: Optional[str] = None, client_kwargs: Optional[dict] = None, batch_size: int = 100, **kwargs: Any) QdrantVectorStore

Create a connection to a remote Qdrant vector store from a config.

json(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, encoder: Optional[Callable[[Any], Any]] = None, models_as_dict: bool = True, **dumps_kwargs: Any) unicode

Generate a JSON representation of the model, include and exclude arguments as per dict().

encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().

classmethod parse_file(path: Union[str, Path], *, content_type: unicode = None, encoding: unicode = 'utf8', proto: Protocol = None, allow_pickle: bool = False) Model
classmethod parse_obj(obj: Any) Model
classmethod parse_raw(b: Union[str, bytes], *, content_type: unicode = None, encoding: unicode = 'utf8', proto: Protocol = None, allow_pickle: bool = False) Model
persist(persist_path: str, fs: Optional[AbstractFileSystem] = None) None
query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

Parameters

query (VectorStoreQuery) – query

classmethod schema(by_alias: bool = True, ref_template: unicode = '#/definitions/{model}') DictStrAny
classmethod schema_json(*, by_alias: bool = True, ref_template: unicode = '#/definitions/{model}', **dumps_kwargs: Any) unicode
to_dict(**kwargs: Any) Dict[str, Any]
to_json(**kwargs: Any) str
classmethod update_forward_refs(**localns: Any) None

Try to update ForwardRefs on fields based on this Model, globalns and localns.

classmethod validate(value: Any) Model
property client: Any

Return the Qdrant client.

class llama_index.vector_stores.RedisVectorStore(index_name: str, index_prefix: str = 'llama_index', prefix_ending: str = '/vector', index_args: Optional[Dict[str, Any]] = None, metadata_fields: Optional[List[str]] = None, redis_url: str = 'redis://localhost:6379', overwrite: bool = False, **kwargs: Any)
add(nodes: List[BaseNode]) List[str]

Add nodes to the index.

Parameters

nodes (List[BaseNode]) – List of nodes with embeddings

Returns

List of ids of the documents added to the index.

Return type

List[str]

Raises

ValueError – If the index already exists and overwrite is False.

async adelete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id. NOTE: this is not implemented for all vector stores. If not implemented, it will just call delete synchronously.

async aquery(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Asynchronously query vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call query synchronously.

async async_add(nodes: List[BaseNode]) List[str]

Asynchronously add nodes with embedding to vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call add synchronously.

property client: RedisType

Return the redis client instance

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

Parameters

ref_doc_id (str) – The doc_id of the document to delete.

delete_index() None

Delete the index and all documents.

persist(persist_path: str, fs: Optional[AbstractFileSystem] = None, in_background: bool = True) None

Persist the vector store to disk.

Parameters
  • persist_path (str) – Path to persist the vector store to. (doesn’t apply)

  • in_background (bool, optional) – Persist in background. Defaults to True.

  • fs (fsspec.AbstractFileSystem, optional) – Filesystem to persist to. (doesn’t apply)

Raises

redis.exceptions.RedisError – If there is an error persisting the index to disk.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query the index.

Parameters

query (VectorStoreQuery) – query object

Returns

query result

Return type

VectorStoreQueryResult

Raises
  • ValueError – If query.query_embedding is None.

  • redis.exceptions.RedisError – If there is an error querying the index.

  • redis.exceptions.TimeoutError – If there is a timeout querying the index.

  • ValueError – If no documents are found when querying the index.

class llama_index.vector_stores.RocksetVectorStore(collection: str, client: Optional[Any] = None, text_key: str = 'text', embedding_col: str = 'embedding', metadata_col: str = 'metadata', workspace: str = 'commons', api_server: Optional[str] = None, api_key: Optional[str] = None, distance_func: DistanceFunc = DistanceFunc.COSINE_SIM)
class DistanceFunc(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)
add(nodes: List[BaseNode]) List[str]

Stores vectors in the collection

Parameters

nodes (List[BaseNode]) – List of nodes with embeddings

Returns

Stored node IDs (List[str])

async adelete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id. NOTE: this is not implemented for all vector stores. If not implemented, it will just call delete synchronously.

async aquery(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Asynchronously query vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call query synchronously.

async async_add(nodes: List[BaseNode]) List[str]

Asynchronously add nodes with embedding to vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call add synchronously.

property client: Any

Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Deletes nodes stored in the collection by their ref_doc_id

Parameters

ref_doc_id (str) – The ref_doc_id of the document whose nodes are to be deleted

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Gets nodes relevant to a query

Parameters
Returns

query results (llama_index.vector_stores.types.VectorStoreQueryResult)

classmethod with_new_collection(dimensions: Optional[int] = None, **rockset_vector_store_args: Any) RocksetVectorStore

Creates a new collection and returns its RocksetVectorStore.

Parameters
  • dimensions (Optional[int]) – The length of the vectors to enforce in the collection’s ingest transformation. By default, the collection will do no vector enforcement.

  • collection (str) – The name of the collection to be created

  • client (Optional[Any]) – Rockset client object

  • workspace (str) – The workspace containing the colleciton to be created (default: β€œcommons”)

  • text_key (str) – The key to the text of nodes (default: llama_index.vector_stores.utils.DEFAULT_TEXT_KEY)

  • embedding_col (str) – The DB column containing embeddings (default: llama_index.vector_stores.utils.DEFAULT_EMBEDDING_KEY))

  • metadata_col (str) – The DB column containing node metadata (default: β€œmetadata”)

  • api_server (Optional[str]) – The Rockset API server to use

  • api_key (Optional[str]) – The Rockset API key to use

  • distance_func (RocksetVectorStore.DistanceFunc) – The metric to measure vector relationship (default: RocksetVectorStore.DistanceFunc.COSINE_SIM)

class llama_index.vector_stores.SimpleVectorStore(data: Optional[SimpleVectorStoreData] = None, fs: Optional[AbstractFileSystem] = None, **kwargs: Any)

Simple Vector Store.

In this vector store, embeddings are stored within a simple, in-memory dictionary.

Parameters

simple_vector_store_data_dict (Optional[dict]) – data dict containing the embeddings and doc_ids. See SimpleVectorStoreData for more details.

add(nodes: List[BaseNode]) List[str]

Add nodes to index.

async adelete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id. NOTE: this is not implemented for all vector stores. If not implemented, it will just call delete synchronously.

async aquery(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Asynchronously query vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call query synchronously.

async async_add(nodes: List[BaseNode]) List[str]

Asynchronously add nodes with embedding to vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call add synchronously.

property client: None

Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

Parameters

ref_doc_id (str) – The doc_id of the document to delete.

classmethod from_persist_dir(persist_dir: str = './storage', fs: Optional[AbstractFileSystem] = None) SimpleVectorStore

Load from persist dir.

classmethod from_persist_path(persist_path: str, fs: Optional[AbstractFileSystem] = None) SimpleVectorStore

Create a SimpleKVStore from a persist directory.

get(text_id: str) List[float]

Get embedding.

persist(persist_path: str = './storage/vector_store.json', fs: Optional[AbstractFileSystem] = None) None

Persist the SimpleVectorStore to a directory.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Get nodes for response.

class llama_index.vector_stores.SupabaseVectorStore(postgres_connection_string: str, collection_name: str, dimension: int = 1536, **kwargs: Any)

Supbabase Vector.

In this vector store, embeddings are stored in Postgres table using pgvector.

During query time, the index uses pgvector/Supabase to query for the top k most similar nodes.

Parameters
  • postgres_connection_string (str) – postgres connection string

  • collection_name (str) – name of the collection to store the embeddings in

add(nodes: List[BaseNode]) List[str]

Add nodes to index.

Args

nodes: List[BaseNode]: list of nodes with embeddings

async adelete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id. NOTE: this is not implemented for all vector stores. If not implemented, it will just call delete synchronously.

async aquery(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Asynchronously query vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call query synchronously.

async async_add(nodes: List[BaseNode]) List[str]

Asynchronously add nodes with embedding to vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call add synchronously.

property client: None

Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete doc.

:param : param ref_doc_id (str): document id

get_by_id(doc_id: str) list

Get row ids by doc id.

Parameters

doc_id (str) – document id

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

Parameters

query (List[float]) – query embedding

class llama_index.vector_stores.TairVectorStore(tair_url: str, index_name: str, index_type: str = 'HNSW', index_args: Optional[Dict[str, Any]] = None, overwrite: bool = False, **kwargs: Any)
add(nodes: List[BaseNode]) List[str]

Add nodes to the index.

Parameters

nodes (List[BaseNode]) – List of nodes with embeddings

Returns

List of ids of the documents added to the index.

Return type

List[str]

async adelete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id. NOTE: this is not implemented for all vector stores. If not implemented, it will just call delete synchronously.

async aquery(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Asynchronously query vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call query synchronously.

async async_add(nodes: List[BaseNode]) List[str]

Asynchronously add nodes with embedding to vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call add synchronously.

property client: Tair

Return the Tair client instance

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete a document.

Parameters

doc_id (str) – document id

delete_index() None

Delete the index and all documents.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query the index.

Parameters

query (VectorStoreQuery) – query object

Returns

query result

Return type

VectorStoreQueryResult

Raises

ValueError – If query.query_embedding is None.

class llama_index.vector_stores.TimescaleVectorStore(service_url: str, table_name: str, num_dimensions: int = 1536, time_partition_interval: Optional[timedelta] = None)
add(embedding_results: List[BaseNode]) List[str]

Add nodes with embedding to vector store.

async adelete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id. NOTE: this is not implemented for all vector stores. If not implemented, it will just call delete synchronously.

async aquery(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Asynchronously query vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call query synchronously.

async async_add(embedding_results: List[BaseNode]) List[str]

Asynchronously add nodes with embedding to vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call add synchronously.

property client: Any

Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query vector store.

class llama_index.vector_stores.VectorStoreQuery(query_embedding: Optional[List[float]] = None, similarity_top_k: int = 1, doc_ids: Optional[List[str]] = None, node_ids: Optional[List[str]] = None, query_str: Optional[str] = None, output_fields: Optional[List[str]] = None, embedding_field: Optional[str] = None, mode: VectorStoreQueryMode = VectorStoreQueryMode.DEFAULT, alpha: Optional[float] = None, filters: Optional[MetadataFilters] = None, mmr_threshold: Optional[float] = None, sparse_top_k: Optional[int] = None)

Vector store query.

class llama_index.vector_stores.VectorStoreQueryResult(nodes: Optional[Sequence[BaseNode]] = None, similarities: Optional[List[float]] = None, ids: Optional[List[str]] = None)

Vector store query result.

pydantic model llama_index.vector_stores.WeaviateVectorStore

Weaviate vector store.

In this vector store, embeddings and docs are stored within a Weaviate collection.

During query time, the index uses Weaviate to query for the top k most similar nodes.

Parameters
  • weaviate_client (weaviate.Client) – WeaviateClient instance from weaviate-client package

  • index_name (Optional[str]) – name for Weaviate classes

Show JSON schema
{
   "title": "WeaviateVectorStore",
   "description": "Weaviate vector store.\n\nIn this vector store, embeddings and docs are stored within a\nWeaviate collection.\n\nDuring query time, the index uses Weaviate to query for the top\nk most similar nodes.\n\nArgs:\n    weaviate_client (weaviate.Client): WeaviateClient\n        instance from `weaviate-client` package\n    index_name (Optional[str]): name for Weaviate classes",
   "type": "object",
   "properties": {
      "stores_text": {
         "title": "Stores Text",
         "default": true,
         "type": "boolean"
      },
      "is_embedding_query": {
         "title": "Is Embedding Query",
         "default": true,
         "type": "boolean"
      },
      "index_name": {
         "title": "Index Name",
         "type": "string"
      },
      "url": {
         "title": "Url",
         "type": "string"
      },
      "text_key": {
         "title": "Text Key",
         "type": "string"
      },
      "auth_config": {
         "title": "Auth Config",
         "type": "object"
      },
      "client_kwargs": {
         "title": "Client Kwargs",
         "type": "object"
      }
   },
   "required": [
      "index_name",
      "text_key"
   ]
}

Fields
  • auth_config (Dict[str, Any])

  • client_kwargs (Dict[str, Any])

  • index_name (str)

  • is_embedding_query (bool)

  • stores_text (bool)

  • text_key (str)

  • url (Optional[str])

field auth_config: Dict[str, Any] [Optional]
field client_kwargs: Dict[str, Any] [Optional]
field index_name: str [Required]
field is_embedding_query: bool = True
field stores_text: bool = True
field text_key: str [Required]
field url: Optional[str] = None
add(nodes: List[BaseNode]) List[str]

Add nodes to index.

Parameters

nodes – List[BaseNode]: list of nodes with embeddings

async adelete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id. NOTE: this is not implemented for all vector stores. If not implemented, it will just call delete synchronously.

async aquery(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Asynchronously query vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call query synchronously.

async async_add(nodes: List[BaseNode]) List[str]

Asynchronously add nodes to vector store. NOTE: this is not implemented for all vector stores. If not implemented, it will just call add synchronously.

classmethod class_name() str

Get class name.

classmethod construct(_fields_set: Optional[SetStr] = None, **values: Any) Model

Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed. Behaves as if Config.extra = β€˜allow’ was set since it adds all passed values

copy(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, update: Optional[DictStrAny] = None, deep: bool = False) Model

Duplicate a model, optionally choose which fields to include, exclude and change.

Parameters
  • include – fields to include in new model

  • exclude – fields to exclude from new model, as with values this takes precedence over include

  • update – values to change/add in the new model. Note: the data is not validated before creating the new model: you should trust this data

  • deep – set to True to make a deep copy of the model

Returns

new model instance

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

Parameters

ref_doc_id (str) – The doc_id of the document to delete.

dict(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False) DictStrAny

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

classmethod from_dict(data: Dict[str, Any], **kwargs: Any) Self
classmethod from_json(data_str: str, **kwargs: Any)