CassandraVectorStore#

class llama_index.vector_stores.CassandraVectorStore(table: str, embedding_dimension: int, *, session: Optional[Any] = None, keyspace: Optional[str] = None, ttl_seconds: Optional[int] = None, insertion_batch_size: int = 20)#

Bases: VectorStore

Cassandra Vector Store.

An abstraction of a Cassandra table with vector-similarity-search. Documents, and their embeddings, are stored in a Cassandra table and a vector-capable index is used for searches. The table does not need to exist beforehand: if necessary it will be created behind the scenes.

All Cassandra operations are done through the CassIO library.

Note: in recent versions, only table and embedding_dimension can be passed positionally. Please revise your code if needed. This is to accommodate for a leaner usage, whereby the DB connection is set globally through a cassio.init(…) call: then, the DB details are not to be specified anymore when creating a vector store, unless desired.

Parameters
  • table (str) – table name to use. If not existing, it will be created.

  • embedding_dimension (int) – length of the embedding vectors in use.

  • session (optional, cassandra.cluster.Session) – the Cassandra session to use. Can be omitted, or equivalently set to None, to use the DB connection set globally through cassio.init() beforehand.

  • keyspace (optional. str) – name of the Cassandra keyspace to work in Can be omitted, or equivalently set to None, to use the DB connection set globally through cassio.init() beforehand.

  • ttl_seconds (optional, int) – expiration time for inserted entries. Default is no expiration (None).

  • insertion_batch_size (optional, int) – how many vectors are inserted concurrently, for use by bulk inserts. Defaults to 20.

Attributes Summary

client

Return the underlying cassIO vector table object.

flat_metadata

stores_text

Methods Summary

add(nodes, **add_kwargs)

Add nodes to index.

delete(ref_doc_id, **delete_kwargs)

Delete nodes using with ref_doc_id.

query(query, **kwargs)

Query index for top k most similar nodes.

Attributes Documentation

client#

Return the underlying cassIO vector table object.

flat_metadata: bool = True#
stores_text: bool = True#

Methods Documentation

add(nodes: List[BaseNode], **add_kwargs: Any) List[str]#

Add nodes to index.

Parameters

nodes – List[BaseNode]: list of node with embeddings

delete(ref_doc_id: str, **delete_kwargs: Any) None#

Delete nodes using with ref_doc_id.

Parameters

ref_doc_id (str) – The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult#

Query index for top k most similar nodes.

Supported query modes: ‘default’ (most similar vectors) and ‘mmr’.

Parameters

query (VectorStoreQuery) –

the basic query definition. Defines: mode (VectorStoreQueryMode): one of the supported modes query_embedding (List[float]): query embedding to search against similarity_top_k (int): top k most similar nodes mmr_threshold (Optional[float]): this is the 0-to-1 MMR lambda.

If present, takes precedence over the kwargs parameter. Ignored unless for MMR queries.

Args for query.mode == ‘mmr’ (ignored otherwise):
mmr_threshold (Optional[float]): this is the 0-to-1 lambda for MMR.

Note that in principle mmr_threshold could come in the query

mmr_prefetch_factor (Optional[float]): factor applied to top_k

for prefetch pool size. Defaults to 4.0

mmr_prefetch_k (Optional[int]): prefetch pool size. This cannot be

passed together with mmr_prefetch_factor