CassandraVectorStore#

class llama_index.vector_stores.CassandraVectorStore(table: str, embedding_dimension: int, *, session: Optional[Any] = None, keyspace: Optional[str] = None, ttl_seconds: Optional[int] = None, insertion_batch_size: int = 20)#

Bases: VectorStore

Cassandra Vector Store.

An abstraction of a Cassandra table with vector-similarity-search. Documents, and their embeddings, are stored in a Cassandra table and a vector-capable index is used for searches. The table does not need to exist beforehand: if necessary it will be created behind the scenes.

All Cassandra operations are done through the CassIO library.

Note: in recent versions, only table and embedding_dimension can be passed positionally. Please revise your code if needed. This is to accommodate for a leaner usage, whereby the DB connection is set globally through a cassio.init(…) call: then, the DB details are not to be specified anymore when creating a vector store, unless desired.

Parameters

table (str) – table name to use. If not existing, it will be created.
embedding_dimension (int) – length of the embedding vectors in use.
session (optional, cassandra.cluster.Session) – the Cassandra session to use. Can be omitted, or equivalently set to None, to use the DB connection set globally through cassio.init() beforehand.
keyspace (optional. str) – name of the Cassandra keyspace to work in Can be omitted, or equivalently set to None, to use the DB connection set globally through cassio.init() beforehand.
ttl_seconds (optional, int) – expiration time for inserted entries. Default is no expiration (None).
insertion_batch_size (optional, int) – how many vectors are inserted concurrently, for use by bulk inserts. Defaults to 20.

Attributes Summary

`client`	Return the underlying cassIO vector table object.
`flat_metadata`
`stores_text`

Methods Summary

`add`(nodes, **add_kwargs)	Add nodes to index.
`delete`(ref_doc_id, **delete_kwargs)	Delete nodes using with ref_doc_id.
`query`(query, **kwargs)	Query index for top k most similar nodes.

Attributes Documentation

client#: Return the underlying cassIO vector table object.

flat_metadata: bool = True#

stores_text: bool = True#

Methods Documentation

add(nodes: List[BaseNode], **add_kwargs: Any) → List[str]#

Add nodes to index.

Parameters: nodes – List[BaseNode]: list of node with embeddings

delete(ref_doc_id: str, **delete_kwargs: Any) → None#

Delete nodes using with ref_doc_id.

Parameters: ref_doc_id (str) – The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult#

Query index for top k most similar nodes.

Supported query modes: ‘default’ (most similar vectors) and ‘mmr’.

Parameters

query (VectorStoreQuery) –

the basic query definition. Defines: mode (VectorStoreQueryMode): one of the supported modes query_embedding (List[float]): query embedding to search against similarity_top_k (int): top k most similar nodes mmr_threshold (Optional[float]): this is the 0-to-1 MMR lambda.

If present, takes precedence over the kwargs parameter. Ignored unless for MMR queries.

Args for query.mode == ‘mmr’ (ignored otherwise):

mmr_threshold (Optional[float]): this is the 0-to-1 lambda for MMR.: Note that in principle mmr_threshold could come in the query
mmr_prefetch_factor (Optional[float]): factor applied to top_k: for prefetch pool size. Defaults to 4.0
mmr_prefetch_k (Optional[int]): prefetch pool size. This cannot be: passed together with mmr_prefetch_factor