Customizing Storage

By default, LlamaIndex hides away the complexities and let you query your data in under 5 lines of code:

from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader('data').load_data()
index = GPTVectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("Summarize the documents.")

Under the hood, LlamaIndex also supports a swappable storage layer that allows you to customize where ingested documents (i.e., Node objects), embedding vectors, and index metadata are stored.

Low-Level API

To do this, instead of the high-level API,

index = GPTVectorStoreIndex.from_documents(documents)

we use a lower-level API that gives more granular control:

from llama_index.storage.docstore import SimpleDocumentStore
from llama_index.storage.index_store import SimpleIndexStore
from llama_index.vector_stores import SimpleVectorStore
from llama_index.node_parser import SimpleNodeParser

# create parser and parse document into nodes 
parser = SimpleNodeParser()
nodes = parser.get_nodes_from_documents(documents)

# create storage context
storage_context = StorageContext.from_defaults(
    docstore=SimpleDocumentStore(),
    vector_store=SimpleVectorStore(),
    index_store=SimpleIndexStore(),
)

# create (or load) docstore and add nodes
storage_context.docstore.add_documents(nodes)

# build index
index = GPTVectorStoreIndex(nodes, storage_context=storage_context)

You can customize the underlying storage with a one-line change to instantiate different document stores, index stores, and vector stores. See Document Stores, Vector Stores, Index Stores guides for more details.