LlamaIndex offers composability of your indices, meaning that you can build indices on top of other indices. This allows you to more effectively index your entire document tree in order to feed custom knowledge to GPT.

Composability allows you to to define lower-level indices for each document, and higher-order indices over a collection of documents. To see how this works, imagine defining 1) a tree index for the text within each document, and 2) a list index over each tree index (one document) within your collection.

Defining Subindices

To see how this works, imagine you have 3 documents: doc1, doc2, and doc3.

doc1 = SimpleDirectoryReader('data1').load_data()
doc2 = SimpleDirectoryReader('data2').load_data()
doc3 = SimpleDirectoryReader('data3').load_data()

Now let’s define a tree index for each document. In Python, we have:

index1 = GPTTreeIndex.from_documents(doc1)
index2 = GPTTreeIndex.from_documents(doc2)
index3 = GPTTreeIndex.from_documents(doc3)

Defining Summary Text

You then need to explicitly define summary text for each subindex. This allows
the subindices to be used as Documents for higher-level indices.

index1_summary = "<summary1>"
index2_summary = "<summary2>"
index3_summary = "<summary3>"

You may choose to manually specify the summary text, or use LlamaIndex itself to generate a summary, for instance with the following:

summary = index1.query(
    "What is a summary of this document?", retriever_mode="all_leaf"
index1_summary = str(summary)

If specified, this summary text for each subindex can be used to refine the answer during query-time.

Creating a Graph with a Top-Level Index

We can then create a graph with a list index on top of these 3 tree indices: We can query, save, and load the graph to/from disk as any other index.

from llama_index.indices.composability import ComposableGraph

graph = ComposableGraph.from_indices(
    [index1, index2, index3],
    index_summaries=[index1_summary, index2_summary, index3_summary],

Querying the Graph

During a query, we would start with the top-level list index. Each node in the list corresponds to an underlying tree index. The query will be executed recursively, starting from the root index, then the sub-indices. The default query engine for each index is called under the hood (i.e. index.as_query_engine()), unless otherwise configured by passing custom_query_engines to the ComposableGraphQueryEngine. Below we show an example that configure the tree index retrievers to use child_branch_factor=2 (instead of the default child_branch_factor=1).

More detail on how to configure ComposableGraphQueryEngine can be found here.

# set custom retrievers. An example is provided below
custom_query_engines = {
    index.index_id: index.as_query_engine(
    for index in [index1, index2, index3]
query_engine = graph.as_query_engine(
response = query_engine.query("Where did the author grow up?")

Note that specifying custom retriever for index by id might require you to inspect e.g., index1.index_struct.index_id. Alternatively, you can explicitly set it as follows:

index1.index_struct.index_id = "<index_id_1>"
index2.index_struct.index_id = "<index_id_2>"
index3.index_struct.index_id = "<index_id_3>"

So within a node, instead of fetching the text, we would recursively query the stored tree index to retrieve our answer.

NOTE: You can stack indices as many times as you want, depending on the hierarchies of your knowledge base!

We can take a look at a code example below as well. We first build two tree indices, one over the Wikipedia NYC page, and the other over Paul Graham’s essay. We then define a keyword extractor index over the two tree indices.

Here is an example notebook.