Tree Index

Building the Tree Index

Tree-structured Index Data Structures.

llama_index.indices.tree.GPTTreeIndex

alias of TreeIndex

class llama_index.indices.tree.TreeAllLeafRetriever(index: TreeIndex)

GPT all leaf retriever.

This class builds a query-specific tree from leaf nodes to return a response. Using this query mode means that the tree index doesn’t need to be built when initialized, since we rebuild the tree for each query.

Parameters

text_qa_template (Optional[QuestionAnswerPrompt]) – Question-Answer Prompt (see Prompt Templates).

retrieve(str_or_query_bundle: Union[str, QueryBundle]) List[NodeWithScore]

Retrieve nodes given query.

Parameters

str_or_query_bundle (QueryType) – Either a query string or a QueryBundle object.

class llama_index.indices.tree.TreeIndex(nodes: Optional[Sequence[Node]] = None, index_struct: Optional[IndexGraph] = None, service_context: Optional[ServiceContext] = None, summary_template: Optional[Prompt] = None, insert_prompt: Optional[Prompt] = None, num_children: int = 10, build_tree: bool = True, use_async: bool = False, **kwargs: Any)

Tree Index.

The tree index is a tree-structured index, where each node is a summary of the children nodes. During index construction, the tree is constructed in a bottoms-up fashion until we end up with a set of root_nodes.

There are a few different options during query time (see Querying an Index). The main option is to traverse down the tree from the root nodes. A secondary answer is to directly synthesize the answer from the root nodes.

Parameters
  • summary_template (Optional[SummaryPrompt]) – A Summarization Prompt (see Prompt Templates).

  • insert_prompt (Optional[TreeInsertPrompt]) – An Tree Insertion Prompt (see Prompt Templates).

  • num_children (int) – The number of children each node should have.

  • build_tree (bool) – Whether to build the tree during index construction.

delete(doc_id: str, **delete_kwargs: Any) None

Delete a document from the index. All nodes in the index related to the index will be deleted.

Parameters

doc_id (str) – A doc_id of the ingested document

delete_nodes(doc_ids: List[str], delete_from_docstore: bool = False, **delete_kwargs: Any) None

Delete a list of nodes from the index.

Parameters

doc_ids (List[str]) – A list of doc_ids from the nodes to delete

delete_ref_doc(ref_doc_id: str, delete_from_docstore: bool = False, **delete_kwargs: Any) None

Delete a document and it’s nodes by using ref_doc_id.

property docstore: BaseDocumentStore

Get the docstore corresponding to the index.

classmethod from_documents(documents: Sequence[Document], storage_context: Optional[StorageContext] = None, service_context: Optional[ServiceContext] = None, **kwargs: Any) IndexType

Create index from documents.

Parameters

documents (Optional[Sequence[BaseDocument]]) – List of documents to build the index from.

property index_id: str

Get the index struct.

property index_struct: IS

Get the index struct.

index_struct_cls

alias of IndexGraph

insert(document: Document, **insert_kwargs: Any) None

Insert a document.

property ref_doc_info: Dict[str, RefDocInfo]

Retrieve a dict mapping of ingested documents and their nodes+metadata.

refresh(documents: Sequence[Document], **update_kwargs: Any) List[bool]

Refresh an index with documents that have changed.

This allows users to save LLM and Embedding model calls, while only updating documents that have any changes in text or extra_info. It will also insert any documents that previously were not stored.

refresh_ref_docs(documents: Sequence[Document], **update_kwargs: Any) List[bool]

Refresh an index with documents that have changed.

This allows users to save LLM and Embedding model calls, while only updating documents that have any changes in text or extra_info. It will also insert any documents that previously were not stored.

set_index_id(index_id: str) None

Set the index id.

NOTE: if you decide to set the index_id on the index_struct manually, you will need to explicitly call add_index_struct on the index_store to update the index store.

Parameters

index_id (str) – Index id to set.

update(document: Document, **update_kwargs: Any) None

Update a document and it’s corresponding nodes.

This is equivalent to deleting the document and then inserting it again.

Parameters
  • document (Union[BaseDocument, BaseIndex]) – document to update

  • insert_kwargs (Dict) – kwargs to pass to insert

  • delete_kwargs (Dict) – kwargs to pass to delete

update_ref_doc(document: Document, **update_kwargs: Any) None

Update a document and it’s corresponding nodes.

This is equivalent to deleting the document and then inserting it again.

Parameters
  • document (Union[BaseDocument, BaseIndex]) – document to update

  • insert_kwargs (Dict) – kwargs to pass to insert

  • delete_kwargs (Dict) – kwargs to pass to delete

class llama_index.indices.tree.TreeRootRetriever(index: TreeIndex)

Tree root retriever.

This class directly retrieves the answer from the root nodes.

Unlike GPTTreeIndexLeafQuery, this class assumes the graph already stores the answer (because it was constructed with a query_str), so it does not attempt to parse information down the graph in order to synthesize an answer.

retrieve(str_or_query_bundle: Union[str, QueryBundle]) List[NodeWithScore]

Retrieve nodes given query.

Parameters

str_or_query_bundle (QueryType) – Either a query string or a QueryBundle object.

class llama_index.indices.tree.TreeSelectLeafEmbeddingRetriever(index: TreeIndex, query_template: Optional[Prompt] = None, text_qa_template: Optional[Prompt] = None, refine_template: Optional[Prompt] = None, query_template_multiple: Optional[Prompt] = None, child_branch_factor: int = 1, verbose: bool = False, **kwargs: Any)

Tree select leaf embedding retriever.

This class traverses the index graph using the embedding similarity between the query and the node text.

Parameters
  • query_template (Optional[TreeSelectPrompt]) – Tree Select Query Prompt (see Prompt Templates).

  • query_template_multiple (Optional[TreeSelectMultiplePrompt]) – Tree Select Query Prompt (Multiple) (see Prompt Templates).

  • text_qa_template (Optional[QuestionAnswerPrompt]) – Question-Answer Prompt (see Prompt Templates).

  • refine_template (Optional[RefinePrompt]) – Refinement Prompt (see Prompt Templates).

  • child_branch_factor (int) – Number of child nodes to consider at each level. If child_branch_factor is 1, then the query will only choose one child node to traverse for any given parent node. If child_branch_factor is 2, then the query will choose two child nodes.

  • embed_model (Optional[BaseEmbedding]) – Embedding model to use for embedding similarity.

retrieve(str_or_query_bundle: Union[str, QueryBundle]) List[NodeWithScore]

Retrieve nodes given query.

Parameters

str_or_query_bundle (QueryType) – Either a query string or a QueryBundle object.

class llama_index.indices.tree.TreeSelectLeafRetriever(index: TreeIndex, query_template: Optional[Prompt] = None, text_qa_template: Optional[Prompt] = None, refine_template: Optional[Prompt] = None, query_template_multiple: Optional[Prompt] = None, child_branch_factor: int = 1, verbose: bool = False, **kwargs: Any)

Tree select leaf retriever.

This class traverses the index graph and searches for a leaf node that can best answer the query.

Parameters
  • query_template (Optional[TreeSelectPrompt]) – Tree Select Query Prompt (see Prompt Templates).

  • query_template_multiple (Optional[TreeSelectMultiplePrompt]) – Tree Select Query Prompt (Multiple) (see Prompt Templates).

  • child_branch_factor (int) – Number of child nodes to consider at each level. If child_branch_factor is 1, then the query will only choose one child node to traverse for any given parent node. If child_branch_factor is 2, then the query will choose two child nodes.

retrieve(str_or_query_bundle: Union[str, QueryBundle]) List[NodeWithScore]

Retrieve nodes given query.

Parameters

str_or_query_bundle (QueryType) – Either a query string or a QueryBundle object.