By default, when a query is executed on an index or a composed graph, LlamaIndex performs the following steps:
Retrieval step: Retrieve a set of nodes from the index given the query.
Synthesis step: Synthesize a response over the set of nodes.
Beyond standard retrieval and synthesis, LlamaIndex also provides a collection of modules for advanced second-stage processing (i.e. after retrieval and before synthesis).
After retrieving the initial candidate nodes, these modules further improve the quality and diversity of the nodes used for synthesis by e.g. filtering, re-ranking, or augmenting. Examples include keyword filters, LLM-based re-ranking, and temporal-reasoning based augmentation.
We first provide the high-level API interface, and provide some example modules, and finally discuss usage.
We are also very open to contributions! Take a look at our contribution guide if you are interested in contributing a Postprocessor.
The base class is
BaseNodePostprocessor, and the API interface is very simple:
class BaseNodePostprocessor: """Node postprocessor.""" @abstractmethod def postprocess_nodes( self, nodes: List[NodeWithScore], query_bundle: Optional[QueryBundle] ) -> List[NodeWithScore]: """Postprocess nodes."""
It takes in a list of Node objects, and outputs another list of Node objects.
The full API reference can be found here.
The postprocessor can be used as part of a
ResponseSynthesizer in a
QueryEngine, or on its own.
from llama_index.indices.postprocessor import ( FixedRecencyPostprocessor, ) node_postprocessor = FixedRecencyPostprocessor(service_context=service_context) query_engine = index.as_query_engine( similarity_top_k=3, node_postprocessors=[node_postprocessor] ) response = query_engine.query( "How much did the author raise in seed funding from Idelle's husband (Julian) for Viaweb?", )
Using as Independent Module (Lower-Level Usage)
The module can also be used on its own as part of a broader flow. For instance, here’s an example where you choose to manually postprocess an initial set of source nodes.
from llama_index.indices.postprocessor import ( FixedRecencyPostprocessor, ) # get initial response from vector index query_engine = index.as_query_engine( similarity_top_k=3, response_mode="no_text" ) init_response = query_engine.query(query_str) resp_nodes = [n.node for n in init_response.source_nodes] # use node postprocessor to filter nodes node_postprocessor = FixedRecencyPostprocessor(service_context=service_context) new_nodes = node_postprocessor.postprocess_nodes(resp_nodes) # use list index to synthesize answers list_index = ListIndex(new_nodes) query_engine = list_index.as_query_engine( node_postprocessors=[node_postprocessor] ) response = query_engine.query(query_str)
These postprocessors are simple modules that are already included by default.
A simple postprocessor module where you are able to specify
This will filter out nodes that don’t have required keywords, or contain excluded keywords.
A module where you are able to specify a
These postprocessors are able to exploit temporal relationships between nodes (e.g. prev/next relationships) in order to retrieve additional context, in the event that the existing context may not directly answer the question. They augment the set of retrieved nodes with context either in the future or the past (or both).
The most basic version is
PrevNextNodePostprocessor, which takes a fixed
num_nodes as well as
mode specifying “previous”, “next”, or “both”.
We also have
AutoPrevNextNodePostprocessor, which is able to infer
These postprocessors are able to ensure that only the most recent data is used as context, and that out of date context information is filtered out.
Imagine that you have three versions of a document, with slight changes between versions. For instance, this document may be describing patient history. If you ask a question over this data, you would want to make sure that you’re referencing the latest document, and that out of date information is not passed in.
We support recency filtering through the following modules.
FixedRecencyPostProcessor: sorts retrieved nodes by date in reverse order, and takes a fixed top-k set of nodes.
EmbeddingRecencyPostprocessor: sorts retrieved nodes by date in reverse order, and then
looks at subsequent nodes and filters out nodes that have high embedding
similarity with the current node. This allows us to maintain recent Nodes
that have “distinct” context, but filter out overlapping Nodes that
are outdated and overlap with more recent context.
TimeWeightedPostprocessor: adds time-weighting to retrieved nodes, using the formula
(1-time_decay) ** hours_passed.
The recency score is added to any score that the node already contains.