Forward/Backward Augmentation

Showcase capabilities of leveraging Node relationships on top of PG’s essay

from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.indices.postprocessor.types import (
    PrevNextNodePostprocessor, 
    AutoPrevNextNodePostprocessor
)
from llama_index.node_parser import SimpleNodeParser
from llama_index.storage.docstore import SimpleDocumentStore
/Users/jerryliu/Programming/llama_index/.venv/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Parse Documents into Nodes, add to Docstore

# load documents
from llama_index.storage.storage_context import StorageContext


documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()

# define service context (wrapper container around current classes)
service_context = ServiceContext.from_defaults(chunk_size=512)

# use node parser in service context to parse into nodes
nodes = service_context.node_parser.get_nodes_from_documents(documents)

# add to docstore
docstore = SimpleDocumentStore()
docstore.add_documents(nodes)

storage_context = StorageContext.from_defaults(docstore=docstore)

Build Index

# build index 
index = GPTVectorStoreIndex(nodes, storage_context=storage_context)
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 27212 tokens

Add PrevNext Node Postprocessor

node_postprocessor = PrevNextNodePostprocessor(docstore, num_nodes=4)
query_engine = index.as_query_engine(
    similarity_top_k=1,
    node_postprocessors=[node_postprocessor],
    response_mode="tree_summarize"
)
response = query_engine.query(
    "What did the author do after handing off Y Combinator to Sam Altman?", 
)
INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 2522 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 17 tokens
print(response) 
After handing off Y Combinator to Sam Altman, the author decided to take up painting. He spent most of the rest of 2014 painting and got to be better than he had been before. However, in November he ran out of steam and stopped working on a painting. He then started writing essays again and wrote a bunch of new ones over the next few months. In March 2015, he started working on Lisp again and wrote a new Lisp, called Bel, in Arc. He had to ban himself from writing essays during most of this time in order to finish the project, which took 4 years from March 26, 2015 to October 12, 2019.
# Try querying index without node postprocessor
query_engine = index.as_query_engine(
    similarity_top_k=1,
    response_mode="tree_summarize"
)
response = query_engine.query(
    "What did the author do after handing off Y Combinator to Sam Altman?", 
)
INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 583 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 17 tokens
print(response)
After handing off Y Combinator to Sam Altman, the author focused on helping his mother get out of the nursing home and back to her house. He flew up to Oregon to visit her regularly and used the time to think. He also spent time with his sister to help his mother.
# Try querying index without node postprocessor and higher top-k
query_engine = index.as_query_engine(
    similarity_top_k=3,
    response_mode="tree_summarize"
)
response = query_engine.query(
    "What did the author do after handing off Y Combinator to Sam Altman?", 
)
INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 1547 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 17 tokens
print(response)
After handing off Y Combinator to Sam Altman, the author decided to take up painting as his next activity. He spent most of the rest of 2014 painting and eventually ran out of steam in November. He then stopped working on the painting and began to think about what he should do next.

Add Auto Prev/Next Node Postprocessor

node_postprocessor = AutoPrevNextNodePostprocessor(
    docstore=docstore, 
    num_nodes=3,
    service_context=service_context,
    verbose=True
)
# Infer that we need to search nodes after current one
query_engine = index.as_query_engine(
    similarity_top_k=1,
    node_postprocessors=[node_postprocessor],
    response_mode="tree_summarize"
)
response = query_engine.query(
    "What did the author do after handing off Y Combinator to Sam Altman?", 
)
> Postprocessor Predicted mode: next
INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 1987 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 17 tokens
print(response)
After handing off Y Combinator to Sam Altman, the author decided to take up painting. He spent most of the rest of 2014 painting and got better at it, but eventually ran out of steam and stopped working on it. He then started writing essays again and wrote a few that weren't about startups. In March 2015, he started working on Lisp again.
# Infer that we don't need to search previous or next
response = query_engine.query(
    "What did the author do during his time at Y Combinator?", 
)
> Postprocessor Predicted mode: none
INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 571 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 14 tokens
print(response)
The author did a variety of things during his time at Y Combinator, including hacking, writing essays, investing in startups, creating a batch system for startup funding, building a tight alumni community, and working on a new version of Arc with Robert Morris. He also created Hacker News, an online news aggregator for startup founders.
# Infer that we need to search nodes before current one
response = query_engine.query(
    "What did the author do before handing off Y Combinator to Sam Altman?", 
)
> Postprocessor Predicted mode: previous
INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 2057 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 17 tokens
print(response)
Before handing off Y Combinator to Sam Altman, the author wrote essays, worked on Y Combinator, wrote Hacker News in Arc, wrote all of Y Combinator's internal software in Arc, and worked hard at the parts of the job he didn't like. He also flew up to Oregon to visit his mother regularly and had time to think on those flights.
response = query_engine.query(
    "What did the author do before handing off Y Combinator to Sam Altman?", 
)
INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 575 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 17 tokens
print(response)
The author spent the rest of 2013 gradually leaving the running of Y Combinator to Sam Altman, partly so he could learn the job and partly because the author was focused on visiting his mother in Oregon and helping her get out of a nursing home.