Open In Colab

Colbert Rerank#

If you’re opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

Colbert: ColBERT is a fast and accurate retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.

This example shows how we use Colbert-V2 model as a reranker.

!pip install llama-index
!pip install llama-index-core
!pip install --quiet transformers torch
!pip install llama-index-embeddings-openai
!pip install llama-index-llms-openai
!pip install llama-index-postprocessor-colbert-rerank
from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
)

Download Data

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
import os

os.environ["OPENAI_API_KEY"] = "sk-"
# load documents
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

# build index
index = VectorStoreIndex.from_documents(documents=documents)

Retrieve top 10 most relevant nodes, then filter with Colbert Rerank#

from llama_index.postprocessor.colbert_rerank import ColbertRerank

colbert_reranker = ColbertRerank(
    top_n=5,
    model="colbert-ir/colbertv2.0",
    tokenizer="colbert-ir/colbertv2.0",
    keep_retrieval_score=True,
)

query_engine = index.as_query_engine(
    similarity_top_k=10,
    node_postprocessors=[colbert_reranker],
)
response = query_engine.query(
    "What did Sam Altman do in this essay?",
)
for node in response.source_nodes:
    print(node.id_)
    print(node.node.get_content()[:120])
    print("reranking score: ", node.score)
    print("retrieval score: ", node.node.metadata["retrieval_score"])
    print("**********")
50157136-f221-4468-83e1-44e289f44cd5
When I was dealing with some urgent problem during YC, there was about a 60% chance it had to do with HN, and a 40% chan
reranking score:  0.6470144987106323
retrieval score:  0.8309200279065135
**********
87f0d691-b631-4b21-8123-8f71d383046b
Now that I could write essays again, I wrote a bunch about topics I'd had stacked up. I kept writing essays through 2020
reranking score:  0.6377773284912109
retrieval score:  0.8053000783543145
**********
10234ad9-46b1-4be5-8034-92392ac242ed
It's not that unprestigious types of work are good per se. But when you find yourself drawn to some kind of work despite
reranking score:  0.6301894187927246
retrieval score:  0.7975032272825491
**********
bc269bc4-49c7-4804-8575-cd6db47d70b8
It was as weird as it sounds. I resumed all my old patterns, except now there were doors where there hadn't been. Now wh
reranking score:  0.6282549500465393
retrieval score:  0.8026253284729862
**********
ebd7e351-64fc-4627-8ddd-2681d1ac33f8
As Jessica and I were walking home from dinner on March 11, at the corner of Garden and Walker streets, these three thre
reranking score:  0.6245909929275513
retrieval score:  0.7965812262372882
**********
print(response)
Sam Altman became the second president of Y Combinator after Paul Graham decided to step back from running the organization.
response = query_engine.query(
    "Which schools did Paul attend?",
)
for node in response.source_nodes:
    print(node.id_)
    print(node.node.get_content()[:120])
    print("reranking score: ", node.score)
    print("retrieval score: ", node.node.metadata["retrieval_score"])
    print("**********")
6942863e-dfc5-4a99-b642-967b99b71343
I didn't want to drop out of grad school, but how else was I going to get out? I remember when my friend Robert Morris g
reranking score:  0.6333063840866089
retrieval score:  0.7964996889742813
**********
477c5de0-8e05-494e-95cc-e221881fb5c1
What I Worked On

February 2021

Before college the two main things I worked on, outside of school, were writing and pro
reranking score:  0.5930159091949463
retrieval score:  0.7771872700578062
**********
0448df5c-7950-483d-bc63-15e9110da3bc
[15] We got 225 applications for the Summer Founders Program, and we were surprised to find that a lot of them were from
reranking score:  0.5160146951675415
retrieval score:  0.7782554326959897
**********
83af8efd-e992-4fd3-ada4-3c4c6f9971a1
Much to my surprise, the time I spent working on this stuff was not wasted after all. After we started Y Combinator, I w
reranking score:  0.5005874633789062
retrieval score:  0.7800375923908894
**********
bc269bc4-49c7-4804-8575-cd6db47d70b8
It was as weird as it sounds. I resumed all my old patterns, except now there were doors where there hadn't been. Now wh
reranking score:  0.4977223873138428
retrieval score:  0.782688582042514
**********
print(response)
Paul attended Cornell University for his graduate studies and later applied to RISD (Rhode Island School of Design) in the US.