DeepLake Vector Storeο
import os
import textwrap
from llama_index import VectorStoreIndex, SimpleDirectoryReader, Document
from llama_index.vector_stores import DeepLakeVectorStore
os.environ["OPENAI_API_KEY"] = "sk-********************************"
os.environ["ACTIVELOOP_TOKEN"] = "********************************"
/Users/adilkhansarsen/Documents/work/LlamaIndex/llama_index/GPTIndex/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
!pip install deeplake
if you donβt export token in your environment alternativalay you can use deeplake CLI to loging to deeplake
# !activeloop login -t <TOKEN>
# load documents
documents = SimpleDirectoryReader("../paul_graham_essay/data").load_data()
print("Document ID:", documents[0].doc_id, "Document Hash:", documents[0].doc_hash)
Document ID: 14935662-4884-4c57-ac2e-fa62da019665 Document Hash: 77ae91ab542f3abb308c4d7c77c9bc4c9ad0ccd63144802b7cbe7e1bb3a4094e
# dataset_path = "hub://adilkhan/paul_graham_essay" # if we comment this out and don't pass the path then GPTDeepLakeIndex will create dataset in memory
from llama_index.storage.storage_context import StorageContext
dataset_path = "paul_graham_essay"
# Create an index over the documnts
vector_store = DeepLakeVectorStore(dataset_path=dataset_path, overwrite=True)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
Your Deep Lake dataset has been successfully created!
The dataset is private so make sure you are logged in!
|
This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/adilkhan/paul_graham_essay
hub://adilkhan/paul_graham_essay loaded successfully.
Evaluating ingest: 100%|ββββββββββ| 1/1 [00:21<00:00
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 17617 tokens
Dataset(path='hub://adilkhan/paul_graham_essay', tensors=['embedding', 'ids', 'metadata', 'text'])
tensor htype shape dtype compression
------- ------- ------- ------- -------
embedding generic (6, 1536) None None
ids text (6, 1) str None
metadata json (6, 1) str None
text text (6, 1) str None
if we decide to not pass the path then GPTDeepLakeIndex will create dataset locally called llama_index
# Create an index over the documnts
# vector_store = DeepLakeVectorStore(overwrite=True)
# storage_context = StorageContext.from_defaults(vector_store=vector_store)
# index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
llama_index loaded successfully.
Evaluating ingest: 100%|ββββββββββ| 1/1 [00:04<00:00
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 17617 tokens
Dataset(path='llama_index', tensors=['embedding', 'ids', 'metadata', 'text'])
tensor htype shape dtype compression
------- ------- ------- ------- -------
embedding generic (6, 1536) None None
ids text (6, 1) str None
metadata json (6, 1) str None
text text (6, 1) str None
query_engine = index.as_query_engine()
response = query_engine.query(
"What did the author learn?",
)
INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 4028 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 6 tokens
print(textwrap.fill(str(response), 100))
The author learned that working on things that are not prestigious can be a good thing, as it can
lead to discovering something real and avoiding the wrong track. The author also learned that
ignorance can be beneficial, as it can lead to discovering something new and unexpected. The author
also learned the importance of working hard, even at the parts of the job they don't like, in order
to set an example for others. The author also learned the value of unsolicited advice, as it can be
beneficial in unexpected ways, such as when Robert Morris suggested that the author should make sure
Y Combinator wasn't the last cool thing they did.
response = query_engine.query("What was a hard moment for the author?")
INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 4072 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 9 tokens
print(textwrap.fill(str(response), 100))
A hard moment for the author was when he was dealing with urgent problems during YC and about 60%
of them had to do with Hacker News, a news aggregator he had created. He was overwhelmed by the
amount of work he had to do to keep Hacker News running, and it was taking away from his ability to
focus on other projects. He was also haunted by the idea that his own work ethic set the upper bound
for how hard everyone else worked, so he felt he had to work very hard. He was also dealing with
disputes between cofounders, figuring out when people were lying to them, and fighting with people
who maltreated the startups. On top of this, he was given unsolicited advice from Robert Morris to
make sure Y Combinator wasn't the last cool thing he did, which made him consider quitting.
query_engine = index.as_query_engine()
response = query_engine.query("What was a hard moment for the author?")
print(textwrap.fill(str(response), 100))
INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 4072 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 9 tokens
A hard moment for the author was when he was dealing with urgent problems during YC and about 60%
of them had to do with Hacker News, a news aggregator he had created. He was overwhelmed by the
amount of work he had to do to keep Hacker News running, and it was taking away from his ability to
focus on other projects. He was also haunted by the idea that his own work ethic set the upper bound
for how hard everyone else worked, so he felt he had to work very hard. He was also dealing with
disputes between cofounders, figuring out when people were lying to them, and fighting with people
who maltreated the startups. On top of this, he was given unsolicited advice from Robert Morris to
make sure Y Combinator wasn't the last cool thing he did, which made him consider quitting.
Deleting items from the databaseο
import deeplake as dp
ds = dp.load("paul_graham_essay")
idx = ds.ids[0].numpy().tolist()
\
This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/adilkhan/paul_graham_essay
\
hub://adilkhan/paul_graham_essay loaded successfully.
index.delete(idx[0])
100%|ββββββββββ| 6/6 [00:00<00:00, 4501.13it/s]
Dataset(path='hub://adilkhan/paul_graham_essay', tensors=['embedding', 'ids', 'metadata', 'text'])
tensor htype shape dtype compression
------- ------- ------- ------- -------
embedding generic (5, 1536) None None
ids text (5, 1) str None
metadata json (5, 1) str None
text text (5, 1) str None