Using NVIDIA's LLM API Catalog Connector¶
This notebook will guide you through understanding the basic usage of the NVIDIA
connector.
With this connector, you'll be able to connect to and generate from compatible models available at the NVIDIA API Catalog, such as:
- Google's gemma-7b
- Mistal AI's mistral-7b-instruct-v0.2
- And more!
We'll begin by ensuring llama-index
and associated packages are installed.
NOTE: Only models that have a base URL of
https://integrate.api.nvidia.com/v1
are compatible with this connector at this time.
!pip install llama-index-embeddings-openai llama-index-readers-file
Collecting llama-index-embeddings-openai Using cached llama_index_embeddings_openai-0.1.7-py3-none-any.whl.metadata (603 bytes) Requirement already satisfied: llama-index-core<0.11.0,>=0.10.1 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from llama-index-embeddings-openai) (0.10.30) Requirement already satisfied: PyYAML>=6.0.1 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (6.0.1) Requirement already satisfied: SQLAlchemy>=1.4.49 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from SQLAlchemy[asyncio]>=1.4.49->llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (2.0.29) Requirement already satisfied: aiohttp<4.0.0,>=3.8.6 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (3.9.5) Requirement already satisfied: dataclasses-json in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (0.6.4) Requirement already satisfied: deprecated>=1.2.9.3 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (1.2.14) Requirement already satisfied: dirtyjson<2.0.0,>=1.0.8 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (1.0.8) Requirement already satisfied: fsspec>=2023.5.0 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (2024.3.1) Requirement already satisfied: httpx in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (0.27.0) Requirement already satisfied: llamaindex-py-client<0.2.0,>=0.1.18 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (0.1.18) Requirement already satisfied: nest-asyncio<2.0.0,>=1.5.8 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (1.6.0) Requirement already satisfied: networkx>=3.0 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (3.1) Requirement already satisfied: nltk<4.0.0,>=3.8.1 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (3.8.1) Requirement already satisfied: numpy in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (1.24.4) Requirement already satisfied: openai>=1.1.0 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (1.22.0) Requirement already satisfied: pandas in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (2.0.3) Requirement already satisfied: pillow>=9.0.0 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (10.3.0) Requirement already satisfied: requests>=2.31.0 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (2.31.0) Requirement already satisfied: tenacity<9.0.0,>=8.2.0 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (8.2.3) Requirement already satisfied: tiktoken>=0.3.3 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (0.6.0) Requirement already satisfied: tqdm<5.0.0,>=4.66.1 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (4.66.2) Requirement already satisfied: typing-extensions>=4.5.0 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (4.11.0) Requirement already satisfied: typing-inspect>=0.8.0 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (0.9.0) Requirement already satisfied: wrapt in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (1.16.0) Requirement already satisfied: aiosignal>=1.1.2 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (1.3.1) Requirement already satisfied: attrs>=17.3.0 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (23.2.0) Requirement already satisfied: frozenlist>=1.1.1 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (1.4.1) Requirement already satisfied: multidict<7.0,>=4.5 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (6.0.5) Requirement already satisfied: yarl<2.0,>=1.0 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.6->llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (1.9.4) Requirement already satisfied: pydantic>=1.10 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from llamaindex-py-client<0.2.0,>=0.1.18->llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (2.7.0) Requirement already satisfied: anyio in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from httpx->llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (4.3.0) Requirement already satisfied: certifi in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from httpx->llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (2024.2.2) Requirement already satisfied: httpcore==1.* in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from httpx->llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (1.0.5) Requirement already satisfied: idna in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from httpx->llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (3.7) Requirement already satisfied: sniffio in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from httpx->llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (1.3.1) Requirement already satisfied: h11<0.15,>=0.13 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from httpcore==1.*->httpx->llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (0.14.0) Requirement already satisfied: click in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from nltk<4.0.0,>=3.8.1->llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (8.1.7) Requirement already satisfied: joblib in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from nltk<4.0.0,>=3.8.1->llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (1.4.0) Requirement already satisfied: regex>=2021.8.3 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from nltk<4.0.0,>=3.8.1->llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (2024.4.16) Requirement already satisfied: distro<2,>=1.7.0 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from openai>=1.1.0->llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (1.9.0) Requirement already satisfied: charset-normalizer<4,>=2 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from requests>=2.31.0->llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (3.3.2) Requirement already satisfied: urllib3<3,>=1.21.1 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from requests>=2.31.0->llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (2.2.1) Requirement already satisfied: greenlet!=0.4.17 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from SQLAlchemy>=1.4.49->SQLAlchemy[asyncio]>=1.4.49->llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (3.0.3) Requirement already satisfied: mypy-extensions>=0.3.0 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from typing-inspect>=0.8.0->llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (1.0.0) Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from dataclasses-json->llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (3.21.1) Requirement already satisfied: python-dateutil>=2.8.2 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from pandas->llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (2.9.0.post0) Requirement already satisfied: pytz>=2020.1 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from pandas->llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (2024.1) Requirement already satisfied: tzdata>=2022.1 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from pandas->llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (2024.1) Requirement already satisfied: packaging>=17.0 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from marshmallow<4.0.0,>=3.18.0->dataclasses-json->llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (24.0) Requirement already satisfied: annotated-types>=0.4.0 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from pydantic>=1.10->llamaindex-py-client<0.2.0,>=0.1.18->llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (0.6.0) Requirement already satisfied: pydantic-core==2.18.1 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from pydantic>=1.10->llamaindex-py-client<0.2.0,>=0.1.18->llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (2.18.1) Requirement already satisfied: six>=1.5 in /home/chris/anaconda3/envs/nvidia-llama-index-api/lib/python3.11/site-packages (from python-dateutil>=2.8.2->pandas->llama-index-core<0.11.0,>=0.10.1->llama-index-embeddings-openai) (1.16.0) Using cached llama_index_embeddings_openai-0.1.7-py3-none-any.whl (6.0 kB) Installing collected packages: llama-index-embeddings-openai Successfully installed llama-index-embeddings-openai-0.1.7
API Keys and Boilerplate¶
During the next cell we'll run some boilerplate to allow the examples to be executed smoothly in a notebook environment.
We'll also provide our API keys.
NOTE: You can create your NVIDIA API key using the
Get API Key
button in the code example window.
# llama-parse is async-first, running the async code in a notebook requires the use of nest_asyncio
import nest_asyncio
nest_asyncio.apply()
import os
# Using OpenAI API for embeddings
os.environ["OPENAI_API_KEY"] = "sk-"
# Using NVIDIA API Playground API Key for LLM
os.environ["NVIDIA_API_KEY"] = "nvapi-"
from llama_index.llms.nvidia import NVIDIA
from llama_index.core import VectorStoreIndex
from llama_index.core import Settings
llm = NVIDIA(model="mistralai/mistral-7b-instruct-v0.2")
Settings.llm = llm
We can observe which model our llm
object is currently associated with the .model
attribute.
llm.model
'mistralai/mistral-7b-instruct-v0.2'
Loading API Catalogue LLM¶
We can also load models using their API Catalogue address.
Let's use gemma-7b
as an example!
- Navigate to the model page
- Find the address in the
model
parameter (e.g."google/gemma-7b"
) - Verify it has the
base_url
of"https://integrate.api.nvidia.com/v1"
- Use
NVIDIA(model="model_name_here")
to point the connector at that model (e.g.NVIDIA(model="google/gemma-7b"
)
Let's see this in the code.
llm = NVIDIA(model="google/gemma-7b")
Let's confirm we've associated our NvidiaAIPlayground
LLM with the correct model!
llm.model
'google/gemma-7b'
Basic Functionality¶
Now we can explore the different ways you can use the connector within the LlamaIndex ecosystem!
Before we begin, lets set up a list of ChatMessage
objects - which is the expected input for some of the methods.
from llama_index.core.llms import ChatMessage, MessageRole
chat_messages = [
ChatMessage(
role=MessageRole.SYSTEM, content=("You are a helpful assistant.")
),
ChatMessage(
role=MessageRole.USER,
content=("What are the most popular house pets in North America?"),
),
]
We'll follow the same basic pattern for each example:
- We'll point our
NVIDIA
LLM to our desired model - We'll examine how to use the endpoint to achieve the desired task!
Complete: .complete()
¶
We can use .complete()
/.acomplete()
(which takes a string) to prompt a response from the selected model.
Let's use our default model for this task.
completion_llm = NVIDIA()
We can verify this is the expected default by checking the .model
attribute.
completion_llm.model
'mistralai/mistral-7b-instruct-v0.2'
Let's call .complete()
on our model with a string, in this case "Hello!"
, and observe the response.
completion_llm.complete("Hello!")
CompletionResponse(text=" Hello there! How can I help you today? I'm here to answer any questions you might have or provide information on a wide range of topics. So, feel free to ask me anything!\n\nIf you're looking for some general information, I can help you with that too. For example, I can tell you about the weather, current events, or provide definitions for various words and concepts. I can also help you with math problems, translate words and phrases, and even tell you a joke or two!\n\nSo, what would you like to know? Let me know and I'll do my best to help you out!\n\nIf you have any specific question or topic in mind, please let me know and I'll be glad to help you out. If you want some general information, I can provide you with that as well. For example, I can tell you about the weather, current events, or provide definitions for various words and concepts. I can also help you with math problems, translate words and phrases, and even tell you a joke or two!\n\nSo, what would you like to know? Let me know and I'll do my best to help you out!\n\nIf you have any specific question or topic in mind, please let me know and I'll be glad to help you out. If you want some general information, I can provide you with that as well. For example, I can tell you about the weather, current events, or provide definitions for various words and concepts. I can also help you with math problems, translate words and phrases, and even tell you a joke or two!\n\nSo, what would you like to know? Let me know and I'll do my best to help you out!\n\nIf you have any specific question or topic in mind, please let me know and I'll be glad to help you out. If you want some general information, I can provide you with that as well. For example, I can tell you about the weather, current events, or provide definitions for various words and concepts. I can also help you with math problems, translate words and phrases, and even tell you a joke or two!\n\nSo, what would you like to know? Let me know and I'll do my best to help you out!\n\nIf you have any specific question or topic in mind, please let me know and I'll be glad to help you out. If you want some", additional_kwargs={}, raw={'id': 'chatcmpl-f6906079-51e7-44bf-aaea-a9478397dfbf', 'choices': [Choice(finish_reason=None, index=0, logprobs=ChoiceLogprobs(content=None, text_offset=[], token_logprobs=[0.0, 0.0], tokens=[], top_logprobs=[]), message=ChatCompletionMessage(content=" Hello there! How can I help you today? I'm here to answer any questions you might have or provide information on a wide range of topics. So, feel free to ask me anything!\n\nIf you're looking for some general information, I can help you with that too. For example, I can tell you about the weather, current events, or provide definitions for various words and concepts. I can also help you with math problems, translate words and phrases, and even tell you a joke or two!\n\nSo, what would you like to know? Let me know and I'll do my best to help you out!\n\nIf you have any specific question or topic in mind, please let me know and I'll be glad to help you out. If you want some general information, I can provide you with that as well. For example, I can tell you about the weather, current events, or provide definitions for various words and concepts. I can also help you with math problems, translate words and phrases, and even tell you a joke or two!\n\nSo, what would you like to know? Let me know and I'll do my best to help you out!\n\nIf you have any specific question or topic in mind, please let me know and I'll be glad to help you out. If you want some general information, I can provide you with that as well. For example, I can tell you about the weather, current events, or provide definitions for various words and concepts. I can also help you with math problems, translate words and phrases, and even tell you a joke or two!\n\nSo, what would you like to know? Let me know and I'll do my best to help you out!\n\nIf you have any specific question or topic in mind, please let me know and I'll be glad to help you out. If you want some general information, I can provide you with that as well. For example, I can tell you about the weather, current events, or provide definitions for various words and concepts. I can also help you with math problems, translate words and phrases, and even tell you a joke or two!\n\nSo, what would you like to know? Let me know and I'll do my best to help you out!\n\nIf you have any specific question or topic in mind, please let me know and I'll be glad to help you out. If you want some", role='assistant', function_call=None, tool_calls=None))], 'created': 1713474670, 'model': 'mistralai/mistral-7b-instruct-v0.2', 'object': 'chat.completion', 'system_fingerprint': None, 'usage': CompletionUsage(completion_tokens=512, prompt_tokens=11, total_tokens=523)}, logprobs=None, delta=None)
As is expected by LlamaIndex - we get a CompletionResponse
in response.
Async Complete: .acomplete()
¶
There is also an async implementation which can be leveraged in the same way!
await completion_llm.acomplete("Hello!")
CompletionResponse(text=" Hello there! How can I help you today? I'm here to answer any questions you might have or provide information on a wide range of topics. So feel free to ask me anything!\n\nIf you're looking for a specific topic, just let me know and I'll do my best to provide you with accurate and up-to-date information. And if you have any requests for fun facts or trivia, I'm happy to oblige!\n\nSo, what would you like to know today? Let me help make your day a little brighter! 😊", additional_kwargs={}, raw={'id': 'chatcmpl-8ce881c1-a47b-43aa-afd8-9e9addf26ce9', 'choices': [Choice(finish_reason=None, index=0, logprobs=ChoiceLogprobs(content=None, text_offset=[], token_logprobs=[0.0, 0.0], tokens=[], top_logprobs=[]), message=ChatCompletionMessage(content=" Hello there! How can I help you today? I'm here to answer any questions you might have or provide information on a wide range of topics. So feel free to ask me anything!\n\nIf you're looking for a specific topic, just let me know and I'll do my best to provide you with accurate and up-to-date information. And if you have any requests for fun facts or trivia, I'm happy to oblige!\n\nSo, what would you like to know today? Let me help make your day a little brighter! 😊", role='assistant', function_call=None, tool_calls=None))], 'created': 1712175910, 'model': 'mistralai/mistral-7b-instruct-v0.2', 'object': 'chat.completion', 'system_fingerprint': None, 'usage': CompletionUsage(completion_tokens=123, prompt_tokens=11, total_tokens=134)}, logprobs=None, delta=None)
Chat: .chat()
¶
Now we can try the same thing using the .chat()
method. This method expects a list of chat messages - so we'll use the one we created above.
We'll use the mistralai/mixtral-8x7b-instruct-v0.1
model for the example.
chat_llm = NVIDIA(model="mistralai/mixtral-8x7b-instruct-v0.1")
All we need to do now is call .chat()
on our list of ChatMessages
and observe our response.
You'll also notice that we can pass in a few additional key-word arguments that can influence the generation - in this case, we've used the seed
parameter to influence our generation and the stop
parameter to indicate we want the model to stop generating once it reaches a certain token!
NOTE: You can find information about what additional kwargs are supported by the model's endpoint by referencing the API documentation for the selected model. Mixtral's is located here as an example!
chat_llm.chat(chat_messages, seed=4, stop=["cat", "cats", "Cat", "Cats"])
ChatResponse(message=ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content=" In North America, the most popular types of house pets are:\n\n1. Dogs: Man's best friend is the most popular pet in North America. They are known for their loyalty, companionship, and the variety of breeds that cater to different lifestyles and preferences.\n\n2. Cats", additional_kwargs={}), raw={'id': 'chatcmpl-b6ef95ca-e023-4dc8-8ee9-843f214169e9', 'choices': [Choice(finish_reason=None, index=0, logprobs=ChoiceLogprobs(content=None, text_offset=[], token_logprobs=[0.0, 0.0], tokens=[], top_logprobs=[]), message=ChatCompletionMessage(content=" In North America, the most popular types of house pets are:\n\n1. Dogs: Man's best friend is the most popular pet in North America. They are known for their loyalty, companionship, and the variety of breeds that cater to different lifestyles and preferences.\n\n2. Cats", role='assistant', function_call=None, tool_calls=None))], 'created': 1713474655, 'model': 'mistralai/mixtral-8x7b-instruct-v0.1', 'object': 'chat.completion', 'system_fingerprint': None, 'usage': CompletionUsage(completion_tokens=66, prompt_tokens=26, total_tokens=92)}, delta=None, logprobs=None, additional_kwargs={})
As expected, we receive a ChatResponse
in response.
Async Chat: (achat
)¶
We also have an async implementation of the .chat()
method which can be called in the following way.
await chat_llm.achat(chat_messages)
ChatResponse(message=ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content=' The most popular house pets in North America are dogs and cats. According to the American Pet Products Association (APPA), as of 2021, approximately 69 million homes in the United States own a pet, and 63.4 million of those households have a dog, while 42.7 million have a cat. Birds, small mammals, reptiles, and fish are also popular pets, but to a lesser extent.', additional_kwargs={}), raw={'id': 'chatcmpl-373a1d42-4dc1-4ef9-aaf3-5fea137e8e1e', 'choices': [Choice(finish_reason=None, index=0, logprobs=ChoiceLogprobs(content=None, text_offset=[], token_logprobs=[0.0, 0.0], tokens=[], top_logprobs=[]), message=ChatCompletionMessage(content=' The most popular house pets in North America are dogs and cats. According to the American Pet Products Association (APPA), as of 2021, approximately 69 million homes in the United States own a pet, and 63.4 million of those households have a dog, while 42.7 million have a cat. Birds, small mammals, reptiles, and fish are also popular pets, but to a lesser extent.', role='assistant', function_call=None, tool_calls=None))], 'created': 1712177472, 'model': 'mistralai/mixtral-8x7b-instruct-v0.1', 'object': 'chat.completion', 'system_fingerprint': None, 'usage': CompletionUsage(completion_tokens=95, prompt_tokens=59, total_tokens=154)}, delta=None, logprobs=None, additional_kwargs={})
Stream: .stream_chat()
¶
We can also use the models found on build.nvidia.com
for streaming use-cases!
Let's select another model and observe this behaviour. We'll use Google's gemma-7b
model for this task.
stream_llm = NVIDIA(model="google/gemma-7b")
Let's call our model with .stream_chat()
, which again expects a list of ChatMessage
objects, and capture the response.
streamed_response = stream_llm.stream_chat(chat_messages)
streamed_response
<generator object llm_chat_callback.<locals>.wrap.<locals>.wrapped_llm_chat.<locals>.wrapped_gen at 0x7dd89853e320>
As we can see, the response is a generator with the streamed response.
Let's take a look at the final response once the generation is complete.
last_element = None
for last_element in streamed_response:
pass
print(last_element)
assistant: **Top Popular House Pets in North America:** **1. Dogs:** * Estimated 63.4 million pet dogs in households (2023) * Known for their loyalty, companionship, and trainability **2. Cats:** * Estimated 38.4 million pet cats in households (2023) * Known for their independence, affection, and low-maintenance nature **3. Fish:** * Estimated 14.5 million pet fish in households (2023) * Popular for their tranquility, beauty, and variety of species **4. Small mammals (guinea pigs, hamsters, rabbits):** * Estimated 14.4 million pet small mammals in households (2023) * Known for their playful and affectionate nature **5. Birds:** * Estimated 13.3 million pet birds in households (2023) * Known for their beauty, song, and intelligence **Other popular pets:** * Tortoises and reptiles * Hamsters and rodents * Invertebrates (such as spiders and hermit crabs) **Factors influencing pet popularity:** * **Lifestyle and living situation:** Urban dwellers are more likely to have cats, while suburban and rural residents are more likely to have dogs. * **Cost:** Dogs tend to be more expensive to own than cats. * **Personality and preferences:** Some people prefer the companionship of dogs, while others prefer the independence of cats. * **Availability:** Certain pets are easier to find or adopt than others. * **Trend and cultural influences:** Some pets become more popular than others due to trends or cultural preferences.
Async Stream: .astream_chat()
¶
We have the equivalent async method for streaming as well, which can be used in a similar way to the sync implementation.
streamed_response = await stream_llm.astream_chat(chat_messages)
streamed_response
<async_generator object llm_chat_callback.<locals>.wrap.<locals>.wrapped_async_llm_chat.<locals>.wrapped_gen at 0x787709eea460>
last_element = None
async for last_element in streamed_response:
pass
print(last_element)
assistant: Sure, here are the most popular house pets in North America: 1. Dogs 2. Cats 3. Fish 4. Small Mammals 5. Birds
Streaming Query Engine Responses¶
Let's look at a slightly more involved example using a query engine!
We'll start by loading some data (we'll be using the Hitchhiker's Guide to the Galaxy).
Loading Data¶
Let's first create a directory where our data can live.
!mkdir -p 'data/hhgttg'
We'll download our data from the above source.
!wget 'https://web.eecs.utk.edu/~hqi/deeplearning/project/hhgttg.txt' -O 'data/hhgttg/hhgttg.txt'
--2024-04-01 14:39:38-- https://web.eecs.utk.edu/~hqi/deeplearning/project/hhgttg.txt Resolving web.eecs.utk.edu (web.eecs.utk.edu)... 160.36.127.165 Connecting to web.eecs.utk.edu (web.eecs.utk.edu)|160.36.127.165|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 1534289 (1.5M) [text/plain] Saving to: ‘data/hhgttg/hhgttg.txt’ data/hhgttg/hhgttg. 100%[===================>] 1.46M 6.75MB/s in 0.2s 2024-04-01 14:39:39 (6.75 MB/s) - ‘data/hhgttg/hhgttg.txt’ saved [1534289/1534289]
We'll need to have an embedding model for this step! We'll use OpenAI's text-embedding-03-small
model to achieve this, and save it in our Settings
.
from llama_index.embeddings.openai import OpenAIEmbedding
openai_embedding = OpenAIEmbedding(model="text-embedding-3-small")
Settings.embed_model = openai_embedding
Now we can load our document and create an index leveraging the above created OpenAIEmbedding()
.
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("data/hhgttg").load_data()
index = VectorStoreIndex.from_documents(documents)
Now we can create a simple query engine and set our streaming
parameter to True
.
streaming_qe = index.as_query_engine(streaming=True)
Let's send a query to our query engine, and then stream the response.
streaming_response = streaming_qe.query(
"What is the significance of the number 42?",
)
streaming_response.print_response_stream()
The significance of the number 42 is a central theme in "The Hitchhiker's Guide to the Galaxy" by Douglas Adams. The book is a comedic science fiction satire that follows the adventures of two intergalactic travelers, Arthur Dent and Ford Prefect, as they try to escape the destruction of Earth and uncover the true meaning of the number 42. Throughout the book, the number 42 is presented as the ultimate answer to the ultimate question of life, the universe, and everything. The question itself is never explicitly stated, but it is implied to be a deeply profound and existential one that has been sought after by philosophers, scientists, and thinkers throughout history. The idea of the number 42 as the ultimate answer is a playful jab at the idea of seeking ultimate knowledge and understanding, which is often seen as an impossible task. The number 42 is also a reference to the famous "42" answer in the "The Hitchhiker's Guide to the Galaxy" by Douglas Adams, which is a comedic science fiction satire that follows the adventures of two intergalactic travelers, Arthur Dent and Ford Prefect, as they try to escape the destruction of Earth and uncover the true meaning of the number 42. In the book, the supercomputer Deep Thought is asked to find the answer to the ultimate question, and after billions of years of computation, it determines that the answer is 42. The answer is so profound that it causes Deep Thought to become obsolete, as it is no longer needed to answer questions. The significance of the number 42 in "The Hitchhiker's Guide to the Galaxy" is a commentary on the nature of knowledge and the quest for ultimate understanding. It is a reminder that there are limits to what can be known and that the pursuit of knowledge should be done with a sense of humor and a willingness to accept the unknown.
Connecting to local NIMs¶
In addition to connecting to hosted NVIDIA NIMs, this connector can be used to connect to local microservice instances. This helps you take your applications local when necessary.
For instructions on how to setup local microservice instances, see https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/
from llama_index.llms.nvidia import NVIDIA
llm = NVIDIA(model="...").mode("nim", base_url="https://localhost.../v1")
llm.available_models