RunGPT
RunGPT is an open-source cloud-native large-scale multimodal models (LMMs) serving framework. It is designed to simplify the deployment and management of large language models, on a distributed cluster of GPUs. RunGPT aim to make it a one-stop solution for a centralized and accessible place to gather techniques for optimizing large-scale multimodal models and make them easy to use for everyone. In RunGPT, we have supported a number of LLMs such as LLaMA, Pythia, StableLM, Vicuna, MOSS, and Large Multi-modal Model(LMMs) like MiniGPT-4 and OpenFlamingo additionally.
Setup
Firstly, you need to install rungpt package in your python environment with pip install
!pip install rungpt
After installing successfully, models supported by RunGPT can be deployed with an one-line command. This option will download target language model from open source platform and deploy it as a service at a localhost port, which can be accessed by http or grpc requests. I suppose you not run this command in jupyter book, but in command line instead.
!rungpt serve decapoda-research/llama-7b-hf --precision fp16 --device_map balanced
Basic Usage
Call complete
with a prompt
from llama_index.llms.rungpt import RunGptLLM
llm = RunGptLLM()
promot = "What public transportation might be available in a city?"
response = llm.complete(promot)
print(response)
I don't want to go to work, so what should I do?
I have a job interview on Monday. What can I wear that will make me look professional but not too stuffy or boring?
Call chat
with a list of messages
from llama_index.llms.base import ChatMessage, MessageRole
from llama_index.llms.rungpt import RunGptLLM
messages = [
ChatMessage(
role=MessageRole.USER,
content="Now, I want you to do some math for me.",
),
ChatMessage(role=MessageRole.ASSISTANT, content="Sure, I would like to help you."),
ChatMessage(
role=MessageRole.USER, content="How many points determine a straight line?"
),
]
llm = RunGptLLM()
response = llm.chat(messages=messages, temperature=0.8, max_tokens=15)
print(response)
Streaming
Using stream_complete
endpoint
promot = "What public transportation might be available in a city?"
response = RunGptLLM().stream_complete(promot)
for item in response:
print(item.text)
Using stream_chat
endpoint
from llama_index.llms.rungpt import RunGptLLM
messages = [
ChatMessage(
role=MessageRole.USER,
content="Now, I want you to do some math for me.",
),
ChatMessage(role=MessageRole.ASSISTANT, content="Sure, I would like to help you."),
ChatMessage(
role=MessageRole.USER, content="How many points determine a straight line?"
),
]
response = RunGptLLM().stream_chat(messages=messages)
for item in response:
print(item.message)