Multimodal Ollama#

Open In Colab

This notebook shows you how to use our Ollama multimodal integration.

Supports complete, stream_complete, chat, stream_chat methods (async support coming soon).

Use on its own or plug into broader multi-modal use cases

Define Model#

from llama_index.multi_modal_llms import OllamaMultiModal
mm_model = OllamaMultiModal(model="llava")
!wget "https://res.cloudinary.com/hello-tickets/image/upload/c_limit,f_auto,q_auto,w_1920/v1640835927/o3pfl41q7m5bj8jardk0.jpg" -O jerry_images/test.png
--2024-02-03 11:41:04--  https://res.cloudinary.com/hello-tickets/image/upload/c_limit,f_auto,q_auto,w_1920/v1640835927/o3pfl41q7m5bj8jardk0.jpg
Resolving res.cloudinary.com (res.cloudinary.com)... 2606:4700::6813:a641, 2606:4700::6813:a741, 104.19.166.65, ...
Connecting to res.cloudinary.com (res.cloudinary.com)|2606:4700::6813:a641|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 181517 (177K) [image/jpeg]
Saving to: ‘jerry_images/test.png’

jerry_images/test.p 100%[===================>] 177.26K  --.-KB/s    in 0.01s   

2024-02-03 11:41:04 (14.6 MB/s) - ‘jerry_images/test.png’ saved [181517/181517]

Load Data#

from llama_index.multi_modal_llms.generic_utils import load_image_urls

image_urls = [
    # "https://www.visualcapitalist.com/wp-content/uploads/2023/10/US_Mortgage_Rate_Surge-Sept-11-1.jpg",
    # "https://www.sportsnet.ca/wp-content/uploads/2023/11/CP1688996471-1040x572.jpg",
    "https://res.cloudinary.com/hello-tickets/image/upload/c_limit,f_auto,q_auto,w_1920/v1640835927/o3pfl41q7m5bj8jardk0.jpg",
    # "https://www.cleverfiles.com/howto/wp-content/uploads/2018/03/minion.jpg",
]

image_documents = load_image_urls(image_urls)

Completion (Non-Streaming/Streaming)#

complete_response = mm_model.complete(
    prompt="Tell me more about this image", image_documents=image_documents
)
print(str(complete_response))
 The image shows the ancient Greek landmark known as the Colosseum, located in Rome, Italy. It is lit up with colorful lights and appears to be illuminated against a night sky. The Colosseum is a distinctive oval structure that has been used for various purposes over the centuries, including gladiatorial contests and public events. The colors of light suggest they could be representing national colors, such as those of Italy (red), white, and green. The photo captures the grandeur and historical significance of this iconic monument. 
response_gen = mm_model.stream_complete(
    prompt="Tell me more about this image",
    image_documents=image_documents,
)
for r in response_gen:
    print(r.delta, end="")
 This is an image of the Colosseum, a famous landmark located in Rome, Italy. It's illuminated at night with colorful lights, which gives it a festive appearance. The Colosseum is a significant historical and architectural structure that was used for gladiatorial contests and other public spectacles during the Roman Empire. 

Chat (Non-Streaming/Streaming)#

# chat
from llama_index.llms import ChatMessage, MessageRole

image_bytes_io = [d.resolve_image() for d in image_documents]

chat_response = mm_model.chat(
    [
        ChatMessage(
            role=MessageRole.USER,
            content="Tell me more about this image",
            additional_kwargs={"images": image_bytes_io},
        )
    ]
)
print(str(chat_response))
assistant:  This is an image of the Colosseum, also known as the Flavian Amphitheatre, which is a renowned landmark located in Rome, Italy. The Colosseum is one of the most famous and well-preserved ancient structures in the world. It was built during the Roman Empire and was used for public entertainment such as gladiatorial contests, reenactments of battles, and dramas based on classical mythology.

The structure has a distinctive elliptical shape with tiered seating for spectators, which is clearly visible in this image. The photo is taken at night, and the Colosseum is illuminated by colorful lights that highlight its arches and the overall outline of the building. The colors of the lights correspond to those of the Italian flag, symbolizing a sense of national pride or celebration.

The architecture and design of the Colosseum are indicative of the engineering prowess of the Romans during their peak period. It has become an iconic symbol of Roman civilization and continues to be a popular tourist attraction in Rome. 
# stream chat
from llama_index.llms import ChatMessage, MessageRole


image_bytes_io = [d.resolve_image() for d in image_documents]

chat_gen = mm_model.stream_chat(
    [
        ChatMessage(
            role=MessageRole.USER,
            content="Tell me more about this image",
            additional_kwargs={"images": image_bytes_io},
        )
    ]
)
for r in chat_gen:
    print(r.delta, end="")
 The image shows the Colosseum, a well-known landmark and amphitheater in Rome, Italy. It is illuminated with colorful lights, which appear to be red, green, and yellow, possibly indicating a special event or celebration, such as a national holiday given the colors of the Italian flag (red, white, and green). The Colosseum is captured at night under a clear sky, which adds to its dramatic presentation.