Hide navigation sidebar

Hide table of contents sidebar

Toggle site navigation sidebar

LlamaIndex 🦙 v0.10.19

Toggle table of contents sidebar

LlamaIndex 🦙 v0.10.19

Getting Started

Installation and Setup
How to read these docs
Starter Tutorial
High-Level Concepts
Customization Tutorial
Discover LlamaIndex Video Series

Use Cases

Q&A
Toggle child pages in navigation
- RAG CLI
Chatbots
Agents
Toggle child pages in navigation
- Agents (Putting your RAG Pipeline Together)
  Toggle child pages in navigation
- Agentic Strategies (Optimizing your RAG Pipeline)
  Toggle child pages in navigation
  - Routers
    Toggle child pages in navigation
  - Query Transformations
    Toggle child pages in navigation
    - HyDE Query Transform
    - Multi-Step Query Engine
  - Sub Question Query Engine (Intro)
  - Build your own OpenAI Agent
  - OpenAI Agent with Query Engine Tools
  - Retrieval-Augmented OpenAI Agent
  - OpenAI Agent + Query Engine Experimental Cookbook
  - OpenAI Agent Query Planning
  - Context-Augmented OpenAI Agent
- Agents
  Toggle child pages in navigation
- Tools
  Toggle child pages in navigation
  - Usage Pattern
  - LlamaHub Tools Guide
Structured Data Extraction
Toggle child pages in navigation
- Structured Outputs
  Toggle child pages in navigation
  - Pydantic Program
    Toggle child pages in navigation
  - Query Engines + Pydantic Outputs
    Toggle child pages in navigation
  - Output Parsing Modules
    Toggle child pages in navigation
- Output Parsing Modules
  Toggle child pages in navigation
- Extracting names and locations from descriptions of people
- Extracting album data from music reviews
- Extracting information from emails
Multi-modal
Toggle child pages in navigation

Understanding

Building an LLM application
Using LLMs
Toggle child pages in navigation
- Privacy and Security
Loading Data (Ingestion)
Toggle child pages in navigation
- LlamaHub
- Documents / Nodes
  Toggle child pages in navigation
  - Defining and Customizing Documents
    Toggle child pages in navigation
    - Metadata Extraction Usage Pattern
      Toggle child pages in navigation
  - Defining and Customizing Nodes
  - Transformations
- Node Parser Usage Pattern
  Toggle child pages in navigation
  - Node Parser Modules
- Ingestion Pipeline
  Toggle child pages in navigation
Indexing
Storing
Querying
Putting It All Together
Toggle child pages in navigation
- Q&A patterns
  Toggle child pages in navigation
- Full-Stack Web Application
  Toggle child pages in navigation
  - A Guide to Building a Full-Stack Web App with LLamaIndex
  - A Guide to Building a Full-Stack LlamaIndex Web App with Delphic
- How to Build a Chatbot
- Agents
  Toggle child pages in navigation
- Full-Stack Projects
  Toggle child pages in navigation
Tracing and Debugging
Evaluating
Toggle child pages in navigation
- Cost Analysis
  Toggle child pages in navigation
  - Usage Pattern

Optimizing

Basic Strategies
Toggle child pages in navigation
Advanced Retrieval Strategies
Toggle child pages in navigation
- Query Transform Cookbook
- Query Transformations
  Toggle child pages in navigation
  - HyDE Query Transform
  - Multi-Step Query Engine
- Composable Objects
- DeepMemory (Activeloop)
- Weaviate Vector Store - Hybrid Search
- Pinecone Vector Store - Hybrid Search
Agentic strategies
Toggle child pages in navigation
- Routers
  Toggle child pages in navigation
- Query Transformations
  Toggle child pages in navigation
  - HyDE Query Transform
  - Multi-Step Query Engine
- Sub Question Query Engine (Intro)
- Build your own OpenAI Agent
- OpenAI Agent with Query Engine Tools
- Retrieval-Augmented OpenAI Agent
- OpenAI Agent + Query Engine Experimental Cookbook
- OpenAI Agent Query Planning
- Context-Augmented OpenAI Agent
Evaluation
Toggle child pages in navigation
- End-to-End Evaluation
  Toggle child pages in navigation
- Component Wise Evaluation
  Toggle child pages in navigation
  - BEIR Out of Domain Benchmark
  - HotpotQADistractor Demo
- Evaluating
  Toggle child pages in navigation
  - Usage Pattern (Response Evaluation)
  - Usage Pattern (Retrieval)
  - Modules
    Toggle child pages in navigation
  - Evaluating With LabelledRagDataset’s
    Toggle child pages in navigation
    - Benchmarking RAG Pipelines With A LabelledRagDatatset
    - Downloading a LlamaDataset from LlamaHub
  - Contributing A LabelledRagDataset
    Toggle child pages in navigation
    - LlamaDataset Submission Template Notebook
- Component Wise Evaluation
  Toggle child pages in navigation
  - BEIR Out of Domain Benchmark
  - HotpotQADistractor Demo
- End-to-End Evaluation
  Toggle child pages in navigation
Fine-tuning
Toggle child pages in navigation
Building Performant RAG Applications for Production
Toggle child pages in navigation
Writing Custom Modules
Building RAG from Scratch (Lower-Level)
Toggle child pages in navigation

Module Guides

Models
Toggle child pages in navigation
Prompts
Toggle child pages in navigation
Loading Data
Toggle child pages in navigation
- Data Connectors (LlamaHub)
  Toggle child pages in navigation
  - Usage Pattern
  - Module Guides
    Toggle child pages in navigation
- Documents / Nodes
  Toggle child pages in navigation
  - Defining and Customizing Documents
    Toggle child pages in navigation
    - Metadata Extraction Usage Pattern
      Toggle child pages in navigation
  - Defining and Customizing Nodes
  - Transformations
- Node Parser Usage Pattern
  Toggle child pages in navigation
  - Node Parser Modules
- Ingestion Pipeline
  Toggle child pages in navigation
Indexing
Toggle child pages in navigation
- LlamaCloudIndex + LlamaCloudRetriever
- Using VectorStoreIndex
  Toggle child pages in navigation
- How Each Index Works
- Module Guides
  Toggle child pages in navigation
  - VectorStoreIndex
    Toggle child pages in navigation
  - Summary Index
  - Tree Index
  - Keyword Table Index
  - Knowledge Graph Index
  - Knowledge Graph Query Engine
  - Knowledge Graph RAG Query Engine
  - REBEL + Knowledge Graph Index
  - REBEL + Wikipedia Filtering
  - SQL Index
  - SQL Query Engine with LlamaIndex + DuckDB
  - Document Summary Index
  - The ObjectIndex Class
Storing
Toggle child pages in navigation
Querying
Toggle child pages in navigation
- Query Pipeline
  Toggle child pages in navigation
  - Usage Pattern
  - Module Usage
  - Module Guides
    Toggle child pages in navigation
- Query Engine
  Toggle child pages in navigation
  - Usage Pattern
    Toggle child pages in navigation
    - Response Modes
    - Streaming
  - Module Guides
    Toggle child pages in navigation
  - Supporting Modules
    Toggle child pages in navigation
    - Query Transformations
      Toggle child pages in navigation
      - HyDE Query Transform
      - Multi-Step Query Engine
- Chat Engine
  Toggle child pages in navigation
  - Usage Pattern
  - Module Guides
    Toggle child pages in navigation
- Agents
  Toggle child pages in navigation
- Retriever
  Toggle child pages in navigation
  - Retriever Modes
  - Retriever Modules
    Toggle child pages in navigation
- Response Synthesizer
  Toggle child pages in navigation
  - Response Synthesis Modules
    Toggle child pages in navigation
- Routers
  Toggle child pages in navigation
- Node Postprocessor
  Toggle child pages in navigation
  - Node Postprocessor Modules
    Toggle child pages in navigation
- Structured Outputs
  Toggle child pages in navigation
  - Pydantic Program
    Toggle child pages in navigation
  - Query Engines + Pydantic Outputs
    Toggle child pages in navigation
  - Output Parsing Modules
    Toggle child pages in navigation
Agents
Toggle child pages in navigation
Observability
Toggle child pages in navigation
Evaluating
Toggle child pages in navigation
- Usage Pattern (Response Evaluation)
- Usage Pattern (Retrieval)
- Modules
  Toggle child pages in navigation
- Evaluating With LabelledRagDataset’s
  Toggle child pages in navigation
  - Benchmarking RAG Pipelines With A LabelledRagDatatset
  - Downloading a LlamaDataset from LlamaHub
- Contributing A LabelledRagDataset
  Toggle child pages in navigation
  - LlamaDataset Submission Template Notebook
Supporting Modules
Toggle child pages in navigation
- Configuring Settings

API Reference

API Reference
Toggle child pages in navigation
- Agents
- Callbacks
- Composability
- Evaluation
- Example Notebooks
- Indices
  Toggle child pages in navigation
- LLMs
- BaseEmbedding
- Memory
- Node Postprocessors
- Node
  Toggle child pages in navigation
- Playground
- Prompt Templates
- Querying an Index
  Toggle child pages in navigation
  - Retrievers
    Toggle child pages in navigation
  - Response Synthesizer
  - Query Engines
    Toggle child pages in navigation
  - Chat Engines
    Toggle child pages in navigation
  - Query Bundle
  - Query Transform
- Data Connectors
  Toggle child pages in navigation
- Response
- Service Context
  Toggle child pages in navigation
- Storage Context
  Toggle child pages in navigation
  - Document Store
  - Index Store
  - Vector Store
    Toggle child pages in navigation
  - KV Storage
  - Loading Indices
- Structured Index Configuration

Community

Integrations
Toggle child pages in navigation
Frequently Asked Questions (FAQ)
Toggle child pages in navigation
Full-Stack Projects
Toggle child pages in navigation

Contributing

Contributing to LlamaIndex
Documentation Guide

Changes

ChangeLog
Deprecated Terms

Toggle table of contents sidebar

Multi-Modal LLM using DashScope qwen-vl model for image reasoning#

In this notebook, we show how to use DashScope qwen-vl MultiModal LLM class/abstraction for image understanding/reasoning. Async is not currently supported

We also show several functions we are now supporting for DashScope LLM:

complete (sync): for a single prompt and list of images
chat (sync): for multiple chat messages
stream complete (sync): for steaming output of complete
stream chat (sync): for steaming output of chat
multi round conversation.

!pip install -U llama-index-multi-modal-llms-dashscope

Use DashScope to understand Images from URLs#

# Set API key
%env DASHSCOPE_API_KEY=YOUR_DASHSCOPE_API_KEY

Initialize `DashScopeMultiModal` and Load Images from URLs#

from llama_index.multi_modal_llms.dashscope import (
    DashScopeMultiModal,
    DashScopeMultiModalModels,
)

from llama_index.core.multi_modal_llms.generic_utils import load_image_urls


image_urls = [
    "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg",
]

image_documents = load_image_urls(image_urls)

dashscope_multi_modal_llm = DashScopeMultiModal(
    model_name=DashScopeMultiModalModels.QWEN_VL_MAX,
)

Complete a prompt with images#

complete_response = dashscope_multi_modal_llm.complete(
    prompt="What's in the image?",
    image_documents=image_documents,
)
print(complete_response)

The image captures a serene moment on a sandy beach at sunset. A woman, dressed in a blue and white plaid shirt, is seated on the ground. She is holding a treat in her hand, which is being gently taken by a dog. The dog, wearing a blue harness, is sitting next to the woman, its paw resting on her leg. The backdrop of this heartwarming scene is the vast ocean, with the sun setting in the distance, casting a warm glow over the entire landscape. The image beautifully encapsulates the bond between the woman and her dog, set against the tranquil beauty of nature.

### Complete a prompt with multi images
multi_image_urls = [
    "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg",
    "https://dashscope.oss-cn-beijing.aliyuncs.com/images/panda.jpeg",
]

multi_image_documents = load_image_urls(multi_image_urls)
complete_response = dashscope_multi_modal_llm.complete(
    prompt="What animals are in the pictures?",
    image_documents=multi_image_documents,
)
print(complete_response)

There is a dog in Picture 1, and there is a panda in Picture 2.

Steam Complete a prompt with a bunch of images#

stream_complete_response = dashscope_multi_modal_llm.stream_complete(
    prompt="What's in the image?",
    image_documents=image_documents,
)

for r in stream_complete_response:
    print(r.delta, end="")

The image captures a serene moment on a sandy beach at sunset. A woman, dressed in a blue and white plaid shirt, is seated on the ground. She is holding a treat in her hand, which is being gently taken by a dog. The dog, wearing a blue harness, is sitting next to the woman, its paw resting on her leg. The backdrop of this heartwarming scene is the vast ocean, with the sun setting in the distance, casting a warm glow over the entire landscape. The image beautifully encapsulates the bond between the woman and her dog, set against the tranquil beauty of nature.

multi round conversation with chat messages#

from llama_index.core.base.llms.types import MessageRole
from llama_index.multi_modal_llms.dashscope.utils import (
    create_dashscope_multi_modal_chat_message,
)

chat_message_user_1 = create_dashscope_multi_modal_chat_message(
    "What's in the image?", MessageRole.USER, image_documents
)
chat_response = dashscope_multi_modal_llm.chat([chat_message_user_1])
print(chat_response.message.content[0]["text"])
chat_message_assistent_1 = create_dashscope_multi_modal_chat_message(
    chat_response.message.content[0]["text"], MessageRole.ASSISTANT, None
)
chat_message_user_2 = create_dashscope_multi_modal_chat_message(
    "what are they doing?", MessageRole.USER, None
)
chat_response = dashscope_multi_modal_llm.chat(
    [chat_message_user_1, chat_message_assistent_1, chat_message_user_2]
)
print(chat_response.message.content[0]["text"])

The image shows two photos of a panda sitting on a wooden log in an enclosure. In the top photo, the panda is sitting upright with its front paws on the log, facing three crows that are perched on the log. The panda looks alert and curious, while the crows seem to be observing the panda. In the bottom photo, the panda is lying down on the log, its head resting on its front paws. One crow has landed on the ground next to the log, and it seems to be interacting with the panda. The background of the photo shows green plants and a wire fence, creating a natural and relaxed atmosphere.
The woman is sitting on the beach with her dog, and they are giving each other high fives. The panda and the crows are sitting together on a log, and the panda seems to be communicating with the crows.

Stream Chat through a list of chat messages#

stream_chat_response = dashscope_multi_modal_llm.stream_chat(
    [chat_message_user_1, chat_message_assistent_1, chat_message_user_2]
)
for r in stream_chat_response:
    print(r.delta, end="")

The woman is sitting on the beach, holding a treat in her hand, while the dog is sitting next to her, taking the treat from her hand.

Use images from local files#

Use local file:
Linux&mac file schema: file:///home/images/test.png
Windows file schema: file://D:/images/abc.png

from llama_index.multi_modal_llms.dashscope.utils import load_local_images

local_images = [
    "file://THE_FILE_PATH1",
    "file://THE_FILE_PATH2",
]

image_documents = load_local_images(local_images)
chat_message_local = create_dashscope_multi_modal_chat_message(
    "What animals are in the pictures?", MessageRole.USER, image_documents
)
chat_response = dashscope_multi_modal_llm.chat([chat_message_local])
print(chat_response.message.content[0]["text"])

There is a dog in Picture 1, and there is a panda in Picture 2.

Copyright © 2023, Jerry Liu

Made with Sphinx and @pradyunsg's Furo

On this page

Multi-Modal LLM using DashScope qwen-vl model for image reasoning
- Use DashScope to understand Images from URLs
- Initialize DashScopeMultiModal and Load Images from URLs