OpenAI Pydantic Program

This guide shows you how to generate structured data with new OpenAI API via LlamaIndex. The user just needs to specify a Pydantic object.

We demonstrate two settings:

  • Extraction into an Album object (which can contain a list of Song objects)

  • Extraction into a DirectoryTree object (which can contain recursive Node objects)

Extraction into Album

This is a simple example of parsing an output into an Album schema, which can contain multiple songs.

from pydantic import BaseModel
from typing import List

from llama_index.program import OpenAIPydanticProgram

Define output schema

class Song(BaseModel):
    """Data model for a song."""

    title: str
    length_seconds: int


class Album(BaseModel):
    """Data model for an album."""

    name: str
    artist: str
    songs: List[Song]

Define openai pydantic program

prompt_template_str = """\
Generate an example album, with an artist and a list of songs. \
Using the movie {movie_name} as inspiration.\
"""
program = OpenAIPydanticProgram.from_defaults(
    output_cls=Album,
    prompt_template_str=prompt_template_str,
    verbose=True,
)

Run program to get structured output.

output = program(movie_name="The Shining")
Function call: Album with args: {
  "name": "The Shining",
  "artist": "Various Artists",
  "songs": [
    {
      "title": "Main Title",
      "length_seconds": 180
    },
    {
      "title": "Opening Credits",
      "length_seconds": 120
    },
    {
      "title": "The Overlook Hotel",
      "length_seconds": 240
    },
    {
      "title": "Redrum",
      "length_seconds": 150
    },
    {
      "title": "Here's Johnny",
      "length_seconds": 200
    }
  ]
}

The output is a valid Pydantic object that we can then use to call functions/APIs.

output
Album(name='The Shining', artist='Various Artists', songs=[Song(title='Main Title', length_seconds=180), Song(title='Opening Credits', length_seconds=120), Song(title='The Overlook Hotel', length_seconds=240), Song(title='Redrum', length_seconds=150), Song(title="Here's Johnny", length_seconds=200)])

Extraction into Album (Streaming)

We also support streaming a list of objects through our stream_list function.

Full credits to this idea go to openai_function_call repo: https://github.com/jxnl/openai_function_call/tree/main/examples/streaming_multitask

prompt_template_str = "{input_str}"
program = OpenAIPydanticProgram.from_defaults(
    output_cls=Album,
    prompt_template_str=prompt_template_str,
    verbose=False,
)

output = program.stream_list(input_str="make up 5 random albums")
for obj in output:
    print(obj.json(indent=2))
{
  "name": "The Journey",
  "artist": "Unknown",
  "songs": [
    {
      "title": "Lost in the Woods",
      "length_seconds": 240
    },
    {
      "title": "Endless Horizon",
      "length_seconds": 320
    },
    {
      "title": "Mystic Dreams",
      "length_seconds": 280
    }
  ]
}
{
  "name": "Electric Pulse",
  "artist": "Synthwave Master",
  "songs": [
    {
      "title": "Neon Nights",
      "length_seconds": 300
    },
    {
      "title": "Cyber City",
      "length_seconds": 280
    },
    {
      "title": "Digital Dreams",
      "length_seconds": 320
    }
  ]
}
{
  "name": "Soulful Serenade",
  "artist": "Smooth Jazz Trio",
  "songs": [
    {
      "title": "Midnight Groove",
      "length_seconds": 280
    },
    {
      "title": "Saxophone Serenade",
      "length_seconds": 320
    },
    {
      "title": "Chill Vibes",
      "length_seconds": 240
    }
  ]
}
{
  "name": "Rock Revolution",
  "artist": "The Thunderbolts",
  "songs": [
    {
      "title": "High Voltage",
      "length_seconds": 320
    },
    {
      "title": "Guitar Shredder",
      "length_seconds": 280
    },
    {
      "title": "Rock Anthem",
      "length_seconds": 300
    }
  ]
}
{
  "name": "Pop Sensation",
  "artist": "The Starlets",
  "songs": [
    {
      "title": "Catchy Melody",
      "length_seconds": 240
    },
    {
      "title": "Dancefloor Hit",
      "length_seconds": 300
    },
    {
      "title": "Pop Princess",
      "length_seconds": 280
    }
  ]
}

Extraction into DirectoryTree object

This is directly inspired by jxnl’s awesome repo here: https://github.com/jxnl/openai_function_call.

That repository shows how you can use OpenAI’s function API to parse recursive Pydantic objects. The main requirement is that you want to β€œwrap” a recursive Pydantic object with a non-recursive one.

Here we show an example in a β€œdirectory” setting, where a DirectoryTree object wraps recursive Node objects, to parse a file structure.

# NOTE: defining recursive objects in a notebook causes errors
from directory import DirectoryTree, Node
DirectoryTree.schema()
{'title': 'DirectoryTree',
 'description': 'Container class representing a directory tree.\n\nArgs:\n    root (Node): The root node of the tree.',
 'type': 'object',
 'properties': {'root': {'title': 'Root',
   'description': 'Root folder of the directory tree',
   'allOf': [{'$ref': '#/definitions/Node'}]}},
 'required': ['root'],
 'definitions': {'NodeType': {'title': 'NodeType',
   'description': 'Enumeration representing the types of nodes in a filesystem.',
   'enum': ['file', 'folder'],
   'type': 'string'},
  'Node': {'title': 'Node',
   'description': 'Class representing a single node in a filesystem. Can be either a file or a folder.\nNote that a file cannot have children, but a folder can.\n\nArgs:\n    name (str): The name of the node.\n    children (List[Node]): The list of child nodes (if any).\n    node_type (NodeType): The type of the node, either a file or a folder.',
   'type': 'object',
   'properties': {'name': {'title': 'Name',
     'description': 'Name of the folder',
     'type': 'string'},
    'children': {'title': 'Children',
     'description': 'List of children nodes, only applicable for folders, files cannot have children',
     'type': 'array',
     'items': {'$ref': '#/definitions/Node'}},
    'node_type': {'description': 'Either a file or folder, use the name to determine which it could be',
     'default': 'file',
     'allOf': [{'$ref': '#/definitions/NodeType'}]}},
   'required': ['name']}}}
program = OpenAIPydanticProgram.from_defaults(
    output_cls=DirectoryTree,
    prompt_template_str="{input_str}",
    verbose=True,
)
input_str = """
root
β”œβ”€β”€ folder1
β”‚   β”œβ”€β”€ file1.txt
β”‚   └── file2.txt
└── folder2
    β”œβ”€β”€ file3.txt
    └── subfolder1
        └── file4.txt
"""

output = program(input_str=input_str)
Function call: DirectoryTree with args: {
  "root": {
    "name": "root",
    "children": [
      {
        "name": "folder1",
        "children": [
          {
            "name": "file1.txt",
            "children": [],
            "node_type": "file"
          },
          {
            "name": "file2.txt",
            "children": [],
            "node_type": "file"
          }
        ],
        "node_type": "folder"
      },
      {
        "name": "folder2",
        "children": [
          {
            "name": "file3.txt",
            "children": [],
            "node_type": "file"
          },
          {
            "name": "subfolder1",
            "children": [
              {
                "name": "file4.txt",
                "children": [],
                "node_type": "file"
              }
            ],
            "node_type": "folder"
          }
        ],
        "node_type": "folder"
      }
    ],
    "node_type": "folder"
  }
}

The output is a full DirectoryTree structure with recursive Node objects.

output
DirectoryTree(root=Node(name='root', children=[Node(name='folder1', children=[Node(name='file1.txt', children=[], node_type=<NodeType.FILE: 'file'>), Node(name='file2.txt', children=[], node_type=<NodeType.FILE: 'file'>)], node_type=<NodeType.FOLDER: 'folder'>), Node(name='folder2', children=[Node(name='file3.txt', children=[], node_type=<NodeType.FILE: 'file'>), Node(name='subfolder1', children=[Node(name='file4.txt', children=[], node_type=<NodeType.FILE: 'file'>)], node_type=<NodeType.FOLDER: 'folder'>)], node_type=<NodeType.FOLDER: 'folder'>)], node_type=<NodeType.FOLDER: 'folder'>))