January 21st, 2024

Reading and Writing Spatial Photos with Image I/O

A spatial photo in the Apple Vision Pro simulator.

This is a follow up to my previous post on Reading and Writing Spatial Video with AVFoundation.

Writing Spatial Photos
Reading Spatial Photos

Spatial photos are even less documented than spatial video, however it’s not too difficult to create your own spatial photos once you know the basics. While spatial videos are contained in MV-HEVC video files, spatial photos are stored in a HEIC file. HEIC files can contain multiple images and metadata, and in the case of spatial photo, each spatial photo file contains two images: one for the left eye and one for the right. Once you know what metadata you need to add, spatial photos become quite easy to create using the Image I/O framework.

Writing Spatial Photos

To start out, you will need two CGImages, one representing each eye. In this case we are using a solid red image for the left eye and a solid blue image for the left eye, but in practice you will most likely render a scene with a camera offset or capture two images using a stereo camera system:

let imageSize = CGRect(x: 0, y: 0, width: 3072, height: 3072)
let leftImage = CIContext().createCGImage(.red, from: imageSize)!
let rightImage = CIContext().createCGImage(.blue, from: imageSize)!

Create a new image destination pointing towards an output URL, specifying “public.heic” for the type identifier and “2” for the image count:

let destination = CGImageDestinationCreateWithURL(url as CFURL, UTType.heic.identifier as CFString, 2, nil)!

Create a properties dictionary. This will contain information on the left and right image index as well as a matrix that relates a camera’s internal properties to an ideal pinhole-camera model. The camera intrinsics may be specific to the technique you are using to generate stereo images:

let properties = [
    kCGImagePropertyGroups: [
        kCGImagePropertyGroupIndex: 0,
        kCGImagePropertyGroupType: kCGImagePropertyGroupTypeStereoPair,
        kCGImagePropertyGroupImageIndexLeft: 0,
        kCGImagePropertyGroupImageIndexRight: 1,
    ],
    kCGImagePropertyHEIFDictionary: [
        kIIOMetadata_CameraModelKey: [
            kIIOCameraModel_Intrinsics: [
                1676.249856948853, 0, 1536,
                0, 1676.249856948853, 1536,
                0, 0, 1
            ] as CFArray
        ]
    ]
]

Add the left and right image to you image destination, passing the properties to each call:

CGImageDestinationAddImage(destination, leftImage, properties as CFDictionary)
CGImageDestinationAddImage(destination, rightImage, properties as CFDictionary)

Finalize the image:

CGImageDestinationFinalize(destination)

You should now have a spatial photo stored at url that you can view in the Apple Vision Pro simulator. It is worth noting there are other property values you may specify, however the ones above are the bare minimum for getting a spatial photo to display in the simulator. The example spatial photo from Apple contains values for camera extrinsics (with a position offset indicating it was taken with a two-camera system) as well as a horizontal disparity adjustment.

Reading Spatial Photos

Reading a spatial photo is as easy as creating a CGImageSource pointing to the spatial photo URL and copying the left and right image from it:

let source = CGImageSourceCreateWithURL(url as CFURL, nil)!
let leftImage: CGImage = CGImageSourceCreateImageAtIndex(source, 0, nil)!
let rightImage: CGImage = CGImageSourceCreateImageAtIndex(source, 1, nil)!