Archive for the ‘Video Streaming’ Category


The intended audience of this article are MacOS C++/Obj-C developers and architects, It is assumed that the reader of this article is familiar with object oriented programming and design.

For the purpose of brevity and clarity, thread synchronization aspect is omitted and not discussed in details in this the article.


The Objective-C AVFoundation framework is encapsulating media processing ( capture, editing, … ), it is robust, well document and covers most of the A/V use-cases, however, some edge case use-cases are not supported by this framework, for example, being able to directly access the buffers sent out from the device, this, is specifically important when the payload sent out from the device is already muxed and/or compressed, in such cases, AVFoundation ( AVCaptureSession in-specific ) will de-mux and/or decode the payload before making it accessible to the user, to get direct access to the buffers sent out from the device w/o any intermediate intervention we will have to use a lower-level API, namely, the CoreMediaIO.

Apples CoreMediaIO is a low-level C++ framework for accessing and interacting with audio/video devices such as cameras, capture cards and even Mirroring sessions of iOS devices

The problem with CoreMediaIO is lack of documentation, and, the fact that the existing sample code is old and require quite some tinkering to have it compiling with latest SDKs

In this short article I will provide a simple sample code demonstrating capture and format resolution using CoreMediaIO and some AVFoundation


CoreMediaIO API are provided through the “CoreMediaIO.framework“, make sure to have it included by the project, and to have “CoreMediaIO/CMIOHardware.h” included/imported.

The first thing we have to do in-order to be able to start capture is to find the device of interest, if we are interested in screen capture ( for example capturing the screen of an attached iOS device ) we need to enable CoreMediaIO ‘DAL’ plug-ins, This, is demonstrated in the following code snap:

void EnableDALDevices()
    CMIOObjectPropertyAddress prop = {

    UInt32 allow = 1;
                            &prop, 0, NULL,
                            sizeof(allow), &allow );

Some devices are added or removed on runtime, to get runtime indications for device addition or removal, an A/V Capture device notification is set using the NSNotificationCenter class, the AVCaptureDevice added/removed is indicated by the ‘object‘ variable of the ‘note‘ ^block argument, This is demonstrated by the following code snap, Be aware that no notifications will be received unless a Run Loop is executed.

NSNotificationCenter *notiCenter = [NSNotificationCenter defaultCenter];
id connObs =[notiCenter addObserverForName:AVCaptureDeviceWasConnectedNotification
                                     queue:[NSOperationQueue mainQueue]
                                usingBlock:^(NSNotification *note)
                                                // Device addition logic

id disconnObs =[notiCenter addObserverForName:AVCaptureDeviceWasDisconnectedNotification
                                        queue:[NSOperationQueue mainQueue]
                                 usingBlock:^(NSNotification *note)
                                                // Device removal logic

[[NSRunLoop mainRunLoop] run];
[notiCenter removeObserver:connObs];
[notiCenter removeObserver:disconnObs];

The next step is to enumerate the attached capture devices, this is either done using AVCaptureDevice class of AVFoundation or, directly using CoreMediaIO C++ APIs, each capture device provide an uniquely identifier, in the next code snap, that id will be used to find the device of interest

The Code Snap bellow demonstrate device enumeration using AVFoundation APIs, To filter a specific type of device use the ‘devicesWithMediaType’ method of the AVCaptureDevice class.

// Use the ‘devicesWithMediaType’ to filter devs by media type
// NSArray* devs = [AVCaptureDevice devicesWithMediaType:AVMediaTypeMuxed];
NSArray* devs = [AVCaptureDevice devices];
NSLog(@“devices: %d\n”, (int)[devs count]);

for(AVCaptureDevice* d in devs) {
    NSLog(@“uniqueID: %@\n”, [d uniqueID]);
    NSLog(@“modelID: %@\n”, [d modelID]);
    NSLog(@“description: %@\n”, [d localizedName]);

The next step is to find the device we want to use for capture, Capture devices in CoreMediaIO are identified by CMIODeviceID, the following code-snap demonstrate how to resolve the devices CMIODeviceID according to their unique ID which is a-priori known and externally provided.

OSStatus GetPropertyData(CMIOObjectID objID, int32_t sel, CMIOObjectPropertyScope scope,
                         UInt32 qualifierDataSize, const void* qualifierData, UInt32 dataSize,
                         UInt32& dataUsed, void* data) {
    CMIOObjectPropertyAddress addr={ (CMIOObjectPropertySelector)sel, scope,
                                     kCMIOObjectPropertyElementMaster };
    return CMIOObjectGetPropertyData(objID, &addr, qualifierDataSize, qualifierData,
                                     dataSize, &dataUsed, data);

OSStatus GetPropertyData(CMIOObjectID objID, int32_t selector, UInt32 qualifierDataSize,
                         const void* qualifierData, UInt32 dataSize, UInt32& dataUsed,
                         void* data) {
    return GetPropertyData(objID, selector, 0, qualifierDataSize,
                         qualifierData, dataSize, dataUsed, data);

OSStatus GetPropertyDataSize(CMIOObjectID objID, int32_t sel,
                             CMIOObjectPropertyScope scope, uint32_t& size) {
    CMIOObjectPropertyAddress addr={ (CMIOObjectPropertySelector)sel, scope,
                                     kCMIOObjectPropertyElementMaster };
    return CMIOObjectGetPropertyDataSize(objID, &addr, 0, 0, &size);

OSStatus GetPropertyDataSize(CMIOObjectID objID, int32_t selector, uint32_t& size) {
    return GetPropertyDataSize(objID, selector, 0, size);

OSStatus GetNumberDevices(uint32_t& cnt) {
    if(0 != GetPropertyDataSize(kCMIOObjectSystemObject, kCMIOHardwarePropertyDevices, cnt))
        return -1;
    cnt /= sizeof(CMIODeviceID);
    return 0;

OSStatus GetDevices(uint32_t& cnt, CMIODeviceID* pDevs) {
    OSStatus status;
    uint32_t numberDevices = 0, used = 0;
    if((status = GetNumberDevices(numberDevices)) < 0)
        return status;
    if(numberDevices > (cnt = numberDevices))
        return -1;
    uint32_t size = numberDevices * sizeof(CMIODeviceID);
    return GetPropertyData(kCMIOObjectSystemObject, kCMIOHardwarePropertyDevices,
                         0, NULL, size, used, pDevs);

template< const int C_Size >
OSStatus GetDeviceStrProp(CMIOObjectID objID, CMIOObjectPropertySelector sel,
                         char (&pValue)[C_Size]) {
    CFStringRef answer = NULL;
    UInt32     dataUsed= 0;
    OSStatus    status = GetPropertyData(objID, sel, 0, NULL, sizeof(answer),
                                         dataUsed, &answer);
    if(0 == status)// SUCCESS
        CFStringCopyUTF8String(answer, pValue);
    return status;

template< const int C_Size >
Boolean CFStringCopyUTF8String(CFStringRef aString, char (&pText)[C_Size]) {
    CFIndex length = CFStringGetLength(aString);
    if(sizeof(pText) < (length + 1))
        return false;
    CFIndex maxSize = CFStringGetMaximumSizeForEncoding(length, kCFStringEncodingUTF8);
    return CFStringGetCString(aString, pText, maxSize, kCFStringEncodingUTF8);

Utility methods

OSStatus FindDeviceByUniqueId(const char* pUID, CMIODeviceID& devId) {
    OSStatus status = 0;
    uint32_t numDev = 0;
    if(((status = GetNumberDevices(numDev)) < 0) || (0 == numDev))
        return status;
    // Allocate memory on the stack
    CMIODeviceID* pDevs = (CMIODeviceID*)alloca(numDev * sizeof(*pDevs));
    if((status = GetDevices(numDev, pDevs)) < 0)
        return status;
    for(uint32_t i = 0; i < numDev; i++) {
        char pUniqueID[64];
        if((status = GetDeviceStrProp(pDevs[i], kCMIODevicePropertyDeviceUID, pUniqueID)) < 0)
        status = afpObjectNotFound;// Not Found…
        if(0 != strcmp(pUID, pUniqueID))
        devId = pDevs[i];
        return 0;
    return status;

Device resolution by UID

CoreMediaIO Capture devices expose streams, each such stream is a data source and is indicated using a CMIOStreamID type, one stream might provide Video payload, another can provide Audio payload and others might provide multiplexed payload, while capturing we have to select a stream and start pumping out data, the following code-snap demonstrate how to enumerate the available streams for a given device ( indicated by it’s CMIODeviceID ) and how to resolve the payload format.

uint32_t GetNumberInputStreams(CMIODeviceID devID)
    uint32 size = 0;
    GetPropertyDataSize(devID, kCMIODevicePropertyStreams,
                        kCMIODevicePropertyScopeInput, size);
    return size / sizeof(CMIOStreamID);

OSStatus GetInputStreams(CMIODeviceID devID, uint32_t&
                        ioNumberStreams, CMIOStreamID* streamList)
    ioNumberStreams = std::min(GetNumberInputStreams(devID), ioNumberStreams);
    uint32_t size     = ioNumberStreams * sizeof(CMIOStreamID);
    uint32_t dataUsed = 0;
    OSStatus err = GetPropertyData(devID, kCMIODevicePropertyStreams,
                                    kCMIODevicePropertyScopeInput, 0,
                                    NULL, size, dataUsed, streamList);
    if(0 != err)
        return err;
    ioNumberStreams = size / sizeof(CMIOStreamID);
    CMIOStreamID* firstItem = &(streamList[0]);
    CMIOStreamID* lastItem = firstItem + ioNumberStreams;
    std::sort(firstItem, lastItem);
    return 0;

Utility methods

CMIODeviceID devId;
FindDeviceByUniqueId(“4e58df701eb87”, devId);

uint32_t numStreams = GetNumberInputStreams(devId);
CMIOStreamID* pStreams = (CMIOStreamID*)alloca(numStreams * sizeof(CMIOStreamID));
GetInputStreams(devId, numStreams, pStreams);
for(uint32_t i = 0; i < numStreams; i++) {
    CMFormatDescriptionRef fmt = 0;
    uint32_t                used;
    GetPropertyData(pStreams[i], kCMIOStreamPropertyFormatDescription,
                    0, NULL, sizeof(fmt), used, &fmt);
    CMMediaType mt     = CMFormatDescriptionGetMediaType(fmt);
    uint8_t     null1 = 0;// ‘mt’ is a 4 char string, we use ‘null1’ so
                         // it could be printed.
    FourCharCode fourcc= CMFormatDescriptionGetMediaSubType(fmt);
    uint8_t     null2 = 0;// ‘fourcc’ is a 4 char string, we use ‘null1’
                         // so it could be printed.
    printf(“media type: %s\nmedia sub type: %s\n”, (char*)&mt, (char*)&fourcc);

Stream format resolution

The next and final stage is to start pumping data out of the stream, this is done by registering a callback to be called upon by CoreMediaIO with the sampled payload, the following code-snap demonstrate how this is done and how to get access to the raw payload bytes.

CMSimpleQueueRef    queueRef = 0;// The queue that will be used to
                                 // process the incoming data
CMIOStreamCopyBufferQueue(strmID, [](CMIOStreamID streamID, void*, void* refCon) {
    // The callback ( lambda in out case ) being called by CoreMediaIO
    CMSimpleQueueRef queueRef = *(CMSimpleQueueRef*)refCon;
    CMSampleBufferRef sb = 0;
    while(0 != (sb = (CMSampleBufferRef)CMSimpleQueueDequeue(queueRef))) {
        size_t            len     = 0;// The ‘len’ of our payload
        size_t            lenTotal = 0;
        char*             pPayload = 0;// This is where the RAW media
                                     // data will be stored
        const CMTime     ts         = CMSampleBufferGetOutputPresentationTimeStamp(sb);
        const double     dSecTime = (double)ts.value / (double)ts.timescale;
        CMBlockBufferRef bufRef     = CMSampleBufferGetDataBuffer(sb);
        CMBlockBufferGetDataPointer(bufRef, 0, &len, &lenTotal, &pPayload);
        assert(len == lenTotal);
        // TBD: Process ‘len’ bytes of ‘pPayload’
}, &queueRef, &queueRef);

One last thing to note, on more tan few cases the actual capture format is not available until the first sample is sent, in such cases it should be resolved upon first sample reception, the following code-snap demonstrate how to resolve Audio sample format using CMSampleBufferRef, the same can be done for video and other media types with a little more effort.

bool PrintAudioFormat(CMSampleBufferRef sb)
    CMFormatDescriptionRef    fmt    = CMSampleBufferGetFormatDescription(sb);
    CMMediaType                mt    = CMFormatDescriptionGetMediaType(fmt);

    if(kCMMediaType_Audio != mt) {
        printf(“Not an audio sample\n”);
        return false;
    CMAudioFormatDescriptionRef afmt = (CMAudioFormatDescriptionRef)fmt;
    const auto pAud = CMAudioFormatDescriptionGetStreamBasicDescription(afmt);
    if(0 == pAud)
        return false;
    // We are expecting PCM Audio
    if(‘lpcm’ != pAud->mFormatID)// ‘pAud->mFormatID’ == fourCC
        return false;// Not a supported format
    printf(“mChannelsPerFrame: %d\nmSampleRate: %.1f\n”\
            “mBytesPerFrame: %d\nmBitsPerChannel: %d\n”,
         pAud->mChannelsPerFrame, pAud->mSampleRate,
         pAud->mBytesPerFrame, pAud->mBitsPerChannel);
    return true;

Final words

What provided in this article is just a glimpse of what is doable with CoreMediaIO, further information of can be found in the reference links bellow.


CoreMediaIO, AVFoundation, AVCaptureSession, NSNotificationCenter, Run Loop, AVCaptureDevice


MP4 is a widely used container format for multimedia files, it is an extension of Apple’s QuickTime file format, and is agnostic to the actual codec used for encoding, It can contain multiple streams of video, audio and data ( eg. subtitles ).

MP4 files are broken into two main parts, the payload part where interleaved audio & video are stored and the metadata part where information describing the payload is stored,
that information consists, for example, of the available streams, their payload format/compression type, …

So what are we trying to solve?

MP4 Metadata of specific importance is the file Index, the index is pointing to the offset of the file where the payload ( eg. video ) of a specific time is found, this way, the player knows where the payload for the first video frame is found, and, what data to play at a given time.

The following present a high-level view of the MP4 file structure:

When ~recording~ a video file, the duration of the file and amount of recorded data can ( obviously ) be known only once recording has finished, and thus, the Index is stored at the end of the file.

MP4 files are commonly used on web sites for Video playback, To play the file, a player ( eg. Web Browser ) must read the file from the remote site, files are read sequentially, starting at offset zero.

A player must read the Index before processing any video payload, and thus, must read the file up to it’s end ( where the index reside ) before being able to present the first video frame, for big MP4 files, this limitation might cause playback to start a long time after the play button was actually clicked rendering a poor the user experience.

In this article I will show how to reduce playback latency to a minimum by moving the metadata chunk from the end of the file to it’s start making it available for the player to consume before the first video payload is read, and thus, enabling playback to commence before the file was fully downloaded to the client machine ( also known as progressive download ).

Basic File structure

In accordance with Chapter 1 ( Page 17 ) of the QuickTime file format, The basic structure of MP4 files consists of a data chunk of data called an ATOM, each ATOM has a unique id ( uuid ) and size ( in bytes ).

Specific ATOMs contain data, and others contain a set of other child ATOMs.
ATOMs can have a ‘size’ indicator of either 32bit or 64bit, in this article we assume a 64bit size indicator, the following is the 64bit ATOM structure:

    struct ATOM {
        UINT64	size;
        union {
            UINT uuid;
            CHAR name[4];
        } type;

The following figure present a typical ATOM hierarchy:

There are three types of atoms we will need to deal with in specific

‘mdat’ >This atom is used to hold the raw media data such as compressed Video & Audio samples, the media data is stored according to time in an interleaved fasion as can be seen in the figure to the right:
‘moov’ Holds all metadata related with the media file, “it is essentially a container of other atoms. These atoms, taken together, describe the contents of a movie. At the highest level, movie atoms typically contain track atoms, which in turn contain media atoms. At the lowest level are the leaf atoms, which contain non-atom data, usually in the form of a table or a set of data elements. For example, a track atom contains an edit atom, which in turn contains an edit list atom, a leaf atom which contains data in the form of an edit list table.”
‘stco’ An indirect child of the ‘moov’ atom, available on a per media stream ( ‘trak’ atom ) basis, pointing to the offset of the media payload directly in the ‘mdat’ section, the following simplified diagram present a possible configuration:


Moving the Metadata ( ‘moov’ ) ATOM to the beginning of the file require modification of the ‘stco’ offsets so they will be aligned with the new ‘mdat’ position.

The process is finalized by iterating through all of the ‘stco’ ATOMs and updating the offsets after moving the ‘moov’ ATOM to the beginning of the file.

The project consists of A single ‘.cpp’ file implementing the logic described in this article, for simplicity memory mapped files were used for file modification and access

While developing on Windows OS, The code implementation was made as simple as possible so it could easily be ported to any platform.


QuickTime file format
Movie Atoms
MP4 Spec