MacOS, Media Capture using CoreMediaIO

Posted: July 6, 2015 in MacOS Programming, Video Streaming
Tags: , ,

Perspective

The intended audience of this article are MacOS C++/Obj-C developers and architects, It is assumed that the reader of this article is familiar with object oriented programming and design.

For the purpose of brevity and clarity, thread synchronization aspect is omitted and not discussed in details in this the article.

Introduction

The Objective-C AVFoundation framework is encapsulating media processing ( capture, editing, … ), it is robust, well document and covers most of the A/V use-cases, however, some edge case use-cases are not supported by this framework, for example, being able to directly access the buffers sent out from the device, this, is specifically important when the payload sent out from the device is already muxed and/or compressed, in such cases, AVFoundation ( AVCaptureSession in-specific ) will de-mux and/or decode the payload before making it accessible to the user, to get direct access to the buffers sent out from the device w/o any intermediate intervention we will have to use a lower-level API, namely, the CoreMediaIO.

Apples CoreMediaIO is a low-level C++ framework for accessing and interacting with audio/video devices such as cameras, capture cards and even Mirroring sessions of iOS devices

The problem with CoreMediaIO is lack of documentation, and, the fact that the existing sample code is old and require quite some tinkering to have it compiling with latest SDKs

In this short article I will provide a simple sample code demonstrating capture and format resolution using CoreMediaIO and some AVFoundation

Implementation

CoreMediaIO API are provided through the “CoreMediaIO.framework“, make sure to have it included by the project, and to have “CoreMediaIO/CMIOHardware.h” included/imported.

The first thing we have to do in-order to be able to start capture is to find the device of interest, if we are interested in screen capture ( for example capturing the screen of an attached iOS device ) we need to enable CoreMediaIO ‘DAL’ plug-ins, This, is demonstrated in the following code snap:

void EnableDALDevices()
{
    CMIOObjectPropertyAddress prop = {
        kCMIOHardwarePropertyAllowScreenCaptureDevices,
        kCMIOObjectPropertyScopeGlobal,
        kCMIOObjectPropertyElementMaster
    };

    UInt32 allow = 1;
    CMIOObjectSetPropertyData(kCMIOObjectSystemObject,
                            &prop, 0, NULL,
                            sizeof(allow), &allow );
}

Some devices are added or removed on runtime, to get runtime indications for device addition or removal, an A/V Capture device notification is set using the NSNotificationCenter class, the AVCaptureDevice added/removed is indicated by the ‘object‘ variable of the ‘note‘ ^block argument, This is demonstrated by the following code snap, Be aware that no notifications will be received unless a Run Loop is executed.

NSNotificationCenter *notiCenter = [NSNotificationCenter defaultCenter];
id connObs =[notiCenter addObserverForName:AVCaptureDeviceWasConnectedNotification
                                    object:nil
                                     queue:[NSOperationQueue mainQueue]
                                usingBlock:^(NSNotification *note)
                                            {
                                                // Device addition logic
                                            }];

id disconnObs =[notiCenter addObserverForName:AVCaptureDeviceWasDisconnectedNotification
                                     object:nil
                                        queue:[NSOperationQueue mainQueue]
                                 usingBlock:^(NSNotification *note)
                                            {
                                                // Device removal logic
                                            }];

[[NSRunLoop mainRunLoop] run];
[notiCenter removeObserver:connObs];
[notiCenter removeObserver:disconnObs];

The next step is to enumerate the attached capture devices, this is either done using AVCaptureDevice class of AVFoundation or, directly using CoreMediaIO C++ APIs, each capture device provide an uniquely identifier, in the next code snap, that id will be used to find the device of interest

The Code Snap bellow demonstrate device enumeration using AVFoundation APIs, To filter a specific type of device use the ‘devicesWithMediaType’ method of the AVCaptureDevice class.

// Use the ‘devicesWithMediaType’ to filter devs by media type
// NSArray* devs = [AVCaptureDevice devicesWithMediaType:AVMediaTypeMuxed];
NSArray* devs = [AVCaptureDevice devices];
NSLog(@“devices: %d\n”, (int)[devs count]);

for(AVCaptureDevice* d in devs) {
    NSLog(@“uniqueID: %@\n”, [d uniqueID]);
    NSLog(@“modelID: %@\n”, [d modelID]);
    NSLog(@“description: %@\n”, [d localizedName]);
}

The next step is to find the device we want to use for capture, Capture devices in CoreMediaIO are identified by CMIODeviceID, the following code-snap demonstrate how to resolve the devices CMIODeviceID according to their unique ID which is a-priori known and externally provided.

OSStatus GetPropertyData(CMIOObjectID objID, int32_t sel, CMIOObjectPropertyScope scope,
                         UInt32 qualifierDataSize, const void* qualifierData, UInt32 dataSize,
                         UInt32& dataUsed, void* data) {
    CMIOObjectPropertyAddress addr={ (CMIOObjectPropertySelector)sel, scope,
                                     kCMIOObjectPropertyElementMaster };
    return CMIOObjectGetPropertyData(objID, &addr, qualifierDataSize, qualifierData,
                                     dataSize, &dataUsed, data);
}

OSStatus GetPropertyData(CMIOObjectID objID, int32_t selector, UInt32 qualifierDataSize,
                         const void* qualifierData, UInt32 dataSize, UInt32& dataUsed,
                         void* data) {
    return GetPropertyData(objID, selector, 0, qualifierDataSize,
                         qualifierData, dataSize, dataUsed, data);
}

OSStatus GetPropertyDataSize(CMIOObjectID objID, int32_t sel,
                             CMIOObjectPropertyScope scope, uint32_t& size) {
    CMIOObjectPropertyAddress addr={ (CMIOObjectPropertySelector)sel, scope,
                                     kCMIOObjectPropertyElementMaster };
    return CMIOObjectGetPropertyDataSize(objID, &addr, 0, 0, &size);
}

OSStatus GetPropertyDataSize(CMIOObjectID objID, int32_t selector, uint32_t& size) {
    return GetPropertyDataSize(objID, selector, 0, size);
}

OSStatus GetNumberDevices(uint32_t& cnt) {
    if(0 != GetPropertyDataSize(kCMIOObjectSystemObject, kCMIOHardwarePropertyDevices, cnt))
        return -1;
    cnt /= sizeof(CMIODeviceID);
    return 0;
}

OSStatus GetDevices(uint32_t& cnt, CMIODeviceID* pDevs) {
    OSStatus status;
    uint32_t numberDevices = 0, used = 0;
    if((status = GetNumberDevices(numberDevices)) < 0)
        return status;
    if(numberDevices > (cnt = numberDevices))
        return -1;
    uint32_t size = numberDevices * sizeof(CMIODeviceID);
    return GetPropertyData(kCMIOObjectSystemObject, kCMIOHardwarePropertyDevices,
                         0, NULL, size, used, pDevs);
}

template< const int C_Size >
OSStatus GetDeviceStrProp(CMIOObjectID objID, CMIOObjectPropertySelector sel,
                         char (&pValue)[C_Size]) {
    CFStringRef answer = NULL;
    UInt32     dataUsed= 0;
    OSStatus    status = GetPropertyData(objID, sel, 0, NULL, sizeof(answer),
                                         dataUsed, &answer);
    if(0 == status)// SUCCESS
        CFStringCopyUTF8String(answer, pValue);
    return status;
}

template< const int C_Size >
Boolean CFStringCopyUTF8String(CFStringRef aString, char (&pText)[C_Size]) {
    CFIndex length = CFStringGetLength(aString);
    if(sizeof(pText) < (length + 1))
        return false;
    CFIndex maxSize = CFStringGetMaximumSizeForEncoding(length, kCFStringEncodingUTF8);
    return CFStringGetCString(aString, pText, maxSize, kCFStringEncodingUTF8);
}

Utility methods

OSStatus FindDeviceByUniqueId(const char* pUID, CMIODeviceID& devId) {
    OSStatus status = 0;
    uint32_t numDev = 0;
    if(((status = GetNumberDevices(numDev)) < 0) || (0 == numDev))
        return status;
    // Allocate memory on the stack
    CMIODeviceID* pDevs = (CMIODeviceID*)alloca(numDev * sizeof(*pDevs));
    if((status = GetDevices(numDev, pDevs)) < 0)
        return status;
    for(uint32_t i = 0; i < numDev; i++) {
        char pUniqueID[64];
        if((status = GetDeviceStrProp(pDevs[i], kCMIODevicePropertyDeviceUID, pUniqueID)) < 0)
            break;
        status = afpObjectNotFound;// Not Found…
        if(0 != strcmp(pUID, pUniqueID))
            continue;
        devId = pDevs[i];
        return 0;
    }
    return status;
}

Device resolution by UID

CoreMediaIO Capture devices expose streams, each such stream is a data source and is indicated using a CMIOStreamID type, one stream might provide Video payload, another can provide Audio payload and others might provide multiplexed payload, while capturing we have to select a stream and start pumping out data, the following code-snap demonstrate how to enumerate the available streams for a given device ( indicated by it’s CMIODeviceID ) and how to resolve the payload format.

uint32_t GetNumberInputStreams(CMIODeviceID devID)
{
    uint32 size = 0;
    GetPropertyDataSize(devID, kCMIODevicePropertyStreams,
                        kCMIODevicePropertyScopeInput, size);
    return size / sizeof(CMIOStreamID);
}

OSStatus GetInputStreams(CMIODeviceID devID, uint32_t&
                        ioNumberStreams, CMIOStreamID* streamList)
{
    ioNumberStreams = std::min(GetNumberInputStreams(devID), ioNumberStreams);
    uint32_t size     = ioNumberStreams * sizeof(CMIOStreamID);
    uint32_t dataUsed = 0;
    OSStatus err = GetPropertyData(devID, kCMIODevicePropertyStreams,
                                    kCMIODevicePropertyScopeInput, 0,
                                    NULL, size, dataUsed, streamList);
    if(0 != err)
        return err;
    ioNumberStreams = size / sizeof(CMIOStreamID);
    CMIOStreamID* firstItem = &(streamList[0]);
    CMIOStreamID* lastItem = firstItem + ioNumberStreams;
    std::sort(firstItem, lastItem);
    return 0;
}

Utility methods

CMIODeviceID devId;
FindDeviceByUniqueId(“4e58df701eb87”, devId);

uint32_t numStreams = GetNumberInputStreams(devId);
CMIOStreamID* pStreams = (CMIOStreamID*)alloca(numStreams * sizeof(CMIOStreamID));
GetInputStreams(devId, numStreams, pStreams);
for(uint32_t i = 0; i < numStreams; i++) {
    CMFormatDescriptionRef fmt = 0;
    uint32_t                used;
    GetPropertyData(pStreams[i], kCMIOStreamPropertyFormatDescription,
                    0, NULL, sizeof(fmt), used, &fmt);
    CMMediaType mt     = CMFormatDescriptionGetMediaType(fmt);
    uint8_t     null1 = 0;// ‘mt’ is a 4 char string, we use ‘null1’ so
                         // it could be printed.
    FourCharCode fourcc= CMFormatDescriptionGetMediaSubType(fmt);
    uint8_t     null2 = 0;// ‘fourcc’ is a 4 char string, we use ‘null1’
                         // so it could be printed.
    printf(“media type: %s\nmedia sub type: %s\n”, (char*)&mt, (char*)&fourcc);
}

Stream format resolution

The next and final stage is to start pumping data out of the stream, this is done by registering a callback to be called upon by CoreMediaIO with the sampled payload, the following code-snap demonstrate how this is done and how to get access to the raw payload bytes.

CMSimpleQueueRef    queueRef = 0;// The queue that will be used to
                                 // process the incoming data
CMIOStreamCopyBufferQueue(strmID, [](CMIOStreamID streamID, void*, void* refCon) {
    // The callback ( lambda in out case ) being called by CoreMediaIO
    CMSimpleQueueRef queueRef = *(CMSimpleQueueRef*)refCon;
    CMSampleBufferRef sb = 0;
    while(0 != (sb = (CMSampleBufferRef)CMSimpleQueueDequeue(queueRef))) {
        size_t            len     = 0;// The ‘len’ of our payload
        size_t            lenTotal = 0;
        char*             pPayload = 0;// This is where the RAW media
                                     // data will be stored
        const CMTime     ts         = CMSampleBufferGetOutputPresentationTimeStamp(sb);
        const double     dSecTime = (double)ts.value / (double)ts.timescale;
        CMBlockBufferRef bufRef     = CMSampleBufferGetDataBuffer(sb);
        CMBlockBufferGetDataPointer(bufRef, 0, &len, &lenTotal, &pPayload);
        assert(len == lenTotal);
        // TBD: Process ‘len’ bytes of ‘pPayload’
    }
}, &queueRef, &queueRef);

One last thing to note, on more tan few cases the actual capture format is not available until the first sample is sent, in such cases it should be resolved upon first sample reception, the following code-snap demonstrate how to resolve Audio sample format using CMSampleBufferRef, the same can be done for video and other media types with a little more effort.

bool PrintAudioFormat(CMSampleBufferRef sb)
{
    CMFormatDescriptionRef    fmt    = CMSampleBufferGetFormatDescription(sb);
    CMMediaType                mt    = CMFormatDescriptionGetMediaType(fmt);

    if(kCMMediaType_Audio != mt) {
        printf(“Not an audio sample\n”);
        return false;
    }
    
    CMAudioFormatDescriptionRef afmt = (CMAudioFormatDescriptionRef)fmt;
    const auto pAud = CMAudioFormatDescriptionGetStreamBasicDescription(afmt);
    if(0 == pAud)
        return false;
    // We are expecting PCM Audio
    if(‘lpcm’ != pAud->mFormatID)// ‘pAud->mFormatID’ == fourCC
        return false;// Not a supported format
    printf(“mChannelsPerFrame: %d\nmSampleRate: %.1f\n”\
            “mBytesPerFrame: %d\nmBitsPerChannel: %d\n”,
         pAud->mChannelsPerFrame, pAud->mSampleRate,
         pAud->mBytesPerFrame, pAud->mBitsPerChannel);
    return true;
}

Final words

What provided in this article is just a glimpse of what is doable with CoreMediaIO, further information of can be found in the reference links bellow.

References

CoreMediaIO, AVFoundation, AVCaptureSession, NSNotificationCenter, Run Loop, AVCaptureDevice

Advertisements
Comments
  1. Dhanesh says:

    I am following this blog as Me to want to mirror the iOS device screen. But as I go through this, I have few doubts,

    You only used AVDeviceCapture for enumeration, cant we use the same for screen mirroring ?

    Like

    • nadavrub says:

      When enabling a Mirroring devices ( using ‘EnableDALDevices()’ ) the devices are not immediately added and [AVCaptureDevice devices] doesn’t list them right away, this is why ‘NSNotificationCenter’ is to be used to ~know~ when mirroring devices are available/loaded, that point on you can use [AVCaptureDevice devices] to get the mirroring devices ( along all other capture devices )

      NOTE: As of the time this blog was written, OS-X Device Mirroring is bogus, see the following link for details: http://stackoverflow.com/questions/32295832/coremediaio-an-alleged-os-bug-memory-leak

      Like

  2. Hi Nadav,

    Thanks a lot for posting this, I think this is the only place all over the net that shows some code for how to mirror iOS device seen on OSX.

    Following your code example I reached CMIOStreamCopyBufferQueue callback and now trying to understand how to decode the “muxx” / “isr” media format that is returned from my iPAD into something like frames buffer and then to JPG.

    Any chance that you can guide me on how to achieve this ?

    Like

    • nadavrub says:

      The video coming through w/ the callback is h264 encoded ( Annex B ), to make use of it u must extract the SPS and PPS NALUs, then, have these NALUs passed through to your h264 decider of choice as the first packets, these, should follow the rest of the h264 NALUs, if dealing w/ DShow/MF the SPS & PPS should be part of the media-type, as for the audio, it’s AAC, simply pass it through.

      However, if the only thing u need is snapping a JPEG image i recommend u to enable the mirroing devices as described in the post, use the notification center to et informed when the mirroring device was add, that point on it can B treated as a normal capture device, C the AVRecorder sample for std capture device usage

      Like

  3. Hi Nadav, great post… actually the only one all over the net that deals with mirroring iOS to OSX. Any chance that you can post an example of how to decode a “muxx” / “isr” buffer received from an iOS device in the CMIOStreamCopyBufferQueue callback, to something like JPG images for each frame?

    Like

  4. dinhnguyen says:

    Hi, can we use this approach for recording multiple devices at a time? I tried to create 2 instances of the app to record 2 devices plugged in. But only one is recorded.

    Like

    • nadavrub says:

      Yeah, that is the limitation, OS-X is limiting mirroring to a single device on the driver level, if you want to go around it you can either implement CarPlay ( mandate MFi membership & Appls Auth Co processor incorporation ), OR, reverse engineer the mirroring protocol ( should be similar to the AirPlay protocol that was already reverse-engineered ).

      Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s