This Article is designated for developers with C++ & Objective-C experience, A basic knowledge of iOS Operating System is assumed.


In this article I will describe an approach enabling code injection to an existing AppStore Application, In a nut-shell, this involves breaking FairPlay encryption, Injecting code, re-signing and re-packaging the App.

Once the process described in this article is complete, the App ( having the injected code on board ) can be freely installed on any type of device, whether Jail-broken OR NOT.

High-level Flow diagram


  • A developer account with Apple
    This is required so after Injecting the DynamicLib the application could get re-packaged and installed on the iOS device, use this link to create one.
  • X-code 6 ( supporting iOS Framework Compilation )
    With X-code 6 it is possible to create custom frameworks, which are, in essence, DynamicLibraries.
  • Two iOS device
    Two devices are required for the process, one, that will run the patched app, and another, JailBroken device with cydia installed where FairPlay is Removed.
  • Mach-O View
    An application used to parse Apple Executable files, use this link to download.

The Mach-O File format

Executable files on OS-X & iOS are stored using the Mach-O format, this executable format is specific for apple and is used to store multiple instances of the binary code, each using a different instruction set ( x86, ARM, … ) along with metadata telling the OS how the code is to be executed.

The Mach-O file format consists of a header followed by a set of commands and one or more segments, Of specific interest is the set of load commands, I will cover this later in details

Application Packaging

Applications are packaged in bundles, a bundle is a ZIP file with an ‘.IPA’ extension, these, contain the Program/Framework executable, and, all of the related resources such as Strings, Images, NIBs, and most importantly the developer certificates and application Entitlements.

A simple IPA is presented bellow, this IPA consists of the Mach-O Executable ( named Tester ), the program resources, located under the Base.lproj folder, the provisioning profile ( named mbedded.mobileprovision ) where the authorized devices are defined,

the Entitlements file ( named archived-expanded-entitlements.xcent ) where specific application capabilities and or security permissions are defined, and ‘Info.plist‘ where the bundle id and other application specific properties are set, later on, these are used to re-sign an re-pack the IPA with the Injected code.

FairPlay Workaround

FairPlay is apples DRM used to protect applications downloaded from the AppStore, it prevents execution of the application on unauthorized devices.

FairPlay removal requires a JailBroken device, Having Cydia and the Clutch tool installed, I have found this site to contain the most up-to-date iOS JailBreaks, high level instructions of the process are provided by this Tutorial, next, I will provide explanation regarding FairPlay Mach-O protection, How Clutch is removing encryption, and, why a Jailbroken device is needed.

Mach-O Executables are signed with the developer certificate, this certificate is used in conjunction with other OS information during FairPlay decryption, The code injection process mandate Mach-O modifications, and these, mandate re-signing with a different certificate ( eg. a custom developer certificate ), this is why it is needed to remove FairPlay before doing any change to the Mach-O Executable.

FairPlay encrypts part of the code segment of the Mach-O executable, the encrypted region is indicated using the LC_ENCRYPTION_INFO load command ( consisting of the ‘encryption_info_command’ structure ) as can be seen in the following Mach-O View snap:

Having ‘encryption_info_command::cryptid’ set to a value different than zero indicate that the Mach-O executable is FairPlay protected.
During the process of Mach-O loading, the operating system decrypt the protected region, and thus, the binary code reside un-encrypted in memory during execution, Clutch is taking advantage of that fact by starting the application, and, once loaded in memory, copies the decrypted section back to the physical Mach-O file replacing the encrypted segment, obviously, accessing another process memory mandate root privileges, and this, is why a Jailbroken device is needed.

Code Injection

Once we have FairPlay removed and the Mach-O is un-encrypted we can inject custom code to be executed on behalf of the application ( at the application sandbox ), this is done by having the code compiled as a framework ( consisting of a DynamicLib ), and, having a LC_LOAD_DYLIB command ( referring that DynamicLib ) added to the Mach-O executable.

As can be seen on the image to the right, With X-code 6, Creating an iOS DynamicLib was made much simpler, with older X-code versions, one would have to tinker through with the solution files to have X-code produce iOS compatible DynamicLib, X-code 6 has introduced a new type of project, the “Cocoa Touch Framework”, which in essence, is a DynamicLib packed together with it’s associated resources.

The compiler attribute “__attribute__((constructor))” is used to ensure the Framework code execute upon ‘module loading’/’application start-up’, this is illustrated bellow:

#import <Foundation/Foundation.h>
#import <UIKit/UIKit.h>

void EntryPoint() {
    NSLog(@“Injected routine…\n”);
    dispatch_async(dispatch_get_main_queue(), ^{
        UIAlertView *alert = [[UIAlertView alloc] initWithTitle:@“Hello”
                                                        message:@“Code Injected”
        [alert show];

The framework code injected to the application executable might have references to common frameworks, for example, both, the Injected code and the Application code might use/refer UIKit.framework, this is a tricky situation since only a single framework version is loaded during execution, using different versions of the same type of framework might cause un-predictable behaviour, having that said, it is essential, either to have the injected code dependent on a minimal set of external frameworks, OR, to make sure to compile the Injected code with the same frameworks used by the application being injected too.

Once the framework to be injected was generated, the Application IPA ( Zip file ) must be extracted and the Injected framework must be copied to the Application executable folder so later on it could get loaded in to the application memory address space, this is demonstrated on the image to the right.
The Mach-O executable is then to be added a load command referring the Injected Framework, this, will make the OS load the framework when the Mach-O executable is lunched.

After the injection has taken place, the Application Executable looks as follows ( in red is the path to the Injected Framework ), Note the ‘@executable_path’ notation, this notation is used resemble a path relative to the current executing Mach-O.

The Injected framework is added as the last LC_LOAD_DYLIB Command to ensure that all the application dependent modules were already loaded.

Sample Code ( download )

This Article is accompanied a simple single file sample-code implementing the Injection logic, the following include an explanation of the major parts of the code.

As stated before, In-order to be able to inject the framework the Mach-O must be un-encrypted ( otherwise signature verification will fail, and, it’ll not get loaded by the OS ), the following code-snap iterate through all of the commands until it finds the ‘LC_ENCRYPTION_INFO‘ command, if found, it verify no encryption is applied by evaluating ‘cryptid‘, the file is concluded un-encrypted if either ‘cryptid‘ is evaluated to zero, OR, when the ‘LC_ENCRYPTION_INFO‘ command is not found.

m_pCmdFirst‘ in the code bellow is found immediately after the mach_header ( whether 64 bit or not )

template< typename T >// Supports both the 32bit and 64 bit versions of ‘mach_header’
bool MachOParser<T>::IsMachOEncrypted() {
    uint8_t*        pPtr = (uint8_t*)m_pCmdFirst;
    load_command*    pCmd = m_pCmdFirst;
    for (uint32_t i = 0;
        i < m_pMachO->ncmds;
        i++, pPtr += pCmd->cmdsize, pCmd = (load_command*)pPtr)
        if (LC_ENCRYPTION_INFO != pCmd->cmd)
        if (0 != ((encryption_info_command*)pCmd)->cryptid)
            return true;
        break;// We have found the encryption info section, no need to keep on searching
    return false;

The next code-snap is responsible for injecting the ‘dylib_command‘ as the last ‘dylib_command‘ of the Mach-O, this is done using ‘m_pCmdLastLoadLib ‘ which is pre-initialized to the last ‘dylib_command‘ upon Mach-O loading ( see the ‘ReloadCommands()‘ method of the accompanied code ).

cmdInjected‘ define the injected command, important variables are ‘dylib_command::cmd‘ that indicate the command type, and, ‘dylib_command::dylib::name::offset‘ that indicate an offset to where the name of the framework is located within the Mach-O, relative to the beginning of the ‘dylib_command‘ command, in our case, the name of the framework is directly located after the command, and thus, the relative offset is ‘sizeof(dylib_command)

One more thing to note, all commands must be aligned to 4 bytes length, and thus, the name of the framework is padded with zeros making sure it is appropriately aligned.

m_vecCommands‘ is pre-initialized with all commands upon Mach-O loading ( see the ‘ReloadCommands()‘ method of the accompanied code ).

template< typename T >// Supports both the 32bit and 64 bit versions of ‘mach_header’
int MachOParser<T>::InjectDyLib(const char* pDynLibPath) {
    union {
        dylib_command    cmdInjected;
        char            __pRaw__[512];
    cmdInjected.cmd = LC_LOAD_DYLIB;
    cmdInjected.dylib.compatibility_version = 0x00010000;
    cmdInjected.dylib.current_version = 0x00020000;
    cmdInjected.dylib.timestamp = 2; = (uint32_t)sizeof(dylib_command);

    char* pLibNameStart = (char*)(&cmdInjected + 1);
    strncpy(pLibNameStart, pDynLibPath, sizeof(__pRaw__);
    cmdInjected.cmdsize = + (uint32_t)strlen(pLibNameStart);
    const div_t d = div(cmdInjected.cmdsize, 4);
    if (0 != d.rem) {// Commands size must be aligned to 4
        memset((char*)&cmdInjected + cmdInjected.cmdsize, 0, 4 – d.rem);
        cmdInjected.cmdsize += (4 – d.rem);

    if (FALSE == IsThereEnoughSpaceForCmd((load_command*)&cmdInjected)) {
        // TBD: In case no space is available in the existing Mach-O, enlarge
        // the size of the file and update section offsets/RVAs
        return ENOBUFS;

    char* pInjectionOffset = (char*)m_pCmdLastLoadLib + m_pCmdLastLoadLib->cmdsize;
    const char* pLoadCmdsEnd = (char*)m_vecCommands[m_vecCommands.size() – 1] +
                                m_vecCommands[m_vecCommands.size() – 1]->cmdsize;
    // Make space for the new command
    memmove(pInjectionOffset + cmdInjected.cmdsize,
            (size_t)(pLoadCmdsEnd – pInjectionOffset));
    // Inject the dynlib command
    memcpy(pInjectionOffset, &cmdInjected, cmdInjected.cmdsize);
    m_pMachO->sizeofcmds += cmdInjected.cmdsize;
    return 0;

Sign & Re-package

This is the last step before the patched application can be used on a non Jail-broken device, In this section a method for signing/packaging using a provisioning profile is demonstrated although any other approach will work ( potentially, even AppStore distribution )

  1. Extract application IPA
    iOS applications are packed in IPA files, which, in essence, are zip files containing all of the application resources, the first thing to do is to extract the Application IPA to a known folder, this can be done using the unzip command-line utility in the following manner:

    unzip %Filename%.ipa -d %dest folder%
  2. Remove existing signature
    When applications are signed, a per file signature is generated, these are stored in a special file named ‘CodeResources‘ which is located under the ‘_CodeSignature‘ folder, to remove these signatures the ‘_CodeSignature‘ is to be removed:

    rm -fR %dest folder%/Payload/%App

  3. Update Provisioning Profile
    The provisioning profile defines the devices the application is allowed to execute on, An iOS developer account is required, click here to generate one, once the provisioning profile is ready it should be copied to the application @executable_path and named ‘embedded.mobileprovision‘:

    cp %profile file%.mobileprovision %dest folder%/Payload/%App

  4. Update Entitlements
    The Entitlements file is a simple ‘.plist‘ file that “confer specific capabilities or security permissions to your iOS” App, it should be copied to the application @executable_path and named ‘Entitlements.plist‘, it refers the developer via the ‘application-identifier‘ variable which must prefix with the developer id, this is illustrated bellow:

    <?xml version=“1.0” encoding=“UTF-8”?>
    <!DOCTYPE plist PUBLIC “-//Apple//DTD PLIST 1.0//EN”;>
    <plist version=“1.0”>
        <string>%Developer Id%.*</string>

    The %Developer Id% can be retrieved directly form the developer portal, OR, under the ‘User Id’/’Org Unit’ when inspecting the developer certificate using using the KeyChain tool.

    Copy the ‘Entitlements‘ file to the ‘@executable_folder‘:

    cp %entitlements file%.pinfo %dest folder%/Payload/%app

  5. Update Framework
    Once the framework is compiled ( into a folder named %Proj Name%.framework ) it should be copied into the the applications @executable_path, make sure to have any descendent _CodeSignature folder removed.

    cp -R %Name%.framework %dest folder%/Payload/%app

  6. Inject Framework to the App Executable
    This is where we use the above mentioned code, compile the accompanied code and run the following to Inject the framework into the application executable:

    Injector “%app path%/Payload/” “@executable_path/%Proj name%.framework/%Proj name%

  7. Sign
    It is needed to replace the signature embedded in the Mach-O executables, and, to re-generate the _CodeSignature folder using the developer identity, during the process of creating a developer account the development signing certificates were installed on the local machine ( this might have been done automatically by xcode ), these can be seen using the Key-Chain tool, OR, via the xcode project ‘Code Signing Identity’ of the project target ‘Build Settings’ ( located at ‘project properties->Target->Build settings->Code signing->Code signing Identity‘ ), the codesign tool is to be used to sign the Mach-O binaries with this identity as demonstrated bellow:

    codesign -s “%KeyChain cert name%” –force –deep “%dest folder%/Payload/%app”


    codesign -s “%KeyChain cert name%” –force –deep “%dest folder%/Payload/%app Name%.framework/%Proj Name%”

    Once the application executables signature is updated/replaced, the _CodeSignature folder is to be reconstructed using the development signing certificates and the Entitlements file:

    codesign -s “%KeyChain cert name%” –entitlements “%dest folder%/Payload/%app” –force –deep %dest folder%/Payload/%app

  8. Re-package
    The last step is to zip together all of the resulting files into an IPA, this is done using the zip command in the following manner:

    zip -r %dest folder%/%Name%.ipa “%dest folder%/Payload/

  9. Deploy
    That’s it, all is set, simply drag the resulting IPA on to iTunes and install the patched application on a provisioned device as described on the ‘Installing Your App on Test Devices’ of this link.
    If the app being packaged include 3rd party Frameworks and/or Extensions these are also to be signed, of specific importance is the order in which they are signed, for example, if the app is composed of the following dependent modules


    the order in which they are to be signed must be bottom to top, thus, %tool app% -> %Extension% -> %app name%, otherwise package verification will fail and the app will not get installed on the device.

    When signing a module ( App/Framework/Extension/… ) the generated _CodeSignature refer all of it’s decedents, if any of the decedents is changed AFTER the _CodeSignature was generated verification will fail due to signature inconsistency and the app will not be installed, To avoid that, module signing must be bottom to top.

    A handy way to verify validity is to use the following for each of the embedded modules

    codesign –verify –verbose %App/Extension/Framework Name%

End result

Background is intentionally blurred

Risks & Limitations

  • Usage of a provisioning profile ( ad-hoc distribution ) limits usage on up to 100 devices
  • Injected framework must consist of a minimal set of dependencies to avoid different version of the same framework being loaded by the Mach-O
  • The current injection code takes as granted that there is enough space available between the last command and the first section for the Injected ‘dylib_command‘, while I have seen no cases where there was not enough space, in theory such a case can exist, to deal with it, a page aligned block should be inserted, and, all sections RVA should get correspondingly modified.


The write of this article takes no liability for any direct of indirect damage that usage of the accompanied source code and the demonstrated approach might cause.


Register as a developer with Apple, Mach-O file format, Mach-O View, Appels FairPlay, Creating provisioning profiles,
Application Entitlements, Bundle Structure, Cydia, Clutch Open-Source project, Creating Your Team Provisioning Profile, Working with the System Key-Chain, codesign tool, Beta Testing iOS Apps


The intended audience of this article are Windows Driver C++ developers and architects, It is assumed that the reader of this article is familiar with object oriented programming and design and is intimately acquainted with the Windows Operating System.

For the purpose of brevity and clarity, thread synchronization and error checking aspects are omitted and not discussed in details in this the article.


For a while I have been searching for the means of simulating Bluetooth HID devices under windows desktop, this apparently, is not that trivial since the Bluetooth HID interface is reserved for operating system use.

This Article provide a brief review of windows 8 Bluetooth stack & Profile Drivers, Describes it’s limitation with HID, and, present a work-around enabling HID device simulation using windows standard Bluetooth stack.

High level overview

The above present the main modules related with our use-case, in green, are custom modules developed by a 3rd party, in blue, are protocols/APIs provided by the operating system, in red, are operating system modules we patch in-order to achieve the desired functionality, and, in orange are physical HW components.

Profile Driver is a mini port driver implementing a specific Bluetooth service, in contrast to RFCOMM services which can be implemented using winsock on the user-realm, services that directly use L2CAP ( such as HID ) mandate Kernel-mode profile driver (KDMF) implementation.

The HCI layer provides a unified API for communicating with Bluetooth controllers, of specific interest for us is the HCI CoD ( Class of device/service ) indicating the type of Bluetooth peripheral, unfortunately, with windows built-in bluetooth stack the CoD is limited to COD_MAJOR_COMPUTER, and this, limits connectivity with various Bluetooth devices such as iOS which mandate a ‘Peripheral’ major class and a minor class of eg. ‘Keyboard’ ( where the CoD is 0x540 ), I have found this tool to be quite useful in generating popper Bluetooth CoDs

SDP stands for Service Discovery Protocol, it is used to report the type of services provided by the Bluetooth device, this is, for example, where HID devices report their descriptors or where a Bluetooth headset report it’s audio interface.

L2CAP is a lower level transport layer over-which various other protocols are implemented ( eg. RFCOMM ), it is responsible, among other things, for maintaining sequential packet connection to remote devices and multiplexing data from various Bluetooth services, this is the transport used for HID devices, with L2CAP services are identified by a unique Protocol/Service Multiplexer (PSM) identifier, for HID, two specific and pre-defined PSMs are needed, Interrupt and Control ( 0x13 & 0x11 correspondingly ), the first is used for Device to Host communication and the latter is used for Host to Device. Windows built-in bluetooth stack reserve these PSMs for OS use preventing HID devices simulation, later on in the article I will explain how to go around this limitation.

bthport.sys is a kernel module encapsulating Bluetooth logic including, among others, HCI, SDP, L2CAP, …. It is not a driver, rather, it is a dynamic library directly used by Profile drivers and other system components to implement Bluetooth services. bthport.sys is responsible for reserving the HID PSMs ( 0x13 & 0x11 ) using the ‘bthprot!BthIsSystemPSM‘ internal method, I will show, later in this article how to patch this method to go around PSM reservation.

Bluetooth L2CAP HID Connection Flow

The above high-level level diagram present the main steps in establishing a HID Bluetooth connection, in red, is the initialization phase where we register the PSMs to be used, once these are registered we are able to receive incoming L2CAP connections, the initialization phase is elaborated the next chapter.

Once the L2CAP connection is established HID Reports are sent to the controlled device and back indicating Key-Strokes and feedback from the device.

Driver Initialization

The diagram to the left illustrate the main steps in setting-up an L2CAP HID Profile Driver, The first thing needed is to register a callback method to be invoked upon incoming L2CAP connections, this is done by querying for the Profile driver interface using WdfFdoQueryForInterface, Allocating a Bluetooth Request Block ( BRB ), Setting up the BRB and dispatching it to the IoTarget.

Once the L2CAP callbacks are installed the required PSMs are registered, this is done by dispatching a BRB_REGISTER_PSM with the desired PSMs, In our case: 0x1 for SDP, 0x13 for the Control channel, and, 0x11 for Interrupt.

Registering PSMs reserved for OS use will fail, that is, also if there is no existing connected/paired HID device, the next chapter discuss an approach to go around that limitation.

Once the PSMs are registered we need to use them in conjunction with the Keyboard HID Descriptor to set-up the SDP, bthport.sys expose “GUID_BTHDDI_SDP_PARSE_INTERFACE” obtained using WdfFdoQueryForInterface for that purpose.

Once the SDP is ready it is published to the IoTarget to be listed on the available Bluetooth services of the Desktop machine.

Reserved PSMs Workaround

As earlier mentioned, with Windows OS the HID PSMs are reserved and cannot be used by Profile Drivers, PSM registration logic is implemented by bthport.sys, to work around the PSM limitation a binary patch is applied.

bthport.sys implement an internal method called “BthIsSystemPSM”, this is where the magic happen and where the patch is applied, the process consists of the following steps:

  1. Upon driver startup, Find bthport.sys!BthIsSystemPSM on the loaded binary image
    for that, it is needed to get the address of a reference method in bthport.sys and have the offset to BthIsSystemPSM, this is used to get access to the binary code responsible for OS PSM reservation.
    The reference method we use is bthport.sys!BthAllocateBrb, this method is exposed using the BTH_PROFILE_DRIVER_INTERFACE we have previosly retrieved by calling WdfFdoQueryForInterface
    bthport.sys!BthIsSystemPSM is not accessible using WdfFdoQueryForInterface, getting it’s address is not straight forward and require some low-level PE analysis, Using IDA ( Interactive Disassembler ) we can resolve bthport.sys!BthIsSystemPSM and bthport.sys!BthAllocateBrb PE offsets, get the relative distance ( which is identical to the distance when the PE is loaded in memory ) and use it to find bthport.sys!BthIsSystemPSM address on the loaded binary ( on runtime )
  2. Binary-code modification
    The extracted binary code for bthport.sys!BthIsSystemPSM is the following:

    B8 ED FF 00 00 66 FF C9 66 85 C8 0F 94 C0 C3 CC

    Having this dis-assembled results the following, in green are the values of the relevant registers before executing the instruction on that line

    01> b8 ed ff 00 00
    02> 66 ff c9
    03> 66 85 c8
    04> 0f 94 c0
    05> c3
    06> cc
    mov eax,0FFEDh
    dec cx
    test ax,cx
    sete al
    int 3
    // ZF:1 AL:0x09 AX:0x109 CX:0x11
    // ZF:1 AL:0xed AX:0xffed CX:0x11
    // ZF:0 AL:0xed AX:0xffed CX:0x10
    // ZF:1 AL:0xed AX:0xffed CX:0x10
    // ZF:1 AL:0x01 AX:0xffed CX:0x10

    With the above assembly code, the PSM is set through register CX ( 0x11 in our case ), this value is deduced by one on line #02 and then, on line #03 applied a ‘bitwise and’ operator with the value at register AX ( 0xFFED ), this result a zero value, sets the Zero Flag ( register ZF ) to 1 and assign it’s value to register AL, the calling code evaluates AL to make sure if the PSM can be used by the calling code, in our case a value of AL=1 will cause BRB_REGISTER_PSM to fail.

    In order to workaround this PSM verification we should cause the code to return with AL set to Zero, this will prevent the calling code from rejecting our PSMs, and will, in turn, enable L2CAP HID connections

    Inspecting line #03 of the above it is clear that “0 == (0xFFED & (0x11 – 1))”, and also that “0 == (0xFFED & (0x13 – 1))”, and hence, we need to change the 0xFFED mask such that the bitwise and operation result will be different than Zero, this, is achieved by changing the 0xFFED mask to 0xFFFD, having that set, the code execute as follows:

    01> b8 fd ff 00 00
    02> 66 ff c9
    03> 66 85 c8
    04> 0f 94 c0
    05> c3
    06> cc
    mov eax,0FFFDh
    dec cx
    test ax,cx
    sete al
    int 3
    // ZF:1 AL:0x09 AX:0x109 CX:0x11
    // ZF:1 AL:0xfd AX:0xfffd CX:0x11
    // ZF:0 AL:0xfd AX:0xfffd CX:0x10
    // ZF:0 AL:0xfd AX:0xfffd CX:0x10
    // ZF:0 AL:0x00 AX:0xfffd CX:0x10

    The above returns AL=0 resulting acceptance of the HID PSMs, the raw binary code we need to update looks as follows:

    B8 FD FF 00 00 66 FF C9 66 85 C8 0F 94 C0 C3 CC

  3. Patch the binary code
    Once we have the offset between bthport.sys!BthIsSystemPSM and bthport.sys!BthAllocateBrb we need to updated the binary code with the above mentioned modification, we can’t however, directly update the binary code, before doing so we need to clear the Write Protectin ( WP ) bit of register cr0, Apply the update and then return the original value of cr0, this is done using the __writecr0 and __readcr0 kernel intrinsics, once we are done with the modification our code can freely Register the HID PSMs and intercept incoming HID L2CAP connections.

Sample Code

    // 000000000008B4D0 – BthIsSystemPSM PE Offset ( fixed )
    // 0000000000083698 – BthAllocateBrb PE Offset ( fixed )
    // bthprot!BthIsSystemPSM: B8 ED FF 00 00 66 FF C9 66 85 C8 0F 94 C0 C3 CC
    UCHAR    pMachineCode[] = { 0xb8, 0xed, 0xff, 0x00,
                                0x00, 0x66, 0xff, 0xc9,
                                0x66, 0x85, 0xc8, 0x0f,
                                0x94, 0xc0, 0xc3, 0xcc };
    // The approx distance between ‘itf.BthAllocateBrb’ and ‘bthprot!BthIsSystemPSM’
    INT64    qwOffset    = (INT64)(0x8B4D0 – 0x83698);
    PUCHAR    pAddr        = (PUCHAR)((UINT64)itf.BthAllocateBrb + qwOffset);
    // Make the start address page aligned
    PUCHAR    pAddrStart    = (PUCHAR)((UINT64)pAddr & 0xfffffffffffff000);
    UINT64    qwTrailer    = *(UINT64*)(pMachineCode + sizeof(pMachineCode)
                            - sizeof(qwTrailer));
    PUCHAR    pAddrEnd    = pAddrStart + PAGE_SIZE – sizeof(qwTrailer);

    pAddrStart += sizeof(pMachineCode) – sizeof(qwTrailer);
    while (pAddrStart <= pAddrEnd) {
        if (*(UINT64*)(pAddrStart) == qwTrailer) {
            if (0 == memcmp(pAddrStart + sizeof(qwTrailer) – sizeof(pMachineCode),
                            pMachineCode, sizeof(pMachineCode) – sizeof(qwTrailer)))
                NT_ASSERT(0xed == pAddrStart[-7]);
                const auto cr0 = __readcr0();
                const auto cr0noWP = cr0 & 0xFFFFFFFFFFFEFFFF;// Clear the WP bit
                pAddrStart[-7] = 0xfd;// Patch the code!!!
                return TRUE;
    return FALSE;

Risks & Limitations

  • All kernel modules are running under a shared address space, any change done to bthport.sys will affect any other module/driver referring/using it.
  • The HID PSMs are reserved by the OS for a reason, Usage of this patch should be done with care when other Bluetooth HID devices are connected.
  • This binary patch assumes a specific bthport.sys version with fixed bthport.sys!BthIsSystemPSM relative positing, while the above sample code demonstrate some flexibility regarding finding the right offset, updated bthport.sys versions might require re-calculating the new offsets.
  • As mentioned before, Some devices expect a specific HID CoD values, Windows OS doesn’t support the required HCI level API for changing the CoD, this way, using this kernel patch will enable HID device simulation for eg. Android Devices but not for iOS devices, the reader is encouraged to use the approach described in this article to patch this through.
  • Windows kernel implement a mechanism called Kernel Patch Protection (KPP), this mechanisms verify no binary changes were applied to core kernel modules on runtime, at the time this article was written bthport.sys wasn’t one of these modules, this, may ( and may not ) change in the future.


This Article discuss implementing an HID device using the Windows Desktop Bluetooth stack, this stack is limited and mandate a binary patch, When Windows OS is not a hard requirement the reader is encouraged to use solutions where the above mentioned is natively supported, such as the Linux BlueZ stack.

The patch was implemented on Windows 8 OS (x64) and should be verified if used on other/newer OS versions


KDMF Profile drivers, RFCOMM, Assigned CoD Numbers – Bluetooth Baseband, Bluetooth Class of Device/Service (CoD) Generator, PSMs reserved for OS use, HID: Human Interface Device Class, Bluetooth Request Block, L2CAP Bluetooth Echo Sample, Service Discovery Protocol, Keyboard HID Descriptor, Hex-Rays Interactive Disassembler (IDA), A Guide to Kernel Exploitation, Kernel Patch Protection (KPP), Linux BlueZ stack,


The intended audience of this article are MacOS C++/Obj-C developers and architects, It is assumed that the reader of this article is familiar with object oriented programming and design.

For the purpose of brevity and clarity, thread synchronization aspect is omitted and not discussed in details in this the article.


The Objective-C AVFoundation framework is encapsulating media processing ( capture, editing, … ), it is robust, well document and covers most of the A/V use-cases, however, some edge case use-cases are not supported by this framework, for example, being able to directly access the buffers sent out from the device, this, is specifically important when the payload sent out from the device is already muxed and/or compressed, in such cases, AVFoundation ( AVCaptureSession in-specific ) will de-mux and/or decode the payload before making it accessible to the user, to get direct access to the buffers sent out from the device w/o any intermediate intervention we will have to use a lower-level API, namely, the CoreMediaIO.

Apples CoreMediaIO is a low-level C++ framework for accessing and interacting with audio/video devices such as cameras, capture cards and even Mirroring sessions of iOS devices

The problem with CoreMediaIO is lack of documentation, and, the fact that the existing sample code is old and require quite some tinkering to have it compiling with latest SDKs

In this short article I will provide a simple sample code demonstrating capture and format resolution using CoreMediaIO and some AVFoundation


CoreMediaIO API are provided through the “CoreMediaIO.framework“, make sure to have it included by the project, and to have “CoreMediaIO/CMIOHardware.h” included/imported.

The first thing we have to do in-order to be able to start capture is to find the device of interest, if we are interested in screen capture ( for example capturing the screen of an attached iOS device ) we need to enable CoreMediaIO ‘DAL’ plug-ins, This, is demonstrated in the following code snap:

void EnableDALDevices()
    CMIOObjectPropertyAddress prop = {

    UInt32 allow = 1;
                            &prop, 0, NULL,
                            sizeof(allow), &allow );

Some devices are added or removed on runtime, to get runtime indications for device addition or removal, an A/V Capture device notification is set using the NSNotificationCenter class, the AVCaptureDevice added/removed is indicated by the ‘object‘ variable of the ‘note‘ ^block argument, This is demonstrated by the following code snap, Be aware that no notifications will be received unless a Run Loop is executed.

NSNotificationCenter *notiCenter = [NSNotificationCenter defaultCenter];
id connObs =[notiCenter addObserverForName:AVCaptureDeviceWasConnectedNotification
                                     queue:[NSOperationQueue mainQueue]
                                usingBlock:^(NSNotification *note)
                                                // Device addition logic

id disconnObs =[notiCenter addObserverForName:AVCaptureDeviceWasDisconnectedNotification
                                        queue:[NSOperationQueue mainQueue]
                                 usingBlock:^(NSNotification *note)
                                                // Device removal logic

[[NSRunLoop mainRunLoop] run];
[notiCenter removeObserver:connObs];
[notiCenter removeObserver:disconnObs];

The next step is to enumerate the attached capture devices, this is either done using AVCaptureDevice class of AVFoundation or, directly using CoreMediaIO C++ APIs, each capture device provide an uniquely identifier, in the next code snap, that id will be used to find the device of interest

The Code Snap bellow demonstrate device enumeration using AVFoundation APIs, To filter a specific type of device use the ‘devicesWithMediaType’ method of the AVCaptureDevice class.

// Use the ‘devicesWithMediaType’ to filter devs by media type
// NSArray* devs = [AVCaptureDevice devicesWithMediaType:AVMediaTypeMuxed];
NSArray* devs = [AVCaptureDevice devices];
NSLog(@“devices: %d\n”, (int)[devs count]);

for(AVCaptureDevice* d in devs) {
    NSLog(@“uniqueID: %@\n”, [d uniqueID]);
    NSLog(@“modelID: %@\n”, [d modelID]);
    NSLog(@“description: %@\n”, [d localizedName]);

The next step is to find the device we want to use for capture, Capture devices in CoreMediaIO are identified by CMIODeviceID, the following code-snap demonstrate how to resolve the devices CMIODeviceID according to their unique ID which is a-priori known and externally provided.

OSStatus GetPropertyData(CMIOObjectID objID, int32_t sel, CMIOObjectPropertyScope scope,
                         UInt32 qualifierDataSize, const void* qualifierData, UInt32 dataSize,
                         UInt32& dataUsed, void* data) {
    CMIOObjectPropertyAddress addr={ (CMIOObjectPropertySelector)sel, scope,
                                     kCMIOObjectPropertyElementMaster };
    return CMIOObjectGetPropertyData(objID, &addr, qualifierDataSize, qualifierData,
                                     dataSize, &dataUsed, data);

OSStatus GetPropertyData(CMIOObjectID objID, int32_t selector, UInt32 qualifierDataSize,
                         const void* qualifierData, UInt32 dataSize, UInt32& dataUsed,
                         void* data) {
    return GetPropertyData(objID, selector, 0, qualifierDataSize,
                         qualifierData, dataSize, dataUsed, data);

OSStatus GetPropertyDataSize(CMIOObjectID objID, int32_t sel,
                             CMIOObjectPropertyScope scope, uint32_t& size) {
    CMIOObjectPropertyAddress addr={ (CMIOObjectPropertySelector)sel, scope,
                                     kCMIOObjectPropertyElementMaster };
    return CMIOObjectGetPropertyDataSize(objID, &addr, 0, 0, &size);

OSStatus GetPropertyDataSize(CMIOObjectID objID, int32_t selector, uint32_t& size) {
    return GetPropertyDataSize(objID, selector, 0, size);

OSStatus GetNumberDevices(uint32_t& cnt) {
    if(0 != GetPropertyDataSize(kCMIOObjectSystemObject, kCMIOHardwarePropertyDevices, cnt))
        return -1;
    cnt /= sizeof(CMIODeviceID);
    return 0;

OSStatus GetDevices(uint32_t& cnt, CMIODeviceID* pDevs) {
    OSStatus status;
    uint32_t numberDevices = 0, used = 0;
    if((status = GetNumberDevices(numberDevices)) < 0)
        return status;
    if(numberDevices > (cnt = numberDevices))
        return -1;
    uint32_t size = numberDevices * sizeof(CMIODeviceID);
    return GetPropertyData(kCMIOObjectSystemObject, kCMIOHardwarePropertyDevices,
                         0, NULL, size, used, pDevs);

template< const int C_Size >
OSStatus GetDeviceStrProp(CMIOObjectID objID, CMIOObjectPropertySelector sel,
                         char (&pValue)[C_Size]) {
    CFStringRef answer = NULL;
    UInt32     dataUsed= 0;
    OSStatus    status = GetPropertyData(objID, sel, 0, NULL, sizeof(answer),
                                         dataUsed, &answer);
    if(0 == status)// SUCCESS
        CFStringCopyUTF8String(answer, pValue);
    return status;

template< const int C_Size >
Boolean CFStringCopyUTF8String(CFStringRef aString, char (&pText)[C_Size]) {
    CFIndex length = CFStringGetLength(aString);
    if(sizeof(pText) < (length + 1))
        return false;
    CFIndex maxSize = CFStringGetMaximumSizeForEncoding(length, kCFStringEncodingUTF8);
    return CFStringGetCString(aString, pText, maxSize, kCFStringEncodingUTF8);

Utility methods

OSStatus FindDeviceByUniqueId(const char* pUID, CMIODeviceID& devId) {
    OSStatus status = 0;
    uint32_t numDev = 0;
    if(((status = GetNumberDevices(numDev)) < 0) || (0 == numDev))
        return status;
    // Allocate memory on the stack
    CMIODeviceID* pDevs = (CMIODeviceID*)alloca(numDev * sizeof(*pDevs));
    if((status = GetDevices(numDev, pDevs)) < 0)
        return status;
    for(uint32_t i = 0; i < numDev; i++) {
        char pUniqueID[64];
        if((status = GetDeviceStrProp(pDevs[i], kCMIODevicePropertyDeviceUID, pUniqueID)) < 0)
        status = afpObjectNotFound;// Not Found…
        if(0 != strcmp(pUID, pUniqueID))
        devId = pDevs[i];
        return 0;
    return status;

Device resolution by UID

CoreMediaIO Capture devices expose streams, each such stream is a data source and is indicated using a CMIOStreamID type, one stream might provide Video payload, another can provide Audio payload and others might provide multiplexed payload, while capturing we have to select a stream and start pumping out data, the following code-snap demonstrate how to enumerate the available streams for a given device ( indicated by it’s CMIODeviceID ) and how to resolve the payload format.

uint32_t GetNumberInputStreams(CMIODeviceID devID)
    uint32 size = 0;
    GetPropertyDataSize(devID, kCMIODevicePropertyStreams,
                        kCMIODevicePropertyScopeInput, size);
    return size / sizeof(CMIOStreamID);

OSStatus GetInputStreams(CMIODeviceID devID, uint32_t&
                        ioNumberStreams, CMIOStreamID* streamList)
    ioNumberStreams = std::min(GetNumberInputStreams(devID), ioNumberStreams);
    uint32_t size     = ioNumberStreams * sizeof(CMIOStreamID);
    uint32_t dataUsed = 0;
    OSStatus err = GetPropertyData(devID, kCMIODevicePropertyStreams,
                                    kCMIODevicePropertyScopeInput, 0,
                                    NULL, size, dataUsed, streamList);
    if(0 != err)
        return err;
    ioNumberStreams = size / sizeof(CMIOStreamID);
    CMIOStreamID* firstItem = &(streamList[0]);
    CMIOStreamID* lastItem = firstItem + ioNumberStreams;
    std::sort(firstItem, lastItem);
    return 0;

Utility methods

CMIODeviceID devId;
FindDeviceByUniqueId(“4e58df701eb87”, devId);

uint32_t numStreams = GetNumberInputStreams(devId);
CMIOStreamID* pStreams = (CMIOStreamID*)alloca(numStreams * sizeof(CMIOStreamID));
GetInputStreams(devId, numStreams, pStreams);
for(uint32_t i = 0; i < numStreams; i++) {
    CMFormatDescriptionRef fmt = 0;
    uint32_t                used;
    GetPropertyData(pStreams[i], kCMIOStreamPropertyFormatDescription,
                    0, NULL, sizeof(fmt), used, &fmt);
    CMMediaType mt     = CMFormatDescriptionGetMediaType(fmt);
    uint8_t     null1 = 0;// ‘mt’ is a 4 char string, we use ‘null1’ so
                         // it could be printed.
    FourCharCode fourcc= CMFormatDescriptionGetMediaSubType(fmt);
    uint8_t     null2 = 0;// ‘fourcc’ is a 4 char string, we use ‘null1’
                         // so it could be printed.
    printf(“media type: %s\nmedia sub type: %s\n”, (char*)&mt, (char*)&fourcc);

Stream format resolution

The next and final stage is to start pumping data out of the stream, this is done by registering a callback to be called upon by CoreMediaIO with the sampled payload, the following code-snap demonstrate how this is done and how to get access to the raw payload bytes.

CMSimpleQueueRef    queueRef = 0;// The queue that will be used to
                                 // process the incoming data
CMIOStreamCopyBufferQueue(strmID, [](CMIOStreamID streamID, void*, void* refCon) {
    // The callback ( lambda in out case ) being called by CoreMediaIO
    CMSimpleQueueRef queueRef = *(CMSimpleQueueRef*)refCon;
    CMSampleBufferRef sb = 0;
    while(0 != (sb = (CMSampleBufferRef)CMSimpleQueueDequeue(queueRef))) {
        size_t            len     = 0;// The ‘len’ of our payload
        size_t            lenTotal = 0;
        char*             pPayload = 0;// This is where the RAW media
                                     // data will be stored
        const CMTime     ts         = CMSampleBufferGetOutputPresentationTimeStamp(sb);
        const double     dSecTime = (double)ts.value / (double)ts.timescale;
        CMBlockBufferRef bufRef     = CMSampleBufferGetDataBuffer(sb);
        CMBlockBufferGetDataPointer(bufRef, 0, &len, &lenTotal, &pPayload);
        assert(len == lenTotal);
        // TBD: Process ‘len’ bytes of ‘pPayload’
}, &queueRef, &queueRef);

One last thing to note, on more tan few cases the actual capture format is not available until the first sample is sent, in such cases it should be resolved upon first sample reception, the following code-snap demonstrate how to resolve Audio sample format using CMSampleBufferRef, the same can be done for video and other media types with a little more effort.

bool PrintAudioFormat(CMSampleBufferRef sb)
    CMFormatDescriptionRef    fmt    = CMSampleBufferGetFormatDescription(sb);
    CMMediaType                mt    = CMFormatDescriptionGetMediaType(fmt);

    if(kCMMediaType_Audio != mt) {
        printf(“Not an audio sample\n”);
        return false;
    CMAudioFormatDescriptionRef afmt = (CMAudioFormatDescriptionRef)fmt;
    const auto pAud = CMAudioFormatDescriptionGetStreamBasicDescription(afmt);
    if(0 == pAud)
        return false;
    // We are expecting PCM Audio
    if(‘lpcm’ != pAud->mFormatID)// ‘pAud->mFormatID’ == fourCC
        return false;// Not a supported format
    printf(“mChannelsPerFrame: %d\nmSampleRate: %.1f\n”\
            “mBytesPerFrame: %d\nmBitsPerChannel: %d\n”,
         pAud->mChannelsPerFrame, pAud->mSampleRate,
         pAud->mBytesPerFrame, pAud->mBitsPerChannel);
    return true;

Final words

What provided in this article is just a glimpse of what is doable with CoreMediaIO, further information of can be found in the reference links bellow.


CoreMediaIO, AVFoundation, AVCaptureSession, NSNotificationCenter, Run Loop, AVCaptureDevice


The intended audience are C++ developers and architects, It is assumed that the reader of this article is familiar with object oriented programming and design.

For the purpose of brevity and clarity, thread synchronization aspect is omitted and not discussed in details in this the article.


Comparing to high level languages such as C# & Java, C++ has a substantial disadvantage with memory management, C# & Java automate memory management using garbage collectors where with C++ the developer is responsible for allocating and freeing memory, and this, increase code complexity and the total development and debugging time.

This article describe an approach for achieving pseudo garbage collection using C++, this, reduces development and mainly debugging time while keepingthe fine memory control supported by C++.


To be able to automate memory allocation we have to keep track of the users of a given memory block/object, once all users are done consuming the memory block/object it can be automatically freed.

Keeping tack of the memory/object references is done using a reference counter, the counter is increased each time a new user is consuming the memory and is decreased when consumption is done, when the counter hits zero the memory block/object is automatically freed.

Implementation guidelines

In-order to achieve automated memory management, each object must implement reference counting and enable it’s users/consumers to control it, this is done by implementing the simple IRefCount interface described at ‘Code Snap 1’ bellow, The ‘AddRef()’ method increase the internal reference count of the object and returns the result while ‘Release’ decrease the reference count, on ‘Release’, when the reference count hits zero the object delete it-self from memory, the following simple code snap illustrate this concept:

01 interface IRefCount {
02     virtual unsigned int AddRef(void) = 0;
03     virtual unsigned int Release(void) = 0;
04 };

Code Snap 1

To guarantee the object life-time will be controlled only by it’s reference count, any other type of instantiation should be prevented, and thus, all IRefCount objects must have their constructors and destructor decelerations defined as non public, instead a special method used for instantiation is to be implemented, in ‘Code Snap 2’ bellow the ‘CreateInstance’ method on line #20 is used exactly for that.

The ‘CreateInstance’ method is responsible for allocating the memory required for the object, adding the first reference by calling ‘AddRef()’ and returning the instantiated object to the caller, similarly, the ‘Release’ method on line #13, is responsible for removing the object from memory when the reference count hits zero, the reference count is maintained by the ‘m_uiRefCount’ variable on line #4.

01 class TestObj : public IRefCount
02 {
03 protected:
04     unsigned int m_uiRefCount;
06     TestObj() : m_uiRefCount(0) {}
07     ~TestObj() {}    
09 public:
10     virtual unsigned int AddRef(void) {
11         return ++m_uiRefCount;
12     }
13     virtual unsigned int Release(void) {
14         unsigned int uiRef = –m_uiRefCount;
15         if(0 == uiRef)
16             delete this;
17         return uiRef;
18     }
20     static bool CreateInstance(OUT TestObj** ppObj) {
21         if(0 == (*ppObj = new TestObj()))
22             return false;
23         (*ppObj)->AddRef();
24         return true;
25     }
26 };

Code Snap 2

There are two main pit-falls with reference counting, the first is a reference that is added and never released: a reference leak, this leads to dangling objects in memory ( a memory leak ), the second is an extra call to ‘Release()’ causing a premature disposal of the object which may lead to future access of an already deleted memory block, these problems can easily be avoided by following few simple rules:

  • Add a reference during assignment.
  • Add a reference for objects returned as output parameters.
  • Make sure always to release the reference when stopped using the object.

The following example illustrate implementation of these simple rules:

01 class SomeClass {
02 protected:
03     IRefCount* m_pObj;
04 public:
05     SomeClass() : m_pObj(0) {}
06    ~SomeClass() {
07        if(0 == m_pObj)
08            return;
08        // No need to hold a reference to the object any-more, release it
08        m_pObj->Release();
09    }
10     void set_Object(IRefCount* pObj) {
11         if(0 != m_pObj) {
12            // Release the reference to the existing object before assigning a new value
13             m_pObj->Release();
14        }
15        if(0 != pObj) {
16             m_pObj = pObj;
17            // Add a reference to account for the assignment
18             m_pObj->AddRef();
19        }
20     }
21     bool get_Object(IRefCount** ppObj) {
22         if(0 == m_pObj)
23             return false;
24         *ppObj = m_pObj;
25        // Add a reference to account for the output variable assignment
26         (*ppObj)->AddRef();
27         return true;
28     }
29 };

Code Snap 3

The SmartPtr class

Making sure object references are added and released as needed is tedious and error prone, it is quite easy to forget to release or add a reference, To avoid that, we will use the SmartPtr class, that encapsulate the reference counting logic, this class is described at Code Snap 4 bellow.

00 template<class T_Interface >
01 class SmartPtr
02 {
03 public:
04     SmartPtr() : m_p(0) {
05     }
07     SmartPtr(T_Interface* lPtr) : m_p(0) {
08         if (lPtr != 0) {
09             m_p = lPtr;
10             m_p->AddRef();
11         }
12     }
14     SmartPtr(const SmartPtr& sp) : m_p((T_Interface*)sp) {
15         if (m_p)
16             m_p->AddRef();
17     }
19     ~SmartPtr() {
20         if (m_p) {
21             m_p->Release();
22             m_p = 0;
23         }
24     }
26     operator T_Interface*() const {
27         return m_p;
28     }
30     T_Interface& operator*() const {
31         _ASSERT(m_p != 0);
32         return *m_p;
33     }
35     T_Interface** operator&() {
36         return &m_p;
37     }
39     T_Interface* operator->() const {
40         _ASSERT(m_p != 0);
41         return m_p;
42     }
44     T_Interface* operator=(T_Interface* lPtr) {
45         if (lPtr == m_p)
46             return m_p;
47         if (0 != m_p)
48             m_p->Release();
49        if(0 != lPtr)
50             lPtr->AddRef();
51         m_p = lPtr;
52         return m_p;
53     }
55     T_Interface* operator=(const SmartPtr& sp) {
56         _ASSERT(&sp != 0);
57         if (0 != m_p)
58             m_p->Release();
59         m_p = (T_Interface*)sp;
60         if (m_p)
61             m_p->AddRef();
62         return m_p;
63     }
65     void Attach(T_Interface* lPtr) {
66        if (0 == lPtr)
67            return;
68        if (0 != m_p)
69            m_p->Release();
70        m_p = lPtr;
71     }
73     T_Interface* Detach() {
74         T_Interface* lPtr = m_p;
75         m_p = 0;
76         return lPtr;
77     }
79     void Release() {
80         if (m_p) {
81             m_p->Release();
82             m_p = 0;
83         }
84     }
86     T_Interface* m_p;
87 };

Code Snap 4

The following compares a modified version of Code Snap 3 that use the SmartPtr with the original version of the code that does not use the SmartPtr class, as can be seen, usage of the SmartPtr class considerably reduces the lines of code needed by approximately half, no specialized constructor and destructor are needed and the ‘set_Object’ is reduced to a simple assignment operation.

There is one case however, where the SmartPtr class doesn’t automate reference counting, that is, when a reference to the object is passed an an output variable, this is demonstrated at the ‘get_Object’ method on line #8 at the left pane, in this case, after assigning the value to the output variable the reference count must get manually increased.

Using SmartPtr Not using SmartPtr
01 class SomeClass {
02 protected:
03    SmartPtr m_spObj;
04 public:
05     void set_Object(IRefCount* pObj) {
06        m_spObj = pObj;
07     }
08     bool get_Object(IRefCount** ppObj) {
09         if(0 == m_spObj)
10             return false;
11         *ppObj = m_spObj;
12         (*ppObj)->AddRef();
13         return true;
14     }
15 };
01 class SomeClass {
02 protected:
03     IRefCount* m_pObj;
04 public:
05     SomeClass() : m_pObj(0) {}
06    ~SomeClass() {
07        if(0 == m_pObj)
08            return;
09        m_pObj->Release();
10    }
11     void set_Object(IRefCount* pObj) {
12         if(0 != m_pObj) {
13             m_pObj->Release();
14        }
15        if(0 != pObj) {
16             m_pObj = pObj;
17             m_pObj->AddRef();
18        }
19     }
20     bool get_Object(IRefCount** ppObj) {
21         if(0 == m_pObj)
22             return false;
23         *ppObj = m_pObj;
24         (*ppObj)->AddRef();
25         return true;
26     }
27 };

Code Snap 5

There is however one more limitation we need to solve: In order to achieve the automated memory management logic described in this article, as can be seen at Code Snap 2 above, each and every object must implement reference counting, and this, is cumbersome and time consuming, to solve this we will use the RefCountObj class described bellow.

The RefCountObj class

Code Snap 6 bellow present the RefCountObj class which encapsulate the reference counting logic abstracting out all that is needed from the object but the definition of a reference counting interface, in other words, all that is needed from the object is to have two virtual methods to maintain reference counting, namely ‘AddRef’ and ‘Release’ as described by the ‘IRefCount’ interface for Code Snap 1.

01 template< typename T >
02 class RefCountObj : public T
03 {
04 public:
05     static bool CreateInstance(OUT RefCountObj*& pObj) {
06         pObj = new RefCountObj();
07         return (0 != pObj);
08     }
10     static bool CreateInstance(OUT SmartPtr& spObj) {
11         RefCountObj* pObj = 0;
12         bool bRet = RefCountObj::CreateInstance(pObj);
13         if (true == bRet)
14             spObj = pObj;// Adds a reference
15         return bRet;
16     }
18     unsigned int AddRef(void) {
19         return ++m_uiRef;
20     }
22     unsigned int Release(void) {
23         unsigned int uiRef = –m_uiRef;
24         if (0 == uiRef)
25             delete this;
26         return uiRef;
27     }
29     protected:
30         std::atomic_uint m_uiRef;
32         RefCountObj() : m_uiRef(0) {}
33         virtual ~RefCountObj() {}
34 };

Code Snap 6

The following demonstrate ‘RefCountObj’ usage, on the right pane is Code Snap 2 added code for instantiation on line #30 of the main function, on the left pane is the same code having RefCountObj used to encapsulate the reference counting logic.
As can be seen ‘RefCountObj’ usage reduced the code to the very basic class definition by abstracting out all reference counting logic, this way, the implemented class encapsulate specific use-cases with no need of dealing with reference counting.

There is one exception to the above mentioned, all objects to be used with automated memory management must have pure virtual ‘AddRef’ and ‘Release’ methods, or, directly inherit from the IRefCount interface defined at Code Snap 1 above, the methods signature must correspond with those defined at ‘RefCountObj’.

Using RefCountObj Not using RefCountObj
class TestObj : public IRefCount

void main(void) {
    SmartPtr spObj;
        return -1;
    return 0;




class TestObj : public IRefCount
    unsigned int m_uiRefCount;
    TestObj() : m_uiRefCount(0) {}
    ~TestObj() {}    
    virtual unsigned int AddRef(void) {
        return ++m_uiRefCount;
    virtual unsigned int Release(void) {
        unsigned int uiRef;
        uiRef = –m_uiRefCount;
        if(0 == uiRef)
            delete this;
        return uiRef;
    static bool CreateInstance(TestObj** ppObj)
        *ppObj = new TestObj();
        if(0 == *ppObj)
            return false;
        return true;

void main(void) {
    SmartPtr spObj;
        return -1;
    return 0;

Code Snap 7

Main Advantages over std::shared_ptr

  • Higher efficiency: Reference count is implemented by each object and not allocated by the smart pointer as with std::shared_ptr reducing memory fragmentation and emitted assembly code execution time
  • DLL Concurrency: The object must implement allocation and de-allocation of it’s memory, this, guarantee cross DLL safety, hence, an object allocated on the heap of one DLL is guaranteed to be deleted from the same heap where with std::shared_ptr one can set a pointer allocated on one DLL to a std::shared_ptr maintained by another, upon std::shared_ptr deletion of the referred object an invalid heap will be used for deletion, causing a memory leak at best, and a crash on worst

Basic COM with Linux

Posted: April 4, 2015 in Design Patterns


This article is intended for C++ developers, It is assumed that the reader of this article is familiar with object oriented programming and design.

Windows C++ developers working with COM might find this article useful in leveraging their existing windows knowledge with Linux.


While COM is widely used on windows operating systems, it is rarely used with Linux, in this article I will demonstrate a simple & Light weight Linux C++ implementation of the basic COM model.

This article is the first of a series of article discussing object oriented design using COM & C++, The Article start with a short explanation of the basic ideas and follows a simple source code example.

The core concepts are simple and are easy to implement on many platforms other than windows, at it’s very basic, COM solve two main problems, [1] Cross-module Object run-time type information, [2] Object life-cycle management, these are fundamental concepts widely used in numerous projects, COM facilitate a simple yet flexible design pattern to solve these problems.

When to use and when not to use

COM was defined decades ago, Since then, new technologies have emerged considerably reducing development cost comparing to COM, However, while these technologies have proved affective in most of the cases there are cases where performance and resource consumption are critical, in these cases C++/COM prove essential.

Web and big-data applications ( for example ) have many highly optimized frameworks enabling implementation using a higher level language such as C# or Java, however, for specialized applications where performance is critical, development must be done in C/C++, in these cases COM proves efficient, I have been vastly using COM while building multimedia / streaming engines on Windows, Linux and mobile devices.


Object life-cycle control: Reference counting is used to keep the object alive as long as it is being used, and thus, each of the object consumers ( class, method, … ) increase it’s reference count while it’s using the object ( by calling ‘AddRef’ ), and, reduce the reference count when it has finished using the object ( by calling ‘Release’ ).

Object run-time type information: with COM, objects implement interfaces, each such interface is associated with a unique id, this id is then used by the object consumer ( e.g. calling method ) to query for support of a specific interface, the method implementing this logic is called QueryInterface.

The IUnknown interface

The most fundamental COM construct is the IUnknown interface, this interface must be implemented by every COM object and interface, it define methods for reference count control and run-time type information querying.

interface IUnknown
    virtual HRESULT QueryInterface(IN REFIID riid, OUT void** ppvObject) = 0;
    virtual UINT AddRef(void) = 0;
    virtual UINT Release(void) = 0;

Object life-cycle control is usually implemented using a class member variable for reference counting, calling AddRef increase the reference count by one while Release decrease the reference count, when the reference count reach zero the object is responsible to clean it-self from memory.

The QueryInterface method is used to query the object for support of a specific interface, implementation of the QueryInterface method involves iterating though the ids list of supported interfaces, if the interface queried is supported is found to be support, the object will increase it’s reference count and return a pointer reference through ‘*ppvObject’, if the queried interface was not found, E_NOINTERFACE is returned.

Implementation guidelines

Since COM object maintain their own life time using reference counting, external object life-time control should be prevented, for example allocating a COM object on the stack will cause it’s allocated resources to be released upon stack frame termination making the reference count mechanism useless and mis-leading.

To ensure the object maintain it’s own life-cycle the COM object constructors and destructor are defined as protected, this prevent the object to get directly created on the stack.

COM object creation is implemented using a special static class method usually called CreateInstance, this method allocate the object, Initialize the reference count and returns the default interface, that interface can later be used to query for other interfaces.

The Sample Code

// {00000000-0000-0000-C000-000000000046}
constexpr GUID IID_IUnknown = { 0x00000000, 0x0000, 0x0000, { 0xC0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x46 } };

interface IUnknown
    virtual HRESULT QueryInterface(IN REFIID riid, OUT void** ppvObject) = 0;
    virtual UINT AddRef(void) = 0;
    virtual UINT Release(void) = 0;


This file contain the most basic definitions comprising basic COM behaviour, Every COM object must implement all of the IUnknown interface methods.

// {00000000-0000-0000-C000-000000000046}
constexpr GUID IID_IRefCountPrinter = { 0x12345678, 0x1234, 0x1234, { 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01 } };

interface IRefCountPrinter : public IUnknown
    virtual void PrintRefCount() = 0;

HRESULT CreateTester(OUT ITester** ppObj);


This is where we define the specialized interfaces we want our objects to support and the instance creation factory methods ( ‘CreateTexter’ in the example above ).

class TesterObj : public ITester
                , public IRefCountPrinter
    std::atomic_uint m_uiRefCount;


    // IUnknown implementation
    HRESULT QueryInterface(IN REFIID riid, OUT void** ppvObject);
    UINT AddRef(void);
    UINT Release(void);

    // ITester implementation
    void TestMe();

    // IRefCountPrinter implementatin
    void PrintRefCount();

    static HRESULT CreateInstance(OUT IUnknown** ppUnk);


Here we define a basic COM object implementing our COM interfaces consisting of the ‘TestMe()’ and ‘PrintRefCount()’ methods, we have also defined a class variable named ‘m_uiRefCount’ to keep track of the object reference count, to support multi-threading we have to assure atomic access to the variable, for that ‘std::atomic_int’ is used.

         : m_uiRefCount(0)
    printf(“TesterObj::TesterObj(), ref count = %d\n”, (UINT)m_uiRefCount);

    printf(“TesterObj::~TesterObj(), ref count = %d\n”, (UINT)m_uiRefCount);

HRESULT TesterObj::QueryInterface(IN REFIID riid, OUT void** ppvObject) {
    if (0 == memcmp(&riid, &IID_IUnknown, sizeof(GUID)))
        *ppvObject = (IUnknown*)((ITester*)this);
    else if (0 == memcmp(&riid, &IID_ITester, sizeof(GUID)))
        *ppvObject = (ITester*)this;
    else if (0 == memcmp(&riid, &IID_IRefCountPrinter, sizeof(GUID)))
        *ppvObject = (IRefCountPrinter*)this;
        return E_NOINTERFACE;
    AddRef();// A reference to the object is returned via ‘*ppvObject’, add a ref
    return S_OK;

UINT TesterObj::AddRef(void) {
    return m_uiRefCount.fetch_add(1) + 1;

UINT TesterObj::Release(void) {
    const UINT uiRef = m_uiRefCount.fetch_sub(1) – 1;
    if (0 == uiRef)
        delete this;
    return uiRef;

void TesterObj::TestMe() {
    printf(“TesterObj::TestMe(), This is a test!\n”);

void TesterObj::PrintRefCount() {
    printf(“TesterObj::PrintRefCount(), Ref count is %d\n”, (UINT)m_uiRefCount);

HRESULT TesterObj::CreateInstance(OUT IUnknown** ppUnk) {
    TesterObj* pObj = new TesterObj();
    if (0 == pObj)
        return E_OUTOFMEMORY;
    HRESULT hr = pObj->QueryInterface(IID_IUnknown, (void**)ppUnk);
    // if ‘pObj->QueryInterface’ has failed, it doesn’t increase the
    // ref count, and this ‘Release()’ will destruct the object
    return hr;


As seen above, the AddRef and Release methods maintain the object lifetime, AddRef increase the reference count while Release decrease it, when the reference count hits zero the object clean it-self from memory.

The QueryInterface method iterate through all of the supported interfaces to verify support, if the queries interface is supported a pointer of that type is returned to the caller, note that when an interface pointer is retuned to the caller the method will increase the reference count by calling AddRef, this is done to account for the new object referenced returned, it is the responsibility of the caller to Release that reference once the object usage is done.

Instance Creation is implemented using the static method CreateInstance, AddRef bound Release the QueryInterface method to guarantee object destruction upon QueryInterface failure.

HRESULT CreateTester(OUT ITester** ppObj) {
    IUnknown* pUnk = 0;
    HRESULT hr = S_OK;
    if (FAILED(hr = TesterObj::CreateInstance(&pUnk)))
        return hr;
    hr = pUnk->QueryInterface(IID_ITester, (void**)ppObj);
    return hr;


The factory method ‘CreateTester’ simply Create the object and query it for the ITester interface.

#include “Interface.h”

int main(int argc, char *argv[])
    IRefCountPrinter*    pRefCntPrinter    = 0;
    ITester*            pTester            = 0;
    HRESULT                hr                = S_OK;

    if (FAILED(hr = CreateTester(&pTester))) {
        printf(“failed creating object with hr = 0x%.8x\n”, hr);
        return hr;

    if (FAILED(hr = pTester->QueryInterface(IID_IRefCountPrinter, (void**)&pRefCntPrinter))) {
        printf(“failed Querying for the IRefCountPrinter interface with hr = 0x%.8x\n”, hr);
        return hr;

    printf(“Object is still alive\n”);
    return 0;


This is the main program used to instantiate and call upon the COM object methods, bellow is the execution output:

TesterObj::TesterObj(), ref count = 0
TesterObj::PrintRefCount(), Ref count is 2
TesterObj::TestMe(), This is a test!
Object is still alive
TesterObj::PrintRefCount(), Ref count is 1
TesterObj::~TesterObj(), ref count = 0


Component Object Model, IUnknown


This Article is designated for experienced C++ developers, It is assumed that the reader has basic experience with windows OS driver development.


WinUSB is Microsoft user-mode framework for communicating with USB devices.

The main aim of WinUSB is to reduce development cost by exposing a user-mode application level USB API, this, save the time and effort of developing a USB Driver and enables the developer to focus on application logic development.

As of Windows 8.1, Comparing to a fully fetched USB Kernel mode driver, WinUSB has few limitations, one, is the fact that while a USB device might expose multiple Configurations to choose from, WinUSB supports only the default Configuration ( the first one ).

This Article describe an approach enabling WinUSB to use any of the Configurations exposed by the USB device.

Few words about USB

“Universal Serial Bus (USB) is an industry standard developed in the mid-1990s that defines the cables, connectors and communications protocols used in a bus for connection, communication, and power supply between computers and electronic devices.” ( Commenting Wikipedia )

A USB device can expose multiple configurations, at any given moment only a single configuration can be used by the application communicating with the USB device.

While setting up a USB connection, the SW selects the configuration to be used for device communication.

Use this link for a through USB explanation.

Few words about windows drivers

In most cases, a driver is a module ( DLL ) implementing functionality related with operating a specialized HW device, In windows, multiple drivers can be used to operate a single HW device, each, implement a subset of the requiered functionality,
these drivers are grouped in stacks where each IO request is executed from the top most driver to the one bellow, each driver, in it’s turn can modify the IO being executed and execute additional logic, In addition there is a special type of drivers called filter drivers, these are used to ~filter~ the IOs goin in and out of another driver, we will get deeper into details regarding this type of drivers later on.

With USB, there are several types of drivers, The USB host-controller driver, Hub driver and more are provided by the OS, a specialized USB device requires a function driver on top of the Host-Controller & Hub drivers, these implement the specific USB HW Bus & Hub logic, the specialized USB driver will then need to implement only the functionality related with the specific HW device.

A detailed explanation of windows USB architecture can be found in this link.

The WinUSB.sys driver

Commenting msdn, “Windows USB (WinUSB) is a generic driver for USB devices that was developed concurrently with the Windows Driver Frameworks (WDF) for Windows XP with SP2. The WinUSB architecture consists of a kernel-mode driver (Winusb.sys) and a user-mode dynamic link library (Winusb.dll) that exposes WinUSB functions. By using these functions, you can manage USB devices with user-mode software.”

USB 2.0 Driver stack


We are going to make WinUSB think it is selecting the first/default configuration while under the hood we will switch the default configuration with the one desired, to achieve that,
we will implement a lower-level filter driver, one that will intercept all URBs sent by WinUSB downwards,
and, when needed, change them.

We connect to the default device queue and in specific intercept the URB_FUNCTION_CONTROL_TRANSFER and URB_FUNCTION_GET_DESCRIPTOR_FROM_DEVICE URBs where we change the configuration index from the default one requested by WinUSB to the one we want

switch (pUrb->UrbHeader.Function) {
		if ((USB_REQUEST_GET_DESCRIPTOR != pUrb->UrbControlTransfer.SetupPacket[1]) || 
		if (USB_DEFAULT_CFG_INDEX == pUrb->UrbControlTransfer.SetupPacket[2])
			pUrb->UrbControlTransfer.SetupPacket[2] = m_btZeroCfgSwitch;
		else if (m_btZeroCfgSwitch == pUrb->UrbControlTransfer.SetupPacket[2])
			pUrb->UrbControlTransfer.SetupPacket[2] = USB_DEFAULT_CFG_INDEX;
			return TRUE;// This is not what we are looking for...
		if (USB_DEFAULT_CFG_INDEX == pUrb->UrbControlDescriptorRequest.Index) {
			pUrb->UrbControlDescriptorRequest.Index = m_btZeroCfgSwitch;
		} else if (m_btZeroCfgSwitch == pUrb->UrbControlDescriptorRequest.Index) {
			pUrb->UrbControlDescriptorRequest.Index = USB_DEFAULT_CFG_INDEX;

the Figure above show what is needed to switch the default configuration with the desired value, m_btZeroCfgSwitch indicate the configuration index to replace the default with, the two control requests we need to modify
are URB_FUNCTION_GET_DESCRIPTOR_FROM_DEVICE and USB_REQUEST_GET_DESCRIPTOR where we specifically intercept the extraction of the configuration descriptor.

SetupPacket format is in accordance to Table 9-3 of the USB_3_1 spec shown bellow, for configuration query, the first byte of wValue indicate the index, this is what
we modify to make WinUSB use the configuration we want.

Coding & Concepts

The driver is implemented as a C++ KDMF Lower Filter driver, it consists of a simple C++ class for the device called UsbCfgDevice where most of the logic is implemented and a simple user-mode WinUSB application used to interact with the device.

The WinUSB Use-mode app is using WinUSB API to open the device and verify that the id of the selected configuration is equal to the id of the configuration at index zero ( the one we have overridden ), while this is true for any WinUSB application, when the default CFG is overriden, the value of the selected configuration will be different than one ( assuming the CFGs are numbered incrementally by the HW device ).

Tracing is implemented using the built-in WPP framework, To monitor logging on the debugee create a monitoring session using ‘traceview.exe’ and the driver PDB ( make sure to set logging level to verbose ).

The Driver is HW specific, the HW for which it is installed ( along with WinUSB ) is defined in the associated INF ( Explained next ).

Driver Installation

Driver installation is done using a standard INF file indicating the files and registry entries to be updated on the operating system.

Of specific importance are the following INF sections:

            Class     = USBDevice
            ClassGUID = {88BAE032-5A81-49f0-BC3D-A4FF138216D6}

The Class and ClassGUID indicate the type of the driver being installed in accordance with a system-defined device setup classes

            %DeviceName%=USB_Install, USB\VID_nnnn&PID_nnnn

Defines the HW ( using the Vendor Id and Product Id ) for which the driver is to be installed, this section can include multiple HW definitions, each having a specialized VID and PID ( ‘nnnn’ is replaced with the actual ids )


Defines the driver as a lower filter driver ( installed bellow winusb.sys on the driver stack ), and, the Configuration index we want to replace the default with.


References USB_3_1 spec USB Architecture
WinUSB Lower filter drivers USB Request Blocks
URB Header Structure INF Files Using traceview.exe
Source Code


This article is addressed to developers dealing with image processing.

In this Article I will present a technical ( and not theoretical ) perspective of the Hough Transform for the detection of lines and all of the related transformations.


One of the basic problems in machine vision is line detection, Given a noisy Image or a video feed the application should automatically detect optimal straight lines in an unsupervised manner, all this, w/o any prior knowledge of the image.

The Algorithm

The first step would be to detect the edges; Since the human brain is much more sensitive to intensity changes a YUV color space will be preferred as the image capture format.

The YUV image would then be traversed though an edge detection filter ( eg. Sobel operator ).

In order to reduce the algorithm execution time and the lines detection resolution, it is important that the resulting edges will contain as less pixels as possible, and thus, Will be as thin as possible, This, can be achieved using eg. The canny algorithm.

Once we have an image with fine edges we have to figure out which set of edge points form a line, This is where the Hough Transform comes in.

Using the Hough Transform, the monochrome edge image is transformed from image coordinates to the Hough space, At the Hough Space, each line is expressed as a point whose cordinates are the angle of the line (in the image plane) relative to the horizontal axis of the image coordinates and the length of a perpendicular to the line intersecting the Cartesian coordinate system origin (in the image plane).

Figure 1

Figure 2

Figure 1 above present the source YUV image, Figure 2 present the image after executing the Canny edge detector, In green, is one of the dominant image lines, in blue is the perpendicular to the line, and, in red is the angle.

In the Hough space, all lines are expressed as points whose coordinates are angle and radius ( red and blue in Figure 2 above ).

The Hough Space is used to present lines only, How then can we convert the set of edge pixels presented in Figure 2 above into Hough Space? We simply assume all possible straight lines going though each of the edge pixels, For each different line we create a counter, Each pixel, having the same line passing through, will increment the counter for the same line, and thus, multiple edge pixels located on the same line will result in Hough Space pixel with higher Counter value, This is demonstrated at Figure 3 bellow.

Figure 3

In dashed yellow are ~all possible~ lines going through the two red dots, In green, is the only line going through both of the points, The counter for each of the yellow lines will be 1 while the counter for the green line will be 2 indicating higher probability that a straight line is passing through the edge pixels presented by the two red dots.

The Hough transform is 2D to 3D transformation where the input is a binary 2D edge image and the result is a 3D image where orientation present the angle and the radius, and intensity present probability for existence of a straight line.

Lines in the hough space are expressed using this line formula: int radius = (int)(x*cos(angle) + y*sin(angle)), the derived ‘radius’ and ‘angle’ form a single hough space point ( pixel ).

Intensity present the amount of edge pixels sharing a common line having the same orientation, Figure 4 bellow presents the result of the Hough transform ( the Hough space ).

Figure 4

The three most dominant peaks are surrounded with red circles, These, indicate the three most dominant lines, specifically, the left road border, the Horizon, and, the right road border ( from left to right ).

The next step is to resolve the specific lines, As noted, these lines are reflected by the most intense pixels, Figure 5 bellow present the lines whose Hough Space pixel brightness is above 80%.

Figure 5

As we can see there is allot of noise in the form of multiple lines, This noise is the result of the rough classification approach mentioned above, The features ( lines ) in this form are not usable.

Redundant lines ( noise ) must be dramatically reduced in order for the extracted features to be usable, Our algorithm should extract the most probable 3 lines described in Figure 4 above.

There are more than few methods to reduce the noise presented in Figure 5 above, One of the most robust is the mean-shift algorithm, This algorithm is used to find local stationary points where the 1st order derivative is equal to zero, we will use this algorithm to find local maxima, and thus, resolve the three most probable lines.

Figure 6 bellow illustrates the result of executing the mean-shift algorithm at the Hough space, This, result in considerable reduction of the amount of lines detected.

Figure 6

We can see that by using the mean-shift algorithm we were able to remove all noise, resulting in the exact three lines that were expected.

Line detection flow diagram

Figure 7

Final words

The Hough Transform is a robust unsupervised algorithm used to detect lines in an arbitrary image.

It enables detection of straight lines without resolving their start/end points, In addition, it is a relatively greedy algorithm, Optimizations of the Hough Transform include the Random Hough transform, the Hierarchical Hough transform and more.

Reducing the resolution of the Hough Space can dramatically improve it’s performance on count of results granularity.


Sobel operator, Canny edge detector, mean-shift algorithm, OpenCV


This Article is intended for experienced C++ developers, It is assumed that the reader is familiar with Windows API programming.


An example for memory overrun is when writing to a memory block more bytes then what was actually allocated, and, possibly overwriting memory intended to be used for other purpose, This, might cause unpredictable behavior when the overwritten memory is accessed in future time.

While the symptom of a memory overrun is easily detected ( usually a crash of some sort ), The cause of a memory overruns is much harder to find, that is, because the overruned memory might be used long after the time the overrun has actually happened.

In this Article I will demonstrate a simple approach to pin-point the cause ( rather than the symptom ) of the memory overrun as it happens: the debugger show the line of code causing the breach.

Few words about Memory allocation

Memory is aimed to be sequentially allocated on the heap ( that is, when it is not fragmented ), This means that two sequential allocation requests will result in two adjacent blocks of memory on the heap, Each of these blocks is built of a small header internally used by the OS followed by a block of bytes in the requested size, This is illustrated
in Figure 1 bellow:

                      pFirst[13]                                      pSecond[13]                   

     – Memory internally used by OS to keep track of the allocated block

     – X amount of bytes requested by HeapAlloc/malloc/new, …

Having the above in mind, when a memory overrun occurs data written into ‘pFirst’ breach the size initially allocated and overruns part of the next memory block, That is, data written to ‘pFirst[13+1]’ will overwrite the header of the next memory block and might also overwrite ‘pSecond’ if the breach is large enough.


The code example in Figure 2 bellow demonstrates a simple memory overrun scenario where ‘memset’ is called to set 26 bytes of a memory block allocated with only 13 bytes producing a memory overrun while breaching into ‘pSecond’ array address space

Figure 2

Section (A) at ‘memset(pSecond, … )’ cause the memory overrun, The impact is not immediate.
Section (B) is where the application fails, while HeapFree ( called by the delete operator ) is trying to deallocate the memory, it is accessing the memory block header ( Internally managed by the OS ), This header was corrupted at (A) and contains invalid data, This cause heap validation to fail ( RtlValidateHeap ) that eventually leads to a premature termination of the application.

So what is so special about memory overruns? Well, as seen in the above example, the impact of a memory overrun is not immediate, Thus, with the above application, the actual overrun has occurred at (A) while the first place it had an impact is at (B), ‘long time’ after the overrun has actually occurred, The above is a simple example though there might be much more complex scenarios where multiple classes and threads are involved.

What if we could intercept the memory overrun at the point where it happens ( at A )?, Then, obviously, it would have been much easier to pin-point the bug.

Digging deeper

The OS manage memory using pages, A page is 4096 bytes size ( for both x86 and x64 ), A single allocated block of memory might span over multiple pages, Each page has protection rights, The protection right given to a block of memory allocated on the heap is PAGE_READWRITE, This enables reading and writing from that page, There are other protection rights, Interesting in-specific is the PAGE_GUARD right, Which prevents any access to the page, If accessed EXCEPTION_GUARD_PAGE is raised, ‘How does this is related with memory overruns’ you might ask,
Well, having a EXCEPTION_GUARD_PAGE exactly at the end of each allocation block will immediately trigger an EXCEPTION_GUARD_PAGE if the allocated block boundary is breached, and hence, indicate the overrun as it happens, giving control to the debugger ( if attached ) for further analysis.


So how can we do that? Well, obviously this requires implementation of custom allocation function and/or to overload the existing methods ( eg. the new operator ), First thing we should start with, is aligning the end ( and not start ) of each memory block with the end of a memory page, This can be achieved using the _aligned_malloc method, Using this method with alignment of a PAGE_SIZE, we can guarantee that the allocated memory block will start ( and not end ) on a new page of memory, Aligning the requested bytes count to the next highest multiple of PAGE_SIZE will enable us to align the end of the memory block with the end of the page, Next, is to have a guard page aligned to the end of the allocated memory block, This can be done by allocating one page more than what is needed and aligning the end of the memory block to the start of that page, All remaining now, Is to set the memory protection of the last ‘guard page’ to PAGE_GUARD, This can be easily achieved using the VirtualProtect API, Figure 3 bellow demonstrate how this can be achieved.

Figure 3

The above code snap demonstrate implementation of the ‘guard page’ concept with basic memory allocation functions

Note that the ‘realloc‘ method doesn’t really try to reallocate memory in the fashion
HeapRealloc does, rather, It checks if the new bytes count falls into the PAGE_SIZE boundary, If it does, internal structures are adjusted and memory is moved to reflect the new requested bytes count, Otherwise, a totally new block of memory is allocated at the requested size, data is copied, the old memory block is freed and a pointer to the new block is returned.
Trying to reallocate memory in the HeapRealloc fashion will force removing the PAGE_GUARD from the guard page for a short period of time, and this, might let memory overruns happen without being noted.

The code snap presented at Figure 4 bellow shows how to use the above memory management methods with some overloads of the new operator. These overloads must be defined on global scope to be able to properly override the defaults.

Figure 4

Using the suite of methods presented at Figures 3 and 4 with the application presented at Figure 2 will cause a debugged application to break at (A) and not at (B) as with the standard CRT implementation, and hence will enable identification of the problem as it happens making it much easier to resolve.

Final words

The above suite of methods is good while hunting for memory overruns, however, nothing comes for free, These methods have memory size and performance penalty, Specifically when reallocating memory using ‘Memory::realloc’, Having that in mind it is advised to use this suite of methods only on _DEBUG mode while on Release to use the standard CRT implementation, This can easily be implemented by combining few #ifdef statements.

It is worth noting that PAGE_GUARD will except only once, if the exception is suppressed no further PAGE_GAURD exceptions will be generated for the page, to support repeating exceptions use eg. PAGE_NOACCESS rather than PAGE_GAURD.


This Article is intended for developers experienced with C++ and Low-Level Windows API programming


The focus of this article is to discuss an autonomous method of generating dump files w/o the need of any development tool installed.

It starts by giving a high level explanation of what dump files are and what they are used for, then, it present few of the most common development tools used to generate dump files and discuss windows exception model, finally, a way of generating dump files w/o the need of any development tool is presented.

So what a Dump file is?

A dump file is the image of the process at a certain point in time, this process image can include various information such as the call stack & stack variables, loaded module list, and even an image of the raw memory used by the application.
This valuable information can then be used to analyze the process state at the time the dump file was generated.

What is it used for?

In most cases ( but not only ) Dump files are used to identify the root of an exceptional condition causing the process to abnormally terminate ( a 2nd chance exception ), having a dump file generated just before the application has crashed will enable postmortem analysis of the process state when it has crashed, and thus, enables pin-pointing the root of the problem.

Using MS Visual Studio to generate memory dumps

Microsoft Visual Studio enable generation of memory dumps while breaking the execution of a debugged process, this can be done through the Debug->Save Dump As menu item as illustrated in Figure 1 bellow

Figure 1

Two dump file types are support by the IDE, a ‘minidump’ that include stack trace information ( resulting small files ), and a ‘minidump with heap’ including the full memory image ( resulting large files ).

Using ADPlus to generate dump files

Debugging tools for windows is a light weight suite of tools for debugging applications, It is ideal for customer site problem resolution, and for scenarios where it is not possible to install heavy duty development environments such as Microsoft Visual Studio.

ADPlus ( also known as ‘AutoDump+’ ) is a light weight tool used to automatically generate Dump files, that is, upon abnormal process termination a Dump file will automatically be generated enabling postmortem analysis of the process state when it has crashed, it also support automatic dump generation upon deadlocks, Figure 2 bellow present sample ADPlus command line.

ADPlus.exe –crash –pn winword.exe –o d:\Dumps

Figure 2

The above attach ADPlus to winword.exe and generates dump files at ‘d:\dumps’ upon winword.exe crash, click here for the full command line specification.

Analyzing Dump Files

Dump file analysis is the phase where postmortem takes place, Starting with Microsoft Visual Studio 10, it is possible to directly analyze dump files for un-managed applications through the IDE, this is done through the “File->Open->’File…’” menu and then by selecting the dump file to analyze ( ‘*.dmp’, ‘*.mdmp’, ‘*.hdmp’ extensions ).

Once opened, Click the ‘Play’ Icon and the IDE will take you to the point where the application was breaking.

It is important to note that for Dump Analysis to properly work it is essential to keep the symbol files ( .pdb ) associated with the executable for which the dump was created, these should then be used during the analysis process.

Dump file analysis for managed applications is supported by debugging tools for windows and will be covered in a specialized Article.

Process termination due to Exceptional condition

A Process might be abnormally terminated due to an exceptional condition preventing normal process execution, such an exceptional condition is usually due to a programming error ( a SW bug ), A list describing common exceptions can be found here.

The operating system use Structured exceptions to indicate such exceptional behavior, the executing application will get the first chance to deal with the exception, and if not dealt with or if dealt with but not suppressed, the operating system will get the second chance to deal with it, having 1st and 2nd chance exceptions respectively, most of the time when 2nd chance exceptions are generated the operating system will terminate the application ( a crash ), exceptions are eg. debugger breakpoints ( DebugBreak() ) where once intercepted, the OS will open a dialog letting the programmer to choose if he wants to debug the application ( assuming a debugger is installed ) or supress the exception.

Generating Dump files upon abnormal termination

No more than few lines of code are needed to be able to automatically generate dump files when the application is crashing, Figure 3 bellow demonstrate what is needed.

Figure 3

The above code snap uses Structured Exception handling to intercept 2nd chance exceptions, this is done by installing the Unhandled exception handler ‘__TopLevelExceptionHandler’ ( using SetUnhandledExceptionFilter ) that intercept all 2nd chance exceptions.

Once an exception has been intercepted ‘__TopLevelExceptionHandler’ is invoked and does the actual dump file generation.

The un-handled exception handler ( in our case ‘__TopLevelExceptionHandler’ ) is executed on the context of the thread throwing the exception, thread stack is not collected while the handler is executed, this, might limit the exception handler implementation on stack overflow scenarios where there might not be enough space left on the stack to execute the handler functionality, for this, ‘__TopLevelExceptionHandler’ create a separate thread where the actual ~dumping~ process will synchronously execute.

The actual dumping process is executed by the ‘__GenerateDumpFile’ method, in specific by using the MiniDumpWriteDump API.

By default the dump file will be generated at the directory of the executing process, the name of the file include the time, the exception code, and, the name of the process.

The code can easily be integrated in to any C++ application enabling automatic dump file generation, and, reducing the cost of customer site probelm interception.

Final words

I was trying to have the code provided with this article as clear & simple as possible, The generated dump files might take considerable disk space, integrating this code with any commercial product will req implementation of a dump file recycling mechanism.


MP4 is a widely used container format for multimedia files, it is an extension of Apple’s QuickTime file format, and is agnostic to the actual codec used for encoding, It can contain multiple streams of video, audio and data ( eg. subtitles ).

MP4 files are broken into two main parts, the payload part where interleaved audio & video are stored and the metadata part where information describing the payload is stored,
that information consists, for example, of the available streams, their payload format/compression type, …

So what are we trying to solve?

MP4 Metadata of specific importance is the file Index, the index is pointing to the offset of the file where the payload ( eg. video ) of a specific time is found, this way, the player knows where the payload for the first video frame is found, and, what data to play at a given time.

The following present a high-level view of the MP4 file structure:

When ~recording~ a video file, the duration of the file and amount of recorded data can ( obviously ) be known only once recording has finished, and thus, the Index is stored at the end of the file.

MP4 files are commonly used on web sites for Video playback, To play the file, a player ( eg. Web Browser ) must read the file from the remote site, files are read sequentially, starting at offset zero.

A player must read the Index before processing any video payload, and thus, must read the file up to it’s end ( where the index reside ) before being able to present the first video frame, for big MP4 files, this limitation might cause playback to start a long time after the play button was actually clicked rendering a poor the user experience.

In this article I will show how to reduce playback latency to a minimum by moving the metadata chunk from the end of the file to it’s start making it available for the player to consume before the first video payload is read, and thus, enabling playback to commence before the file was fully downloaded to the client machine ( also known as progressive download ).

Basic File structure

In accordance with Chapter 1 ( Page 17 ) of the QuickTime file format, The basic structure of MP4 files consists of a data chunk of data called an ATOM, each ATOM has a unique id ( uuid ) and size ( in bytes ).

Specific ATOMs contain data, and others contain a set of other child ATOMs.
ATOMs can have a ‘size’ indicator of either 32bit or 64bit, in this article we assume a 64bit size indicator, the following is the 64bit ATOM structure:

    struct ATOM {
        UINT64	size;
        union {
            UINT uuid;
            CHAR name[4];
        } type;

The following figure present a typical ATOM hierarchy:

There are three types of atoms we will need to deal with in specific

‘mdat’ >This atom is used to hold the raw media data such as compressed Video & Audio samples, the media data is stored according to time in an interleaved fasion as can be seen in the figure to the right:
‘moov’ Holds all metadata related with the media file, “it is essentially a container of other atoms. These atoms, taken together, describe the contents of a movie. At the highest level, movie atoms typically contain track atoms, which in turn contain media atoms. At the lowest level are the leaf atoms, which contain non-atom data, usually in the form of a table or a set of data elements. For example, a track atom contains an edit atom, which in turn contains an edit list atom, a leaf atom which contains data in the form of an edit list table.”
‘stco’ An indirect child of the ‘moov’ atom, available on a per media stream ( ‘trak’ atom ) basis, pointing to the offset of the media payload directly in the ‘mdat’ section, the following simplified diagram present a possible configuration:


Moving the Metadata ( ‘moov’ ) ATOM to the beginning of the file require modification of the ‘stco’ offsets so they will be aligned with the new ‘mdat’ position.

The process is finalized by iterating through all of the ‘stco’ ATOMs and updating the offsets after moving the ‘moov’ ATOM to the beginning of the file.

The project consists of A single ‘.cpp’ file implementing the logic described in this article, for simplicity memory mapped files were used for file modification and access

While developing on Windows OS, The code implementation was made as simple as possible so it could easily be ported to any platform.


QuickTime file format
Movie Atoms
MP4 Spec