I have been teaching myself to reverse engineer binary programs so that I can use these skills to reverse engineer malware. I have been learning assembly code, and playing with new tools such as ghidra and radare2/cutter.
I found that @MalwareTech had some great binary analysis challenges on his blog and decided to check them out.
This write up covers the fourth challenge shellcode1.exe: ‘https://www.malwaretech.com/challenges-shellcode1’
Lets open this binary in cutter and analyze it with radare2. Once open lets navigate to the entry function:
The first bit of intresting code we come across is:
| 0x00402285 push 0x10 ; 16 | 0x00402287 push 0 ; DWORD dwFlags | 0x00402289 call dword [sym.imp.KERNEL32.dll_GetProcessHeap] ; 0x403008 ; HANDLE GetProcessHeap(void) | 0x0040228f push eax ; HANDLE hHeap | 0x00402290 call dword [sym.imp.KERNEL32.dll_HeapAlloc] ; 0x403004 ; LPVOID HeapAlloc(HANDLE hHeap, DWORD dwFlags, SIZE_T dwBytes) | 0x00402296 mov dword [var_4h], eax | 0x00402299 mov eax, dword [var_4h] | 0x0040229c mov dword [eax], str.2b__:__B_bb ; [0x404040:4]=0x3a0a6232 ; "2b\n:\u06daB*bb\x1az\"*iJ\x9ar\xa2iR\xaa\x9a\xa2i2z\x92i*\u0082bzJ\xa2\x9a\xeb" | 0x004022a2 push str.2b__:__B_bb ; 0x404040 ; "2b\n:\u06daB*bb\x1az\"*iJ\x9ar\xa2iR\xaa\x9a\xa2i2z\x92i*\u0082bzJ\xa2\x9a\xeb" ; const char *s
0x00402285 is pushing 16 to the stack. 0x00402287 is pushing 0 to the stack. 0x00402289 is a call to GetProcessHeap
Retrieves a handle to the default heap of the calling process. This handle can then be used in subsequent calls to the heap functions.
0x0040228f is pushing a handle to our current heap to the stack.
Currently our stack is as follows:
Heap 0 16
0x00402290 is a call to HeapAlloc
Allocates a block of memory from a heap. The allocated memory is not movable.
DECLSPEC_ALLOCATOR LPVOID HeapAlloc( HANDLE hHeap, DWORD dwFlags, SIZE_T dwBytes );
This call is returning a pointer to allocated memory. Specifically a 16 byte section of the heap.
After accepting our pointer we are then pushing a string from offset 0x404040 to the stack:
| 0x0040229c mov dword [eax], str.2b__:__B_bb ; [0x404040:4]=0x3a0a6232 ; "2b\n:\u06daB*bb\x1az\"*iJ\x9ar\xa2iR\xaa\x9a\xa2i2z\x92i*\u0082bzJ\xa2\x9a\xeb" | 0x004022a2 push str.2b__:__B_bb ; 0x404040 ; "2b\n:\u06daB*bb\x1az\"*iJ\x9ar\xa2iR\xaa\x9a\xa2i2z\x92i*\u0082bzJ\xa2\x9a\xeb" ; const char *s
Using the comment generated by radare2 we know our string starts at 0x404040 and the last byte is EB. Lets go look at 0x404040 in the hexdump view and grab those bytes:
32 62 0a 3a db 9a 42 2a 62 62 1a 7a 22 2a 69 4a 9a 72 a2 69 52 aa 9a a2 69 32 7a 92 69 2a c2 82 62 7a 4a a2 9a eb
The next few lines of assembly calls strlen which would get the length of the dword we just pushed to the stack:
| 0x004022a7 call sub.ntdll.dll_strlen ; size_t strlen(const char *s) | 0x004022ac add esp, 4 | 0x004022af mov ecx, dword [var_4h] | 0x004022b2 mov dword [ecx + 4], eax
It looks like we are saving this string for later use as var_4h lets move on:
| 0x004022b5 push 0x40 ; '@' ; 64 ; DWORD flProtect | 0x004022b7 push 0x1000 ; DWORD flAllocationType | 0x004022bc push 0xd ; 13 ; SIZE_T dwSize | 0x004022be push 0 ; LPVOID lpAddress | 0x004022c0 call dword [sym.imp.KERNEL32.dll_VirtualAlloc] ; 0x403000 ; LPVOID VirtualAlloc(LPVOID lpAddress, SIZE_T dwSize, DWORD flAllocationType, DWORD flProtect) | 0x004022c6 mov dword [s1], eax
At 0x004022c0 we are calling VirtualAlloc(0, 13, 0x1000, 0x40)
Reserves, commits, or changes the state of a region of pages in the virtual address space of the calling process. Memory allocated by this function is automatically initialized to zero.
Lets go over what the parameters we are sending mean.
- 0 is the starting address of the region to allocate.
- 13 is the region size in bytes.
- 0x1000 translates to
Allocates memory charges (from the overall size of memory and the paging files on disk) for the specified reserved memory pages. The function also guarantees that when the caller later initially accesses the memory, the contents will be zero. Actual physical pages are not allocated unless/until the virtual addresses are actually accessed. To reserve and commit pages in one step, call VirtualAlloc with MEM_COMMIT | MEM_RESERVE. Attempting to commit a specific address range by specifying MEM_COMMIT without MEM_RESERVE and a non-NULL lpAddress fails unless the entire range has already been reserved. The resulting error code is ERROR_INVALID_ADDRESS. An attempt to commit a page that is already committed does not cause the function to fail. This means that you can commit pages without first determining the current commitment state of each page. If lpAddress specifies an address within an enclave, flAllocationType must be MEM_COMMIT.
- 0x40 translates to
Enables execute, read-only, or read/write access to the committed region of pages.
The return value, the base memory address, is saved as a dword.
Now we have an address to a 13 byte section of page memory with RWX.
Moving forward it looks like we are setting up the stack to call memcpy(address to our 13 byte page memory, 0x404068, 13)
| 0x004022cc push 0xd ; 13 ; size_t n | 0x004022ce push 0x404068 ; 'h@@' ; const void *s2 | 0x004022d3 mov edx, dword [s1] | 0x004022d9 push edx ; void *s1 | 0x004022da call sub.ntdll.dll_memcpy ; void *memcpy(void *s1, const void *s2, size_t n) | 0x004022df add esp, 0xc | 0x004022e2 mov esi, dword [var_4h] | 0x004022e5 call dword [s1]
So we are copying 13 bytes from 0x404068 to page memory. Then we are moving the string from earlier to esi. 0x004022e5 is a call to si which is the memory address that we have copied to. Lets take a look at whats at 0x404068:
| 0x00404068 mov edi, dword [esi] | 0x0040406a mov ecx, dword [esi + 4] ; [0x4:4]=-1 ; 4 | .-> 0x0040406d rol byte [edi + ecx - 1], 5 | `=< 0x00404072 loop 0x40406d | 0x00404074 ret
This appears to be our shellcode and we are injecting it into page memory. The call in 0x004022e5 runs this code.
Right away we see that we are moving esi to edi.
We are then setting up a counter, which relates back to the strlen call we saw earlier. This is likely our way of iterating over the string so to speak.
We then see:
| .-> 0x0040406d rol byte [edi + ecx - 1], 5 | `=< 0x00404072 loop 0x40406d | 0x00404074 ret
So it appears our shellcode is taking our string (byte array) from earlier and rotating the bits left 5 and then returning the new string, which I can only assume is our flag.
Normally I would grab the hex throw it in a Python list and iterate through it, however Python does not contain a native rotate function.
Instead of grabbinbg one off Github or writing one myself, lets use another tool that I find very useful when working with data conversions: CyberChef
And just like that we have our flag.
The rest of the assembly is what we have become familiar with in these challenges. When ran the program prints the MD5 value of the flag and pops a messagebox saying “We’ve been compromised!”
I learned a lot from this challenge. This is all new to me so going in getting my hands dirty has been great. We found a “string” that wasnt readable. We found byte code that we then injected into memory. And we were able to replicate the byte code and sucessfully translate our string into a readable flag.
10/10 will upload again.