BetterDoorThanAnNT - RedPwn 2019

TL;DR

As this post is really long, there will be an other TL;DR later in this WU. Here is the very short story.

To sum up : you had to find chosen xored AdressOrdinalNames and find the first letter corresponding to each of them. All the letters gives NOWWREUERSEWVMP. Knowing the flag format, I tried to find where to put underscores. As the flag is looking like “NOW REVERSE VMP”, I tried NOW_REVERSE_VMP . The letters was probably not exact because they are kernel32.dll version dependent, and I probably did not have the exact required version of kernel32.dll.

flag{NOW_REVERSE_VMP}

Introduction

We’re welcomed with a long description saying that the binary has been written by Daax, a malware author. We can expect sneaky techniques to hide functions and strings, and probably anti-reversing techniques. The executable returns an error when you try to run it. This has to be a static analysis.

1566035711825

The Static Analysis way

TR;LD :

Defeating anti-disassembly techniques

This challenge has a few anti-disassembly techniques. For the rest of this WU, we will assume we already defeated all those techniques. If you want to learn about anti-disassembly and how to defeat it, that will be discussed in Annex.

Here is a 4mn GIF about how I defeated it for this challenge (the video will be explained in the Annex):

patchBinary

Finding main and its arguments

Loading it in IDA we arrive at this block of instructions :

.text:0040133D loc_40133D:                             ; CODE XREF: .text:0040133A↑j
.text:0040133D                 push    eax
.text:0040133E                 mov     eax, 39830h
.text:00401343                 xor     eax, 25B14h
.text:00401348                 cmp     eax, 1C324h
.text:0040134D                 pop     eax
.text:0040134E                 jz     short loc_401352 ; Jmp always taken

We quickly notice that the jmp is always taken :

>>> hex(0x39830^0x25b14)
'0x1c324'

So, why a jz instruction? It is used in a common anti-disassembly technique. Again, defeating anti-disassembly will be talked in Annex. We can move on :

.text:00401330 loc_401330:                             ; CODE XREF: start↓j
.text:00401330                 push    11512478h
.text:00401335                 push    11111111h
.text:0040133A                 jmp     short loc_40133D
.text:0040133C ; -------------------------------------------------------------------------
.text:0040133C
.text:0040133C locret_40133C:                          ; CODE XREF: .text:00401363↓j
.text:0040133C                 retn
.text:0040133D ; -------------------------------------------------------------------------
.text:0040133D
.text:0040133D loc_40133D:                             ; CODE XREF: .text:0040133A↑j
.text:0040133D                 push    eax
.text:0040133E                 mov     eax, 39830h
.text:00401343                 xor     eax, 25B14h
.text:00401348                 cmp     eax, 1C324h
.text:0040134D                 pop     eax
.text:0040134E                 jz      short loc_401352
.text:0040134E ; -------------------------------------------------------------------------
.text:00401350                 db 0FFh ; ÿ
.text:00401351                 db  25h ; %
.text:00401352 ; -------------------------------------------------------------------------
.text:00401352
.text:00401352 loc_401352:                             ; CODE XREF: .text:0040134E↑j
.text:00401352                 push    eax             ; this substract 11111111h 
.text:00401353                 mov     eax, [esp+8]    ; to 11512478h on the stack
.text:00401357                 sub     eax, [esp+4]    ; which give 0x00401367
.text:0040135B                 mov     [esp+8], eax
.text:0040135F                 pop     eax
.text:00401360                 add     esp, 4          ; points esp to 0x00401367
.text:00401363                 jmp     short locret_40133C  ; ret to 0x00401367
.text:00401363 ; -------------------------------------------------------------------------
.text:00401365                 db 0FFh ; ÿ
.text:00401366                 db  25h ; %
.text:00401367 ; -------------------------------------------------------------------------
.text:00401367                 push    0DAA5BADDh     ; <-- this will be important
.text:0040136C                 jmp     loc_4012F0

The jmp instruction at 0x00401363 jumps to 0x0040133c, which is a retn instruction that pop 4 bytes and ret to 0x00401367 . Here we can notice that the value 0xDAA5BADD is pushed to the stack (it will become useful later on).

The program then jumps to 0x4012f0, and here we go trough the same process again :

.text:004012F0 loc_4012F0:                             ; CODE XREF: .text:0040136C↓j
.text:004012F0                 push    11512438h
.text:004012F5                 push    11111111h
.text:004012FA                 jmp     short loc_4012FD
.text:004012FC ; -------------------------------------------------------------------------
.text:004012FC
.text:004012FC locret_4012FC:                          ; CODE XREF: .text:00401323↓j
.text:004012FC                 retn
.text:004012FD ; -------------------------------------------------------------------------
.text:004012FD
.text:004012FD loc_4012FD:                             ; CODE XREF: .text:004012FA↑j
.text:004012FD                 push    eax
.text:004012FE                 mov     eax, 39830h
.text:00401303                 xor     eax, 25B14h
.text:00401308                 cmp     eax, 1C324h
.text:0040130D                 pop     eax
.text:0040130E                 jz      short loc_401312 ; always jumps
.text:0040130E ; -------------------------------------------------------------------------
.text:00401310                 db 0FFh ; ÿ
.text:00401311                 db  25h ; %
.text:00401312 ; -------------------------------------------------------------------------
.text:00401312
.text:00401312 loc_401312:                             ; CODE XREF: .text:0040130E↑j
.text:00401312                 push    eax
.text:00401313                 mov     eax, [esp+8]    
.text:00401317                 sub     eax, [esp+4]
.text:0040131B                 mov     [esp+8], eax
.text:0040131F                 pop     eax
.text:00401320                 add     esp, 4
.text:00401323                 jmp     short locret_4012FC ;jmps to a ret
.text:00401323 ; -------------------------------------------------------------------------
.text:00401325                 db 0FFh ; ÿ
.text:00401326                 db  25h ; %
.text:00401327 ; -------------------------------------; call the pseudo_main of the prog
.text:00401327                 call    sub_4013B0     ; with 0xDAA5BADD as argument
.text:0040132C                 add     esp, 4
.text:0040132F                 retn

Following the program flow, we can quickly see that we will be calling sub_4013B0. We can see that the only truly pushed value to the sub_4013B0 function was the previously pushed 0xDAA5BADD at 0x00401364.

Reverseing pseudo-main function & Asking for user input

Reading a bit of the code, we can see that as expected, strings in this program are encrypted using XOR. An example of string being decrypted by the program :

  Buffer = xmmword_402F90;  // 0E174B4A464F08424E51045147554E5Ah
  v16 = 256;
  do
  {
    *((_BYTE *)&Buffer + v1) ^= (_BYTE)v1 + 31;
    ++v1;
  }
  while ( v1 < 0x10 );

The result of this decrypted string for example gives Enter the flag:.

So the program ask the user a flag and stores it in a buffer :

  WriteConsoleA(hConsoleOutput, &Buffer, v2 - 1, 0, 0);
  NumberOfCharsRead = 0;
  ReadConsoleA(hConsoleInput, userInput, 0xFFu, &NumberOfCharsRead, 0);
  v13[NumberOfCharsRead] = 0;

Then the program gives the input to a function:

  v4 = ((int (__cdecl *)(char *, int))loc_401000)(userInput, a1);

and compares its result. If the result is false, the program prints Oooopsie Daisie, else it prints the flag is {userinput} (I wont be detailing this part for the sake of brevity, but it just decrypt the strings and use WriteConsoleA to print stuff).

It also gives the variable a1 to the function, which is the first argument of our current function. We saw earlier that the only given argument to our current function was 0xDAA5BADD. So the function gets the user input and the value 0xDAA5BADD. We can assume that this function checks if our given input is correct. If it is, the program say it’s the flag. I’ll rename this function check_input accordingly.

Checking input

Let’s get our hands dirty, the real challenge starts here !

In the first lines, we quickly see useful informations about the flag format :

  CharUpperBuffA(inputUser, lenInputUser);
  if ( lenInputUser == 15 )

So we’re looking for a 15 char uppercase flag. Let’s continue.

Here is the part of the function that will be interesting.

if ( lenInputUser == 15 )
  {
    v5 = 0;
    v24 = 0;
    while ( 1 )
    {
       // On odd loop count v6 will be 0xBADD
      if ( v5 & 1 )
        v6 =  pushed_0xDAA5BADD >> 16;
       // On pair loop count v6 will be 0xDAA5
      else
        LOWORD(v6) = pushed_0xDAA5BADD;
      v23 = (unsigned __int16)(v6 ^ word_404000[v5]);  // something gets xored
      v7 = sub_401250()[6];  // interesting function
      v8 = v7[15];
      if ( *(int *)((char *)v7 + v8) == 0x4550 )   // chr(0x45) = "P" | chr(0x50) = "E"
      {
        v9 = *(int *)((char *)v7 + v8 + 120);
        v10 = 0;
        v22 = *(int *)((char *)v7 + v9 + 24);
        if ( v22 )
        {
          v11 = (_WORD *)((char *)v7 + *(int *)((char *)v7 + v9 + 36));
          v12 = (int *)((char *)v7 + *(int *)((char *)v7 + v9 + 32));
          while ( *v11 != (_WORD)v23 )
          {
            ++v10;
            ++v12;
            ++v11;
            if ( v10 >= v22 )
              goto LABEL_20;
          }
          v21 = (char *)v7 + *v12;
        }
      }
      else{ 
          [..] // printing that our input flag isn't valid
      }
LABEL_20:
      if ( *v21 != inputUser[v24] )  // checking user input validity
        break;
      v5 = v24 + 1;
      v24 = v5;
      if ( v5 >= 15 )
        return 1;
    }
  }

See commentaries in the code

Function sub_401250

We’re coming across this function.

  for ( i = *(*(__readfsdword(0x30u) + 0xC) + 0xC); ; i = *i )
  {
      //
      [..]
	  // Cut for brevity, operations on obfuscated string. String2 == "kernel32.dll"
    v10 = 0;
      // comparing library name to String2's name (kernel32.dll)
    if ( CompareStringW(0x400u, 1u, i[12], *(i + 22) >> 1, String2, 12) == 2 ) 
      break;
  }
  return i;
}

The function is comparing a name with the string kernel32.dll. But where is that name coming from? Ladies and gentleman, let me introduce how obscure are Windows internals.

Here is the asm part of what will be important here. The following lines correspond to __readfsdword(0x30u) + 0xC) + 0xC .

.text:00401253                 mov     eax, large fs:30h
.text:00401259                 sub     esp, 1Ch
.text:0040125C                 mov     eax, [eax+0Ch]

The program is loading FS:30 in eax and then makes eax points to FS:30 + 0xc. Alright.

On Windows, the FS register is a special register you can’t modify that allows you have access to the TIB (Thread Information Block), a huge block of memory describing many things about the current running thread.

The TIB can also be called TEB (Thread Environment Block)

Here is a table listing all segment accessible via the FS register on Windows. According to the table, offset 0x30 within the TEB is the pointer to the PEB Process Environment Block (PEB).The PEB is a huge structure living within every process describing all sorts of things. Our current program tries to access FS:30+0xc, AKA PEB+0xc. A description of the PEB struct in latest x86 Windows can be found here.

https://www.vergiliusproject.com/ is a great website describing internals structs of each version of Windows

According to this website, FS:30[0xc] is a struct _PEB_LDR_DATA* Ldr; , which is the generic name for an LDR_DATA_TABLE_ENTRY.

This is the beginning of a known technique used by malwares to load functions from kernel32.dll without having to put them directly into the executable (to prevent easy strings to show them, for example).

Input comparison with the flag using PEB?

So now we can assume that our program will be looking for some exported functions from kernel32.dll.

The previous functions returned a LDR_DATA_TABLE_ENTRY to the kernel32.dll.

As decompilation on the next part of the function is a little quacky, we’ll just be looking at the assembly. We got that :

v7 = sub_401250()[6];  // loading kernel32.dll LDR_DATA_TABLE_ENTRY 

The [6] notation doesn’t make much sense there, so we will check the asm :

.text:00401054                 call    getKernel32_dll			   ; get kernel32.dll
.text:00401059                 mov     edi, [eax+18h]			   ; points to baseDLL
.text:0040105C                 mov     ecx, [edi+3Ch]	; points to [DLL+0xc] which is PE 														  ; signature ("PE\0\0" according to 													     ; microsoft doc")
.text:0040105F                 cmp     dword ptr [ecx+edi], 4550h  ; "PE"

Once the PE signature checked in the header, the program gets the EXPORT_DIRECTORY_TABLE:

.text:00401068                 mov     ecx, [ecx+edi+78h] ; base + signarue offset + 0x78

Then, the program checks if the number of exported functions is > 0 :

.text:0040106E                 mov     ebx, [ecx+edi+18h]   ; export_table nb of functions
.text:00401072                 mov     [ebp-0Ch], ebx		; almost useless
.text:00401075                 test    ebx, ebx

if so, it stores the address of the export name pointer table and the address of the ordinal table into eax and ecx :

.text:0040107D                 mov     eax, [ecx+edi+24h]   ; Export ordinal table
.text:00401081                 mov     ecx, [ecx+edi+20h]   ; Export name table

For the rest of this WU you need to know how the export table works. This is discussed here in the Annex.

Ordinal = (v8 + *(v10 + v8 + 0x24));
FunctionName = (v8 + *(v10 + v8 + 0x20));
while ( *Ordinal != xoredValue )
{
	++v11;
    ++FunctionName;
    ++Ordinal;
    if ( v11 >= v23 )
    	goto LABEL_20;				// compare the first char of the name associated to 									// the given ordinal with our input	
    }
    firstCharOfFunctioName = (v8 + *FunctionName);
}

The xoredValue comes from earlier in the function :

       // On odd loop count v6 will be 0xBADD
      if ( loop_count & 1 )
        v6 =  pushed_0xDAA5BADD >> 16;
       // On pair loop count v6 will be 0xDAA5
      else
        LOWORD(v6) = pushed_0xDAA5BADD;
      v23 = (unsigned __int16)(v6 ^ word_404000[loop_count]);// Memory xored to either 																		0xDAA5 or 0xBADD

As we can see, the function iterates all the ordinals in the export table, and compares it to a xored value calculated earlier. If it matches, on a given loop count “i” the first letter to the corresponding function is compared to our userInput[i].

Calculating and finding the flag using xored values

First we need to calculate ordinal values we’ll be looking for. Those are calculated xoring word_404000 with either 0xDAA5 or 0xBADD depending on the loop count (pair or odd).

# values extracted from word_404000 in IDA
s = [0xb928, 0xdea3, 0xbcca, 0xdc82, 0xbe50, 0xdbe7, 0xbf1a, 0xdbc0, 0xbe0f, 0xdf80, 0xbbbf, 0xdc8c, 0xbf12, 0xd94f, 0xbeea]
for i in range(len(s)):
    if i & 1 == 0:
        print(str(s[i]^0xBADD))
    else:
        print(str(s[i]^0xDAA5))
nofix@bash:~/test$ python xoredValues.py
1013
1030
1559
1575
1165
322
1479
357
1234
1317
354
1577
1487
1002
1079

Now we will load our kernel32.dll (from C:\Windows\System32, not C:\Windows\SysWoW64 as we’re reverse a 32 bits executable) into IDA to get the export table view (View->Subviews->Exports).

This is where “Require Windows 1803+” from the challenge description is important :

Depending on your windows version, kernel32.dll has different export table with different function names and more or less ordinals.

1566297132065

Now we just need to get the first letter of each function’s name of our given ordinals to recompose the flag, which gives : NOWWREUERSEWVMP.

Knowing the flag format we can assume we need to find where to put underscores in that. We also assume that we don’t have the proper exact kernel32.dll required for this challenge. NOWWREUERSEWVMP looks a lot like REVERSE ME VMP. Just putting underscores between words validate the flag.

flag{NOW_REVERSE_VMP}

Bonus : patching the executable to reverse it dynamically

I wish I had find this earlier. ffs

It is safe to assume that if the binary fails at execution it’s because it has a broken header.

We’ll analyze the header using a python tool named peanalysis. Maybe not the best tool as it sometimes prints output in decimal instead of hexadecimal, which made me miss multiple times what was wrong in the header.

After a long analysis of the result, I finally noticed something strange in the optional header :

found PE optional header (size: 224)
	[...cut for brevity...]
	 SizeOfStackCommit: 4096
	 SizeOfHeapReserve: 2147483647   <-- that looks strange
	 SizeOfHeapCommit: 4096
	 LoaderFlags: 0 (0x0)
	[...cut for brevity...]

According to Microsoft’s documentation about PE file format, SizeOfStackReserve is used to reserve a certain amount of stack at execution. The value 2147483647 is 0x7FFFFFFF in hex. That’s too much to allocate on a stack, that is probably what makes the program crash at execution. We’ll edit this header and put it to 0x00100000.

Execute it again:

Untitled

Damn it worked ! Winning dance

Untitled

Bonus : usage of Fireeye’s Flare-Floss tool to decrypt strings automatically

Flare-Floss is a binary from FireEye which can look for xored strings inside a program and try to decrypt them. We can try to decrypt some of the strings xored in this binary :

nofix@bash:~/Downloads$ ./floss ./BetterDoorThanAnNT.exe --no-static-strings  

FLOSS decoded 1 strings
kernel32.dll

FLOSS extracted 0 stackstrings

Finished execution after 0.819930 seconds

We only decoded one string among the 3-4ish ones. It is still a good tool to try when we are facing an obfuscated binary.

The tool did not successfuly decrypt the strings because strings aren’t encoded using a single byte or a given string, they use the current char + a value to xor (eg : decoded = stringToDecode[i] ^ 0xef).

Annex

Defeating anti-disassembly

Disassemble a compiled binary isn’t a perfect science. Actually many reversing tools that disassemble binaries can get confused quite easily by this obscure science.


No one :

IDA :

science


There are two well known algorithms to disassemble a binary, linear and flow oriented.

Both have their pros and cons, but this wont be discussed here.

Linear disassembly

Linear disassembly iterates over the code, disassembling one instruction at a time, blindly.

This can be easily fooled putting some data inside the code.

For example :

_start:
	xor rax, rax
	jmp _exit
	db 'Fool'		; Some data inside the _start procedure
_exit:
	mov eax, 1
	int 0x80

objdump is a tool that uses linear disassembly. Here is the output of objdump -D fool -M intel :

0000000000401000 <_start>:
  401000:	48 31 c0             	xor    rax,rax
  401003:	eb 04                	jmp    401009 <_exit>
  401005:	46 6f                	rex.RX outs dx,DWORD PTR ds:[rsi]
  401007:	6f                   	outs   dx,DWORD PTR ds:[rsi]
  401008:	6c                   	ins    BYTE PTR es:[rdi],dx

0000000000401009 <_exit>:
  401009:	b8 01 00 00 00       	mov    eax,0x1
  40100e:	cd 80                	int    0x80

As we can see it tried to interpret the string “Fooled” (= 0x466f6f6c, you can see the opcodes).

Flow oriented disassembly

The second well known algorithm is flow oriented disassembly. This algorithm will disassemble instructions, and note down other locations to disassemble, looking for jmp and call instructions. The algorithm then disassembly every location it noted down, noting other location to disassemble again, etc.

If the jmp is conditional, the algorithm jumps to the FALSE condition of this jump first. This is an important mechanism used to fool this algorithm.

IDA uses flow oriented disassembly. When there is a conflict, the graph view might show you the two possibilities, but text view will only show you the trusted one, the one with the FALSE condition.

Technique used in the binary

  • First anti-disassembly :
.text:0040134E                 jz      short near ptr loc_401350+2
.text:00401350
.text:00401350 loc_401350:                             ; CODE XREF: .text:0040134E↑j
.text:00401350                 jmp     dword ptr ds:24448B50h
.text:00401356 ; -------------------------------------------------------------------------
.text:00401356                 or      [ebx], ch
.text:00401358                 inc     esp
.text:00401359                 and     al, 4
.text:0040135B                 mov     [esp+8], eax
.text:0040135F                 pop     eax
.text:00401360                 add     esp, 4
.text:00401363                 jmp     short locret_40133C

The conditional jmp at 0040134E isn’t one. The program just started, so this jz is equivalent to jmp here. But IDA takes the FALSE condition, and thus continues disassembling the program just after the jz, which leads to junk instructions from 00401350 to 00401359. To assist IDA, we just need to undefine instructions at 00401350 (U) and patch the jz (opcode : 0x74) to jmp (opcode : 0xEB). Then put the cursor on the loc_401352 that appeared and press (C) to disassemble.

  • Second anti-disassembly :
.text:00401330                 push    11512478h		; NOTICE THAT 11512478h-11111111h
.text:00401335                 push    11111111h		; EQUALS 0x00401367
.text:0040133A                 jmp     short loc_40133D
.text:0040133C ; ------------------------------------------------------------------------
.text:0040133C
.text:0040133C locret_40133C:                          
.text:0040133C                 retn						; pop stack value and ret to it
.text:0040133D ; ------------------------------------------------------------------------
.text:0040133D
.text:0040133D loc_40133D:                             
						       ;[...] brevity
.text:0040134E                 jz      short loc_401352
.text:0040134E ; ------------------------------------------------------------------------
.text:00401350                 db 0FFh ; ÿ
.text:00401351                 db  25h ; %
.text:00401352 ; ------------------------------------------------------------------------
.text:00401352
.text:00401352 loc_401352:   ;this part is substracting 0x11512478 and 0x11111111 on stack
.text:00401352                 push    eax
.text:00401353                 mov     eax, [esp+8]
.text:00401357                 sub     eax, [esp+4]
.text:0040135B                 mov     [esp+8], eax
.text:0040135F                 pop     eax
.text:00401360                 add     esp, 4
.text:00401363                 jmp     short locret_40133C
.text:00401365 ; ------------------------------------------------------------------------
.text:00401365                 jmp     dword ptr ds:0A5BADD68h
.text:00401365 ; ------------------------------------------------------------------------
.text:0040136B                 db 0DAh
.text:0040136C                 dd 0FFFF7FE9h, 0CCCCCCFFh, 3 dup(0CCCCCCCCh)
.text:00401380 ; ------------------------------------------------------------------------

Here the jmp at 00401363 jumps to a ret instruction, which pop a value from the stack and sets instruction pointer to it. As the first value on the stack is the previously calculated 0x00401367, the program should naturally jump to this location. However, flow oriented disassembly does not calculate that, and assume that the ret returns right after the calling jmp instruction at 000401363, leading to junk disassembly (seems fair as this would be the common behavior of a ret in a program to ret just after the calling instruction).

Once again, undefine the instructions at 0x00401365 and assemble at 0x00401367 to correct it.

Other anti-disassembly techniques

There are many other ways to trick flow oriented disassembly.

See:

The export table

Here is a picture describing it, taken from here.

1566305839271

The export table has a list of Ordinals. Each ordinal has an unique id and points to a single unique function name. Each function name points to the real address of the function.


Nofix -