"Learning Notes on IDA Reverse Engineering from Scratch - 14 (Introduction to Program Unpacking)"

What is packing?

This chapter demonstrates the unpacking of the UPX packed program.

Packing refers to the technique of hiding the executable code of a program through compression or encryption, in order to prevent easy reverse engineering. Packing involves adding additional sections (STUB) to the program. After the program starts running, the encrypted file is decrypted and saved to another section in memory, or the original sections of the program are created, and then the execution jumps to the decrypted code.

Most packers protect the file by tampering with the Import Address Table (IAT) and the file header (HEADER). They also add anti-debugging code to prevent the original file from being unpacked.

Use die to check if the program is packed.

The image shows that it is packed with UPX version 3.91, the program is 32-bit, and the architecture is i386.

Loading the packed file

When loading the packed file, uncheck "Create Input Section" and check "Manual Load".

After clicking OK, a window will pop up, click OK.

The image shows the entry point of the packed program.

The original program entry point.

The entry point of the packed program is at address 0x409BE0, while the address of the original file is 0x401000.

File and memory usage

Comparing the sections of these two files, there is an additional section called upx0 below the file header of the packed file, which occupies more memory than the other sections in the original file.

The original file.

The packed file.

The upx0 section of the packed file ends at 0x409000, while the sections below the file header of the original file range from 0x401000 to 0x408200. When a program is executed, it may only occupy 1k on the hard disk, but it may occupy 20k or more in memory.

As shown in the image, the starting address of the CODE section in the original file is 0x401000, the size of the section in the file is 0x600 bytes, and the virtual size in memory is 0x1000 bytes.

Going back to the packed file, as shown in the image, the starting point of the upx0 section is 0x401000, the size of the section in the file is 0, but it occupies 0x8000 bytes in memory. The program occupies enough space here to save the original program code and then jumps to execute it.

The jump at 0x401000 in the packed file.

The dword_ before 0x401000 represents the data type DWORD, "?" indicates that it only occupies a memory location without saving any content, and dup indicates 0xc00 dwords, which is 0x3000 bytes. 0x404000 also occupies 0x1400 bytes.

So a total of 0x8000 bytes are used to store the content of the original code.

At 0x401000 in the original file, press x to see that there are two references here (we will come back to this part later).

The references of the executable code.

The upx1 section occupies 0xe00 in the file and 0x1000 in memory.

The file and memory usage of the upx1 section.

The program may have used some simple encryption to hide the original code. There are several references to the starting point of the upx1 section at 0x409000.

The reference at 0x409000.

One of the references comes from the executable part below, click to jump to that location.

The program entry point.

Stub and OEP

In the stub after the program entry point shown in the image, the ESI register is passed the address 0x409000. As shown in the image, the executable code is below the packed code of the original file, and they both belong to the upx1 section. Therefore, in the upx1 section, there are the encrypted content of the original file and the stub code after 0x409be0.

Tracing the executable code.

In the upx0 section, there is a reference.

The reference at 0x401000.

In the image, there is an unconditional jump to 0x401000, which is the reference at 0x401000 in the previous image.

"jmp near" is an instruction that directly jumps to the specified number of addresses. After the stub completes the decryption operation and generates the original code, the program will jump to 0x401000 (OEP, original entry point), which is the original entry point of the program (where the program starts executing), and the corresponding stub entry point is 0x409be0.

The original entry point is also called "OEP". If it is a packed program, it is not known where it is specifically located. In this program, there is an OEP, which is 0x401000.

Finding OEP

In most cases, it is not possible to obtain the original program, so it is not possible to directly obtain the OEP address of the packed program. Next, we will introduce how to find the OEP.

When the STUB completes the decryption operation and generates the original code, it will jump to execute the program. Generally, the first instruction executed in the section is the OEP.

Set a breakpoint before entering the OEP to check if the original program has been generated before reaching this point.

Set a breakpoint at the jump location.

Select "Local Windows Debugger" for debugging, and then start debugging and run to the breakpoint.

Run to the breakpoint.

When the program runs to the breakpoint and jumps, press F8 to step through.

A warning message pops up stating that the upx0 section was originally interpreted as data, click Yes to interpret it as code.

Run to the OEP.

Now the program has decrypted the original code and jumped to execute it. The code here is very similar to the code at 0x401000 in the original program. However, since it is not defined as a function (loc_401000), it cannot be switched to the graph view. But this can be automatically achieved.

In the bottom left corner of the IDA interface, there is a hidden menu. Right-click and select "Reanalyze Program". Now go back to 0x401000 and display sub_401000, which indicates that it is a function. Press the space bar to switch back to the graph view.

After reanalysis, the code at 0x401000 is recognized as a function.

So far, two methods for finding OEP have been introduced, creating a memory snapshot of the decryption code. The next step is to DUMP and rebuild the IAT to obtain an unpacked and runnable program.