Let's defuse the bomb together!
It has been two semesters since the last Data Lab, and I finally remembered that I have time to do the CSAPP lab. This lab is called Bomb Lab, and the story background is that Dr. Evil designed a bomb that can only be defused by correctly entering six strings. We need to reverse engineer and disassemble the bomb's executable file to figure out the six strings that can defuse the bomb.
Lab Overview#
Lab materials can be found on the CMU website: https://csapp.cs.cmu.edu/3e/labs.html
There are a total of three files in the compressed package:
.
├── README
├── bomb // Bomb executable file
└── bomb.c // Source code for the bomb
Let's first take a look at the bomb.c
file.
From this, we can see the general working process of the bomb:
- Read a line of string
- Use the
phase_i
function to determine if the input is valid
So, by studying the verification process of each phase
function, we can deduce the string content.
Use objdump -d bomb > bomb.s
to obtain the disassembled file of the bomb.
Use tmux
on the left side for GDB debugging and view the assembly code on the right side.
Next, let's start defusing the bomb!
Phase_1#
The assembly code for phase_1 is as follows:
string_not_equal
takes two string pointer inputs, edi and esi, compares the two strings, and if they are equal, eax outputs 0.
So the purpose of phase_1
is to compare the input string with the string at 0x402400. Then it uses test
to check if the output is 0; if it is zero, it passes; otherwise, it calls explode_bomb
.
Thus, our answer is the string at 0x404200. Let's check it using GDB.
For information on using GDB, you can refer to this CMU manual: https://csapp.cs.cmu.edu/2e/docs/gdbnotes-x86-64.pdf
Tips: Dr. Evil allows you to use file input:
./bomb <file>
, to avoid the tedious repetitive input every time you try to defuse the bomb.
Let's try to defuse the first bomb. First, write the string we just obtained in answer.txt
.
The first string has been successfully entered!
Phase_2#
The assembly code for phase_2 is as follows:
This function is significantly longer than the previous one and also calls read_six_numbers
, with many branching jump instructions.
400efe: 48 83 ec 28 sub $0x28,%rsp
400f02: 48 89 e6 mov %rsp,%rsi
400f05: e8 52 05 00 00 call 40145c <read_six_numbers>
It allocates 0x28 space on the stack and stores six numbers in the stack.
400f0a: 83 3c 24 01 cmpl $0x1,(%rsp)
400f0e: 74 20 je 400f30 <phase_2+0x34>
400f10: e8 25 05 00 00 call 40143a <explode_bomb>
If the first number is not 1, the bomb explodes.
400f17: 8b 43 fc mov -0x4(%rbx),%eax
400f1a: 01 c0 add %eax,%eax
400f1c: 39 03 cmp %eax,(%rbx)
400f1e: 74 05 je 400f25 <phase_2+0x29>
400f20: e8 15 05 00 00 call 40143a <explode_bomb>
If the next number is not double the current number, the bomb explodes.
400f25: 48 83 c3 04 add $0x4,%rbx
400f29: 48 39 eb cmp %rbp,%rbx
400f2c: 75 e9 jne 400f17 <phase_2+0x1b>
400f2e: eb 0c jmp 400f3c <phase_2+0x40>
rbx
points to the position after the last number. If it has reached this point, it means all six numbers have been processed, and the judgment ends.
From the above analysis, the second string needs to meet the following requirements:
- Composed of 6 numbers
- The first number is 1
- Each number from the second one onwards is double the previous one
So the string is 1 2 4 8 16 32
.
Phase_3#
The assembly code for phase_3 is as follows:
We can see that there are many regular jump statements, and there is an sscanf
at the beginning. The 0x4025cf that appears earlier should store the format being read. Let's print it out using GDB.
We can see that the string we input should be two numbers.
400f6a: 83 7c 24 08 07 cmpl $0x7,0x8(%rsp)
400f6f: 77 3c ja 400fad <phase_3+0x6a>
0x8(%rsp)
is the first number. If it is greater than 7, it jumps to bomb explosion.
400f71: 8b 44 24 08 mov 0x8(%rsp),%eax
400f75: ff 24 c5 70 24 40 00 jmp *0x402470(,%rax,8)
400f7c: b8 cf 00 00 00 mov $0xcf,%eax
400f81: eb 3b jmp 400fbe <phase_3+0x7b>
400f83: b8 c3 02 00 00 mov $0x2c3,%eax
400f88: eb 34 jmp 400fbe <phase_3+0x7b>
400f8a: b8 00 01 00 00 mov $0x100,%eax
400f8f: eb 2d jmp 400fbe <phase_3+0x7b>
400f91: b8 85 01 00 00 mov $0x185,%eax
400f96: eb 26 jmp 400fbe <phase_3+0x7b>
400f98: b8 ce 00 00 00 mov $0xce,%eax
400f9d: eb 1f jmp 400fbe <phase_3+0x7b>
400f9f: b8 aa 02 00 00 mov $0x2aa,%eax
400fa4: eb 18 jmp 400fbe <phase_3+0x7b>
400fa6: b8 47 01 00 00 mov $0x147,%eax
400fab: eb 11 jmp 400fbe <phase_3+0x7b>
Based on the first number, it selects the second number, similar to a switch
statement. Assuming the first number is 0, it jumps to *0x02470. Let's check what is stored there using GDB.
It turns out to jump to 0x400f7c, where %eax=0xcf, which is 207.
400fbe: 3b 44 24 0c cmp 0xc(%rsp),%eax
400fc2: 74 05 je 400fc9 <phase_3+0x86>
400fc4: e8 71 04 00 00 call 40143a <explode_bomb>
It checks whether the second number is equal to %eax. If not, the bomb explodes.
So one form of this string is 0 207
. Depending on the first number, the second number can have other different values.
Phase_4#
The assembly code for phase_4 is as follows:
We can see that besides calling sscanf
, it also calls the function func4
.
As usual, let's first take a look at the format received by sscanf
.
We can see that there are still two numbers.
40102e: 83 7c 24 08 0e cmpl $0xe,0x8(%rsp)
401033: 76 05 jbe 40103a <phase_4+0x2e>
401035: e8 00 04 00 00 call 40143a <explode_bomb>
And the first number must be less than or equal to 14 (0xe).
40103a: ba 0e 00 00 00 mov $0xe,%edx
40103f: be 00 00 00 00 mov $0x0,%esi
401044: 8b 7c 24 08 mov 0x8(%rsp),%edi
401048: e8 81 ff ff ff call 400fce <func4>
Pass 14, 0, and the first number as parameters to func4
. Next, let's analyze what happens in func4
.
There are two branch jumps and two recursive calls. We suspect it is a program similar to binary search. Let's try to write it in C format.
int func4(int i, int j, int target) {
int mid = ((i - j + ((i - j) >> 31)) >> 1) + j;
if (mid == target) {
return 0;
} else if (mid < target) {
return 2 * func4(i, mid + 1, target) + 1;
} else {
return 2 * func4(mid - 1, j, target);
}
}
To make the output 0, we just need to keep searching to the left without ever finding mid
on the right side. The input is i=14
, j=0
, so the possible values for the first number are 7, 3, 1, 0
.
401051: 83 7c 24 0c 00 cmpl $0x0,0xc(%rsp)
401056: 74 05 je 40105d <phase_4+0x51>
401058: e8 dd 03 00 00 call 40143a <explode_bomb>
Finally, it checks whether the second number is zero. So this string can take the form 3 0
.
Phase_5#
The assembly code for phase_5 is as follows:
We see two string constants 0x4024b0
and 0x40245e
. Let's check what they are using GDB.
"maduiersnfotvbylSo you think you can stop the bomb with ctrl-c, do you"
"flyers"
I speculate that the first string is a dictionary since there is an offset 0x4024b0(%rdx)
when used; the second string is the target string, which is compared with the user input string after being translated through the dictionary using <string_not_equal>
.
Based on this assumption, let's carefully observe the translation process through the dictionary:
40108b: 0f b6 0c 03 movzbl (%rbx,%rax,1),%ecx
40108f: 88 0c 24 mov %cl,(%rsp)
401092: 48 8b 14 24 mov (%rsp),%rdx
401096: 83 e2 0f and $0xf,%edx
401099: 0f b6 92 b0 24 40 00 movzbl 0x4024b0(%rdx),%edx
Here, %cl
is the low 8 bits of the %ecx
register, and taking it with 0xf
means taking the low four bits, so when translating, we take the low four bits of the character byte as the offset!
Thus, "flyers" corresponds to the dictionary offsets "9 15 14 5 6 7". We can write a Python script to generate some valid strings.
spell = [9, 15, 14, 5, 6, 7]
for offset in range(8):
result = "".join(chr(num + 16 * offset) for num in spell)
print(f"Offset {offset * 16} : {result}")
The output is:
So the fifth string can be IONEFG. It seems the guess was correct, and we arrived at the answer without carefully reading the assembly code line by line.
Phase_6#
This function is significantly longer than the previous ones. Here, I will present the most important part of the code.
I will briefly explain what this code does without going into detailed analysis. Interested readers can download the source code to check it out.
In memory, there is a linked list.
First, read in 6 numbers, each ranging from 1 to 6.
Then take the complement of these 6 numbers with respect to 7, i.e., i=7-i
.
Using the number i
as an index, find the i
-th node of the linked list and push the pointer to this node onto the stack.
By rewriting each node's nextnode
pointer according to the order in the stack, the linked list nodes' order is rearranged to match the stack order.
Finally, check whether the data
in the linked list nodes is in descending order.
To make the numbers in the nodes sorted in descending order, the order should be 3 4 5 6 1 2
. After taking the complement with respect to 7, it becomes 4 3 2 1 6 5
, which is the string we need.
Bomb Defused Successfully#
Thus, we have obtained all six strings needed to defuse the bomb and written them into answer.txt
.
Running ./bomb answer.txt
, we get:
Although this lab is at a hello world level for true reverse engineering, it familiarized me with x86-64 assembly and basic GDB commands.
Overall, this was a very interesting experiment, and I couldn't stop!