pin-in-CTF

十几个月前，在学习 fuzz 时，我接触了 intel-pin 这个动态插桩工具，当时发现对于一些 CTF reverse 题目，尤其是代码混淆比较严重的题目，可以编写 pintool 统计指令数等信息，多快好省的通过侧信道的方法逐位爆破出 flag，详见 pin-in-ctf: blog, pin-in-ctf: repo。

但这种方法缺点也很多:

动态插桩使得程序的 runtime overhead 成倍增长(~%100)
对于比较复杂的程序，尤其是系统调用较多的程序，内核态指令数会占总指令数的很大一部分，影响用户判断
pintool 编写相对复杂
只能对用户态的程序插桩，对内核态程序插桩需要较大的改动

intel-pt and perf

半年前在做毕设时，我接触了 intel-pt 这个硬件级别的 trace 工具，其同样具有记录程序指令的功能，并且由于直接在硬件层面实现，runtime overhead 可以缩小到惊人的 5%，同时通过用户态的系统分析工具 perf，用户也可以相当方便的调用 intel pt 的某些功能。而 perf 也就是本篇文章要介绍的主角。

前几个月的 oGeek 线上赛中，我有一道 pwn 一道 reverse 的出题任务，本来想打算用 intel-pt 的特性出一道逆向题，但因为时间紧张，构思尚不成熟，不想浪费想法，最终出了另外一道 NSIS script 的题目 King of KoF。在昨天的 ASIS Final 2019 的比赛中，遇到了一道题目 Cursed app，分析之后觉得使用 perf 解决十分合适，于是终于有了分享 perf-in-ctf 这个想法（~~并打理要长草的博客~~）的机会，接下来进入正文。

perf

perf 是 Linux 中分析性能的工具，具有很多强大的功能，详见 linux perf turtorial。本篇文章只对其统计指令数的用法进行介绍。

首先需要安装，以 ubuntu 18.04 为例，直接使用包管理安装：

sudo apt install linux-tools-common

查看功能简介：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


~ perf --version
perf version 5.0.21
~ tldr perf

  perf

  Framework for linux performance counter measurements.

  - Display basic performance counter stats for a command:
    perf stat gcc hello.c

  - Display system-wide real time performance counter profile:
    sudo perf top

  - Run a command and record its profile into "perf.data":
    sudo perf record command

  - Read "perf.data" (created by perf record) and display the profile:
    sudo perf report

这里只对 perf stat 功能进行介绍，参考 tldr 的例子，对 id 这个命令进行性能分析

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


~ sudo perf stat id
uid=0(root) gid=0(root) groups=0(root)

 Performance counter stats for 'id':

              1.71 msec task-clock                #    0.710 CPUs utilized          
                 1      context-switches          #    0.584 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               122      page-faults               #    0.071 M/sec                  
         6,244,401      cycles                    #    3.647 GHz                    
         2,353,996      instructions              #    0.38  insn per cycle         
           459,910      branches                  #  268.614 M/sec                  
            18,201      branch-misses             #    3.96% of all branches        

       0.002412965 seconds time elapsed

       0.000000000 seconds user
       0.002342000 seconds sys


~ 

其中有指令数这个结果

1
2
3


...
         2,353,996      instructions              #    0.38  insn per cycle         
...

通过格式化输出简化显示

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


~ sudo perf stat -x : id
uid=0(root) gid=0(root) groups=0(root)
1.17:msec:task-clock:1167180:100.00:0.721:CPUs utilized
0::context-switches:1167180:100.00:0.000:K/sec
0::cpu-migrations:1167180:100.00:0.000:K/sec
121::page-faults:1167180:100.00:0.104:M/sec
4243048::cycles:1208711:100.00:3.635:GHz
2388158::instructions:1208711:100.00:0.56:insn per cycle
461917::branches:1208711:100.00:395.755:M/sec
18229::branch-misses:1208711:100.00:3.95:of all branches
~ 

只保留 instructions 这一条结果

1
2


~ sudo perf stat -x : -e instructions id 1>/dev/null
2359978::instructions:1819500:100.00::

只保留用户态的结果

1
2


~ sudo perf stat -x : -e instructions:u id 1>/dev/null
716210::instructions:u:1532634:100.00::

看到这，看过 pin-in-CTF 的朋友可能已经知道怎么做了，接下来通过 ASIS Final 2019 的 Cursed app 这个题目进行举例。

perf-in-CTF

binary here

64 位 linux 程序，通过读文件验证结果

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22


Desktop check ./cursed_app.elf 
./cursed_app.elf: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/l, BuildID[sha1]=7c0031d8c259457ca7f3bab1c5a5f40269167b5f, stripped
[*] '/home/m4x/Desktop/cursed_app.elf'
    Arch:     amd64-64-little
    RELRO:    No RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      PIE enabled
Desktop ./cursed_app.elf 
please locate license file, run ./app license_key
Desktop echo "ASIS{" > flag && ltrace ./cursed_app.elf ./flag
_setjmp(0x7ffc7e160590, 0x7ffc7e160758, 0x7ffc7e160770, 0x559e6789dbb0) = 0
fopen("./flag", "r")                                               = 0x559e68a2d260
fseek(0x559e68a2d260, 0, 2, 0)                                     = 0
ftell(0x559e68a2d260, 0x559e68a2d340, 0, 6)                        = 6
fseek(0x559e68a2d260, 0, 0, 0)                                     = 0
malloc(6)                                                          = 0x559e68a2e4a0
fread(0x559e68a2e4a0, 1, 6, 0x559e68a2d260)                        = 6
fclose(0x559e68a2d260)                                             = 0
puts("Bummer! You got the wrong flag!!"...Bummer! You got the wrong flag!!
)                        = 33
+++ exited (status 0) +++

根据题目描述

1
2
3
4


Description:
If we want to say 100, should we start at 1 and count until 100?

sha256(flag) = dd791346264fc02ddcd2466b9e8b2b2bba8057d61fbbc824d25c7957c127250d

已经可以猜到没必要逆清楚程序的整个逻辑。

使用 IDA 分析程序，发现程序的逻辑很简单

读取文件内容后有多个判断，都通过才会执行到

1
2
3


.text:0000000000001F3B lea     rdi, s          ; "Congratz! You got the correct flag!!"
.text:0000000000001F42 call    _puts
.text:0000000000001F47 mov     [rsp+0F8h+var_F4]

简单调试了几个判断，发现是对文件内容的逐字节判断，因此必然每次输入的正确 flag 长度越长，程序执行的指令数就越多，可以利用这个特性用侧信道的方法得到 flag。

先使用 perf 验证可行性，已经知道 flag 格式为 ASIS{this_is_flag}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24


Desktop echo -n "0" > ./license && perf stat -x : -e instructions:u ./cursed_app.elf ./license 1>/dev/null 
167612::instructions:u:1000300:100.00::
Desktop echo -n "1" > ./license && perf stat -x : -e instructions:u ./cursed_app.elf ./license 1>/dev/null 
167614::instructions:u:1237218:100.00::
Desktop echo -n "2" > ./license && perf stat -x : -e instructions:u ./cursed_app.elf ./license 1>/dev/null 
167614::instructions:u:967846:100.00::
Desktop echo -n "A" > ./license && perf stat -x : -e instructions:u ./cursed_app.elf ./license 1>/dev/null 
167628::instructions:u:934691:100.00::
Desktop echo -n "A0" > ./license && perf stat -x : -e instructions:u ./cursed_app.elf ./license 1>/dev/null 
167631::instructions:u:1034616:100.00::
Desktop echo -n "A1" > ./license && perf stat -x : -e instructions:u ./cursed_app.elf ./license 1>/dev/null 
167629::instructions:u:1019846:100.00::
Desktop echo -n "A2" > ./license && perf stat -x : -e instructions:u ./cursed_app.elf ./license 1>/dev/null 
167630::instructions:u:504064:100.00::
Desktop echo -n "AS" > ./license && perf stat -x : -e instructions:u ./cursed_app.elf ./license 1>/dev/null 
167648::instructions:u:955516:100.00::
Desktop echo -n "AS0" > ./license && perf stat -x : -e instructions:u ./cursed_app.elf ./license 1>/dev/null 
167645::instructions:u:914409:100.00::
Desktop echo -n "AS1" > ./license && perf stat -x : -e instructions:u ./cursed_app.elf ./license 1>/dev/null 
167647::instructions:u:2748495:100.00::
Desktop echo -n "AS2" > ./license && perf stat -x : -e instructions:u ./cursed_app.elf ./license 1>/dev/null 
167646::instructions:u:677381:100.00::
Desktop echo -n "ASI" > ./license && perf stat -x : -e instructions:u ./cursed_app.elf ./license 1>/dev/null 
167664::instructions:u:564774:100.00::

可以看到，当我们猜到下一位正确的 flag 时，指令数会有较为明显的增长（这里只保留用户态的 instructions 提高稳定性）。

把上述过程封装为脚本，运行即可得到 flag

exploit here

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41


#!/usr/bin/env python
# -*- coding: utf-8 -*-

# ASIS{y0u_c4N_s33_7h15_15_34513R_7h4n_Y0u_7h1nk_r16h7?__!!!}
from subprocess import *
import string
import sys

command = "perf stat -x : -e instructions:u ./cursed_app.elf ./license 1>/dev/null"

flag = 'ASIS{'

def write_file(cont):
    with open("./license", "wb") as f:
        f.write(cont)

write_file(flag)
'''
Desktop cat license
ASIS{
Desktop perf stat -x : -e instructions:u ./cursed_app.elf ./license 1>/dev/null
167689::instructions:u:377798:100.00::
'''
ins_count = 167689
while True:
    delta = 0
	count_chr = ''
	for i in string.printable:
        write_file(flag + i)
		target = Popen(command, stdout=PIPE, stdin=PIPE, stderr=STDOUT, shell=True)
		target_output = target.communicate()
        #  import pdb;pdb.set_trace()
        #  print(target_output)
		instructions = int(target_output[0].split(':')[0])
		if ins_count + 20 > instructions > ins_count + 10:
			count_chr = i
            delta = instructions - ins_count
			ins_count = instructions
            break
	flag += count_chr
        print(delta, flag)

整个过程耗时不到 1 分钟，远远快于使用 pintool 的方法。

这里使用了统计用户态总指令数这种相对粗糙的方法，有兴趣的同学还可以试试精确的只统计某种 pt packet（tnt, tip 等）数目来达到更为精准的结果。

summary

pin-in-CTF is dead, I prefer perf.

使用 perf 来解决这类问题有以下优点：

基于硬件，runtime overhead < 5%
可以通过只保留用户态指令或者只保留某种特定的 pt 数据包来达到更精确的效果
工具较为成熟，直接使用即可
intel pt 本身就是内核模块，可以用于内核态

前文简单的展示了如何使用 perf 这个工具解决一部分 CTF reverse challenge，但也仅仅展现了 perf/intel pt 这个强大工具的冰山一角，更多的高级用法和 trick 还需要大家一起学习和脑洞。

Pin in CTF? No, Perf in CTF!

Table of Contents

pin-in-CTF

intel-pt and perf

perf

perf-in-CTF

summary

reference