2010年5月18日 星期二

轉載Who Call Me?

Who Call Me?

徐千洋
timhsu@info.sayya.org
2004/03/30

 


誰叫我? 有用過 gdb 對 core 檔作 bt (backtrace) 嗎? 所謂 backtrace 是用來回溯檢查函式呼叫的關係, 以便了解是由那一個函式呼叫
出問題的函式. 尢其是在許多錯綜複雜的龐大程式碼中, backtrace 是相當有用的 debug 技巧. 而這個題目則是用來討論如何在程式執行中
作 backtrace.

在實作這個技術前有兩個關鍵點要先解決:
1. 如何取得此 function 返回位址.
2. 如何依據返回位址查知函式名稱.



關於第一點, 必須先了解堆疊(Stack) 和函式呼叫的處理關係. 堆疊是一個後進先出(Last-In-First-Out)的資料結構. 當呼叫某個函式時, 相關的暫存器(Register) 就會被存入堆疊. 而當函式返回時便會從堆疊裡取回返回位址以便回到原來呼叫的下一個指令繼續執行. 至於暫存器(Register), 其中 EIP 是 Instruction Pointer, 用來指出 CPU 將要執行指令的位址. ESP 暫存器則是用來指向目前堆壘的位址.

我們先寫個小程式來觀察可行性.

----------- test.c -----------
void test()
{

}

int main()
{

test();
}
------------------------------

[tim@localhost whocallme]$ gcc -o test test.c
[tim@localhost whocallme]$ gdb ./test
GNU gdb 5.3-25mdk (Mandrake Linux)
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i586-mandrake-linux-gnu"...
(gdb) b test
Breakpoint 1 at 0x804832f
(gdb) r
Starting program: /home/tim/research/whocallme/test

Breakpoint 1, 0x0804832f in test ()
(gdb) info reg
eax 0x0 0
ecx 0x1 1
edx 0x4014fe50 1075117648
ebx 0x4014e9a0 1075112352
esp 0xbffff698 0xbffff698
ebp 0xbffff698 0xbffff698
esi 0x40013880 1073821824
edi 0xbffff6f4 -1073744140
eip 0x804832f 0x804832f

(gdb) disas test
Dump of assembler code for function test:
0x804832c : push %ebp
0x804832d : mov %esp,%ebp
0x804832f : pop %ebp
0x8048330 : ret
End of assembler dump.

ebp 暫存器值是 0xbffff698, 也就是原來的堆疊位址. 我們知道在呼叫函式時(call) CPU 會將返回位址存入堆疊, 因此可以從 ebp 暫存器的位址裡面找到我們需要的返回位址:

(gdb) p/x *0xbffff698
$1 = 0xbffff6a8

別忘了, 一進入此函式時 push $ebp 已被執行, 因此堆疊位址已被減 4, 所以要取得正確的值還得把 4 加回去才行:

(gdb) p/x *(0xbffff698+4)
$2 = 0x8048346

這個值應該就是 test() 正確的返回位址, 來檢查看看:

(gdb) disas main
Dump of assembler code for function main:
0x8048331
: push %ebp
0x8048332 : mov %esp,%ebp
0x8048334 : sub $0x8,%esp
0x8048337 : and $0xfffffff0,%esp
0x804833a : mov $0x0,%eax
0x804833f : sub %eax,%esp
0x8048341 : call 0x804832c
0x8048346 : leave
0x8048347 : ret
0x8048348 : nop
0x8048349 : nop
0x804834a : nop
0x804834b : nop
0x804834c : nop
0x804834d : nop
0x804834e : nop
0x804834f : nop
End of assembler dump.

果然在 call 完後的下個指令是位於 0x8048346, 也就是 test() 返回位址.
接下來我們就用 C 和一些 assembly 配合來實作.

------------- test-1.c ------------------
void test()
{
unsigned long *stack;
asm ("movl %%ebp, %0\n"
: "=g"(stack));
printf("ret address = 0x%x\n", *(stack+1));

}

int main()
{

test();
}
-----------------------------------------

[tim@localhost whocallme]$ ./test-1
ret address = 0x8048394
[tim@localhost whocallme]$ gdb ./test-1
(gdb) disas main
Dump of assembler code for function main:
0x804837f
: push %ebp
0x8048380 : mov %esp,%ebp
0x8048382 : sub $0x8,%esp
0x8048385 : and $0xfffffff0,%esp
0x8048388 : mov $0x0,%eax
0x804838d : sub %eax,%esp
0x804838f : call 0x804835c
0x8048394 : leave
0x8048395 : ret
0x8048396 : nop

第一個關鍵點目前已解決, 再來要想想怎麼要能夠依記憶體位址查知所處的函式名稱呢?




首先使用 objdump -t 來觀察執行檔的符號表:
[tim@localhost whocallme]$ objdump -t ./test-1 | awk '{print $1" "$3" "$6}'|grep "F"
080482c4 F call_gmon_start
080482f0 F __do_global_dtors_aux
08048330 F frame_dummy
080484ac O __FRAME_END__
08048450 F __do_global_ctors_aux
080483f0 F __libc_csu_fini
08048250 F _init
0804835c F test
080482a0 F _start
080483a0 F __libc_csu_init
0804837f F main
08048474 F _fini
08049598 O _GLOBAL_OFFSET_TABLE_

既然 objdump -t 可以印出程式的函式名稱和記憶體位址, 那麼我們就試試利用 objdump 一樣的技巧吧.
objdump 是利用 BFD Library(The Binary File Descriptor Library)來實作的, 底下的 bfd.c 也利用 BFD Library 來讀取符號表.

--------------- bfd.c --------------------

#include
#include

int main(int argc, char **argv[])
{
bfd *abfd;
long storage_needed;
asymbol **symbol_table;
long number_of_symbols;
long i;
char **matching;
struct sec *section;
char *symbol_name;
long symbol_offset;
long section_vma;
long symbol_address;

if (argc < 2)
return 0;
printf("Open %s\n", argv[1]);
bfd_init();
abfd = bfd_openr(argv[1],NULL);
if (abfd == (bfd *)0)
{
bfd_perror("bfd_openr");
return -1;
}
if (!bfd_check_format_matches(abfd, bfd_object, &matching))
{
return -1;
}
if (!(bfd_get_file_flags (abfd) & HAS_SYMS))
{
printf("ERROR flag!\n");
return -1;
}
/* 取得符號表大小 */
storage_needed = bfd_get_symtab_upper_bound(abfd);
if (storage_needed < 0)
return -1;
symbol_table = (asymbol **) xmalloc(storage_needed);
/* 將符號表讀進所配置的記憶體裡(symbol_table), 並傳回符號表個數 */
number_of_symbols = bfd_canonicalize_symtab(abfd, symbol_table);
if (number_of_symbols < 0)
return -1;
for(i = 0; i < number_of_symbols; i++)
{
/* 檢查此符號是否為函式 */
if (symbol_table[i]->flags & (BSF_FUNCTION|BSF_GLOBAL))
{
/* 反查此函式所處的區段(section) 及區段位址(section_vma)*/
section = symbol_table[i]->section;
section_vma = section->vma;
/* 取得此函式的名稱(symbol_name), 偏移位址(symbol_offset) */
symbol_name = symbol_table[i]->name;
symbol_offset = symbol_table[i]->value;
/* 將此函式的偏移位址加上區段位址則為此函式在執行時的記憶體位址(symbol_address */
symbol_address = section_vma + symbol_offset;
/* 檢查此函式是否處在程式本文區段 */
if (section->flags & SEC_CODE)

printf("<%s> 0x%x 0x%x 0x%x\n",
symbol_name,
section_vma,
symbol_offset,
symbol_address);

}
}
bfd_close(abfd);
}

-----------------------------------------
執行結果:
[tim@localhost whocallme]$ ./bfd ./test-1
Open ./test-1
0x80482a0 0x24 0x80482c4
<__do_global_dtors_aux> 0x80482a0 0x50 0x80482f0
0x80482a0 0x90 0x8048330
<__do_global_ctors_aux> 0x80482a0 0x1b0 0x8048450
<__libc_csu_fini> 0x80482a0 0x150 0x80483f0
<_init> 0x8048250 0x0 0x8048250
0x80482a0 0xbc 0x804835c
<_start> 0x80482a0 0x0 0x80482a0
<__libc_csu_init> 0x80482a0 0x100 0x80483a0
0x80482a0 0xdf 0x804837f
<_fini> 0x8048474 0x0 0x8048474

現在, 我們可以依照函式名稱及記憶體位址作對照表, 即可方便查詢. 不過這其中還有個小問題, 那就是我們雖然知道各函式的起始位址, 但是並不知道函式的結束位址, 也不知道各函式的大小. 要解決這個小問題就必須在作對照表時先作排序, 將位址越高的函式排在越後面,
把下一個函式的起始位址當作結束位址.

--------------- bfd_dumpfun.c --------------------
/* bfd_dumpfun.c (GPL)
* gcc -o bfd_dumpfun bfd_dumpfun.c -lbfd
*
* Usage: ./bfd_dumpfun [binary]
* Note: Dump functions infomation of ELF-binary with BFD Library.
*
* by TimHsu(timhsu@info.sayya.org) 2004/03/31
*
*/

#include
#include

typedef struct function_table FUN_TABLE;

/* 宣告一個包含函式名稱和位址的結構 */
struct function_table
{
char name[80];
unsigned long addr;
};

static FUN_TABLE *fun_table;
static int table_count = 0; /* 函式個數 */

static int compare_function(const void *a, const void *b)
{
FUN_TABLE *aa = (FUN_TABLE *) a;
FUN_TABLE *bb = (FUN_TABLE *) b;
if (aa->addr > bb->addr)
return 1;
else if (aa->addr <>addr)
return -1;
else
return 0;
}

/* 增加一個函式資料至對照表 */
static void add_function_table(char *name, unsigned long address)
{

strncpy(fun_table[table_count].name, name, 80);
fun_table[table_count].addr = address;
table_count++;
}

static void dump_function_table(void)
{
int i;
for(i = 0; i < table_count; i++)
{
printf("%-30s 0x%x\n", fun_table[i].name,
fun_table[i].addr);

}
}

int main(int argc, char **argv[])
{
bfd *abfd;
asection *text;
long storage_needed;
asymbol **symbol_table;
long number_of_symbols;
long i;
char **matching;
struct sec *section;
char *symbol_name;
long symbol_offset;
long section_vma;
long symbol_address;

if (argc < 2)
return 0;
printf("Open %s\n", argv[1]);
bfd_init();
abfd = bfd_openr(argv[1],NULL);
if (abfd == (bfd *)0)
{
bfd_perror("bfd_openr");
return -1;
}

if (!bfd_check_format_matches(abfd, bfd_object, &matching))
{
return -1;
}
if (!(bfd_get_file_flags (abfd) & HAS_SYMS))
{
printf("ERROR flag!\n");
return -1;
}
storage_needed = bfd_get_symtab_upper_bound(abfd);

if (storage_needed < 0)
return -1;
symbol_table = (asymbol **) xmalloc(storage_needed);
number_of_symbols = bfd_canonicalize_symtab(abfd, symbol_table);
if (number_of_symbols < 0)
return -1;
fun_table = (FUN_TABLE **)malloc(sizeof(FUN_TABLE)*number_of_symbols);
bzero(fun_table, sizeof(FUN_TABLE)*number_of_symbols);

for(i = 0; i < number_of_symbols; i++)
{
if (symbol_table[i]->flags & (BSF_FUNCTION|BSF_GLOBAL))
{
section = symbol_table[i]->section;
section_vma = section->vma;

symbol_name = symbol_table[i]->name;
symbol_offset = symbol_table[i]->value;
symbol_address = section_vma + symbol_offset;
if (section->flags & SEC_CODE)
{
add_function_table(symbol_name,
}
}
}
bfd_close(abfd);
/* 將函式對照表作排序 */
qsort(fun_table, table_count, sizeof(FUN_TABLE), compare_function);
dump_function_table();
}
-----------------------------------------
執行結果:
[tim@localhost whocallme]$ ./bfd_dumpfun ./test-1
Open ./test-1
_init 0x8048250
_start 0x80482a0
call_gmon_start 0x80482c4
__do_global_dtors_aux 0x80482f0
frame_dummy 0x8048330
test 0x804835c
main 0x804837f
__libc_csu_init 0x80483a0
__libc_csu_fini 0x80483f0
__do_global_ctors_aux 0x8048450
_fini 0x8048474


現在, 我們已經把技術的關鍵點都處理好了, 為了讓它能夠實用化, 最好是作成函式庫, 在有需要時能夠隨時呼叫.

------------- whocallme.h ------------------------
#include

#define FUNCTION_NAME_MAXLEN 80

#define who_call_me() \
do { \
unsigned long *stack; \
asm ("movl %%ebp, %0\n" \
: "=g"(stack)); \
fprintf(stderr, ": function <%s> call me <%s>!\n", \
find_function_by_addr(*(stack+1)), who_am_i()); \
} while(0)


extern int init_function_table(char *);
----------------------------------------------------
--------------- whocallme.c ------------------------
/* whocallme.c (GPL)
*
* A runtime backtrace of function.
* http://info.sayya.org/~timhsu/research/whocallme
*
*
* by Timhsu(timhsu@sayya.org) 2004/03/31
*
*/

#include
#include
#include "whocallme.h"

typedef struct function_table FUN_TABLE;
/* 宣告一個包含函式名稱和位址的結構 */
struct function_table
{
char name[FUNCTION_NAME_MAXLEN];
unsigned long addr;
};
static int compare_function(const void *a, const void *b)
{
FUN_TABLE *aa = (FUN_TABLE *) a;
FUN_TABLE *bb = (FUN_TABLE *) b;
if (aa->addr > bb->addr)
return 1;
else if (aa->addr <>addr)
return -1;
else
return 0;
}
/* 增加一個函式資料至對照表 */
static void add_function_table(char *name, unsigned long address)
{

strncpy(fun_table[table_count].name, name, FUNCTION_NAME_MAXLEN);
fun_table[table_count].addr = address;
table_count++;
}
/* 顯示函式對照表的全部資料 */
static void dump_function_table(void)
{
int i;
for(i = 0; i < table_count; i++)
{
fprintf(stderr, "%-30s 0x%x\n",
fun_table[i].name,
fun_table[i].addr);

}
}
/* 取得目前正在執行的函式名稱 */
char * who_am_i(void)
{
unsigned long *stack; \
asm ("movl %%ebp, %0\n" \
: "=g"(stack));
return find_function_by_addr(*(stack+1));
}
/* 依照位址取得函式名稱 */
char *find_function_by_addr(unsigned long addr)
{
int i;
for(i = 0; i < table_count; i++)
{
if (addr > fun_table[i].addr)
{
if (addr < fun_table[i+1].addr)
return fun_table[i].name;
}
}

}
/* 初始化函式對照表 */
int init_function_table(char *file)
{
bfd *abfd;
long storage_needed;
asymbol **symbol_table;
long number_of_symbols;
long i;
char **matching;
struct sec *section;
char *symbol_name;
long symbol_offset;
long section_vma;
long symbol_address;

bfd_init();
abfd = bfd_openr(file,NULL);
if (abfd == (bfd *)0)
{
bfd_perror("bfd_openr");
return -1;
}
if (!bfd_check_format_matches(abfd, bfd_object, &matching))
{
return -1;
}
if (!(bfd_get_file_flags (abfd) & HAS_SYMS))
{
printf("ERROR flag!\n");
return -1;
}
/* 取得符號表大小 */
storage_needed = bfd_get_symtab_upper_bound(abfd);
if (storage_needed < 0)
return -1;
symbol_table = (asymbol **) xmalloc(storage_needed);
/* 將符號表讀進所配置的記憶體裡(symbol_table), 並傳回符號表個數 */
number_of_symbols = bfd_canonicalize_symtab(abfd, symbol_table);
if (number_of_symbols < 0)
return -1;
/* 配置空間給函式對照表 */
fun_table = (FUN_TABLE **)malloc(sizeof(FUN_TABLE)*number_of_symbols);
bzero(fun_table, sizeof(FUN_TABLE)*number_of_symbols);

for(i = 0; i < number_of_symbols; i++)
{
/* 檢查此符號是否為函式 */
if (symbol_table[i]->flags & (BSF_FUNCTION|BSF_GLOBAL))
{
/* 反查此函式所處的區段(section) 及區段位址(section_vma)*/
section = symbol_table[i]->section;
section_vma = section->vma;
/* 取得此函式的名稱(symbol_name), 偏移位址(symbol_offset) */

symbol_name = symbol_table[i]->name;
symbol_offset = symbol_table[i]->value;
/* 將此函式的偏移位址加上區段位址則為此函式在執行時的記>憶體位址(symbol_address */
symbol_address = section_vma + symbol_offset;
/* 檢查此函式是否處在程式本文區段 */
if (section->flags & SEC_CODE)
{
/* 將此函式名稱和位址加入至對照表 */
add_function_table(symbol_name,
symbol_address);
}
}
}
free(symbol_table);
bfd_close(abfd);
/* 將函式對照表作排序 */
qsort(fun_table, table_count, sizeof(FUN_TABLE), compare_function);
}

----------------------------------------------
[tim@localhost whocallme]$ gcc -c whocallme.c
[tim@localhost whocallme]$ ar -q libwhocallme.a whocallme.o

來寫個小小的測試程式試試看:

---------------- test-2.c --------------------------

#include "whocallme.h"

void test()
{
who_call_me();
}
void test_a()
{
test_b();
test_c();
}
void test_b()
{
test();
}
void test_c()
{
who_call_me();
}
int main(int argc, char *argv[])
{
init_function_table(argv[0]);
test();
test_a();
test_b();
test_c();
}
----------------------------------------------
執行結果:
[tim@localhost whocallme]$ gcc -o test-2 test-2.c -lbfd -L. -lwhocallme
[tim@localhost whocallme]$ ./test-2
: function
call me !
: function call me !
: function call me !
: function call me !
: function
call me !

沒有留言: