How Buffer Overflow Attacks Work

Category: Web Security Readings - Last Updated: Thu, 08 Aug 2019 - by Netsparker Team

A computer program may be vulnerable to buffer overflow if it handles incoming data incorrectly. Anybody who can provide suitably crafted user input data can cause such a program to crash. Even worse, a vulnerable program may execute arbitrary code provided by an intruder and do something that the author did not intend it to do. Buffer overflow vulnerabilities are caused by programmer mistakes, which are easy to understand but not so easy to avoid or protect against.

How Buffer Overflow Attacks Work

What Causes a Buffer Overflow?

The idea of a buffer overflow vulnerability (also known as a buffer overrun) is simple. The following is the source code of a C program that has a buffer overflow vulnerability:

char greeting[5];
memcpy(greeting, "Hello, world!\n", 15);
printf(greeting);

What do you think will happen when we compile and run this vulnerable program? The answer may be surprising: anything can happen. When this code snippet is executed, it will try to put fifteen bytes into a destination buffer that is only five bytes long. This means that ten bytes will be written to memory addresses outside of the array. What happens later depends on the original content of the overwritten ten bytes of memory. Maybe important variables were stored there and we have just changed their values?

The example above is broken in such an obvious way that no sane programmer would make such a mistake. So, let’s consider another example. Let’s suppose that we need to read an IP address from a file. We can do it using the following C code:

#include <stdio.h>
#define MAX_IP_LENGTH 15
int main(void) {
  char file_name[] = "ip.txt";
  FILE *fp;
  fp = fopen(file_name, "r");
  char ch;
  int counter = 0;
  char buf[MAX_IP_LENGTH];
  while((ch = fgetc(fp)) != EOF) {
    buf[counter++] = ch;
  }
  buf[counter] = '\0';
  printf("%s\n", buf);
  fclose(fp);
  return 0;
}

A mistake in the above example is not so obvious. We assume that the IP address, which we want to read from a file, will never exceed 15 bytes. Proper IP addresses (for example, 255.255.255.255) can’t be longer than 15 bytes. However, a malicious user can prepare a file that contains a very long fake string instead of an IP address (for example, 19222222222.16888888.0.1). This string will cause our program to overflow the destination buffer.

If you think that even this bug is too obvious and that no programmer would make such a mistake, stay tuned. Further on, you will see a real-life example of a buffer overflow bug, which occurred in a serious project and which is not much more sophisticated than the above example.

Stack Buffer Overflow Attack Example

Now that we know that a program can overflow an array and overwrite a fragment of memory that it should not overwrite, let’s see how it can be used to mount a buffer overflow attack. In a typical scenario (called stack buffer overflow), the problem is caused – like many problems with information security – by mixing data (meant to be processed or displayed) with commands that control program execution.

In C, like in most programming languages, programs are built using functions. Functions call each other, pass arguments to each other, and return values. For instance, our code, which reads an IP address from a file, could be part of a function called readIpAddress, which reads an IP address from a file and parses it. This function could be called by some other function, for example, readConfiguration. When readConfiguration calls readIpAddress, it passes a filename to it and then the readIpAddress function returns an IP address as an array of four bytes.

Fig. 1. The arguments and the return value of the readIpAddress function

Fig. 1. The arguments and the return value of the readIpAddress function

During this function call, three different pieces of information are stored side-by-side in computer memory. For each program, the operating system maintains a region of memory which includes a part that is called the stack or call stack (hence the name stack buffer overflow). When a function is called, a fragment of the stack is allocated to it. This piece of the stack (called a frame) is used to:

  • Remember the line of code from which program execution should resume when the function execution is completed (in our case, a particular line in the readConfiguration function)
  • Store the arguments passed to the function by its caller (in our case, for example, /home/someuser/myconfiguration/ip.txt)
  • Store the return value that is returned by the function to its caller (in our case, a four bytes array, for instance (192, 168, 0, 1))
  • Store local variables of the called function while this function is being executed (in our case, the variable char[MAX_IP_LENGTH] buf)

Therefore, if a program has a buffer allocated in the stack frame and tries to place more data in it than would fit, user input data may spill over and overwrite the memory location where the return address is stored.

Fig. 2. Contents of the stack frame when the readIPAddress function is called

Fig. 2. Contents of the stack frame when the readIPAddress function is called

If the problem was caused by random malformed user input data, most probably the new return address will not point to a memory location where any other program is stored, so the original program will simply crash. However, if the data is carefully prepared, it may lead to unwanted code execution.

The first step for the attacker is to prepare data that can be interpreted as executable code and that work for the attacker’s benefit (such data is called the shellcode). The second step is to place the address of this malicious data in the exact location where the return address should be.

Fig. 3. The content of ip.txt overwrites the return address

Fig. 3. The content of ip.txt overwrites the return address

In effect, when the function reads the IP character string and places it into the destination buffer, the return address is replaced by the address of the malicious code. When the function ends, program execution jumps to malicious code.

Can You Prevent Buffer Overflow?

Since the discovery of the stack buffer overflow attack technique, authors of operating systems (Linux, Microsoft Windows, macOS, and others) try to find prevention techniques:

  • The stack can be made non-executable, so even if malicious code is placed in the buffer, it cannot be executed. 
  • The operating system may randomize the memory layout of the address space (memory space). In such a case, when malicious code is placed in a buffer, the attacker cannot predict its address. 
  • Other protection techniques (for example, StackGuard) modify a compiler in such a way that each function calls a piece of code that verifies whether the return address has not changed. 

In practice, even if such protection mechanisms make stack buffer overflow attacks harder, they don’t make them impossible, and some of them affect performance.

Buffer overflow vulnerabilities exist in programming languages which, like C, trade security for efficiency and do not check memory access. In higher-level programming languages (e.g. Python, Java, PHP, JavaScript or Perl), which are often used to build web applications, buffer overflow vulnerabilities cannot exist. In those programming languages, you cannot put excess data into the destination buffer. For example, try to compile and execute the following piece of Java code:

int[] buffer = new int[5];
buffer[100] = 44;

The Java compiler will not warn you, but the runtime Java virtual machine will detect the problem and instead of overwriting random memory, it will interrupt program execution.

Buffer Overflow and the Web

However, even programmers who use high-level languages should know and care about buffer overflow attacks. Their programs are often executed within operating systems that are written in C or use runtime environments written in C, and this C code may be vulnerable to such attacks. In order to see how a buffer overflow vulnerability may affect a programmer using such a high-level programming language, let’s analyze CVE-2015-3329 – a real-life security vulnerability, which was discovered in the PHP standard library in 2015.

A PHP application is a collection of *.php files. In order to make it easier to distribute such an application, it may be packed into a single file archive – as a zip file, a tar file, or using a custom PHP format called phar. A PHP extension called phar contains a class that you can use to work with such archives. With this class, you may parse an archive, list its files, extract the files, etc. Using this class is quite simple, for example, to extract all files from an archive, use the following code:

$phar = new Phar('phar-file.phar');
$phar->extractTo('./directory');

When the Phar class parses an archive (new Phar('phar-file.phar')), it reads all filenames from the archive, concatenates each filename with the archive filename, and then calculates the checksum. For example, for an archive called myarchive.phar that contains files index.php and components/hello.php, the Phar class calculates checksums of two strings: myarchive.pharindex.php and myarchive.pharcomponents/hello.php. The reason why the authors implemented it this way is not important here, what is important is how they implemented it. Until 2015, this operation was done using the following function (see the old PHP source code):

phar_set_inode(phar_entry_info *entry TSRMLS_DC) /* {{{ */
{
        char tmp[MAXPATHLEN];
        int tmp_len;
 
        tmp_len = entry->filename_len + entry->phar->fname_len;
        memcpy(tmp, entry->phar->fname, entry->phar->fname_len);
        memcpy(tmp + entry->phar->fname_len, entry->filename, entry->filename_len);
        entry->inode = (unsigned short)zend_get_hash_value(tmp, tmp_len);
}

As we see, this function creates an array of chars called tmp. First, the name of the phar archive (in our example, myarchive.phar) is copied into this array using the following command:

memcpy(tmp, entry->phar->fname, entry->phar->fname_len);

In this command:

  • The first argument, tmp, is a destination where bytes should be copied.
  • The second argument, entry->phar->fname, is a source from where bytes should be copied – in our case, the filename of the archive (myarchive.phar).
  • The third argument, entry->phar->fname_len, is a number of bytes that should be copied – in our case it is the length (in bytes) of the archive filename.

The function copies the filename (in our example, index.php or components/hello.php) into the tmp char array using the following command:

memcpy(tmp + entry->phar->fname_len, entry->filename, entry->filename_len);

In this command:

  • The first argument, tmp + entry->phar->fname_len, is a destination where bytes should be copied – in our case, it is a location in the tmp array just after the end of the archive filename.
  • The second argument, entry->filename, is a source from where bytes should be copied.
  • The third argument, entry->filename_len, is a number of bytes that should be copied.

Then the zend_get_hash_value function is called to calculate the hashcode.

Notice how the size of the buffer is declared:

char tmp[MAXPATHLEN];

It has a size of MAXPATHLEN, which is a constant defined as the maximum length of a filesystem path on the current platform.

The authors assumed that if they concatenate the filename of the archive with the name of a file inside the archive, they will never exceed the maximum allowed path length. In normal situations, this assumption is met. However, if the attacker prepares an archive with unusually long filenames, a buffer overflow is imminent. The function phar_set_inode will cause an overflow in the tmp array. 

An attacker can use this to crash PHP (causing a Denial of Service) or even make it execute malicious code. The problem is similar to our simple example – the programmer made a simple mistake, trusted the user input too much, and assumed that the data will always fit in a fixed-size buffer. Fortunately, this vulnerability was discovered in 2015 and fixed.

How to Avoid Buffer Overflow

Programmers must avoid buffer overflow attacks by always validating user input length. However, a good general way to avoid buffer overflow vulnerabilities is to stick to using safe functions that include buffer overflow protection (unlike memcpy). Such functions are available on different platforms, for example, strlcpy, strlcat, snprintf (OpenBSD) or strcpy_s, strcat_s, sprintf_s (Windows).

By Piotr Sobolewski

Netsparker

Keep up to date with web security news from Netsparker