What Are Format String Vulnerabilities?

Category: Web Security Readings - Last Updated: Thu, 07 May 2020 - by Zbigniew Banach

Format strings are used in many programming languages to insert values into a text string. In some cases, this mechanism can be abused to perform buffer overflow attacks, extract information or execute arbitrary code. Let’s take a closer look at format string vulnerabilities and see why they exist.

Format string vulnerabilities

Sven Morgenroth on Paul’s Security Weekly #625

In October, Netsparker security researcher Sven Morgenroth talked to Paul Asadoorian on Security Weekly, discussing his technical article about string concatenation and format string attacks. Watch the full interview below and read on for a more concise overview of format string vulnerabilities.

How It All Began: Format Strings in C

One of the first functions encountered when learning the C programming language is printf(), or “print formatted”. At its most basic, printf() can be used to simply send an ASCII string to standard output (stdout), but its real strength lies in the use of formatting parameters. To insert values into an output string, you use format specifiers as placeholders in the string and pass the values as additional parameters to the printf function, for example:

char* dir_name = "Work";
int no_of_files = 42;
printf("Directory %s contains %d files", dir_name, no_of_files);

In this example, %s is a string format specifier and is replaced by the value of the first variable (dir_name). %d means a decimal and is replaced by the value of second variable (no_of_files). So in this case, printf() prints:

Directory Work contains 42 files

There are many different format specifiers corresponding to various data types and format parameters, such as %f for decimal floating point or %u for unsigned decimal. There is also the %n parameter, which works in the opposite direction – instead of reading from the specified variable, it writes to it. The value stored in the variable is the number of bytes printed by printf() so far.

Format functions are extremely useful for generating readable output and can save programmers a lot of work by performing automatic type conversions. However, if used incorrectly, printf() format strings can be vulnerable to a variety of attacks. In fact, printf() is just one of a whole family of format functions that also includes fprintf(), sprintf(), snprintf(), vsprintf(), vprintf(), vsnprintf(), vfprintf(), and many others – all vulnerable. Let’s see why this is possible.

Exploiting Format String Vulnerabilities in C

Format string functions in C accept a variable number of arguments (one or more), so you can also use them to print Hello World without any format specifiers:

char* user_input = "Hello World";
printf(user_input);

As long as user_input is guaranteed to contain no format specifiers, this is fine. But if that value is controlled by the user, an attacker can exploit format string syntax to trigger a variety of dangerous behaviors. Let’s start with a string that contains no text but lots of format specifiers, in this case %x for hexadecimal values:

printf("%x%x%x%x");

For each format specifier it encounters in the format string, printf() expects to find a suitable variable in its argument list. In C programs, variables are stored on the stack in process memory, so when printf() sees the first %x specifier, it just looks at the stack and reads the first variable after the format string. This is repeated for all four %x specifiers, so the example above will print the hex representation of four values from the stack. Depending on the program and the execution context, these could include function return addresses, variable values, pointer memory addresses, function parameters, or even user-supplied data.

By crafting format strings that contain a specific number of bytes, attackers can read memory from arbitrary addresses. This is already a critical buffer overread vulnerability that can be used to extract information and prepare other attacks – but it gets worse. Remember the %n parameter? The only one that writes to a variable? Combined with other format function specifiers, it can be used by attackers to overwrite specific memory locations, for example to redirect function pointers to a shellcode or simply cause a segmentation fault to crash an application for a denial of service attack. 

For a detailed technical discussion of format string exploits in C, see the paper Exploiting Format String Vulnerabilities.

How to Avoid these Vulnerabilities

We have seen that careless use of core format string functions in C can open the way to various attacks, up to and including arbitrary code execution. As is so often the case in application security, the best way to eliminate these vulnerabilities is to properly validate user input or (better still) avoid passing user-controlled inputs to format functions whenever possible. You should also never use printf() and its related format functions without format parameters, even when just printing a string literal:

char* greeting = "Hello";
printf(greeting); // This is insecure
printf("%s", greeting); // This is secure

That way, even if the string contains unexpected format specifiers, they will not be processed but simply printed as regular characters. Source code scanners can be used to make sure that the number of arguments passed to a format function is the same as the number of format specifiers in the format string. This can also be checked at compile time – for gcc, these checks are enabled with the -Wall and -Wformat flags.

Format String Vulnerabilities in Web Applications

Web applications are generally written in higher-level languages than C, so are format string vulnerabilities at all relevant to web application security? Many back-end applications, such as web servers, are written in C/C++, so it is certainly possible for user inputs from a web application to make it through to a vulnerable C program, even if just to crash the web server. But what about typical web application languages, like PHP, Python or JavaScript? In his technical article, Sven takes a detailed look at printf() and similar functions in a variety of languages. His conclusion is that of the popular web application languages, Python provides the greatest potential for abuse, so let’s quickly see why.

Format String Vulnerabilities in Python

From version 2.7 onwards, Python includes a new set of string formatting functions. These provide far greater capabilities than pre-2.7 formatting constructs but can also open up interesting attack vectors.

Every Python string has a format() method. A format string that replicates the first example given for C might be:

print("Directory {} contains {} files".format("Work", 42))

This simply replaces each {} placeholder with the corresponding argument to the format() method. 

However, format() can also take an object and access its attributes to complete the format string. This is convenient but can have unexpected consequences. To illustrate this, let’s define the DirData class to use as the information object. Let’s also say there is a confidential value stored in a global variable in the same module:

SECRET_VALUE = "passwd123"
class DirData:     def __init__(self):         self.name = "Work"         self.noOfFiles = 42 print("Directory {dirInfo.name} contains {dirInfo.noOfFiles} files".     format(dirInfo=DirData()))

So far, this is just another way to get the same output. But Python objects can access lots of internal attributes, including a dictionary of global variables. By stringing attributes together, it is possible to get at the secret value:

print("The secret is {dirInfo.__init__.__globals__[SECRET_VALUE]}".
    format(dirInfo=DirData()))

And the result?

The secret is: passwd123

Again, the surest general way of eliminating such vulnerabilities is to avoid including unvalidated user inputs in format strings wherever possible. And as with so many other vulnerabilities, always sanitize external application inputs before using them.

Netsparker

Keep up with the latest web security
content with weekly updates.