Input validation is the first step of checking the type and content of data supplied by a user or application. For web applications, input validation means verifying user inputs provided in web forms, query parameters, uploads, and so on. Improper input validation is a major factor in many web security vulnerabilities, including cross-site scripting (XSS) and SQL injection. Let’s see why proper data validation is so important for application security.
What Is Input Validation?
Any system or application that processes input data needs to ensure that it is valid. This applies both to information provided directly by the user and data received from other systems. Validation can be done on many levels, from simply checking the input types and lengths (syntactic validation) to ensuring that supplied values are valid in the application context (semantic validation).
In web applications, input validation typically means checking the values of web form input fields to ensure that a date field contains a valid date, an email field contains a valid email address, and so on. This initial client-side validation is performed directly in the browser, but submitted values also need to be checked on the server side.
NOTE: While we usually talk about “user input” or “user-controlled input”, it is good practice to check all inputs to an application and treat them as untrusted until validated. This applies even when data comes from sources that should be trusted, because a business partner or regulator may send dangerous data if their own systems have been compromised.
The Consequences of Improper Input Validation
When reading about web vulnerabilities on this blog, you may have noticed that many of the articles have a very similar ending: “to mitigate this vulnerability, make sure you carefully validate all user inputs.” By preventing malicious users from freely entering attack strings, you can mitigate the vast majority of injection attacks, from cross-site scripting and SQL injection to buffer overflows and XML external entity attacks (XXE injection). If you look at the definition of CWE-20: Improper Input Validation, you will notice that this weakness can precede many others and lead to all sorts of problems.
While input validation alone is not enough to prevent all attacks, it can reduce the attack surface and minimize the impact of attacks that do succeed. Another reason for data validation is to make sure that applications work correctly and provide maximum benefit to the user. Without the right data in the right place, an application might return incorrect results or even crash. Missing or insufficient input validation can also degrade the user experience on other levels. For example, if a registration page fails to detect an incorrect email or phone number, the user may be unable to confirm the account. If invalid data passes validation in the browser and is only caught during server-side validation, the user may need to wait longer to get a response from the page.
How to Ensure Proper Input Validation in Web Applications
HTML5 Validation Features
The HTML5 spec includes built-in form validation features that let you specify validation constraints directly in HTML. These include input field attributes such as
required to indicate a required field,
type to specify the data type,
maxlength to define a length limit, or
pattern to specify a regex pattern for valid values. The spec also defines CSS pseudo-classes such as
:invalid that allow you to apply different styles depending on the validation result.
Built-in form validation features in HTML5 are a great place to get started with data validation. With just a few extra attributes in standard HTML elements, you get basic data type and content validation with cross-platform support to save you a lot of work and provide a native user experience. For detailed examples, see the MDN article on client-side form validation.
Blacklisting vs Whitelisting
For well-defined inputs such as numbers, dates, or postcodes, it’s much easier and safer to use a whitelist. That way, you can clearly specify permitted values and reject everything else. With HTML5 form validation, you get predefined whitelisting logic in the built-in data type definitions, so if you indicate that a field contains an email address, you have ready email validation. If only a handful of values are expected, you can use regular expressions to explicitly whitelist them.
Whitelisting gets tricky with free-form text fields, where you need some way to allow the vast majority of available characters, potentially in many different alphabets. Unicode character categories can be useful to allow, for example, only letters and numbers in a variety of international scripts. You should also apply normalization to ensure that all input uses the same encoding and no invalid characters are present.
Input Validation Against XSS
The problems with validating free-form text highlight the final point. Despite its importance for web application security, input validation is not and never should be the primary defense against cross-site scripting. The main defense against cross-site scripting is context-aware output encoding. If the application user needs to enter
<script> in a text field (maybe because they are writing an article on input validation), the application should encode these characters in a suitable way and ensure that they are processed correctly and safely. Simply filtering inputs is not enough to prevent cross-site scripting, which is why XSS filters have been removed from modern web browsers.
For a detailed discussion of input validation in web applications, see the OWASP Input Validation Cheat Sheet.