New Generation Robots.txt: Apple App-Site-Association

Category: Web Security Readings - Last Updated: Wed, 22 May 2019 - by Umran Yildirimkaya

Apple has developed an iOS version of robots.txt, the file that controls the navigation of the search bots on a website. The file, referred to as Apple-app-site-association (AASA), holds the configurations set by web app developers to allow the iOS users who visit their websites to be redirected to the applications or websites designed for their platform. Apple uses what they call Universal Links to achieve these redirections. In this article, we explain what robots.txt, Universal Links, and AASA are and examine how they are used.

New Generation Robots.txt: Apple App-Site-Association

What is robots.txt and What Does it Do?

The robots.txt file is a robot exclusion standard that was suggested by developer Martijn Kosner in 1994. It was designed to prevent Denial of Service (DoS) attacks on servers from the navigation of all sorts of web crawlers. This standard allows website owners to specify which pages are accessible by bots and crawlers. Generally, these bots are used by search engines to categorize and classify websites. The website administrator adds a robots.txt file in the root directory of the web application and inserts the configuration in the format as shown.

Although robots.txt has multiple directives, there are two standard directives accepted by all search engines. User-agent is the name of the bot, and Disallow contains the subdirectory that bots are not allowed to visit. Individual Disallow directives are required for each directory.

User-agent: Googlebot

Disallow: /example-subdirectory/

The crawling process, also known as 'spidering', is the name given to the action of the search engine bots as they navigate websites to collect, index and serve data to users. The first action bots conduct when they visit a website is to find the robots.txt file and continue the navigation within the website, based on the permissions set in this file. In the absence or misconfiguration of a robots.txt file, the bots navigate and index all directories of the website by default. However, there are two important points:

  • Malicious bots ignore the robots.txt file
  • The robots.txt is publicly accessible (so you should never put classified information in this file)

Since website administrators often neglect to take seriously the public availability of this file, attackers, penetration testers, and CTF competitors are regularly able to use it discover hidden directories, as well as the stage and admin panels of websites. To help minimise web application security risks, Netsparker can easily detect and report a website’s robots.txt file.

The extensive use of mobile devices and the popularity of iOS promoted developers at Apple to develop an efficient way to transfer data between mobile applications and websites. In order for tshi to be an effective data gathering process, the search bots have to identify the contents of a website using a similar technology to the robots.txt file. This data transfer was initially achieved by Custom URIs, which posed several risks. Later, Universal Links led to the invention of Apple’s Apple-App-Site-Association (AASA).

Role of Custom URIs in Mobile Application Data Transfers

Let’s take a quick look at how data transfers worked up until iOS 9, to help us understand Universal Links and the AASA.

Until iOS 9, if a mobile application wanted to browse the data of another application, it used Custom URIs. For example, if an app wanted to view a profile page of Wikipedia over the Facebook application, it would have to use the Custom URI fb://profile/33138223345 to launch Wikipedia’s Facebook page. Though this data transfer method appears rather simple, it introduced some disadvantages, one of which concerned security issues. A Custom URI could be shared by multiple applications simply by embedding the same URI within the application. Another significant problem was that the custom URI would work only if there was a corresponding application installed on the mobile device. There was no fallback mechanism if the appropriate application was not installed on the device.

This lack meant that applications were able 'sniff' the existence of the application listed in the Custom URI before sending out the request to the URI. Based on the example above, an application could tell whether the Facebook app was installed on the user's device before sending out the request to Wikipedia’s Facebook page.

The Security Mechanism Invented by Apple-App-Site-Association (AASA)

Using the new concept Apple introduced in iOS9, Universal Links redefine URLs to eliminate the disadvantages of Custom URIs. Apple decided to use Universal Links to verify the ownership of the URL over the domain itself. If the mobile application is unable to handle the request to the URL, users are redirected to the iOS-designed webpage in Safari or the default browser.

This entire mechanism works by checking whether the application associated website has a file named Apple App Site Association (AASA) in the root directory. This JSON formatted file must hold a list of paths that the application will handle.

Just like the robots.txt file, within the AASA file there are lists of URLs that the application is may and may not access. Each domain requires individual configuration, and two directives (appID and path) must be set. Similar to robots.txt, the AASA file must be on HTTPS and in the /.well-known/apple-app-site-association directory.

In this example, the AASA file is configured for a mobile application called Jolly. Note that those links that begin with 'NOT' are the counterparts of those that begin with the Disallow directive in the robots.txt file.

{
 "activitycontinuation": {
   "apps": [
     "W74U47NE8E.com.apple.store.Jolly"
   ]
 },
 "applinks": {
   "apps": [],
   "details": [
     {
       "appID": "W74U47NE8E.com.apple.store.Jolly",
       "paths": [
         "NOT /shop/buy-iphone/*",
         "NOT /us/shop/buy-iphone/*",
         "/xc/*",
         "/shop/buy-*",
         "/shop/product/*",
         "/shop/bag/shared_bag/*",
         "/shop/order/list",
...

The Future and Security of AASA

Just as hackers can exploit the carelessness of website administrators by using the publicly available robots.txt file to discover hidden directories, equally sensitive information could be found in the similar apple-app-site-association file. And, though awareness has increased around the robots.txt file, the Apple equivalent is relatively new and unknown, meaning that admins may inadvertently misconfigure it under the .well-known/apple-app-site-association directory.

Administrators and developers should be aware that publicly available files should only contain information that’s strictly necessary. If confidential information is included in such files, bots can index the information in search engines, making hidden sources of information freely available to anyone who knows where to look. It’s important to note that this file should be present in most websites, since almost every well known website has an iOS application or is configured so that it can be visited from an iOS device. Therefore, it’s safe to say that the AASA is the future of robots.txt file.

Further Information

Read more about the robot exclusion standard suggested by developer Martijn Kosner in A Standard for Robot Exclusion.

We also suggest that you read Apple’s documentation, Support Universal Links, to help you implement a proper and efficient utilization of this new generation robots.txt file.


Netsparker

Dead accurate, fast & easy-to-use Web Application Security Scanner

GET A DEMO