How to Create Better List for Forced Browsing with Dirbuster

Forced browsing / finding hidden resources is one of the crucial part of any black-box web application security assessment. There are great tools to accomplish this task, but our favorite is DirBuster. Simple, fast & smart.

DirBuster ships with several wordlists, these wordlists generated via one big crawler which visited tons of websites, collected links and created most common directory / file names on the Internet. This is a really nice approach and DirBuster’s wordlists worked much better than any other wordlists out there.

However there is one fundamental problem with these wordlists. Whilst the purpose of these wordlists is finding hidden and not linked resources, ironically they are generated only from known and linked resources. To address this problem we came up with the idea of generating wordlists from open source code repositories. This way it would be possible to see all file/directory names and create much more useful wordlists.

We have extracted the directory structure and file names of many projects from Google Code and SourceForge to prepare a good wordlist for discovering hidden files/folders on a targeted web application.

Numbers

We have processed over 5000 projects.
We have more than 400k words at our database.

We have sorted the words according to the their frequency count and prepared some lists based on this data.

How did we generate the wordlists?

Initially we needed to find lots of public SVN/CSV. So far we only used Google Code and Sourceforge. We did filtered search such as “Only PHP” or “Only ASP” projects. After this we used FSF (Freakin’ Simple Fuzzer) to scrape, it was a one liner.
After we had the list of all open source projects, we wrote couple of simple batch files to start getting list of files via SVN and CVS clients.
When all finished, we coded a small client to analyse the all repository outputs and load them into an SQL Server database. Later on we applied many filters with yet another small script and generated all these different wordlists to use in different scenarios.

Download

Download Wordlists (GPL) – SVNDigger.zip (~550KB)

all.txt
all-dirs.txt
all-extensionless.txt
context\admin.txt
context\debug.txt
context\error.txt
context\help.txt
context\index.txt
context\install.txt
context\log.txt
context\readme.txt
context\root.txt
context\setup.txt
context\test.txt
cat\Conf\conf.txt
cat\Conf\config.txt
cat\Conf\htaccess.txt
cat\Conf\properties.txt
cat\Database\inc.txt
cat\Database\ini.txt
cat\Database\mdb.txt
cat\Database\mdf.txt
cat\Database\sql.txt
cat\Database\xml.txt
cat\Language\ascx.txt
cat\Language\asp.txt
cat\Language\aspx.txt
cat\Language\c.txt
cat\Language\cfm.txt
cat\Language\cpp.txt
cat\Language\cs.txt
cat\Language\css.txt
cat\Language\html.txt
cat\Language\jar.txt
cat\Language\java.txt
cat\Language\js.txt
cat\Language\jsp.txt
cat\Language\jspf.txt
cat\Language\php.txt
cat\Language\php3.txt
cat\Language\php5.txt
cat\Language\phpt.txt
cat\Language\pl.txt
cat\Language\py.txt
cat\Language\rb.txt
cat\Language\sh.txt
cat\Language\swf.txt
cat\Language\tpl.txt
cat\Language\vb.txt
cat\Language\wsdl.txt
cat\Project\csproj.txt
cat\Project\pdb.txt
cat\Project\resx.txt
cat\Project\sln.txt
cat\Project\suo.txt
cat\Project\vbproj.txt

It’s licensed under GPL, feel free to share and use your own GPL-Compatible application.

SVN Digger – Better Wordlists for Forced Browsing with Netsparker Web Application Security Scanner

Numbers

Wordlist Categories

How did we generate the wordlists?

Download

Related Articles

SQL Injection Cheat Sheet

HTTP security headers: An easy way to harden your web applications

How you can disable directory listing on your web server – and why you should

JSON injection

Numbers

Wordlist Categories

How did we generate the wordlists?

Download

Related Articles

Most Popular Articles

SQL Injection Cheat Sheet

HTTP security headers: An easy way to harden your web applications

How you can disable directory listing on your web server – and why you should

JSON injection