Dirbuster and dirb are in the toolset of all web application security fans. Both tools are excellent (although I prefer dirb due to it being command line and not Java), but their results obviously depends on how good the wordlist you are using is. I often find myself editing the wordlist file to add directories/files relevant to the site I'm running dirbuster against. So one day I though: hey why not automate this as much as I can.
Here comes dict_populator.pl.
The tool will crawl and discover URL's in the site received in its input and output the following:
- A file with all discovered URL's (site.name-urls.txt)
- dict.txt (Will append any directory/file names found while crawling and not found in the dictionary to this file)
- ext.txt (Will append any extensions found while crawling the site and not found in the file)
The script also supports using cookies for better crawling of sites that require authentication. The cookies can be added in the first few lines of the script (no command line switch yet).
Just as a test I used the common.txt wordlist found in dirb to scan one of my sites. The scan did not include any extensions and discovered 25 directories. I run dict_populator against the site making it append the discovered names in the above wordlist and then run dirb again. This time 34 directories where discovered.
Requires: LWP::UserAgent, HTTP::Cookies and HTML::LinkExtor.
Updated to version 0.2, which fixes a bug that multiple image URL's appear in the output.