Constructing Custom Word Lists and Automating Dictionary Attacks on Password Protected Zip Files in Linux

Method: Building the Wordlists

Once the disks had been imaged and pre-processed in X-Ways Forensics, the zip files and other artefacts were extracted. This was conducted on forensic workstations which utilise Windows however the majority of the work in this case was completed in Linux, utilising the ease of use of manipulating text files that it offers.

pagefile.sys and hyberfil.sys

Windows uses the pagefile.sys file as virtual memory when it has no random access memory (RAM) left to utilize. When a Windows computer enters hibernate mode, it completely writes the memory out to the hard drive as the hyberfil.sys file before powering down. Both files contain outputs from the computer’s memory at certain times. Passwords can be stored in RAM unencrypted. Unfortunately no RAM dump from any of the computers was obtained prior to submission.

strings is a Linux tool which extracts strings of printable characters in files. The output of this formed part of the dictionary. The following command was used to process the two files and output it to a text file on a Linux system with printable characters of length 5 or more.

strings -n 5 pagefile.sys hyberfil.sys > pf_hf.txt

Firefox Password Manager

Firefox was the primary browser used by the suspect. Although he also used the TOR Browser which is based on Firefox, it is modified so that passwords are not capable of being stored. Using virtualisation software, his disk image was started as a guest virtual machine, and Firefox started. Within Firefox on the guest machine, Settings > Privacy & Security > Saved Logins… reveals all currently saved passwords, of which there were over 40 unique passwords. At least one item from this list was the same password for at least one of the zip files. All these saved passwords were exported to a text file to form part of the dictionary.

Internet History and Interests

Whilst the suspect spoke several languages, much of his interest was in Japanese culture, and from examining his computer, most of his translation work was in Japanese. Many of the passwords that had been seen so far, were also in Japanese but in Latin characters.

html2text is a Linux utility which reads HTML documents from the input-files, formats each of them into a stream of plain text characters, and writes the result to standard output. Whilst it performs a similar function to strings, it removes HTML tags which are otherwise printable characters. A website which was bookmarked and a large sample from the suspect’s internet history which was recovered was visited and saved.

html2text ./*htm* > webpage.txt

This output was unsatisfactory for word list generation however, although tags were removed from the website, multiple words per line were still listed. Although it would be easy to split the words by white space in bash, it is a laborious task for multiple web pages which must be saved first. This could be automated but a solution already exists.

CeWL is a ruby application which “spiders” a given URL to a specified depth, optionally following external links, and returns a list of words which can then be used for password crackers. Every website which was bookmarked and a large sample from the suspect’s internet history which was recovered was visited with CeWL. The below example is for one such website.

ruby cewl.rb -m 4 -d 2 -w cewl.txt -v
http://www.japan-guide.com/

This command outputs in verbose mode to cewl.txt all 4 character words (not including HTML) from the website www.japan-guide.com, following links to other websites within the same domain to a depth of 2 sites. As some sites like this news site are large, this “spidering” can take some time. The task of visiting the sites and building these word lists took one week.

Personal Information

cupp.py (Common User Passwords Profiler) is a Python script which automates word list generation when provided with specific information about an individual. When provided with information such as first name, surname, date of birth, nick name, partners name, pet names, and keywords, it will generate multiple combinations of these with random numbers and “leet speak”. It also has many language specific word lists including an 115,600 word Japanese list, which was imported for recombination.

python cupp.py -i

crunch

crunch can create a word list based on criteria you specify. The output from crunch can be sent to the screen, file, or to another program. The usefulness of crunch in this case will be limited as information analysed so far seems to indicate long and complex passwords. crunch will still be used however to generate a word list for smaller words, should some of the many zip files be using one. The cost in doing so in time and computer resources is low as long as the number of possible combinations is low. An example of the word list sizes when completed is listed below.

The size of the word list in bytes created by crunch is approximately (x ^ y) * (y + 1) where x is the number of characters and y is the length of the password. For example, all combinations of 6 characters alphanumeric lower and upper case the file with the following size is created;
(26 + 26 + 10 ^ 6) x (6 + 1)
56,800,235,584 x 7 = 397,601,649,088 bytes

In reality however, crunch creates slightly larger files. With the following file size estimates it can be seen that large word lists generated within crunch become infeasible due to storage limitations;
1 to 4 characters 71 MB,
1 to 5 characters 5 GB,
1 to 6 characters 375 GB,
1 to 7 characters 25 TB.

Whilst these generated word lists can be piped directly to other commands, meaning storage is not an issue, the sheer number of possibilities mean that processing will take a long time and without (and even with) specialist optimised hardware, guessing the correct result is unlikely. As such, the crunch word list was only included to cover passwords from 1 to 6 characters in length of all alphanumeric upper and lower case permutations as below.

crunch 1 6
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789
-o crunch.txt

Other Sources

The suspect provided a list of 12 possible passwords to assist. Although none of these worked on the zip files, it was possible that they were similar to the passwords for the zip files, so they were included in a text file.

A small book was recovered from the suspects home address that listed various account details and passwords. Although none of these allowed access to the zip files, it was possible that they were similar to the passwords for the zip files and as such they were also included in a text file for later recombination if required.

Combining All Word Lists

All the word list text files were merged into one dictionary for use with fcrackzip. This made the process simpler and also removed duplicate entries such as those in both crunch.txt (all 1 to 6 character alphanumeric) and from CeWL (which was set to words of 4 or greater in length). By removing duplicates, the length of time it would take to process the zip files would decrease.

sort ./* | uniq > dictionary.txt

‍

Method: Building the Wrapper

As previously described, fcrackzip can only process multiple zip files if they all have the same password. As they do not in this case, a bash script was created to automate this process. There were many ways to solve this problem, but this one was mine.

The core of the script would be simple and efficient, invoking fcrackzip to do the extractions of the zip files. Robust error checking and counters would also be implemented for a summary at the conclusion of processing.

Review

Throughout the process of building the custom word list, I often dismissed crunch. From the information I knew about the suspect and the data I had already gathered about his devices led me to believe that it would not generate any useful words. Although it generated only 2, it did generate some. Despite the suspect’s often complex passwords, some were short enough to be within crunch’s 1 to 6 character range that I specified.

CeWL exceeded my expectations, although slow to run, especially across large sites, it generated a great word list specific to the susepct. Although Firefox’s saved passwords resulted in more useful passwords for the zip files, that is almost to be expected, it was after all a saved list of passwords and most users do re-use them.

Low Tech solutions are often overlooked in investigating High Tech crime. The importance of the suspects book which was located next to his computer table was initially overlooked. Although all 4 useful passwords within it were already saved within the suspect’s Firefox browser, it is easy to imagine a situation where they were more security conscious, and did not save any passwords locally.

CeWL’s built in (downloadable) word lists also got results that CeWL or no other source did. There are a large number of specific word lists available online and when multiple are combined, an effective attack can be made. Whilst only one external word list was used in this case, if I had achieved fewer results, I would have tried more.

Time and resources permitting, tailoring a world list for dictionary attacks can have great benefits in gaining access to encrypted files.

If you have any questions about this blog, or would like to discuss how defensive services could help your business, please do get in touch.

Background

Challenges

Solution

Method: Building the Wordlists

pagefile.sys and hyberfil.sys

Firefox Password Manager

Internet History and Interests

Personal Information

crunch

Other Sources

Combining All Word Lists

Method: Building the Wrapper

Results

Review

Heading 1

Heading 2

Heading 3

Background

Challenges

Solution

Method: Building the Wordlists

pagefile.sys and hyberfil.sys

Firefox Password Manager

Internet History and Interests

Personal Information

crunch

Other Sources

Combining All Word Lists

Method: Building the Wrapper

Results

Review

Heading 1

Heading 2

Heading 3

Sign up to our newsletter to receive the latest updates