Method: Building the Wordlists
Once the disks had been imaged and pre-processed in X-Ways Forensics, the zip files and other artefacts were extracted. This was conducted on forensic workstations which utilise Windows however the majority of the work in this case was completed in Linux, utilising the ease of use of manipulating text files that it offers.
pagefile.sys and hyberfil.sys
Windows uses the pagefile.sys file as virtual memory when it has no random access memory (RAM) left to utilize. When a Windows computer enters hibernate mode, it completely writes the memory out to the hard drive as the hyberfil.sys file before powering down. Both files contain outputs from the computer’s memory at certain times. Passwords can be stored in RAM unencrypted. Unfortunately no RAM dump from any of the computers was obtained prior to submission.
strings is a Linux tool which extracts strings of printable characters in files. The output of this formed part of the dictionary. The following command was used to process the two files and output it to a text file on a Linux system with printable characters of length 5 or more.
strings -n 5 pagefile.sys hyberfil.sys > pf_hf.txt
Firefox Password Manager
Firefox was the primary browser used by the suspect. Although he also used the TOR Browser which is based on Firefox, it is modified so that passwords are not capable of being stored. Using virtualisation software, his disk image was started as a guest virtual machine, and Firefox started. Within Firefox on the guest machine, Settings > Privacy & Security > Saved Logins… reveals all currently saved passwords, of which there were over 40 unique passwords. At least one item from this list was the same password for at least one of the zip files. All these saved passwords were exported to a text file to form part of the dictionary.
Internet History and Interests
Whilst the suspect spoke several languages, much of his interest was in Japanese culture, and from examining his computer, most of his translation work was in Japanese. Many of the passwords that had been seen so far, were also in Japanese but in Latin characters.
html2text is a Linux utility which reads HTML documents from the input-files, formats each of them into a stream of plain text characters, and writes the result to standard output. Whilst it performs a similar function to strings, it removes HTML tags which are otherwise printable characters. A website which was bookmarked and a large sample from the suspect’s internet history which was recovered was visited and saved.
html2text ./*htm* > webpage.txt
This output was unsatisfactory for word list generation however, although tags were removed from the website, multiple words per line were still listed. Although it would be easy to split the words by white space in bash, it is a laborious task for multiple web pages which must be saved first. This could be automated but a solution already exists.
CeWL is a ruby application which “spiders” a given URL to a specified depth, optionally following external links, and returns a list of words which can then be used for password crackers. Every website which was bookmarked and a large sample from the suspect’s internet history which was recovered was visited with CeWL. The below example is for one such website.
ruby cewl.rb -m 4 -d 2 -w cewl.txt -v
http://www.japan-guide.com/
This command outputs in verbose mode to cewl.txt all 4 character words (not including HTML) from the website www.japan-guide.com, following links to other websites within the same domain to a depth of 2 sites. As some sites like this news site are large, this “spidering” can take some time. The task of visiting the sites and building these word lists took one week.
Personal Information
cupp.py (Common User Passwords Profiler) is a Python script which automates word list generation when provided with specific information about an individual. When provided with information such as first name, surname, date of birth, nick name, partners name, pet names, and keywords, it will generate multiple combinations of these with random numbers and “leet speak”. It also has many language specific word lists including an 115,600 word Japanese list, which was imported for recombination.
python cupp.py -i
crunch
crunch can create a word list based on criteria you specify. The output from crunch can be sent to the screen, file, or to another program. The usefulness of crunch in this case will be limited as information analysed so far seems to indicate long and complex passwords. crunch will still be used however to generate a word list for smaller words, should some of the many zip files be using one. The cost in doing so in time and computer resources is low as long as the number of possible combinations is low. An example of the word list sizes when completed is listed below.
The size of the word list in bytes created by crunch is approximately (x ^ y) * (y + 1) where x is the number of characters and y is the length of the password. For example, all combinations of 6 characters alphanumeric lower and upper case the file with the following size is created;
(26 + 26 + 10 ^ 6) x (6 + 1)
56,800,235,584 x 7 = 397,601,649,088 bytes
In reality however, crunch creates slightly larger files. With the following file size estimates it can be seen that large word lists generated within crunch become infeasible due to storage limitations;
1 to 4 characters 71 MB,
1 to 5 characters 5 GB,
1 to 6 characters 375 GB,
1 to 7 characters 25 TB.
Whilst these generated word lists can be piped directly to other commands, meaning storage is not an issue, the sheer number of possibilities mean that processing will take a long time and without (and even with) specialist optimised hardware, guessing the correct result is unlikely. As such, the crunch word list was only included to cover passwords from 1 to 6 characters in length of all alphanumeric upper and lower case permutations as below.
crunch 1 6
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789
-o crunch.txt
Other Sources
The suspect provided a list of 12 possible passwords to assist. Although none of these worked on the zip files, it was possible that they were similar to the passwords for the zip files, so they were included in a text file.
A small book was recovered from the suspects home address that listed various account details and passwords. Although none of these allowed access to the zip files, it was possible that they were similar to the passwords for the zip files and as such they were also included in a text file for later recombination if required.
Combining All Word Lists
All the word list text files were merged into one dictionary for use with fcrackzip. This made the process simpler and also removed duplicate entries such as those in both crunch.txt (all 1 to 6 character alphanumeric) and from CeWL (which was set to words of 4 or greater in length). By removing duplicates, the length of time it would take to process the zip files would decrease.
sort ./* | uniq > dictionary.txt
Method: Building the Wrapper
As previously described, fcrackzip can only process multiple zip files if they all have the same password. As they do not in this case, a bash script was created to automate this process. There were many ways to solve this problem, but this one was mine.
The core of the script would be simple and efficient, invoking fcrackzip to do the extractions of the zip files. Robust error checking and counters would also be implemented for a summary at the conclusion of processing.