The queries here are designed to explore a set of nested directories, or an unknown drive, to find out how many and what kind of things they hold.

It's best to scan only the directory you want to investigate, so you don't have to specify a path filter in each query's WHERE clause. The scan will recurse down through the whole directory tree, unless you use the -maxdepth option to change this behaviour.

C:\>  crab  "C:\Users\johnsmith\terraIncognita\"

Count how many files and how many directories

Directories have type 'd', files have type 'f'

SELECT type, count(*) FROM files GROUP BY type;

Total file size in MB

SELECT sum(bytes)/1e6 as MB FROM files;

See how deep the directory tree goes

How many files at each depth. Top level directory scanned is depth 0, children of this directory are depth 1, and so on.

SELECT depth, count(*) FROM files 
WHERE type = 'f' GROUP BY depth ORDER BY depth;

Most common file extensions

The ten most common file extensions, with max and avg size for files of this kind

SELECT extension, count(*), max(bytes)/1e6 as maxSizeMB, avg(bytes)/1e6 avgSizeMB, name
FROM files WHERE type = 'f'
GROUP BY extension
ORDER BY count(*) desc

Count how many executable files

On Windows a file is executable if it has an appropriate extension such as .exe. You can identify which files are executable from the mode field in Crab's files table, which stores this information in a Linux-like format, 'x' indicates executable.

SELECT count(*) FROM files 
WHERE type = 'f' and mode like '%x%';

Count how many files per directory

SELECT avg(filecount), max(filecount)
FROM (SELECT count(*) filecount FROM files WHERE type = 'f' GROUP BY parentpath);

Most common filenames

The five most common filenames

SELECT name, count(*)
FROM files
HAVING count(*) > 1
ORDER BY count(*) desc

Tip: Soft links aren't followed when scanning

Soft links are files which act as a placeholder for a file or directory elsewhere on the disk. This allows the same directory to appear in multiple places without taking up extra disk space.

When Crab scans a soft link, it records the soft link but not the contents of the target.

You can identify soft links from the mode field in the files table . For directories the mode starts with a lowercase L, 'l', directories have a mode that starts with a 'd', and files have a mode that starts with a hyphen '-'.

Tip: If you do this often, make a Crab script

If you have a set of queries you run often, put them together into a Crab script and run it with the %read command.

© 2017 Etia UK