The queries here are designed to explore a set of nested directories, or an unknown drive, to find out how many and what kind of things they hold.

It's best to scan only the directory you want to investigate, so you don't have to specify a path filter in each query's WHERE clause. The scan will recurse down through the whole directory tree, unless you use the
-maxdepth option to change this behaviour.


$ crab "/Users/johnsmith/terraIncognita/"

Count how many files and how many directories


Directories have type 'd', files have type 'f'


SELECT type, count(*) FROM files GROUP BY type;

Total file size in MB


SELECT sum(bytes)/1e6 as MB FROM files;

See how deep the directory tree goes


How many files at each depth


SELECT depth, count(*) FROM files
WHERE type = 'f' GROUP BY depth ORDER BY depth;

Most common file extensions


The ten most common file extensions, with max and avg size for files of this kind


SELECT extension, count(*), max(bytes)/1e6 as maxSizeMB, avg(bytes)/1e6 avgSizeMB, name
FROM files WHERE type = 'f'
GROUP BY extension
ORDER BY count(*) desc
LIMIT 10;

Count how many executable files


On macOS a file is executable if it has a appropriate permissions. You can identify which files are executable from the mode field in Crab's files table, which stores this information in a Linux-like format, 'x' indicates executable.

SELECT count(*) FROM files 
WHERE type = 'f' and mode like '%x%';

Count how many files per directory


SELECT avg(filecount), max(filecount)
FROM (SELECT count(*) filecount FROM files WHERE type = 'f' GROUP BY parentpath);

Most common filenames


The five most common filenames


SELECT name, count(*)
FROM files
GROUP BY name
HAVING count(*) > 1
ORDER BY count(*) desc
LIMIT 5;

Tip: Soft links aren't followed when scanning

Soft links are files which act as a placeholder for a file or directory elsewhere on the disk. This allows the same directory to appear in multiple places without taking up extra disk space.

When Crab scans a soft link, it records the soft link but not the contents of the target.

You can identify soft links from the
mode field in the files table . For directories the mode starts with a lowercase L, 'l', directories have a mode that starts with a 'd', and files have a mode that starts with a hyphen '-'.



Tip: Remote filesystems are not scanned by default

A recursive scan will not include remote filesystems by default. To scan these too, use the ‑mount option when launching Crab.


Tip: If you do this often, make a Crab script

If you have a set of queries you run often, put them together into a Crab script and run it with the %read command.

© 2017 Etia UK