First scan your whole drive. This might take up to half an hour, depending on size and type. The following command will do that, saving the scan data to a file called wholedisk.crdb.

$ crab -db wholedisk.crdb /

Next time you start Crab, you can query this same data without scanning again; use the same -db option, and no scan path, e.g.

$ crab -db wholedisk.crdb

The crab index file, a .crdb file, can be quite large, typically about 1% of the size of the scanned drive.

N.B. Changes you make to the filesystem, such as files or directories you delete, won't be reflected in Crab data until the next scan. If working with old scan data, use the
pathexists() function to check whether a file or directory is present at query run time.

Which files are using most disk space?

Find the ten biggest files

SELECT fullpath, bytes/1E9 as GB FROM files
ORDER by bytes DESC LIMIT 10;

If paths are long you may want to use the %mode command to put each query result field on its own line

%mode line

N.B You can change back to the previous output layout, dictionary mode, like this:

%mode dict

Find duplicate files bigger than 100MB

SELECT f1.bytes/1e6 as MB, f1.fullpath, f2.fullpath
FROM files f1 join files f2 on f1.fileid > f2.fileid
and MB > 100
and = and sha1(f1.fullpath) = sha1(f2.fullpath);

Don't delete both of them!

Find which filetypes are using most space

SELECT extension, sum(bytes)/1e9 as GB, max(bytes)/1e9, fullpath
FROM files
GROUP BY extension

We're summing file size by extension, and reporting the biggest file for each.

For each extension we return total file size in GB, maximum file size in GB and the
fullpath that gives the max(bytes) function its value.

Which directories are using most disk space?

Find biggest directories by total file size

SELECT parentpath, sum(bytes)/1e9 as GB FROM files
GROUP BY parentpath ORDER BY sum(bytes) DESC LIMIT 5;

Find likely-duplicate directories

This query finds candidate duplicate directories by looking for directories that contain the same total file size, and same number of files.

SELECT p1.pp, p2.pp, p1.size/1e9
(SELECT parentpath as pp, sum(bytes) ||':'|| count(*) as sig, sum(bytes) as size FROM files
GROUP BY parentpath ) AS p1
(SELECT parentpath as pp, sum(bytes) ||':'|| count(*) as sig FROM files
GROUP BY parentpath ) AS p2
ON p1.sig = p2.sig and p1.pp < p2.pp

The sig field is the 'signature' for each directory, calculated from its total file size and number of files.

By restricting the query results to the 10 largest directories we reduce the chance of an accidental match, but you should still compare the candidates file-by-file before deletion.

Delete manually

You can run shell commands to delete or move objects, create directories etc without leaving Crab. Just put an exclamation mark at the start of the line and Crab will send the rest of the line to the shell.

Delete files manually

You can use the 'rm' command together with the fullpath of a file you want to delete, but the path and filename must be put in quotes. Without the quotes you'll have problems with paths that contain spaces

To save typing, you can copy a fullpath from your query results by highlighting it with the mouse and typing Cmd+C, Cmd+V.

Be careful: rm will delete files immediately, without putting them in the Trash, there is no

CRAB> !rm "/Users/johnsmith/Assorted Files/File_which_I_am_100_percent_sure_I_dont_need"

Remember that files you delete won't be removed from query results until you scan again.

Delete directory trees manually

You absolutely must put quotes around the fullpath of the directory you want to delete, or a typing mistake could delete everything on your filesystem.

This example deletes the 'MyProject' directory, and every file and directory inside it.

CRAB> !rm -r "/Users/johnsmith/MyProject/"

The exclamation mark tells Crab to send the whole line to the shell, and the -r option tells the 'rm' command to delete recursively.

This will prompt you for each file that doesn't have appropriate permissions. It can be tedious if deleting a bunch of files where each one needs your approval.

There is a more dangerous form of the command that deletes files irrespective of their permissions, without asking you for confirmation. This uses the -f option in addition to the -r option

CRAB> !rm -rf "/Users/johnsmith/myproject/"

Delete using exec()

Delete multiple files using exec()

Crab's exec() function runs OS commands on files you specify. If you know what you're doing and don't want to copy files before deleting them, use the 'rm' command with its -f option. This causes 'rm' to delete files without confirmation, whatever their permissions. Test that the query logic is correct before using it for deletion.

WARNING: If a delete query has no WHERE clause it will delete every file that was scanned.

E.g. Recursively delete all files beneath the 'MyProject' directory > 100MB in size

SELECT exec('rm', '-f', fullpath) FROM files
WHERE fullpath LIKE '/Users/johnsmith/myproject/%' and type = 'f' and bytes>100e6;

The -f option is necessary because exec() doesn't play nicely with commands that give run-time prompts, it displays the output of each command after it has completed, so you won't see the prompt.

Irreversibly delete multiple directories using exec()

Use the 'rmdir' command to delete empty directories.

E.g. Recursively delete all empty directories beneath the 'MyProject' directory

SELECT exec('rmdir',fullpath) FROM files
WHERE fullpath LIKE '/Users/johnsmith/MyProject/%' and type = 'd'
and fullpath not in (SELECT parentpath FROM files);

Tip: Suppress exec() echo to screen

By default exec() outputs every command executed to the screen. If you are running an exec() function hundreds of thousands or millions of times this will be slow, and the screen will be a mess.

To discard the output use the following command before running the query:

%output /dev/null

Any error messages will still go to the Terminal window, as will subsequent CRAB> prompts

To switch output back to the screen do this:

%output stdout

If you want to keep a log of the output rather than discard it, use a filename in place of /dev/null e.g.

%output '/somepath/MyOutputLog.txt'

Tip: Memory constraints

The total number of commands that can be executed by a query is limited by memory constraints.

On macOS we've tested with hundreds of millions of rows without any problems. On Windows the limit is currently lower, a few million rows.

We use Google and Clicky Analytics to count how many people visit, and how many come back.
For now, if you don't want a cookie, please browse in incognito mode.

© 2019 Etia UK