First scan your whole drive. This might take up to half an hour, depending on size and type. The following command will do that, saving the scan data to a file called 'wholedisk.crdb'


C:> crab -db wholedisk.crdb  C:\


Next time you start Crab, you can query this same data without scanning again; use the same -db option, and no scan path, e.g.


C:> crab -db wholedisk.crdb


The crab index file, a .crdb file, is typically about 1% of the size of the scanned drive.

N.B Changes you make to the filesystem, such as files or directories you delete, won't be reflected in Crab data until the next scan. If working with old scan data, use the pathexists() function to check whether a file or directory is present at query run time.

Which files are using most disk space?

Find the ten biggest files


SELECT fullpath, bytes/1E9 as GB FROM files
ORDER by bytes DESC LIMIT 10;

If paths are long you may want to use the %mode command to put each query result field on its own line

%mode line

You can change back to the default output layout, dictionary mode, like this:

%mode dict

Find duplicate files bigger than 100MB


SELECT f1.bytes/1e6 as MB, f1.fullpath, f2.fullpath
FROM files f1 join files f2 on f1.fileid > f2.fileid
and MB > 100
and f1.name = f2.name and sha1(f1.fullpath) = sha1(f2.fullpath);

Don't delete both of the pair!

Find which filetypes are using most space


SELECT extension, sum(bytes)/1e9 as GB, max(bytes)/1e9, fullpath
FROM files
GROUP BY extension
ORDER BY GB DESC
LIMIT 10;

We're summing file size by extension, and reporting the biggest file for each.

For each extension we get total file size in GB, maximum file size in GB and the fullpath that gives the max(bytes) function its value.

Which directories are using most disk space?

Find biggest directories by total file size


SELECT parentpath, sum(bytes)/1e9 as GB FROM files
GROUP BY parentpath ORDER BY sum(bytes) DESC LIMIT 5;

Find likely-duplicate directories


This query finds candidate duplicate directories by looking for directories that contain the same total file size, and same number of files.


SELECT p1.pp, p2.pp, p1.size/1e9
FROM
(SELECT parentpath as pp, sum(bytes) ||':'|| count(*) as sig, sum(bytes) as size FROM files
GROUP BY parentpath ) AS p1
JOIN
(SELECT parentpath as pp, sum(bytes) ||':'|| count(*) as sig FROM files
GROUP BY parentpath ) AS p2
ON p1.sig = p2.sig and p1.pp < p2.pp
ORDER BY p1.size DESC
LIMIT 10;

The sig field is the 'signature' for each directory, calculated from its total file size and number of files.

By restricting the query results to the 10 largest directories we reduce the chance of an accidental match, but you should still compare the candidates file-by-file before deletion.

Delete manually

You can run Windows commands to delete or move objects, create directories etc without leaving Crab. Just put an exclamation mark at the start of the line and Crab will send the rest of the line to the shell.

Delete files manually


You can use the 'del' command together with the fullpath of a file you want to delete, but the path and filename must be put in quotes. Without the quotes you'll have problems with paths that contain spaces.

Before Windows 10 it was difficult to copy and paste multi-line text, such as long paths, because newline characters were copied for wrapped lines. However, you could get around this by holding down shift when right clicking to copy text - this removes all new lines from the copied text.

Be careful: del will delete files immediately, without putting them in the Trash, there is no Undo

    CRAB> !del /f "C:\somepath\File_which_I_am_100_percent_sure_I_dont_need"

The /f option forces deletion of files with read-only permissions without further confirmation. Crab doesn't display these confirmation prompts.

Remember that files you delete won't be removed from query results until you scan again.

Delete directory trees manually


You absolutely must put quotes around the fullpath of the directory you want to delete, or a typing mistake could delete everything on your filesystem.

This example deletes the 'MyProject' directory, and every file and directory inside it.

    CRAB> !rmdir /s "C:\Users\johnsmith\MyProject\"

The exclamation mark tells Crab to send the whole line to the shell, and the /s option tells the 'rmdir' command to delete folders and files recursively.

You will be prompted Y/N for each folder that isn't empty. This can be tedious if deleting a lot of folders where each one needs your approval.

There is a still more dangerous form of the command that deletes contents irrespective of their permissions, without asking you for confirmation. This uses the /q option (for "quiet") in addition to the /s option

CRAB> !rmdir /q /s "C:\Users\johnsmith\MyProject\"

Delete using exec()

Delete multiple files using exec()

Crab's exec() function runs OS commands on files you specify. If you know what you're doing and don't want to copy files before deleting them, use the 'del' command with its /f option. This causes 'del' to delete files without confirmation, whatever their permissions. Test that the query logic is correct before using it for deletion.

WARNING: If a delete query has no WHERE clause it will delete every file that was scanned.

E.g. Recursively delete all files beneath the 'MyProject' directory > 100MB

SELECT exec('del', '/f', fullpath) FROM files 
WHERE parentpath like 'C:\Users\johnsmith\MyProject\%' and type = 'f' and bytes>100e6;

The /f option is necessary because exec() doesn't play nicely with commands that give run-time prompts, it displays the output of each command after it has completed, so you won't see the prompt.

Irreversibly delete multiple directories using exec()

Use the 'rmdir' command to delete empty directories.

E.g. Recursively delete all empty directories beneath the 'MyProject' directory

SELECT exec('rmdir', fullpath) FROM files 
WHERE fullpath like 'C:\Users\johnsmith\MyProject\%' and type = 'd'
and fullpath not in (SELECT parentpath FROM files);

Tip: Suppress exec() echo to screen

By default exec() outputs every command executed to your screen. If you are running an exec() function hundreds of thousands or millions of times this will be slow, and the screen will be a mess.

To discard the output use the following command before running the query:

    %output NUL

Any error messages will still go to the Command Line window, as will subsequent CRAB> prompts

To switch output back to the screen do this:

%output stdout


If you want to keep a log of the output rather than discard it, use a filename in place of NUL, e.g.

%output "C:\somepath\MyOutputLog.txt"




Tip: Memory constraints

The total number of commands that can be executed by a query is limited by memory constraints.

On macOS we've tested with hundreds of millions of rows without any problems. On Windows the limit is currently lower, a few million rows.

© 2017 Etia UK