Crab for macOS Cheat Sheet

Scan

Specify the files or directories to scan. Don't use wildcards. Scans are recursive. Put single quotes around paths containing spaces.

$  crab ~/somedirectory '/otherdirectory/sub dir'

$  crab  data1.csv

To query data from the previous scan, start Crab without any scan path

$  crab 

To scan only a specific directory and no deeper

$  crab -maxdepth 1  /some/directory

To save scan results in a named .crdb file use ‑db

$  crab -db databasefile.crdb  /some/directory

To query scan results from a .crdb file use ‑db and no scan path

$  crab -db databasefile.crdb  
  • Cancel a scan with Ctrl + Break
  • Check path and date of current scan data with %show
  • When Crab scans a soft link, it records the soft link object not the contents of the target.
  • When Crab scans a hard link (not common except in backups), it is recorded as a file.
    Multiple hard links to the same file are treated as different files

Find files and directories

FILES TABLE   One entry for each file and directory scanned
fileid  (primary key) Row number: Unique for each item scanned, e.g. 42 int
name Item name e.g. 'Hei.ttf' text
bytes Item size in bytes e.g. 7502752. Divide by 1e3 for KB, 1e6 for MB, 1e9 for GB, 1e12 for TB int
depth How far scan recursed to find the item. 0 for the scan directory, 1 for the directory's children int
accessed Datetime item was accessed, e.g. '2016-09-18T09:28:07' text
modified Datetime item was modified, e.g. '2015-10-27T22:17:41' text
basename Item name without path or extension, e.g. 'Hei' text
extension Item extension including the dot, e.g. '.ttf' text
type Item type, 'f' for file or 'd' for directory text
mode Further type info and permissions, e.g. 'drwxr-xr-x' text
parentpath Absolute path of directory containing the item, e.g. '/Library/Fonts/' text
fullpath  (unique) Parentpath of the item concatenated with its name, e.g. '/Library/Fonts/Hei.ttf'
Directory paths end with a backslash, file paths do not.
text

Pattern match text fields using like, it's not case sensitive. % matches any text, including zero length.

LIMIT clause restricts results to first n rows.

SELECT fullpath FROM files
WHERE fullpath like '%tps%report%' and type = 'f'
  and modified > '2009-02-19' and bytes < 10e6 
LIMIT 20;

For Regex patterns use match, it's case sensitive

SELECT fullpath FROM files
WHERE name match '^[a-fA-F0-9]+$'  and length(name)=50
  and parentpath like '%/csvfiles/';

Find contents of a specific directory

... WHERE parentpath = '/somepath/somedirectory/'

Find contents of a directory and all its subdirectories

... WHERE parentpath like '/somepath/%' 

Find directories by their contents

        By individual properties of files contained

SELECT parentpath FROM files
WHERE extension in ('.py', '.c');

        By collective properties of files contained

SELECT parentpath, count(*) FROM files
WHERE extension = '.xlsx' 
GROUP BY parentpath
HAVING count(*) > 10;
  • End each query with a semicolon ;
  • It's often easier to write queries in a text editor, then copy and paste them to the Crab command line. You can paste multi-line queries in one go.
  • Queries run faster if you avoid wildcards on the left side of search patterns.
  • Cancel a query with Ctrl + C  If this drops you to the macOS command line press up-arrow to recall the command which launched Crab, and start Crab without a scan path to continue querying the same data.
  • To change layout of query results         
    %mode line One output field per line
    %mode list Comma separated output fields
    Toggle header row with   %header on   and   %header off
    %mode dict Dictionary style output (the default)

Process files

exec() runs a command for each query result row.

Use exec() to run programs and commands on files using their fullpath

SELECT exec(<command string>)
FROM files
WHERE <criteria for files to process>

E.g.


SELECT exec('mv', '-n', fullpath, parentpath || replace(name,'V1','V2'))
FROM files
WHERE name like '%V1%' and parentpath like '/oldproject/%';

Test query logic with 'echo'

SELECT exec('echo', 'mv', '-n', fullpath, parentpath || replace(name,'V1','V2'))
FROM files
WHERE name like '%V1%' and parentpath like '/oldproject/%';
  • An exec() query with no WHERE clause will process every file and directory scanned
  • Don't use exec() with commands that show run-time prompts, e.g for overwrite confirmation. Crab doesn't show them.
  • If query results include duplicate rows use GROUP BY to remove them, to avoid duplicate calls to exec()
  • Suppress output, to avoid exec() or writeln() cluttering the screen    %output NUL
    Switch output on again    %output stdout
    Send output to file    %output '/somepath/somefile.txt'

Move or copy files without renaming them

Substitute 'mv' or 'cp' as required

Overwrite allowed:
Use '-f' to overwrite without prompting

SELECT exec('mv', '-f', fullpath, '/somepath/DestinationDirectory/') 
FROM files
WHERE <criteria for files to move>

E.g.


SELECT exec('cp', '-f', fullpath, '/PSD/Dropbox/pdflibrary/')
FROM files
WHERE extension = '.pdf'
  and parentpath like '/PSD/Documentation/%';

Don't overwrite:
Use '-n' to prevent moving files when filename already exists at destination

SELECT exec('mv', '-n', fullpath, '/somepath/DestinationDirectory/') 
FROM files
WHERE <criteria for files to move>

Move or copy files and rename them

Concatenate the new name to the destination path

Overwrite allowed:
Use '-f' to overwrite without prompting

SELECT exec('mv', '-f', fullpath, '/somepath/DestinationDirectory/' || NewName) 
FROM files
WHERE <criteria for files to process>

Don't overwrite:
Use '-n'

SELECT exec('mv', '-n', fullpath, '/somepath/DestinationDirectory/' || NewName) 
FROM files
WHERE <criteria for files to process> 

E.g.


SELECT exec('cp', '-n', fullpath, '/pythonBackups/' || name || '.bak') 
FROM files
WHERE extension = '.py'
  and modified >= date('now','start of day');
  • This example queries the 'modified' field: the file modification date as stored in Crab's index. If you need to identify recently modified files be sure to query recent scan data.

Delete files

Use '-f' to delete without confirmation prompts, Crab doesn't show them

SELECT exec('rm', '-f', fullpath)
FROM files
WHERE type = 'f'
  and <criteria for files to delete> 

Rename files or directories in place

Append the new name to the current parentpath

Don't overwrite:
Use '-n'

SELECT exec('mv', '-n', fullpath, parentpath || replace(name, ' ', '_'))
FROM files
WHERE name like '% %' and type = 'f';
  • If renaming directories use  ORDER BY depth DESC  to rename child items before renaming their parents.

Search file contents

FILESLINES TABLE   One entry for each line in each file scanned
Fields in both the fileslines table  and  the files table:
fileid, name, bytes, depth, accessed, modified, basename, extension, type, mode, parentpath, fullpath
Fields unique to the fileslines table:
linenumber linenumber within file, starts at 1 for each file int
data Text content of line text

Find all the lines that match a pattern in a specific file

SELECT fullpath, linenumber, data FROM fileslines
WHERE  fullpath = '/somepath/somedirectory/somefile.txt' 
  and data like '%TODO%';

Find lines that match criteria in a set of files

SELECT fullpath, linenumber, data FROM fileslines
WHERE extension = '.csv' and linenumber = 1 and data match '[A-Za-z]';

Count lines that match per file

SELECT fullpath, count(*) FROM fileslines
WHERE extension = '.c' and data like '%TODO%'
GROUP BY fullpath;

Count lines that match per directory

SELECT fullpath, count(*) FROM fileslines
WHERE extension = '.c' and data like '%TODO%'
GROUP BY parentpath;
  • File contents are not indexed
  • Querying file contents causes the files matched by the WHERE clause to be opened and read at query runtime.
  • To query files that contain non-UTF8 characters change the encoding setting. Some examples:
    %encoding UTF8:replace Substitutes a square for any non-UTF8 characters
    %encoding cp1252:replace Code page 1252
    %encoding utf8:skipfile (Default) Skips files that contain non-UTF8 characters, to avoid returning binary file contents

Batch scans and queries

Use ‑batch option to scan or run queries without an interactive Crab command prompt.
Use at bash command line, and in shell script files.

Scan C drive to a named .crdb file in batch mode
(scan and then exit crab)

$ crab -batch -db cdrive.crdb / 

To query in batch mode, enter SQL in double quotes at the end of the command line

$ crab -batch  -maxdepth 1 . "SELECT fullpath FROM files WHERE type = 'f'"


Scan current directory and list fullpath of every file


Use batch mode to call Crab in shell scripts

crab ‑batch ~/Downloads "SELECT name FROM files WHERE type = 'f' and mode like '%%x%%'";

Line from shell script that scans Downloads directory and subdirectories, and lists all executable files
  • Batch queries return results in list mode layout. Use ‑dict to change to dict mode, or ‑separator "|" to use a pipe character to separate fields in place of comma
  • A scan run from a batch file without a specified .crdb file will store the scan data in default.crdb

© 2017 Etia UK