Download Crab

Crab is free for evaluation and . The unlicensed version sends a no-data ping to allow us to count active users. Licensing deactivates this and gets you Next Business Day support. Use is covered by the End User License Agreement

Crab for macOS is compatible with OSX Mavericks (10.8) and higher.

Getting started

Setup

Crab doesn't need to be installed, only downloaded and given an alias

  1. Click the Download button to get 'CrabHome.zip'

  2. You may now need to unzip it by double clicking in Finder, depending on which browser you use

  3. From your download directory copy/paste or drag the 'CrabHome' folder into your home directory. For user johnsmith that would be '/Users/johnsmith'

  4. Now you can run Crab from Terminal by typing its path   ~/CrabHome/CrabExe/crab   but to save typing and get full functionality it's strongly recommended to use a text editor to copy/paste one of the aliases below into the file called '.bash_profile' in your home directory.

    N.B. '.bash_profile' is normally hidden in Open/Save dialogs, as are all filenames that start with a dot. Press Cmd + Shift + .   to reveal them.

If you use Terminal with a light colored background

alias crab='~/CrabHome/CrabExe/rlwrap -m$'\''\n'\'' -b '\'' .'\'' -e '\'''\'' -i -f ~/CrabHome/CrabExe/crab_completions -b '\'' .(,'\'' -H ~/CrabHome/CrabData/crab_history --pass-sigint-as-sigterm ~/CrabHome/CrabExe/crab -wrapped'

If you use Terminal with a dark colored background

alias crab='~/CrabHome/CrabExe/rlwrap -m$'\''\n'\'' -b '\'' .'\'' -e '\'''\'' -i -f ~/CrabHome/CrabExe/crab_completions -b '\'' .(,'\'' -H ~/CrabHome/CrabData/crab_history --pass-sigint-as-sigterm ~/CrabHome/CrabExe/crab -wrapped -color darkmode'


  1. From the command line, apply the updated bash profile

$ source ~/.bash_profile

You're good to go!

First scan


At the command line give Crab the paths for one or more directories you want scanned, their subdirectories will be scanned too. It's best to start by scanning a project directory rather than your whole disk, to get the hang of things. E.g. to scan a directory called myproject:

$  crab ~/myproject

If the path contains spaces, wrap it in single quotes:

$  crab '/Users/johnsmith/my project'

N.B. Ctrl + C will quit Crab or cancel a scan

When the scan is finished you'll get a count of files and directories scanned, and a CRAB> prompt where you can type SQL to report on files, and to process them.

First query


Crab uses SQLite flavor SQL, it's mostly ANSI standard: If you come from SQL Server remember to end each query with a semi-colon.

Crab stores every directory path with a backslash on the end. You can tell by looking at a path whether it is a file or a directory. E.g. '/Users/johnsmith/something' is a file, and '/Users/johnsmith/somethingElse/' is a directory.

To match the path for a directory you need to include the backslash. E.g. to list everything in the 'Backups' directory:

SELECT fullpath FROM files WHERE parentpath = '/Users/johnsmith/MyProject/Backups/'

To search recursively in a directory, use LIKE with a wildcard pattern. E.g this query reports all .docx files anywhere in the 'Documentation' directory or its subdirectories:

SELECT fullpath FROM files WHERE parentpath like '/Users/johnsmith/MyProject/Documentation/%' and extension = '.docx'

In the next section you'll learn about editing multi-line queries, and see some basic query examples.

Longer queries


Typing long commands at the command line can be awkward, whether it's Crab SQL, Bash, or another shell language.

If you type each query on one line you can press up-arrow on the keyboard to step through your query history. But it's usually clearer to lay out each query over multiple lines, like the examples on this website. To do this, write your query in a text editor such as TextEdit then copy and paste to the Crab command line. You can paste multi-line queries in one go, you don't need to do it line-by-line.


This query lists all files with a .zip extension that have "proposal" or "rfp" in their name:

SELECT fullpath FROM files 
WHERE extension = '.zip' and (name like '%proposal%' or name like '%rfp%') ;

If you get too many results add a LIMIT clause to your query, to restrict results to a specified number of rows.

This query lists the five biggest files scanned:

SELECT fullpath, bytes FROM files 
ORDER BY bytes DESC LIMIT 5;

This query sums file sizes and groups the total by the directory they're in, to give the five biggest directories:

SELECT parentpath, SUM(bytes) FROM files 
GROUP BY parentpath ORDER BY SUM(bytes) DESC LIMIT 5;

This query counts files by year and month they were modified

SELECT strftime('%Y-%m',modified) yrmon, count(*) FROM files 
WHERE type = 'f'
GROUP BY yrmon ORDER BY yrmon DESC;;


Continue down this page to get a taste of the different ways you can use Crab, or

  • To see what data is available see Crab tables documentation

  • To get an introduction to the SQLite flavor of SQL see SQLite SELECT (offsite)

  • To see example Crab queries for lots of different scenarios, browse the Use Cases

Query tips

  • Ctrl + C will interrupt a running query

  • Path abbreviations such as . and .. aren't useful for matching paths because we're simply matching text. Instead use SQL string pattern matching with the LIKE keyword and wildcards % and _

  • Use the  %mode  command to change layout of query results. The default display of results is dict format, but it's often convenient to use  %mode line (each output field on a separate line) or  %mode list (comma separated output, with field names on header row)

  • A couple of offsite links with useful info:
        SQLite Core functions
        Advanced SQLite SELECT statements

  • If you want to pattern match with regex, use the match operator in place of like

  • SQLite isn't case sensitive, queries here have some keywords written in upper case to make them more readable.

Exit Crab


At the CRAB> prompt just type

%quit

General tips


  • When editing text at the macOS command line, or at the CRAB> prompt, you can reuse earlier commands by pressing up-arrow on the keyboard. Undo is Ctrl + _   To move cursor word‑left and word‑right use Alt + left and Alt + right

  • If you start Crab without giving it any path to scan, you can continue querying the previous scan results

  • You can press Tab after the first two letters of most keywords to complete the keyword

  • You can write a query in a text editor like TextEdit and paste it into the Crab command line in one chunk, even if it's a multi-line query

  • Check out the "Documentation" menu at the top of this page. It has thorough reference info on Crab tables, launch options, commands and functions

  • To avoid your machine sync'ing to the cloud every time you scan, turn off sync of the CrabHome directory to Dropbox or other cloud services.

  • Use the %show command at the CRAB> prompt to see current Crab settings, including the date and path of the current scan

  • Use the %help command at the CRAB> prompt to get reference information on Crab tables and functions

Instant SQL and Bash scripts


Use the -batch option to run Crab without an interactive Crab prompt; to scan directories or execute SQL from the macOS command line, or from inside a Bash script.

In a Bash script or a scheduled job you'll have to use the full path for Crab, or add directory 'CrabHome/CrabExe/' to the PATH, because aliases only work at the macOS command line.


Unattended jobs

Use the -batch option for unattended jobs, such as a scheduled scan.

E.g. Here's a command which will scan your whole drive to a database file called 'wholedrive.crdb' then exit Crab.


crab -db wholedrive.crdb -batch /

To open an interactive query session on this scan data, start Crab with the same database file, but without a scan path

crab -db wholedrive.crdb



Crab SQL at the macOS Command Line

You can use the -batch option to execute a query at the macOS Command line, we call this "Instant SQL". Results will be returned to the macOS Command Line, allowing you to pipe output to other programs. Write SQL inside single quotes, after any crab options and scan path.

E.g. This example will recursively scan the current directory (denoted by the dot) and return a list of all text files. Output is piped to the Bash 'more' command which shows the output one screenful at a time.

crab -batch . "SELECT fullpath FROM files WHERE extension = '.txt' " | more


Crab SQL in Bash scripts

You can also use the -batch option to execute Crab SQL in a Bash script. Typically you'd do this to include a query as part of a larger job, such as a data import process; or to save as script files any SQL queries you use frequently at the macOS command line.

E.g. The following line in a Bash script is part of a data import: It scans the 'Import' directory and copies any .csv files, removing the header row from each.

crab -db importScan.crdb -batch /somepath/Import/ "SELECT writeln(parentpath || basename || '-headless' || extension, data) FROM fileslines WHERE extension = '.csv' and linenumber > 1"


Note the use of a named scan database file, importScan.crdb, to avoid conflict with scan data from other processes.

If your SQL is too complex to fit on one line you can save it as a Crab query script and load it with the -init option, e.g.

crab -db importScan.crdb -batch -init processImport.crab /somepath/Import/


Here is an example of a simple query that is useful at the macOS command line. It scans the current directory, and returns the fullpath of every object in it (it doesn't scan subdirectories because of -maxdepth 1).

crab -batch -maxdepth 1 . "SELECT fullpath FROM files"


If you use this frequently you can save it in a script file somewhere on your path. Then you can just type the name of the script whenever you want a list of full paths for the contents of your current directory.

Instant SQL tips


  • Crab's -maxdepth parameter specifies how deep you want to scan, -maxdepth 1 means the specified directory only -maxdepth 2 includes children one level down. Default is to scan the whole tree.

  • In batch mode Crab defaults to comma separated output, actually list mode.  You can change this with start up options such as -dict and -column    See "Documentation" Menu, "Launching Crab" for details.

Querying file contents


You can search the contents of files, by querying the fileslines table. The fileslines table has the same fields as the files table, plus two extra fields: data and linenumber. The data field represents the text of all files scanned, one row per line. The linenumber field has the value 1 for the first line of each file.

fileslines is a 'virtual table': it doesn't index file contents, it reads the content of files at the time you query them. Filters on name, fullpath, extension and so on are applied before reading file contents, so files will only be read if they match the criteria.

E.g. This query shows all lines containing 'TODO' or a 'FIXME' in .c files anywhere below the 'MyProject' directory


SELECT fullpath, linenumber, data FROM fileslines
WHERE parentpath like '/Users/johnsmith/myproject/%' and extension = '.c'
and (data like '%TODO%' or data like '%FIXME%');

Crab's default settings are configured for UTF-8 and ASCII text files, any non UTF-8 character will cause your query to skip the rest of the file, so as to exclude contents of binary files from query results.

To search files that have occasional exotic characters change the %encoding setting from utf8:skipfile to utf8:ignore (filters out invalid characters) or utf8:replace (replaces invalid characters by )

Processing files


Use Crab's exec() function to run an operating system command once for each query result row. It's typically used to run a command on the files returned by a query, using fullpath as one of the arguments.

exec's first argument should be the command name, or the full path of a program; followed by the command's arguments. Arguments are automatically wrapped in quotes, so you don't need to worry about paths that contain spaces.

E.g. This query copies files to the 'python backups' directory by runing the bash 'cp' command with the -f option on every python file modified since midnight yesterday.


SELECT exec('cp', '-f', fullpath, '/Users/johnsmith/python backups/')
FROM files
WHERE extension = '.py'
and modified >= date('now', 'start of day', '-1 day');


N.B. File modification dates are picked up at scan time, so you'll need a recent scan for this.

Don't use exec() with commands that give run-time prompts, exec() doesn't display them; e.g. when copying or moving files use the -f option (which means always overwrite) or -n option (which means never overwrite).

E.g. To move all scanned spreadsheets to a directory called 'AllMySpreadsheets' without overwriting anything:

SELECT exec('mv', '-n', fullpath, '/somepath/AllMySpreadsheets/')
FROM files
WHERE extension like '.xls%';



Be aware: A query without a WHERE clause would process every file scanned.

Always try a dummy run: Use the Bash 'echo' command to check the logic is correct for any queries that process files:

SELECT exec('echo', 'cp', '-f', fullpath, '/Users/johnsmith/python backups/')
FROM files
WHERE extension = '.py'
and modified >=
date('now', 'start of day', '-1 day');

Tips for processing files


  • Remember that after you move or delete files, Crab's scan data will be out of date; you'll need to scan again before processing the file in its new location.

  • If files may have been moved or deleted since the last scan, you can avoid error messages by using the pathexists() function to check that the file or directory is still there at the time the query is run. E.g. to move a bunch of files when some have been deleted since the last scan:

    SELECT exec('mv', '-f', fullpath, '/somepath2/DestinationDirectory/')
    FROM files
    WHERE fullpath like '/somepath/%' and extension = '.tmp' and pathexists(fullpath);


  • exec() writes every command that is executed to your screen, together with its output. If you are running many commands this will be slow, and the screen will be cluttered.

    You can avoid this by throwing away the output; use the following command before running the query:

    %output /dev/null

    Any error messages will still go to the screen, as will subsequent CRAB> prompts

    To restore output to the screen do this:

    %output stdout

Whole disk scans


Each time you scan with Crab the previous scan data is overwritten. To avoid overwriting a scan of the whole disk, you should give it a name so it won't be overwritten by the next quick scan of some project directory.

You do this with Crab's -db option. It stores scan results in a Crab database file you specify (by default scan results are stored in 'CrabHome/CrabData/default.crdb')

This command will scan your whole disk and store the scan data in file 'wholedisk.crdb'


crab -batch -db ~/CrabHome/CrabData/wholedisk.crdb /

The -batch option scans without giving a CRAB> prompt afterwards; use it if you want to run the scan as a scheduled job.

To query this scan data from an interactive CRAB> prompt, specify the database file with -db without giving any path to scan

crab -db ~/CrabHome/CrabData/wholedisk.crdb

Whole disk query tips


  • By default Crab won't scan mounted drives it meets during a scan, use the -mount option when launching Crab to change this.

  • If you have millions of entries in the scan data, you can make queries faster by avoiding wildcards on the left side of search strings, so Crab can use indexes

    E.g. this is slow

    SELECT fullpath FROM files
    WHERE fullpath like '%/myproject/%Budget%.xls';

    This is fast

    SELECT fullpath FROM files
    WHERE fullpath like '/Users/johnsmith/myproject/%Budget%.xls';

Comparing contents


SQL's set based logic is well suited to comparing contents of directories and files.

E.g. Here is a query that lists any files in directory A that aren't in directory B


SELECT a.name FROM files a
WHERE a.parentpath = '/Users/
johnsmith/A/'
and a.name not in (SELECT b.name FROM files b
WHERE b.parentpath = '/Users/
johnsmith/B/');

For more examples like this, see the "Compare Directories" page on this website, under the "Use Case" menu.

This query lists any lines in fileA.txt that aren't in fileB.txt

SELECT a.data FROM fileslines a
WHERE a.fullpath = '/Users/
johnsmith/fileA.txt'
and a.data not in (SELECT b.data FROM fileslines b
WHERE b.fullpath = '/Users/
johnsmith/fileB.txt');

© 2017 Etia UK