Download Crab

Crab is free for evaluation and . The unlicensed version sends a no-data ping to allow us to count active users. Licensing deactivates this and gets you Next Business Day support. Use is covered by the End User License Agreement

Crab for Windows is compatible with Windows 7, 8 and 10.

Getting started

Unzip

Crab doesn't need to be installed, only unzipped.

  1. Click the Download button to get 'crab.zip'

  2. Open 'crab.zip' and copy/paste or drag the 'CrabHome' folder to where you want it. Most people put it in their home directory, for user johnsmith that would be 'C:\Users\johnsmith\CrabHome'. It's best to choose somewhere that isn't synced to cloud services like Dropbox, to avoid syncing scan data.

You're good to go!

First scan


Start the command line, e.g. from the Windows 7 Start button enter cmd in the "Search programs and files" box. If you prefer PowerShell, you can use that instead.

To start Crab you'll need to use its full path, or add Crab to your system path (see General tips). You'll need to give Crab the paths for one or more directories you want scanned, their subdirectories will be scanned too. It's best to start by scanning a project directory rather than your whole disk, to get the hang of things.

The Crab executable is inside the CrabExe subdirectory of CrabHome. To scan a directory called 'MyProject', user johnsmith would type this at the Windows command line:

C:\>  C:\Users\johnsmith\CrabHome\CrabExe\crab    C:\Users\johnsmith\MyProject

If the path contains spaces, wrap it in double quotes:

C:\>  C:\Users\johnsmith\CrabHome\CrabExe\crab    "C:\Users\johnsmith\My Project"

N.B. Ctrl + Break will quit Crab or cancel a scan

When the scan is finished you'll get a count of files and directories scanned, and a CRAB> prompt where you can type SQL to report on files, and to process them.

First query


Crab uses SQLite flavor SQL, it's mostly ANSI standard: If you come from SQL Server remember to end each query with a semi-colon.

Crab stores every directory path with a backslash on the end. You can tell by looking at a path whether it is a file or a directory. E.g. 'C:\Users\johnsmith\something' is a file, and 'C:\Users\johnsmith\somethingElse\' is a directory.

To match the path for a directory you need to include the backslash. E.g. to list everything in the 'Backups' directory:

SELECT fullpath FROM files WHERE parentpath = 'C:\Users\johnsmith\MyProject\Backups\'

To search recursively in a directory, use LIKE with a wildcard pattern. E.g this query reports all .docx files anywhere in the 'Documentation' directory or its subdirectories:

SELECT fullpath FROM files WHERE parentpath like 'C:\Users\johnsmith\MyProject\Documentation\%' and extension = '.docx'

In the next section you'll learn about editing multi-line queries, and see some basic query examples.

Longer queries


Typing long commands at the command line can be awkward, whether it's Crab SQL, Powershell, or another shell language.

If you type each query on one line you can press up-arrow on the keyboard to step through your query history. But it's usually clearer to lay out each query over multiple lines, like the examples on this website. To do this, write your query in a text editor such as Notepad then copy and paste to the Crab command line. You can paste multi-line queries in one go, you don't need to do it line-by-line.

Tip: To copy and paste in the command line, Windows 7 and 8 require you to use the mouse and the right click menu. Windows 10 doesn't have this problem. If you're on Windows 7 or 8 we recommend setting Windows "QuickEdit Mode", see General tips.


This query lists all files with a .zip extension that have "proposal" or "rfp" in their name:

SELECT fullpath FROM files 
WHERE extension = '.zip' and (name like '%proposal%' or name like '%rfp%') ;

If you get too many results add a LIMIT clause to your query, to restrict results to a specified number of rows.

This query lists the five biggest files scanned:

SELECT fullpath, bytes FROM files 
ORDER BY bytes DESC LIMIT 5;

This query sums file sizes and groups the total by the directory they're in, to give the five biggest directories:

SELECT parentpath, SUM(bytes) FROM files 
GROUP BY parentpath ORDER BY SUM(bytes) DESC LIMIT 5;

This query counts files by year and month they were modified

SELECT strftime('%Y-%m',modified) yrmon, count(*) FROM files 
WHERE type = 'f'
GROUP BY yrmon ORDER BY yrmon DESC;


Continue down this page to get a taste of the different ways you can use Crab, or

  • To see what data is available see Crab tables documentation

  • To get an introduction to the SQLite flavor of SQL see SQLite SELECT (offsite)

  • To see example Crab queries for lots of different scenarios, browse the Use Cases

Query tips

  • Ctrl + C will interrupt a running query

  • Path abbreviations such as . and .. aren't useful for matching paths because we're simply matching text. Instead use SQL string pattern matching with the LIKE keyword and wildcards % and _

  • Use the  %mode  command to change layout of query results. The default display of results is dict format, but it's often convenient to use  %mode line (each output field on a separate line) or  %mode list (comma separated output, with field names on header row)

  • A couple of offsite links with useful info:
        SQLite Core functions
        Advanced SQLite SELECT statements

  • If you want to pattern match with regex, use the match operator in place of like

  • SQLite isn't case sensitive, queries here have some keywords written in upper case to make them more readable.

Exit Crab


At the CRAB> prompt just type

%quit

General tips


  • When editing text at the Windows command line, or at the CRAB> prompt, you can reuse earlier commands by pressing up-arrow on the keyboard

  • If you start Crab without giving it any path to scan, you can continue querying the previous scan results

  • You can press Tab after the first two letters of most keywords to complete the keyword

  • You can write a query in a text editor like Notepad and paste it into the Crab command line in one chunk, even if it's a multi-line query

  • To save having to type the whole path for crab, just add the CrabExe directory to your system PATH. You can do this from Windows Control Panel: Edit environment variables. For example if John Smith put CrabHome in his home directory he'd add this to his system PATH:


    C:\Users\johnsmith\CrabHome\CrabExe\

  • Check out the "Documentation" menu at the top of this page. It has thorough reference info on Crab tables, launch options, commands and functions

  • If non-ASCII characters are not displaying properly in your command line, use the Windows chcp command to change the code page, e.g. to work with utf8 characters use the command chcp 65001 before starting Crab.

  • To copy and paste in the Windows command line, there are some settings that make life easier:


  • Use the %show command at the CRAB> prompt to see current Crab settings, including the date and path of the current scan

  • Use the %help command at the CRAB> prompt to get reference information on Crab tables and functions

Instant SQL and Windows batch files


Use the -batch option to run Crab without an interactive Crab prompt; to scan directories or execute SQL from the Windows command line or from inside a Windows batch file.

If you're not running the batch job under your own account, you may need to use a full path for the crab executable, or add directory 'CrabHome\CrabExe\' to the PATH.


Unattended jobs

Use the -batch option to create unattended jobs, such as a scheduled scan.

E.g. Here's a command which will scan the whole C: drive to a database file called 'cdrive.crdb' then exit Crab.


crab -db cdrive.crdb -batch C:\

To open an interactive query session on this scan data, start Crab with the same database file, but without a scan path

crab -db cdrive.crdb



Crab SQL at the Windows Command Line

You can use the -batch option to execute a query at the Windows Command line, we call this "Instant SQL". Results will be returned to the Windows Command Line, allowing you to pipe output to other programs. Write SQL inside double quotes, after any crab options and scan path.

E.g. This example will recursively scan the current directory (denoted by the dot) and return a list of all text files. Output is piped to the Windows 'more' command which shows the output one screenful at a time.

crab -batch . "SELECT fullpath FROM files WHERE extension = '.txt' " | more


Crab SQL in Windows batch files

You can also use the -batch option to execute Crab SQL in a Windows batch file. Typically you'd do this to include a query as part of a larger job, such as a data import process; or to save as batch files any SQL queries you use frequently at the Windows command line.

E.g. The following line in a Windows batch file is part of a data import: It scans the 'Import' directory and copies any .csv files, removing the header row from each.

crab -db importScan.crdb -batch C:\somepath\Import\ "SELECT writeln(parentpath || basename || '-headless' || extension, data) FROM fileslines WHERE extension = '.csv' and linenumber > 1"


Note the use of a named scan database file, importScan.crdb, to avoid conflict with scan data from other processes.

If your SQL is too complex to fit on one line you can save it as a Crab query script and load it with the -init option, e.g.

crab -db importScan.crdb -batch -init processImport.crab C:\somepath\Import\


Here is an example of a simple query that is useful at the Windows command line. It scans the current directory, and returns the fullpath of every object in it (it's not recursive because of -maxdepth 1).

crab -batch -maxdepth 1 . "SELECT fullpath FROM files"


If you use this frequently you can save it in a batch file somewhere on your path. Then you can just type the name of the batch file whenever you want a list of full paths for the contents of your current directory.

Instant SQL tips

  • The % character has a special meaning in Windows batch files. If you use instant SQL in a Windows batch file you must double them up, even inside quotes e.g.

        crab -batch . "select fullpath from files where name like '%%June%%' "

  • Crab's -maxdepth parameter specifies how deep you want to scan, -maxdepth 1 means the specified directory only -maxdepth 2 includes children one level down. Default is to scan the whole tree.

  • In batch mode Crab defaults to comma separated output, actually %mode list You can change this with start up options such as -dict and -column    See "Documentation" Menu, "Launching Crab" for details.

Querying file contents


You can search the contents of files, by querying the fileslines table. The fileslines table has the same fields as the files table, plus two extra fields: data and linenumber. The data field represents the text of all files scanned, one row per line. The linenumber field has the value 1 for the first line of each file.

fileslines is a 'virtual table': it doesn't index file contents, it reads the content of files at the time you query them. Filters on name, fullpath, extension and so on are applied before reading file contents, so files will only be read if they match the criteria.

E.g. This query shows all lines containing 'TODO' or a 'FIXME' in .c files anywhere below the 'MyProject' directory


SELECT fullpath, linenumber, data FROM fileslines
WHERE parentpath like 'C:\Users\johnsmith\MyProject\%' and extension = '.c'
and (data like '%TODO%' or data like '%FIXME%');


Crab's default settings are configured for UTF-8 and ASCII text files, any non UTF-8 character will cause your query to skip the rest of the file, so as to exclude contents of binary files from query results.

To search files that have occasional exotic characters change the %encoding setting from utf8:skipfile to utf8:ignore (filters out invalid characters) or utf8:replace (replaces invalid characters by )

Processing files


Use Crab's exec() function to run an operating system command once for each query result row. It's typically used to run a command on the files returned by a query, using fullpath as one of the arguments.

exec's first argument should be the command name, or the full path of a program; followed by the command's arguments. Arguments are automatically wrapped in quotes, so you don't need to worry about paths that contain spaces.

E.g. This query copies files to the 'python backups' directory by running the Windows 'copy' command with the /y option on every python file modified since midnight yesterday.

    SELECT exec('copy', '/y', fullpath, 'C:\somepath\python backups\')
FROM files WHERE extension = '.py'
and modified >= date('now', 'start of day', '-1 day')


N.B. File modification dates are picked up at scan time, so you'll need a recent scan for this.

Don't use exec() with commands that give run-time prompts, exec() doesn't display them. For example when copying or moving files use the '/y' option (meaning yes go ahead and overwrite without prompting), or use Crab's pathexists() function to avoid name collisions altogether.

E.g. To move all scanned spreadsheets to a directory called 'AllMySpreadsheets' without overwriting anything:

SELECT exec('move', fullpath, 'C:\somepath\AllMySpreadsheets\')
FROM files
WHERE extension like '.xls%'
and not pathexists('C:\somepath\AllMySpreadsheets\' || name);


Here we are using SQLite string concatenation, ||, to join the file name to the target directory, then using Crab's pathexists() function to see if a file with that name already exists at query run time. The 'move' is only executed if no such file exists.

Be aware: A query without a WHERE clause would process every file scanned.

Always try a dummy run: Use the Windows 'echo' command to check the logic is correct for any queries that process files:

SELECT exec('echo', 'copy', '/y', fullpath, 'C:\somepath\python backups\')
FROM files WHERE extension = '.py'
and modified >= date('now', 'start of day', '-1 day')

Tips for processing files

  • Remember that after you move or delete files, Crab's scan data will be out of date; you'll need to scan again before processing the file in its new location.

  • Crab scans hidden files too, but many Windows commands won't operate on them. You can use  WHERE mode not like '%H%'  to exclude them from processing, or  exec('attrib','-s','-h', fullpath)   to unhide them. See Documentation: Tables to read about the mode field

  • If files may have been moved or deleted since the last scan, you can avoid error messages by using the pathexists() function to check that the file or directory is still there at the time the query is run. E.g. to move a bunch of files when some have been deleted since the last scan:

    SELECT exec('move', '/y', fullpath, 'C:\somepath2\DestinationDirectory\')
    FROM files
    WHERE fullpath like 'C:\somepath\%' and extension = '.tmp' and pathexists(fullpath);


  • exec() writes every command that is executed to your screen, together with its output. If you are running many commands this will be slow, and the screen will be cluttered.

    You can avoid this by throwing away the output; use the following command before running the query:

    %output NUL

    Any error messages will still go to the screen, as will subsequent CRAB> prompts

    To restore output to the screen:

    %output stdout

  • Crab successfully scans and reports files that have very long paths or control characters in their names. However some Windows commands cannot handle them. Crab provides a shortpath() function that converts any path to 8.3 format, allowing you to get around these restrictions. E.g. to delete a file with a very long path

    SELECT exec('del', shortpath(fullpath))
    FROM files
    WHERE fullpath like '%master\node_modules\%\ansi-regex\package.json';

    Always test which files match the WHERE clause before running commands on them.

Whole disk scans


Each time you scan with Crab the previous scan data is overwritten. To avoid overwriting a scan of the whole disk, you should give it a name so it won't be overwritten by the next quick scan of some project directory.

You do this with Crab's -db option. This stores scan results in a Crab database file you specify. By default, scan results are stored in 'CrabHome\CrabData\default.crdb'

So run your whole disk scan like this:

    crab -batch -db C:\Users\johnsmith\CrabHome\CrabData\wholedisk.crdb  c:\

The -batch option scans without giving a CRAB> prompt afterwards, as required for a scheduled job.

To query this scan data from an interactive CRAB> prompt, just use -db to specify the database file, without giving any path to scan

crab -db C:\Users\johnsmith\CrabHome\CrabData\wholedisk.crdb

Whole disk query tips


  • By default Crab won't scan mounted drives it meets during a scan, use the -mount option when launching Crab to change this.

  • If you have millions of entries in the scan data, you can make queries faster by avoiding wildcards on the left side of search strings, so Crab can use indexes

    E.g. this is slow

    SELECT fullpath FROM files
    WHERE fullpath like '%\myproject\%Budget%.xls';


    This is fast

    SELECT fullpath FROM files
    WHERE fullpath like 'C:\Users\johnsmith\myproject\%Budget%.xls';


Comparing contents


SQL's set based logic is well suited to comparing contents of directories and files.

E.g. Here is a query that lists any files in directory A that aren't in directory B


SELECT a.name FROM files a
WHERE a.parentpath = 'C:\Users\
johnsmith\A\'
and a.name not in (SELECT b.name FROM files b
WHERE b.parentpath = 'C:\Users\
johnsmith\B\') ;

For more examples like this, see the "Compare Directories" page on this website, under the "Use Case" menu.


This query lists any lines in fileA.txt that aren't in fileB.txt

SELECT a.data FROM fileslines a
WHERE a.fullpath = 'C:\Users\
johnsmith\fileA.txt'
and a.data not in (SELECT b.data FROM fileslines b
WHERE b.fullpath = 'C:\Users\
johnsmith\fileB.txt') ;

© 2017 Etia UK