Data Tables

FILES TABLE


Details of every item scanned, file or directory

COLUMN DESCRIPTION
fileid  (primary key) files table row number, a unique id for each item
name item name e.g. 'Hei.ttf'
bytes item size in bytes e.g. 7502752
depth how far scan recursed to find the item, starts at 0
accessed datetime item was accessed
modified datetime item was modified
basename item name without path or extension, e.g. 'Hei'
extension item extension including the dot, e.g. '.ttf'
type item type, 'f' for file or 'd' for directory
mode further type info and permissions, e.g. 'drwxr-xr-x'
parentpath absolute path of directory containing the item, e.g. '/Library/Fonts/'
fullpath  (unique) parentpath of the item concatenated with its name, e.g. '/Library/Fonts/Hei.ttf'

All directory paths end with a slash, e.g.   /somedirectory/  
File paths do not e.g.   /somefile.txt

Just to remind you, to convert from bytes: KB = bytes/1e3, MB = bytes/1e6, GB = bytes/1e9, TB = bytes/1e12

FILESLINES TABLE


A virtual table giving searchable access to current contents of files Crab has scanned, one row for each line of data.

COLUMN DESCRIPTION
linenumber line number within file, restarts count from 1 at the first line of each file
data data content of the files, one entry for each line in the file

fileslines also includes columns from the files table: fileid, name, bytes, depth, accessed, modified, basename, extension, type, mode, parentpath, and fullpath. This lets you restrict files to be searched, without needing to join tables.

ITERATOR TABLE


A virtual table with one column, i, that takes the values 1,2,3,..,1000000000.

COLUMN DESCRIPTION
i 1, 2, 3, ...


On the first row i has the value 1, on the second row i has the value 2, and so on: 1, 2, 3, ...

By joining to the iterator table, and using i in the select clause, you turn each value of i into a separate row in your result set. Use it to turn columns in your data, into rows in your result set. E.g.

 

SELECT substr('The sun was shining',i,1)
FROM iterator
WHERE i<=length('The sun was shining');

CRAB> select substr('The sun was shining',i,1) from iterator where i<=length('The sun was shining');
substr('The sun was shining',i,1)
T
h
e

s
u
n

w
a
s

s
h
i
n
i
n
g

If you do this for every line of text in a file, you can group and count the results to make a character frequency count


SELECT substr(data,i,1), count(*)
FROM fileslines JOIN iterator ON i <= length(data)
WHERE name = '25021.txt' AND length(data)>0
GROUP by substr(data,i,1) ORDER BY count(*) DESC LIMIT 10;

CRAB> select substr(data,i,1), count(*) from fileslines join iterator on i <= length(data) where name = '25021.txt' and length(data)>0 group by substr(data,i,1) order by count(*) desc limit 10;
substr(data,i,1) count(*)
                  3449      
e                 2019      
t                 1643      
o                 1454      
r                 1240      
i                 1173      
n                 1166      
a                 1125      
s                 814       
h                 661  

© 2017 Etia UK