Data Tables

FILES TABLE


Details of every item scanned, file or directory

COLUMN DESCRIPTION
fileid  (primary key) files table row number, a unique id for each item
name item name e.g. 'simhei.ttf'
bytes item size in bytes e.g. 9751960
depth how far scan recursed to find the item, starts at 0
accessed datetime item was accessed
modified datetime item was modified
basename item name without path or extension, e.g. 'simhei'
extension item extension including the dot, e.g. '.ttf'
type item type, 'f' for file or 'd' for directory
mode further type info and permissions, e.g. 'drwxr-xr-x' On windows there is a second part to this field which shows file attributes such as hidden, archive, etc see below
parentpath absolute path of directory containing the item, e.g. 'C:\Windows\Fonts\'
fullpath  (unique) parentpath of the item concatenated with its name, e.g. 'C:\Windows\Fonts\simhei.ttf'

All directory paths end with a slash, e.g.   C:\tmp\   File paths do not e.g.   C:\tmp\somefile.txt

parentpath and fullpath don't support abbreviations such as . or .. They're just strings.

All columns are indexed except
mode.


mode field:

1st char object type (l is a link, d is a dir, - is a file)
chars 2 - 10 read, write, execute permissions: rwx (repeated three times, to match linux format)
char 11 colon ':'
chars 12+ file attributes



file attributes:

AARCHIVE
SSYSTEM
HHIDDEN
RREADONLY
INOT_CONTENT_INDEXED
VINTEGRITY_STREAM
XDEVICE
DDIRECTORY
NNORMAL
TTEMPORARY
PSPARSE_FILE
LREPARSE_POINT
CCOMPRESSED
OOFFLINE
EENCRYPTED
-VIRTUAL

FILESLINES TABLE


A virtual table giving searchable access to current contents of files Crab has scanned, one row for each line of data.

COLUMN DESCRIPTION
linenumber line number within file, restarts count from 1 at the first line of each file
data data content of the files, one entry for each line in the file

fileslines also includes columns from the files table: fileid, name, bytes, depth, accessed, modified, basename, extension, type, mode, parentpath, and fullpath. This lets you restrict files to be searched, without needing to join tables.

ITERATOR TABLE


A virtual table with one column, i, that takes the values 1,2,3,..,1000000000.

COLUMN DESCRIPTION
i 1, 2, 3, ...


On the first row i has the value 1, on the second row i has the value 2, and so on: 1, 2, 3, ...

By joining to the iterator table, and using i in the select clause, you turn each value of i into a separate row in your result set. Use it to turn columns in your data, into rows in your result set. E.g.

 

SELECT substr('The sun was shining',i,1)
FROM iterator
WHERE i<=length('The sun was shining');

CRAB> select substr('The sun was shining',i,1) from iterator where i<=length('The sun was shining');
substr('The sun was shining',i,1)
T
h
e

s
u
n

w
a
s

s
h
i
n
i
n
g

If you do this for every line of text in a file, you can group and count the results to make a character frequency count


SELECT substr(data,i,1), count(*)
FROM fileslines JOIN iterator ON i <= length(data)
WHERE name = '25021.txt' AND length(data)>0
GROUP by substr(data,i,1) ORDER BY count(*) DESC LIMIT 10;

CRAB> select substr(data,i,1), count(*) from fileslines join iterator on i <= length(data) where name = '25021.txt' and length(data)>0 group by substr(data,i,1) order by count(*) desc limit 10;
substr(data,i,1) count(*)
                  3449      
e                 2019      
t                 1643      
o                 1454      
r                 1240      
i                 1173      
n                 1166      
a                 1125      
s                 814       
h                 661  

© 2017 Etia UK