Details of every item scanned, file or directory
|fileid (primary key)||files table row number, a unique id for each item|
|name||item name e.g. 'simhei.ttf'|
|bytes||item size in bytes e.g. 9751960|
|depth||how far scan recursed to find the item, starts at 0|
|accessed||datetime item was accessed|
|modified||datetime item was modified|
|basename||item name without path or extension, e.g. 'simhei'|
|extension||item extension including the dot, e.g. '.ttf'|
|type||item type, 'f' for file or 'd' for directory|
|mode||further type info and permissions, e.g. 'drwxr-xr-x' On windows there is a second part to this field which shows file attributes such as hidden, archive, etc see below|
|parentpath||absolute path of directory containing the item, e.g. 'C:\Windows\Fonts\'|
|fullpath (unique)||parentpath of the item concatenated with its name, e.g. 'C:\Windows\Fonts\simhei.ttf'|
All directory paths end with a slash, e.g. C:\tmp\ File paths do not e.g. C:\tmp\somefile.txt
parentpath and fullpath don't support abbreviations such as . or .. They're just strings.
All columns are indexed except mode.
|1st char||object type (l is a link, d is a dir, - is a file)|
|chars 2 - 10||read, write, execute permissions: rwx (repeated three times, to match linux format)|
|char 11||colon ':'|
|chars 12+||file attributes|
A virtual table giving searchable access to current contents of files Crab has scanned, one row for each line of data.
|linenumber||line number within file, restarts count from 1 at the first line of each file|
|data||data content of the files, one entry for each line in the file|
fileslines also includes columns from the files table: fileid, name, bytes, depth, accessed, modified, basename, extension, type, mode, parentpath, and fullpath. This lets you restrict files to be searched, without needing to join tables.
A virtual table with one column, i, that takes the values 1,2,3,..,1000000000.
|i||1, 2, 3, ...|
On the first row i has the value 1, on the second row i has the value 2, and so on: 1, 2, 3, ...
By joining to the iterator table, and using i in the select clause, you turn each value of i into a separate row in your result set. Use it to turn columns in your data, into rows in your result set. E.g.
SELECT substr('The sun was shining',i,1)
WHERE i<=length('The sun was shining');
CRAB> select substr('The sun was shining',i,1) from iterator where i<=length('The sun was shining');
substr('The sun was shining',i,1)
If you do this for every line of text in a file, you can group and count the results to make a character frequency count
SELECT substr(data,i,1), count(*)
FROM fileslines JOIN iterator ON i <= length(data)
WHERE name = '25021.txt' AND length(data)>0
GROUP by substr(data,i,1) ORDER BY count(*) DESC LIMIT 10;
CRAB> select substr(data,i,1), count(*) from fileslines join iterator on i <= length(data) where name = '25021.txt' and length(data)>0 group by substr(data,i,1) order by count(*) desc limit 10;