Introduction¶
PyFSDB is a python implementation of the (perl) FSDB flat-file streaming database. See also, so my C implementation (under development).
Installation¶
Using pip (or pipx):
pip3 install pyfsdb
Or manually:
git clone git@github.com:gawseed/pyfsdb.git
cd pyfsdb
python3 setup.py build
python3 setup.py install
Example Usage¶
The FSDB file format contains headers and footers that supplement the data within a file. The most common separator is tab-separated, but can wrap CSVs and other datatypes (see the FSDB documentation for full details). The file also contains footers that trace all the piped commands that were used to create a file, thus documenting the history of its creation within the metadata in the file.
Example FSDB file¶
The following is an example FSDB formatted file, that is tab separated
(-F t) with three columns (col1 - a long/integer, two - a
string, and three - a double/float).
#fsdb -F t col1:l two:s andthree:d
1 key1 42.0
2 key2 123.0
Example code for reading a FSDB file¶
The following code reads the above file (stored in myfsdb.fsdb), and
prints the column names automatically seen, the converters that come
from the type specifiers (:l, :a, and :d) and each row (in array format).
Code:¶
import pyfsdb
db = pyfsdb.Fsdb("myfile.fsdb")
print(db.column_names)
print(db.converters)
for row in db:
print(row)
Output:¶
['col1', 'two', 'andthree']
{'col1': <class 'int'>, 'andthree': <class 'float'>}
[1, 'key1', 42.0]
[2, 'key2', 123.0]
Example writing an FSDB formatted file.¶
Code:¶
import pyfsdb
db = pyfsdb.Fsdb(out_file="myfile.fsdb")
db.out_column_names=('one', 'two')
db.append([4, 'hello world'])
db.close()
Output:¶
#fsdb -F t one:l two
4 hello world
A larger example¶
The real power of the FSDB comes from the build up of tool-suites that all interchange FSDB formatted files. This allows chaining multiple commands together to achieve a goal. Though the original base set of tools are in perl, you don’t need to know perl for most of them.
Let’s create a ./mydemo.py script:¶
import sys, pyfsdb
db = pyfsdb.Fsdb(file_handle=sys.stdin, out_file_handle=sys.stdout)
value_column = db.get_column_number('value')
for row in db: # reads a row from the input stream
row[value_column] = float(row[value_column]) * 2
db.append(row) # sends the row to the output stream
db.close()
And then feed it this file:
#fsdb -F t col1 value
1 42.0
2 123.0
We can run it thus’ly:
# cat test.fsdb | ./mydemo
#fsdb -F t col1 value
1 84.0
2 246.0
# | ./mydemo.py
Or chain it together with multiple FSDB commands. Note the details of the chain are recorded at the bottom of the output file.
# cat test.fsdb | ./mydemo | dbcolstats value | dbcol mean stddev sum min max | dbfilealter -R C
#fsdb -R C mean stddev sum min max
mean: 165
stddev: 114.55
sum: 330
min: 84
max: 246
# | ./mydemo.py
# | dbcolstats value
# | dbcol mean stddev sum min max
# | dbfilealter -R C