pdbfgrep - join rows from one FSDB files into another

pdbfgrep provides a mechanism for doing a multi-match grep from two FSDB files, where the first is the stream to read and grep from (search through) and the second is a file containing a list of values from keys to match against. Similar to pdbaugment, pdbfgrep is designed to read a single file entirely into memory and use it search for rows in a second one that is read in a streaming style. In general, the smaller file should be used as the augment_file argument, and the larger as the stream_file when possible.

Example input file 1 (mygreptest.fsdb):

#fsdb -F t col1 two andthree
1   key1    42.0
2   key2    123.0
3   key3    90.2

Example input file 2 (grep-values.fsdb):

#fsdb -F t two additional_column
key1    blue
key3    brown

Example command usage

$ pdbfgrep -k two -- mygreptest.fsdb grep-values.fsdb

Example output

#fsdb -F t col1:a two:a andthree:a
1    key1    42.0
3    key3    90.2
#   | pdbfgrep --k two -- mygreptest.fsdb grep-values.fsdb

Example command usage – inverted grep

$ pdbfgrep -v -k two -- mygreptest.fsdb grep-values.fsdb

Example output

#fsdb -F t col1:a two:a andthree:a
2   key2    123.0
#   | pdbfgrep -v -k two -- mygreptest.fsdb grep-values.fsdb

Command Line Arguments

pdbfgrep - CLI interface

This script expects to grep from a FSDB stream (or file) using additional information found in another FSDB file. The second file is loaded entirely into memory in order to accomplish this. Any row in the stream file that has exactly matching keys in the second file will be output, otherwise dropped. This duplicates dbjoin/pdbrow or pdbaugment/pdbrow to a large extent, but pdbgrep should faster when one side is small because it avoids sorting.

pdbfgrep [-h] [-k KEYS [KEYS ...]] [-v] [stream_file] [augment_file] [output_file]

pdbfgrep positional arguments

  • stream_file (default: <_io.TextIOWrapper name='<stdin>' mode='r' encoding='utf-8'>)

  • augment_file (default: <_io.TextIOWrapper name='<stdin>' mode='r' encoding='utf-8'>)

  • output_file (default: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)

pdbfgrep options

  • -h, --help - show this help message and exit

  • -k KEYS, --keys KEYS - Keys to match against (default: ['timestamp', 'key'])

  • -v, --invert-match - Output the non-matching rows instead (similar to grep -v)