FSDB Class Description

class pyfsdb.Fsdb(filename=None, file_handle=None, return_type=1, out_file=None, out_file_handle=None, pass_comments='y', out_command_line='____INTERNAL____', write_nones_as_blanks=True, column_names=None, converters=None, save_command_history=True, out_column_names=None, handle_compressed=True, no_auto_conversion=False)

Reads FSDB files from the perl FSDB module.

(see the fsdb module documentation for full details)

append(row=None)

Writes a passed in row (or the one previously read) to the output file.

close(copy_comments_from=None)

Writes final processing command comment to the output file and closes it.

property column_names

An array of column names for the file being read

comment(line)

Add a comment to an ouput FSDB file

Addition and its placement depends on the value of pass_comments

property comments

Returns a list of comments seen in the document

convert_separator_token(separator_token=None)

Converts a separator (” “) into a separator_token (“t”)

property converters

The list of conversion routines.

It may be an array, with a converter per column, or a dict with a convert per named column.

This must be set before the file is opened/read.

Useful converted may include int, float, etc.

Note: if a converter throws an exception, a value of None will be placed into the returned row instead.

create_header_line(columns=None, separator_token=None, init_row=None)

Returns a header string for the stored column_names and separator/separator_token.

extend(rows=None)

Writes multiple rows to the output FSDB

property file_handle

The input file handle being read.

filter(fn, args=[])

Applies a function to the input stream rows, and saves the output results back to the output fsdb handle.

foreach(fn, return_results=True, args=[])

Applies a function fn to each row, returning an aggregate list of results if desired.

get_all()

Read all the data into memory and return it as an array of rows.

get_column_name(column_number)

Given an integer column number, returns its column name.

get_column_number(column_name)

Given a column_name, returns its integer index into an array of values.

get_column_numbers(column_names)

Given a list of column_names, returns a list of integer index into an array of values.

get_pandas(usecols=None, comment='#', data_has_comment_chars=False, **kwargs)

Returns a pandas dataframe for the given data. Warning: this cannot preserve comments in the files; FSDB comments are stripped from the output. Any other args will be passed to pandas.read_csv()

guess_converters(example_row)

Returns a best-guess effort list of converters after determining if floats/ints exist in the dataset

property header_line

The top #fsdb header line in the file being read.

property headers

Headers for the file handle being read.

maybe_open_filehandle(mode='r')

Internal

next_as_array()

Generator function to return a row as an array.

Using a generator is faster than using the Fsdb object as a iterator.

next_as_dict()

Generator function to return an array row.

Using a generator is faster than using the Fsdb object as a iterator.

property out_column_names

An array of column names for the output file being written.

This must be set prior to writing the first row, and modifies the internal output_header_line too.

property out_command_line

The output trailing command to print as the last line.

The out_command_line is printed with a ‘# | ‘ prefix to preserve the history of the command run. It defaults to “ “.join(sys.argv). Set to None if you wish to surpress printing of the line entirely.

property out_file

The output file being written to (if one is being written)

property out_file_handle

The output file being written to (if one is being written)

property out_header_line

The top #fsdb header line to write to the output.

property out_separator

The separator for the output.

Changing this will also change the stored out_separator_taken value.

This should not be changed after the header has already been written.

property out_separator_token

The separator for the output.

Changing this will also change the stored out_separator value.

This should not be changed after the header has already been written.

parse_commands()

parses the list of stored comments for any saved commands

Note: Assumes saved commands will be prefixed with ‘# |’ per convention

Returns a list of strings when commands can be found.

Returns None when we don’t have the information yet, such as when we have a non-seekable stream input.

parse_separator(separator=None)

Converts a separator (“t”) into a separator_token (” “)

put_all(rows)

Reads a list of rows and appends them to the FSDB file.

put_pandas(df)

saves a pandas dataframe to the output file

read_commands_ahead()

reads the command list at the bottom of the input stream if the input stream can seek.

returns a list of commands found in comments in the input stream returns None when the input is not seekable

read_header(line=None)

Internal

Returns a dict of header -> column numbers.

The header line should be in the form:

#fsdb -option value column1(separator)column2…

Returns:
[0, {

names: { colname: colnum, …}, numbers: { colnum: colname, …} header: { separator: separator_string}

}] on success

[-1, “error description”] on failure

row_as_string(row=None)

Converts an array row to an FSDB output line.

property separator

The ‘separator_token’ is the argument that comes after -F in the fsdb header and the separator is it’s translation; eg, for tab-based separator the separator_token would be the ‘t’ character and the separator would be ‘ ‘.

Changing this will also change the stored separator_token value.

property separator_token

The separator_token character for the file being read or written.

The ‘separator_token’ is the argument that comes after -F in the fsdb header; eg, for a tab-based separator the separator_token would be the ‘t’ character and the separator would be ‘ ‘.

Changing this will also change the stored separator value.

set_iterator_function()

XXX: change this to an property