FSDB Class Description¶
- class pyfsdb.Fsdb(filename=None, file_handle=None, return_type=1, out_file=None, out_file_handle=None, pass_comments='y', out_command_line='____INTERNAL____', write_nones_as_blanks=True, column_names=None, converters=None, save_command_history=True, out_column_names=None, handle_compressed=True, no_auto_conversion=False)¶
Reads FSDB files from the perl FSDB module.
(see the fsdb module documentation for full details)
- append(row=None)¶
Writes a passed in row (or the one previously read) to the output file.
- close(copy_comments_from=None)¶
Writes final processing command comment to the output file and closes it.
- property column_names¶
An array of column names for the file being read
- comment(line)¶
Add a comment to an ouput FSDB file
Addition and its placement depends on the value of pass_comments
- property comments¶
Returns a list of comments seen in the document
- convert_separator_token(separator_token=None)¶
Converts a separator (” “) into a separator_token (“t”)
- property converters¶
The list of conversion routines.
It may be an array, with a converter per column, or a dict with a convert per named column.
This must be set before the file is opened/read.
Useful converted may include int, float, etc.
Note: if a converter throws an exception, a value of None will be placed into the returned row instead.
- create_header_line(columns=None, separator_token=None, init_row=None)¶
Returns a header string for the stored column_names and separator/separator_token.
- extend(rows=None)¶
Writes multiple rows to the output FSDB
- property file_handle¶
The input file handle being read.
- filter(fn, args=[])¶
Applies a function to the input stream rows, and saves the output results back to the output fsdb handle.
- foreach(fn, return_results=True, args=[])¶
Applies a function fn to each row, returning an aggregate list of results if desired.
- get_all()¶
Read all the data into memory and return it as an array of rows.
- get_column_name(column_number)¶
Given an integer column number, returns its column name.
- get_column_number(column_name)¶
Given a column_name, returns its integer index into an array of values.
- get_column_numbers(column_names)¶
Given a list of column_names, returns a list of integer index into an array of values.
- get_pandas(usecols=None, comment='#', data_has_comment_chars=False, **kwargs)¶
Returns a pandas dataframe for the given data. Warning: this cannot preserve comments in the files; FSDB comments are stripped from the output. Any other args will be passed to pandas.read_csv()
- guess_converters(example_row)¶
Returns a best-guess effort list of converters after determining if floats/ints exist in the dataset
- property header_line¶
The top #fsdb header line in the file being read.
- property headers¶
Headers for the file handle being read.
- maybe_open_filehandle(mode='r')¶
Internal
- next_as_array()¶
Generator function to return a row as an array.
Using a generator is faster than using the Fsdb object as a iterator.
- next_as_dict()¶
Generator function to return an array row.
Using a generator is faster than using the Fsdb object as a iterator.
- property out_column_names¶
An array of column names for the output file being written.
This must be set prior to writing the first row, and modifies the internal output_header_line too.
- property out_command_line¶
The output trailing command to print as the last line.
The out_command_line is printed with a ‘# | ‘ prefix to preserve the history of the command run. It defaults to “ “.join(sys.argv). Set to None if you wish to surpress printing of the line entirely.
- property out_file¶
The output file being written to (if one is being written)
- property out_file_handle¶
The output file being written to (if one is being written)
- property out_header_line¶
The top #fsdb header line to write to the output.
- property out_separator¶
The separator for the output.
Changing this will also change the stored out_separator_taken value.
This should not be changed after the header has already been written.
- property out_separator_token¶
The separator for the output.
Changing this will also change the stored out_separator value.
This should not be changed after the header has already been written.
- parse_commands()¶
parses the list of stored comments for any saved commands
Note: Assumes saved commands will be prefixed with ‘# |’ per convention
Returns a list of strings when commands can be found.
Returns None when we don’t have the information yet, such as when we have a non-seekable stream input.
- parse_separator(separator=None)¶
Converts a separator (“t”) into a separator_token (” “)
- put_all(rows)¶
Reads a list of rows and appends them to the FSDB file.
- put_pandas(df)¶
saves a pandas dataframe to the output file
- read_commands_ahead()¶
reads the command list at the bottom of the input stream if the input stream can seek.
returns a list of commands found in comments in the input stream returns None when the input is not seekable
- read_header(line=None)¶
Internal
Returns a dict of header -> column numbers.
The header line should be in the form:
#fsdb -option value column1(separator)column2…
- Returns:
- [0, {
names: { colname: colnum, …}, numbers: { colnum: colname, …} header: { separator: separator_string}
}] on success
[-1, “error description”] on failure
- row_as_string(row=None)¶
Converts an array row to an FSDB output line.
- property separator¶
The ‘separator_token’ is the argument that comes after -F in the fsdb header and the separator is it’s translation; eg, for tab-based separator the separator_token would be the ‘t’ character and the separator would be ‘ ‘.
Changing this will also change the stored separator_token value.
- property separator_token¶
The separator_token character for the file being read or written.
The ‘separator_token’ is the argument that comes after -F in the fsdb header; eg, for a tab-based separator the separator_token would be the ‘t’ character and the separator would be ‘ ‘.
Changing this will also change the stored separator value.
- set_iterator_function()¶
XXX: change this to an property