seabirdfilehandler.file_collection module

Inheritance diagram of seabirdfilehandler.file_collection
seabirdfilehandler.file_collection.get_collection(path_to_files, file_suffix='cnv', only_metadata=False, pattern='', sorting_key=None)[source]

Factory to create instances of FileCollection, depending on input type.

Parameters:
  • path_to_files (Path | str :) – The path to the directory to search for files.

  • file_suffix (str :) – The suffix to search for. (Default value = “cnv”)

  • only_metadata (bool :) – Whether to read only metadata. (Default value = False)

  • pattern (str) – A filter for file selection. (Default value = ‘’)

  • sorting_key (Callable | None :) – A callable that returns the filename-part to use to sort the collection. (Default value = None)

Return type:

An instance of FileCollection or one of its children.

class seabirdfilehandler.file_collection.FileCollection(path_to_files, file_suffix, only_metadata=False, pattern='', sorting_key=None)[source]

Bases: UserList

A representation of multiple files of the same kind. These files share the same suffix and are otherwise closely connected to each other. A common use case would be the collection of CNVs to allow for easier processing or integration of field calibration measurements.

Parameters:
  • path_to_files (Path | str :) – The path to the directory to search for files.

  • file_suffix (str :) – The suffix to search for. (Default value = “cnv”)

  • only_metadata (bool :) – Whether to read only metadata. (Default value = False)

  • pattern (str) – A filter for file selection. (Default value = ‘’)

  • sorting_key (Callable | None :) – A callable that returns the filename-part to use to sort the collection. (Default value = None)

extract_file_type(suffix)[source]

Determines the file type using the input suffix.

Parameters:

suffix (str :) – The file suffix.

Return type:

An object corresponding to the given suffix.

collect_files(pattern='', sorting_key=<function FileCollection.<lambda>>)[source]

Creates a list of target files, recursively from the given directory. These can be sorted with the help of the sorting_key parameter, which is a Callable that identifies the part of the filename that shall be used for sorting.

Parameters:
  • pattern (str) – A filter for file selection. Is given to rglob. (Default value = ‘’)

  • sorting_key (Callable | None :) – The part of the filename to use in sorting. (Default value = lambda file: int(file.stem.split(“_”)[3]))

Return type:

A list of all paths found.

load_files(only_metadata=False)[source]

Creates python instances of each file.

Parameters:

only_metadata (bool :) – Whether to load only file metadata. (Default value = False)

Return type:

A list of all instances.

get_dataframes(event_log=False, coordinates=False, time_correction=False, cast_identifier=False)[source]

Collects all individual dataframes and allows additional column creation.

Parameters:
  • event_log (bool :) – (Default value = False)

  • coordinates (bool :) – (Default value = False)

  • time_correction (bool :) – (Default value = False)

  • cast_identifier (bool :) – (Default value = False)

Return type:

A list of the individual pandas DataFrames.

get_collection_dataframe(list_of_dfs=None)[source]

Creates one DataFrame from the individual ones, by concatenation.

Parameters:

list_of_dfs (list[pd.DataFrame] | None :) – A list of the individual DataFrames. (Default value = None)

Return type:

A pandas DataFrame representing the whole dataset.

tidy_collection_dataframe(df)[source]

Apply the different dataframe edits to the given dataframe.

Parameters:

df (pd.DataFrame :) – A DataFrame to edit.

Return type:

The tidied dataframe.

use_bad_flag_for_nan(df)[source]

Replace all Nan values by the bad flag value, defined inside the files.

Parameters:

df (pd.DataFrame :) – The dataframe to edit.

Return type:

The edited DataFrame.

set_dtype_to_float(df)[source]

Use the float-dtype for all DataFrame columns.

Parameters:

df (pd.DataFrame :) – The dataframe to edit.

Return type:

The edited DataFrame.

select_real_scan_data(df)[source]

Drop data rows have no ‘Scan’ value, if that column exists.

Parameters:

df (pd.DataFrame :) – The dataframe to edit.

Return type:

The edited DataFrame.

to_csv(file_name)[source]

Writes a csv file with the given filename.

Parameters:

file_name – The new csv file name.

class seabirdfilehandler.file_collection.CnvCollection(*args, **kwargs)[source]

Bases: FileCollection

Specific methods to work with collections of .cnv files.

get_dataframes(event_log=False, coordinates=False, time_correction=False, cast_identifier=False)[source]

Collects all individual dataframes and allows additional column creation.

Parameters:
  • event_log (bool :) – (Default value = False)

  • coordinates (bool :) – (Default value = False)

  • time_correction (bool :) – (Default value = False)

  • cast_identifier (bool :) – (Default value = False)

Return type:

A list of the individual pandas DataFrames.

get_data_table_meta_info()[source]

Ensures the same data description in all input cnv files and returns it. Acts as an early alarm when working on different kinds of files, which cannot be concatenated together.

Return type:

A list of dictionaries that represent the data column information.

get_array()[source]

Creates a collection array of all individual file arrays.

Return type:

A numpy array, representing the data of all input files.

get_processing_steps()[source]

Checks the processing steps in the different files for consistency. Returns the steps of the first file, which should be the same as for all other files.

Return type:

A list of ProcessingSteps.

class seabirdfilehandler.file_collection.HexCollection(*args, xmlcon_pattern='', path_to_xmlcons='', **kwargs)[source]

Bases: FileCollection

Specific methods to work with collections of .hex files.

Especially concerned with the detection of corresponding .XMLCON files.

get_xmlcons()[source]

Returns all .xmlcon files found inside the root directory and its children, matching a given pattern.

Does use the global sorting_key to attempt to also sort the xmlcons the same way. This is meant to be used in the future for a more specific hex-xmlcon matching.

Return type:

A list of the found xmlcon filenames.