seabirdfilehandler.file_collection module

- seabirdfilehandler.file_collection.get_collection(path_to_files, file_suffix='cnv', only_metadata=False, pattern='', sorting_key=None)[source]
Factory to create instances of FileCollection, depending on input type.
- Parameters:
path_to_files (Path | str :) – The path to the directory to search for files.
file_suffix (str :) – The suffix to search for. (Default value = “cnv”)
only_metadata (bool :) – Whether to read only metadata. (Default value = False)
pattern (str) – A filter for file selection. (Default value = ‘’)
sorting_key (Callable | None :) – A callable that returns the filename-part to use to sort the collection. (Default value = None)
- Return type:
An instance of FileCollection or one of its children.
- class seabirdfilehandler.file_collection.FileCollection(path_to_files, file_suffix, only_metadata=False, pattern='', sorting_key=None)[source]
Bases:
UserList
A representation of multiple files of the same kind. These files share the same suffix and are otherwise closely connected to each other. A common use case would be the collection of CNVs to allow for easier processing or integration of field calibration measurements.
- Parameters:
path_to_files (Path | str :) – The path to the directory to search for files.
file_suffix (str :) – The suffix to search for. (Default value = “cnv”)
only_metadata (bool :) – Whether to read only metadata. (Default value = False)
pattern (str) – A filter for file selection. (Default value = ‘’)
sorting_key (Callable | None :) – A callable that returns the filename-part to use to sort the collection. (Default value = None)
- extract_file_type(suffix)[source]
Determines the file type using the input suffix.
- Parameters:
suffix (str :) – The file suffix.
- Return type:
An object corresponding to the given suffix.
- collect_files(pattern='', sorting_key=<function FileCollection.<lambda>>)[source]
Creates a list of target files, recursively from the given directory. These can be sorted with the help of the sorting_key parameter, which is a Callable that identifies the part of the filename that shall be used for sorting.
- Parameters:
pattern (str) – A filter for file selection. Is given to rglob. (Default value = ‘’)
sorting_key (Callable | None :) – The part of the filename to use in sorting. (Default value = lambda file: int(file.stem.split(“_”)[3]))
- Return type:
A list of all paths found.
- load_files(only_metadata=False)[source]
Creates python instances of each file.
- Parameters:
only_metadata (bool :) – Whether to load only file metadata. (Default value = False)
- Return type:
A list of all instances.
- get_dataframes(event_log=False, coordinates=False, time_correction=False, cast_identifier=False)[source]
Collects all individual dataframes and allows additional column creation.
- Parameters:
event_log (bool :) – (Default value = False)
coordinates (bool :) – (Default value = False)
time_correction (bool :) – (Default value = False)
cast_identifier (bool :) – (Default value = False)
- Return type:
A list of the individual pandas DataFrames.
- get_collection_dataframe(list_of_dfs=None)[source]
Creates one DataFrame from the individual ones, by concatenation.
- Parameters:
list_of_dfs (list[pd.DataFrame] | None :) – A list of the individual DataFrames. (Default value = None)
- Return type:
A pandas DataFrame representing the whole dataset.
- tidy_collection_dataframe(df)[source]
Apply the different dataframe edits to the given dataframe.
- Parameters:
df (pd.DataFrame :) – A DataFrame to edit.
- Return type:
The tidied dataframe.
- use_bad_flag_for_nan(df)[source]
Replace all Nan values by the bad flag value, defined inside the files.
- Parameters:
df (pd.DataFrame :) – The dataframe to edit.
- Return type:
The edited DataFrame.
- set_dtype_to_float(df)[source]
Use the float-dtype for all DataFrame columns.
- Parameters:
df (pd.DataFrame :) – The dataframe to edit.
- Return type:
The edited DataFrame.
- class seabirdfilehandler.file_collection.CnvCollection(*args, **kwargs)[source]
Bases:
FileCollection
Specific methods to work with collections of .cnv files.
- get_dataframes(event_log=False, coordinates=False, time_correction=False, cast_identifier=False)[source]
Collects all individual dataframes and allows additional column creation.
- Parameters:
event_log (bool :) – (Default value = False)
coordinates (bool :) – (Default value = False)
time_correction (bool :) – (Default value = False)
cast_identifier (bool :) – (Default value = False)
- Return type:
A list of the individual pandas DataFrames.
- get_data_table_meta_info()[source]
Ensures the same data description in all input cnv files and returns it. Acts as an early alarm when working on different kinds of files, which cannot be concatenated together.
- Return type:
A list of dictionaries that represent the data column information.
- class seabirdfilehandler.file_collection.HexCollection(*args, xmlcon_pattern='', path_to_xmlcons='', **kwargs)[source]
Bases:
FileCollection
Specific methods to work with collections of .hex files.
Especially concerned with the detection of corresponding .XMLCON files.
- get_xmlcons()[source]
Returns all .xmlcon files found inside the root directory and its children, matching a given pattern.
Does use the global sorting_key to attempt to also sort the xmlcons the same way. This is meant to be used in the future for a more specific hex-xmlcon matching.
- Return type:
A list of the found xmlcon filenames.