datasafe subpackage
datasafe.datasafe module
Datasafe module for the labinform package.
The datasafe is a key feature of labinform which serves to safely store data. Functionality includes directory generation and checksum creation.
- class labinform.datasafe.datasafe.Datasafe[source]
Bases:
object
Data handler for moving data in the context of a datasafe.
The operations performed include generation of a directory structure, storing data in and retrieving data from these directories as well verifying the integrity of and providing general information about the data stored.
- static add_directory(path)[source]
Create a directory at a specified path
- Parameters:
path (
str
) – path of the directory that should be created
- compare_checksum(loi='', with_meta=False)[source]
Create local checksum and compare with checksum file.
- static dir_is_empty(path='')[source]
Check whether a directory is empty.
- Parameters:
path (
str
) – path of the directory which should be checked
- static find_highest(path='')[source]
Find a numbered directory with the highest number.
For a given path, find the numbered directory (i.e. directory with an integer as name) with the highest number. If the directory that the path leads to doesn’t exist, if it is empty or if the subdirectories are not ‘numbered’ an error is raised.
- Parameters:
path (
str
) – path of the directory that should be searched
- generate(experiment='', sample_id='')[source]
Generate directory structure and return identifier.
Verify to what extent the relevant directory structure is present and create directories as required. In this context the measurement number for a given sample is - in case of consecutive measurements - automatically increased.
Return a unique identifier for the respective measurement and sample, including the directory path.
- static has_dir(path='')[source]
Check whether a directory exists.
- Parameters:
path (
str
) – path of the directory which should be checked
- static increment(number=0)[source]
Increment an integer by one.
- Parameters:
number (
int
) – integer that should be incremented
- index(loi='')[source]
Retrieve meta information about a dataset from the datasafe.
Retrieves meta information (Manifest.yaml file) for a dataset in the datasafe if present at the target directory (as specified in the LOI), raises an exception otherwise.
- Parameters:
loi (
str
) – unique identifier for the dataset for which the meta information should be retrieved.- Returns:
manifest_dict – retrieved meta information (Manifest.yaml) as ordered dict
- Return type:
- loi_to_path(loi='')[source]
Retrieve a file’s datasafe directory path from the data’s LOI.
Retrieves the data’s path (including the datasafe’s root path) which is included in the LOI. If the LOI is not correctly formatted, an exception is raised.
- make_both_checksum_files(path='', ignore_control_files=True)[source]
Create files containing hashes for files in target directory.
Wrapper method: Creates two checksums for files if present at the target directory and writes it to a checksum file, raises an exception otherwise. One checksum includes metadata, one doesn’t.
- make_checksum_file(path='', with_meta=False, ignore_control_files=True)[source]
Create a file containing a hash for files in target directory.
Creates a checksum for files if present at the target directory and writes it to a checksum file, raises an exception otherwise.
- Parameters:
- Returns:
checksum – checksum (currently MD5)
- Return type:
- static make_checksum_for_file(path='')[source]
Create a hash (currently MD5) for a file at a given path.
- make_checksum_for_files(path='', with_meta=False, ignore_control_files=True)[source]
Create a cryptographic hash (currently MD5) for multiple files.
All files in the directory are sorted and included in the checksum with the option to exclude control files, i.e. the manifest file and checksum files.
- make_tgz(path='')[source]
Pack directory content to *.tgz file.
Pack all files in directory to a *.tgz file without the folder itself.
- Parameters:
path (
str
) – path of the directory containing the files
- moveto(data='', experiment='', sample_id='')[source]
Prepare directory in datasafe and move data there.
This is a wrapper function which calls
generate()
to generate a directory structure if necessary and creates a local checksum of the file to be moved. Then moves the file to the datasafe, creates another checksum. The two checksums are compared and the result of the comparison is returned.- Parameters:
- Returns:
results – list containing the generated LOI and the result of the checksum comparison
- Return type:
- multi_push(path='', loi='')[source]
Move data (all files in one directory) into the datasafe.
Wrapper around
push()
for moving all files in any one directory. The files are packed to a tgz archive before moving and unpacked after. Before packing and after unpacking the data’s checksums are compared.
- property path
Get or set the path of the datasafe’s top level directory.
The directory is checked for existence and set as path only in case it exists.
- pull(loi='', target='')[source]
Retrieve data from the datasafe.
Retrieves data from the datasafe if present at the target directory (as specified in the LOI) and moves it to another target directory, raises an exception otherwise.
- push(data='', loi='', check_empty=True)[source]
Move data (one file) into the datasafe.
Before moving the existence of the target directory (as specified in the LOI) as well as its emptiness are verified. Before and after moving, the data’s checksums are compared.
- retrieve_checksum(loi='', with_meta=False)[source]
Return checksum from checksum file for a given LOI.
- exception labinform.datasafe.datasafe.DirectoryNotEmptyError[source]
Bases:
Error
Raised when it is tried to push data to a non-empty directory.
- exception labinform.datasafe.datasafe.Error[source]
Bases:
Exception
Base class for exceptions in this module.
- exception labinform.datasafe.datasafe.IncorrectLoiError[source]
Bases:
Error
Raised when an incorrect loi is provided.
- exception labinform.datasafe.datasafe.NoChecksumFilePresentError[source]
Bases:
Error
Raised when checksum file cannot be retrieved due to inexistence
datasafe.manifest module
Module used for creation of manifest files.
- class labinform.datasafe.manifest.ManifestWriter[source]
Bases:
object
Tool for automated creation of manifest files.
- manifest_dict
Ordered dict that is filled with information and finally saved as a manifest file.
- Type:
- complete
Indication whether the data of the dataset are complete.
Sometimes, measurements get cancelled, but the data measured so far are still useful. However, in such cases some of the metadata may not fit to the actual dimensions of the numerical data.
- Type:
- extension_catalogue
File extensions and their description to be included in the manifest
- Type: