datasafe subpackage

datasafe.datasafe module

Datasafe module for the labinform package.

The datasafe is a key feature of labinform which serves to safely store data. Functionality includes directory generation and checksum creation.

class labinform.datasafe.datasafe.Datasafe[source]

Bases: object

Data handler for moving data in the context of a datasafe.

The operations performed include generation of a directory structure, storing data in and retrieving data from these directories as well verifying the integrity of and providing general information about the data stored.

_add_directory_for_generation(path, directory)[source]
static add_directory(path)[source]

Create a directory at a specified path

Parameters

path – path of the directory that should be created

property basic_loi
compare_checksum(loi='')[source]

Create local checksum and compare with checksum file.

Parameters

loi – unique identifier pointing to a datasafe directory, where the data is stored for which the checksums should be compared.

static dir_empty(path='')[source]

Check whether a directory is empty.

Parameters

path – path of the directory which should be checked

static find_highest(path='')[source]

Find a numbered directory with the highest number.

For a given path, find the numbered directory (i.e. directory with an integer as name) with the highest number. If the directory that the path leads to doesn’t exist, if it is empty or if the subdirectories are not ‘numbered’ an error is raised.

..todo::

What happens, when there are ‘numbered’ _files_ in the dir? Check each entry returned by os.listdir for being a directory.

..todo::

Find solution for when there are only “non-numbered” directories.

Parameters

path – path of the directory that should be searched

generate(experiment='', sample_id='')[source]

Generate directory structure and return identifier.

Verify to what extent the relevant directory structure is present and create directories as required. In this context the measurement number for a given sample is - in case of consecutive measurements - automatically increased.

Return a unique identifier for the respective measurement and sample, including the directory path.

Parameters
  • experiment (str) – type of experiment performed, e.g. ‘cwepr’

  • sample_id (str) – unique identifier for the sample measured

Returns

loi – unique loi including the information provided

Return type

str

static has_dir(path='')[source]

Check whether a directory exists.

Parameters

path – path of the directory which should be checked

static increment(number=0)[source]

Increment an integer by one.

Parameters

number – integer that should be incremented

index(loi='')[source]

Retrieve background information from the datasafe.

Retrieves background information (manifest.yaml file) from the datasafe if present at the target directory (as specified in the loi), raises an exception otherwise.

Parameters

loi – unique identifier for the data for which the background information should be retrieved.

Returns

xxxxxx – retrieved background information (manifest.yaml) as dict

Return type

class

dict

loi_to_path(loi='')[source]

Retrieve the a file’s datasafe directory path from the data’s loi.

Retrieves the data’s path (relative to the datasafe’s root path) which is included in the loi. If the loi is not correctly formatted, an exception is raised.

Parameters

loi – loi from which the path should be retrieved

Returns

path – path retrieved from the loi

Return type

class

str

make_checksum_file(loi='', ignore_control_files=True)[source]

Create a cryptographic hash (MD5) for a file in the datasafe.

Creates a checksum for a file in the datasafe if present at the target directory (as specified in the loi), raises an exception otherwise.

..todo::

multiple methods: make_checksum_file (put checksum in a file and the file in the datasafe), retrieve_checksum (get data from the checksum file) and compare_checksum (compare checksum local to checksum from file). Everything with boolean parameter ‘with_metadata’. Manifest contains listing of all files including information whether it’s metadata or data. Checksums don’t include manifest and checksum files. names (for manifest and both kinds of checksum files) as class properties.

Parameters
  • loi – unique identifier for the data (file) for which a checksum should be created

  • ignore_control_files – whether to ignore manifest and checksum files for checksum creation.

Returns

checksum – checksum (MD5)

Return type

class

str

static make_checksum_for_file(path='')[source]

Create a cryptographic hash (MD5) for a file at a given path.

Parameters

path – path of file for which a checksum should be created.

Returns

checksum – checksum (MD5)

Return type

class

str

make_checksum_for_files(path='', ignore_control_files=True)[source]

Create a cryptographic hash (MD5) for multiple files.

All files in the directory are sorted and included in the checksum with the option to exclude control files, i.e. the manifest file and checksum files.

Parameters
  • path – path of directory which contains the files.

  • ignore_control_files – whether to ignore manifest and checksum files for checksum creation.

Returns

checksum – checksum (MD5)

Return type

class

str

property manifest_name
property md5_name
property md5_with_meta_name
moveto(data='', experiment='', sample_id='')[source]

Prepare directory and move data there.

This is a wrapper function which calls generate() to generate a directory structure if necessary and creates a local checksum of the file to be moved. Then moves the file to the datasafe, creates another checksum. The two checksums are compared and the result of the comparison is returned.

Parameters
  • data – data (file) that should be moved inside the datasafe.

  • experiment – type of experiment performed, e.g. ‘cwepr’

  • sample_id – unique identifier for the sample measured

Returns

xxxxx – result of the checksum comparison

Return type

class

bool

multipush(path='', loi='')[source]
property path
pull(loi='')[source]

Retrieve data from the datasafe.

Retrieves data from the datasafe if present at the target directory (as specified in the loi), raises an exception otherwise.

Parameters

loi – unique identifier for the data to be retrieved

Returns

xxxxxx – retrieved data

Return type

class

str

push(data='', loi='', check_empty=True)[source]

Move data inside the datasafe.

Before moving the existence of the target directory (as specified in the loi) as well as its emptiness are verified.

Parameters
  • data – data (file) to be moved

  • loi – unique identifier providing a directory path

  • check_empty – whether an error should be raised if the loi points to an non-empty directory.

retrieve_checksum(loi='')[source]

Return checksum from checksum file for a given loi.

Parameters

loi – unique identifier pointing to a datasafe directory, where the dataset is located for which the checksum should be read.

verify_own_path()[source]

Verify if the path set as instance attribute is a correct path.

Wrapper around :method: verify_path specifically for checking the root path of the datasafe.

Returns

path_okay – result opf the path check

Return type

class

bool

static verify_path(path='')[source]

Verify if a path is correct.

Static method which works for any path not just the datasafe root path.

Parameters

path – path that should be checked

Returns

path_okay – result opf the path check

Return type

class

bool

exception labinform.datasafe.datasafe.DirNamesAreNotIntsError[source]

Bases: labinform.datasafe.datasafe.Error

Raised when it is tried to increment non numeric dir names.

exception labinform.datasafe.datasafe.DirNotEmptyError[source]

Bases: labinform.datasafe.datasafe.Error

Raised when it is tried to push data to a non-empty directory.

exception labinform.datasafe.datasafe.Error[source]

Bases: Exception

Base class for exceptions in this module.

exception labinform.datasafe.datasafe.IncorrectLoiError[source]

Bases: labinform.datasafe.datasafe.Error

Raised when an incorrect loi is provided.

exception labinform.datasafe.datasafe.NoChecksumFilePresentError[source]

Bases: labinform.datasafe.datasafe.Error

Raised when checksum file cannot be retrieved due to inexistence

exception labinform.datasafe.datasafe.NoPathForThisLoiError[source]

Bases: labinform.datasafe.datasafe.Error

Raised when the path corresponding to a given loi doesn’t exist.

exception labinform.datasafe.datasafe.NoSuchDirectoryError[source]

Bases: labinform.datasafe.datasafe.Error

Raised when an invalid path is set.