datasafe subpackage

datasafe.datasafe module

Datasafe module for the labinform package.

The datasafe is a key feature of labinform which serves to safely store data. Functionality includes directory generation and checksum creation.

class labinform.datasafe.datasafe.Datasafe[source]

Bases: object

Data handler for moving data in the context of a datasafe.

The operations performed include generation of a directory structure, storing data in and retrieving data from these directories as well verifying the integrity of and providing general information about the data stored.

checksum_name

Name used for checksum files covering all files of a dataset

Type:

str

checksum_data_name

Name used for checksum files covering only the data of a dataset

Type:

str

manifest_name

Name of the manifest file

Type:

str

basic_loi

First part for LOIs, i.e. something like “42.xxxx/”

Type:

str

data_movement_name

The name that should be used for the *.tgz archive for moving data.

Type:

str

metadata_extensions

Extensions of metadata files

Type:

list

static add_directory(path)[source]

Create a directory at a specified path

Parameters:

path (str) – path of the directory that should be created

compare_checksum(loi='', with_meta=False)[source]

Create local checksum and compare with checksum file.

Parameters:
  • loi (str) – unique identifier pointing to a datasafe directory, where the data is stored for which the checksums should be compared.

  • with_meta (bool) – whether to compare the checksums that include metadata.

Returns:

comparison – result of the checksum comparison.

Return type:

bool

static dir_is_empty(path='')[source]

Check whether a directory is empty.

Parameters:

path (str) – path of the directory which should be checked

static find_highest(path='')[source]

Find a numbered directory with the highest number.

For a given path, find the numbered directory (i.e. directory with an integer as name) with the highest number. If the directory that the path leads to doesn’t exist, if it is empty or if the subdirectories are not ‘numbered’ an error is raised.

Parameters:

path (str) – path of the directory that should be searched

generate(experiment='', sample_id='')[source]

Generate directory structure and return identifier.

Verify to what extent the relevant directory structure is present and create directories as required. In this context the measurement number for a given sample is - in case of consecutive measurements - automatically increased.

Return a unique identifier for the respective measurement and sample, including the directory path.

Parameters:
  • experiment (str) – type of experiment performed, e.g. ‘cwepr’

  • sample_id (str) – unique identifier for the sample measured

Returns:

loi – unique LOI including the information provided

Return type:

str

static has_dir(path='')[source]

Check whether a directory exists.

Parameters:

path (str) – path of the directory which should be checked

static increment(number=0)[source]

Increment an integer by one.

Parameters:

number (int) – integer that should be incremented

index(loi='')[source]

Retrieve meta information about a dataset from the datasafe.

Retrieves meta information (Manifest.yaml file) for a dataset in the datasafe if present at the target directory (as specified in the LOI), raises an exception otherwise.

Parameters:

loi (str) – unique identifier for the dataset for which the meta information should be retrieved.

Returns:

manifest_dict – retrieved meta information (Manifest.yaml) as ordered dict

Return type:

collections.OrderedDict

loi_to_path(loi='')[source]

Retrieve a file’s datasafe directory path from the data’s LOI.

Retrieves the data’s path (including the datasafe’s root path) which is included in the LOI. If the LOI is not correctly formatted, an exception is raised.

Parameters:

loi (str) – LOI from which the path should be retrieved

Returns:

path – path retrieved from the LOI

Return type:

str

make_both_checksum_files(path='', ignore_control_files=True)[source]

Create files containing hashes for files in target directory.

Wrapper method: Creates two checksums for files if present at the target directory and writes it to a checksum file, raises an exception otherwise. One checksum includes metadata, one doesn’t.

Parameters:
  • path (str) – path to the data (files) for which a checksum should be created

  • ignore_control_files (bool) – whether to ignore manifest and checksum files for checksum creation.

make_checksum_file(path='', with_meta=False, ignore_control_files=True)[source]

Create a file containing a hash for files in target directory.

Creates a checksum for files if present at the target directory and writes it to a checksum file, raises an exception otherwise.

Parameters:
  • path (str) – path to the data (files) for which a checksum should be created

  • with_meta (bool) – whether to include metadata for checksum creation.

  • ignore_control_files (bool) – whether to ignore manifest and checksum files for checksum creation.

Returns:

checksum – checksum (currently MD5)

Return type:

str

static make_checksum_for_file(path='')[source]

Create a hash (currently MD5) for a file at a given path.

Parameters:

path (str) – path of file for which a checksum should be created.

Returns:

checksum – checksum (currently MD5)

Return type:

str

make_checksum_for_files(path='', with_meta=False, ignore_control_files=True)[source]

Create a cryptographic hash (currently MD5) for multiple files.

All files in the directory are sorted and included in the checksum with the option to exclude control files, i.e. the manifest file and checksum files.

Parameters:
  • path (str) – path of directory which contains the files.

  • with_meta (bool) – whether to include metadata for checksum creation.

  • ignore_control_files (bool) – whether to ignore manifest and checksum files for checksum creation.

Returns:

checksum – checksum (currently MD5)

Return type:

str

make_tgz(path='')[source]

Pack directory content to *.tgz file.

Pack all files in directory to a *.tgz file without the folder itself.

Parameters:

path (str) – path of the directory containing the files

moveto(data='', experiment='', sample_id='')[source]

Prepare directory in datasafe and move data there.

This is a wrapper function which calls generate() to generate a directory structure if necessary and creates a local checksum of the file to be moved. Then moves the file to the datasafe, creates another checksum. The two checksums are compared and the result of the comparison is returned.

Parameters:
  • data (str) – data (file) that should be moved inside the datasafe.

  • experiment (str) – type of experiment performed, e.g. ‘cwepr’

  • sample_id (str) – unique identifier for the sample measured

Returns:

results – list containing the generated LOI and the result of the checksum comparison

Return type:

list

multi_push(path='', loi='')[source]

Move data (all files in one directory) into the datasafe.

Wrapper around push() for moving all files in any one directory. The files are packed to a tgz archive before moving and unpacked after. Before packing and after unpacking the data’s checksums are compared.

Parameters:
  • path (str) – path of the directory which contains the files to be moved.

  • loi (str) – unique identifier providing a directory path.

Returns:

comparison – Is the checksum identical before and after pushing?

Return type:

bool

property path

Get or set the path of the datasafe’s top level directory.

The directory is checked for existence and set as path only in case it exists.

pull(loi='', target='')[source]

Retrieve data from the datasafe.

Retrieves data from the datasafe if present at the target directory (as specified in the LOI) and moves it to another target directory, raises an exception otherwise.

Parameters:
  • loi (str) – unique identifier for the data to be retrieved

  • target (str) – directory where the data should be deposited.

Returns:

path – directory where the data was deposited.

Return type:

str

push(data='', loi='', check_empty=True)[source]

Move data (one file) into the datasafe.

Before moving the existence of the target directory (as specified in the LOI) as well as its emptiness are verified. Before and after moving, the data’s checksums are compared.

Parameters:
  • data (str) – data (file) to be moved

  • loi (str) – unique identifier providing a directory path

  • check_empty (bool) – whether an error should be raised if the LOI points to an non-empty directory.

Returns:

comparison – Is the checksum identical before and after pushing?

Return type:

bool

retrieve_checksum(loi='', with_meta=False)[source]

Return checksum from checksum file for a given LOI.

Parameters:
  • loi (str) – unique identifier pointing to a datasafe directory, where the dataset is located for which the checksum should be read.

  • with_meta (bool) – whether to return the checksum that includes metadata.

Returns:

checksum – checksum from file

Return type:

str

verify_own_path()[source]

Verify if the path set as instance attribute is a correct path.

Wrapper around :method: verify_path specifically for checking the root path of the datasafe.

Returns:

path_okay – result opf the path check

Return type:

bool

static verify_path(path='')[source]

Verify if a path is correct.

Static method which works for any path not just the datasafe root path.

Parameters:

path (str) – path that should be checked

Returns:

path_okay – result opf the path check

Return type:

bool

exception labinform.datasafe.datasafe.DirectoryNotEmptyError[source]

Bases: Error

Raised when it is tried to push data to a non-empty directory.

exception labinform.datasafe.datasafe.Error[source]

Bases: Exception

Base class for exceptions in this module.

exception labinform.datasafe.datasafe.IncorrectLoiError[source]

Bases: Error

Raised when an incorrect loi is provided.

exception labinform.datasafe.datasafe.NoChecksumFilePresentError[source]

Bases: Error

Raised when checksum file cannot be retrieved due to inexistence

exception labinform.datasafe.datasafe.NoPathForThisLoiError[source]

Bases: Error

Raised when the path corresponding to a given loi doesn’t exist.

exception labinform.datasafe.datasafe.NoSuchDirectoryError[source]

Bases: Error

Raised when an invalid path is set.

datasafe.manifest module

Module used for creation of manifest files.

class labinform.datasafe.manifest.ManifestWriter[source]

Bases: object

Tool for automated creation of manifest files.

manifest_dict

Ordered dict that is filled with information and finally saved as a manifest file.

Type:

str

version

Version number of the manifest file format shown in the manifest file

Type:

str

type

File type of the manifest file as displayed in the manifest file

Type:

str

file_format

File format of the dataset data displayed in the manifest file

Type:

str

complete

Indication whether the data of the dataset are complete.

Sometimes, measurements get cancelled, but the data measured so far are still useful. However, in such cases some of the metadata may not fit to the actual dimensions of the numerical data.

Type:

bool

metadata_extensions

Extensions of metadata files

Type:

list

checksum_name

Name used for checksum files covering all files of a dataset

Type:

str

checksum_data_name

Name used for checksum files covering only the data of a dataset

Type:

str

manifest_name

Name of the manifest file

Type:

str

extension_catalogue

File extensions and their description to be included in the manifest

Type:

dict

set_properties(datasafe=None)[source]

Apply properties from a datasafe object.

Parameters:

datasafe – datasafe object from which to get properties.

write(path='', loi='')[source]

Create manifest file in target directory.

Parameters:
  • path – path used for listing relevant files and for saving the manifest file

  • loi – loi that is inserted in the file