datasafe subpackage

datasafe.datasafe module

Datasafe module for the labinform package.

The datasafe is a key feature of labinform which serves to safely store data. Functionality includes directory generation and checksum creation.

class labinform.datasafe.datasafe.Datasafe[source]

Bases: object

Data handler for moving data in the context of a datasafe.

The operations performed include generation of a directory structure, storing data in and retrieving data from these directories as well verifying the integrity of and providing general information about the data stored.

checksum_name

Name used for checksum files covering all files of a dataset

Type

str

checksum_data_name

Name used for checksum files covering only the data of a dataset

Type

str

manifest_name

Name of the manifest file

Type

str

basic_loi

First part for LOIs, i.e. something like “42.xxxx/”

Type

str

data_movement_name

The name that should be used for the *.tgz archive for moving data.

Type

str

metadata_extensions

Extensions of metadata files

Type

list

static add_directory(path)[source]

Create a directory at a specified path

Parameters

path (str) – path of the directory that should be created

compare_checksum(loi='', with_meta=False)[source]

Create local checksum and compare with checksum file.

Parameters
  • loi (str) – unique identifier pointing to a datasafe directory, where the data is stored for which the checksums should be compared.

  • with_meta (bool) – whether to compare the checksums that include metadata.

Returns

comparison – result of the checksum comparison.

Return type

bool

static dir_is_empty(path='')[source]

Check whether a directory is empty.

Parameters

path (str) – path of the directory which should be checked

static find_highest(path='')[source]

Find a numbered directory with the highest number.

For a given path, find the numbered directory (i.e. directory with an integer as name) with the highest number. If the directory that the path leads to doesn’t exist, if it is empty or if the subdirectories are not ‘numbered’ an error is raised.

Parameters

path (str) – path of the directory that should be searched

generate(experiment='', sample_id='')[source]

Generate directory structure and return identifier.

Verify to what extent the relevant directory structure is present and create directories as required. In this context the measurement number for a given sample is - in case of consecutive measurements - automatically increased.

Return a unique identifier for the respective measurement and sample, including the directory path.

Parameters
  • experiment (str) – type of experiment performed, e.g. ‘cwepr’

  • sample_id (str) – unique identifier for the sample measured

Returns

loi – unique LOI including the information provided

Return type

str

static has_dir(path='')[source]

Check whether a directory exists.

Parameters

path (str) – path of the directory which should be checked

static increment(number=0)[source]

Increment an integer by one.

Parameters

number (int) – integer that should be incremented

index(loi='')[source]

Retrieve meta information about a dataset from the datasafe.

Retrieves meta information (Manifest.yaml file) for a dataset in the datasafe if present at the target directory (as specified in the LOI), raises an exception otherwise.

Parameters

loi (str) – unique identifier for the dataset for which the meta information should be retrieved.

Returns

manifest_dict – retrieved meta information (Manifest.yaml) as ordered dict

Return type

collections.OrderedDict

loi_to_path(loi='')[source]

Retrieve a file’s datasafe directory path from the data’s LOI.

Retrieves the data’s path (including the datasafe’s root path) which is included in the LOI. If the LOI is not correctly formatted, an exception is raised.

Parameters

loi (str) – LOI from which the path should be retrieved

Returns

path – path retrieved from the LOI

Return type

str

make_both_checksum_files(path='', ignore_control_files=True)[source]

Create files containing hashes for files in target directory.

Wrapper method: Creates two checksums for files if present at the target directory and writes it to a checksum file, raises an exception otherwise. One checksum includes metadata, one doesn’t.

Parameters
  • path (str) – path to the data (files) for which a checksum should be created

  • ignore_control_files (bool) – whether to ignore manifest and checksum files for checksum creation.

make_checksum_file(path='', with_meta=False, ignore_control_files=True)[source]

Create a file containing a hash for files in target directory.

Creates a checksum for files if present at the target directory and writes it to a checksum file, raises an exception otherwise.

Parameters
  • path (str) – path to the data (files) for which a checksum should be created

  • with_meta (bool) – whether to include metadata for checksum creation.

  • ignore_control_files (bool) – whether to ignore manifest and checksum files for checksum creation.

Returns

checksum – checksum (currently MD5)

Return type

str

static make_checksum_for_file(path='')[source]

Create a hash (currently MD5) for a file at a given path.

Parameters

path (str) – path of file for which a checksum should be created.

Returns

checksum – checksum (currently MD5)

Return type

str

make_checksum_for_files(path='', with_meta=False, ignore_control_files=True)[source]

Create a cryptographic hash (currently MD5) for multiple files.

All files in the directory are sorted and included in the checksum with the option to exclude control files, i.e. the manifest file and checksum files.

Parameters
  • path (str) – path of directory which contains the files.

  • with_meta (bool) – whether to include metadata for checksum creation.

  • ignore_control_files (bool) – whether to ignore manifest and checksum files for checksum creation.

Returns

checksum – checksum (currently MD5)

Return type

str

make_tgz(path='')[source]

Pack directory content to *.tgz file.

Pack all files in directory to a *.tgz file without the folder itself.

Parameters

path (str) – path of the directory containing the files

moveto(data='', experiment='', sample_id='')[source]

Prepare directory in datasafe and move data there.

This is a wrapper function which calls generate() to generate a directory structure if necessary and creates a local checksum of the file to be moved. Then moves the file to the datasafe, creates another checksum. The two checksums are compared and the result of the comparison is returned.

Parameters
  • data (str) – data (file) that should be moved inside the datasafe.

  • experiment (str) – type of experiment performed, e.g. ‘cwepr’

  • sample_id (str) – unique identifier for the sample measured

Returns

results – list containing the generated LOI and the result of the checksum comparison

Return type

list

multi_push(path='', loi='')[source]

Move data (all files in one directory) into the datasafe.

Wrapper around push() for moving all files in any one directory. The files are packed to a tgz archive before moving and unpacked after. Before packing and after unpacking the data’s checksums are compared.

Parameters
  • path (str) – path of the directory which contains the files to be moved.

  • loi (str) – unique identifier providing a directory path.

Returns

comparison – Is the checksum identical before and after pushing?

Return type

bool

property path

Get or set the path of the datasafe’s top level directory.

The directory is checked for existence and set as path only in case it exists.

pull(loi='', target='')[source]

Retrieve data from the datasafe.

Retrieves data from the datasafe if present at the target directory (as specified in the LOI) and moves it to another target directory, raises an exception otherwise.

Parameters
  • loi (str) – unique identifier for the data to be retrieved

  • target (str) – directory where the data should be deposited.

Returns

path – directory where the data was deposited.

Return type

str

push(data='', loi='', check_empty=True)[source]

Move data (one file) into the datasafe.

Before moving the existence of the target directory (as specified in the LOI) as well as its emptiness are verified. Before and after moving, the data’s checksums are compared.

Parameters
  • data (str) – data (file) to be moved

  • loi (str) – unique identifier providing a directory path

  • check_empty (bool) – whether an error should be raised if the LOI points to an non-empty directory.

Returns

comparison – Is the checksum identical before and after pushing?

Return type

bool

retrieve_checksum(loi='', with_meta=False)[source]

Return checksum from checksum file for a given LOI.

Parameters
  • loi (str) – unique identifier pointing to a datasafe directory, where the dataset is located for which the checksum should be read.

  • with_meta (bool) – whether to return the checksum that includes metadata.

Returns

checksum – checksum from file

Return type

str

verify_own_path()[source]

Verify if the path set as instance attribute is a correct path.

Wrapper around :method: verify_path specifically for checking the root path of the datasafe.

Returns

path_okay – result opf the path check

Return type

bool

static verify_path(path='')[source]

Verify if a path is correct.

Static method which works for any path not just the datasafe root path.

Parameters

path (str) – path that should be checked

Returns

path_okay – result opf the path check

Return type

bool

exception labinform.datasafe.datasafe.DirectoryNotEmptyError[source]

Bases: labinform.datasafe.datasafe.Error

Raised when it is tried to push data to a non-empty directory.

exception labinform.datasafe.datasafe.Error[source]

Bases: Exception

Base class for exceptions in this module.

exception labinform.datasafe.datasafe.IncorrectLoiError[source]

Bases: labinform.datasafe.datasafe.Error

Raised when an incorrect loi is provided.

exception labinform.datasafe.datasafe.NoChecksumFilePresentError[source]

Bases: labinform.datasafe.datasafe.Error

Raised when checksum file cannot be retrieved due to inexistence

exception labinform.datasafe.datasafe.NoPathForThisLoiError[source]

Bases: labinform.datasafe.datasafe.Error

Raised when the path corresponding to a given loi doesn’t exist.

exception labinform.datasafe.datasafe.NoSuchDirectoryError[source]

Bases: labinform.datasafe.datasafe.Error

Raised when an invalid path is set.

datasafe.manifest module

Module used for creation of manifest files.

class labinform.datasafe.manifest.ManifestWriter[source]

Bases: object

Tool for automated creation of manifest files.

manifest_dict

Ordered dict that is filled with information and finally saved as a manifest file.

Type

str

version

Version number of the manifest file format shown in the manifest file

Type

str

type

File type of the manifest file as displayed in the manifest file

Type

str

file_format

File format of the dataset data displayed in the manifest file

Type

str

complete

Indication whether the data of the dataset are complete.

Sometimes, measurements get cancelled, but the data measured so far are still useful. However, in such cases some of the metadata may not fit to the actual dimensions of the numerical data.

Type

bool

metadata_extensions

Extensions of metadata files

Type

list

checksum_name

Name used for checksum files covering all files of a dataset

Type

str

checksum_data_name

Name used for checksum files covering only the data of a dataset

Type

str

manifest_name

Name of the manifest file

Type

str

extension_catalogue

File extensions and their description to be included in the manifest

Type

dict

set_properties(datasafe=None)[source]

Apply properties from a datasafe object.

Parameters

datasafe – datasafe object from which to get properties.

write(path='', loi='')[source]

Create manifest file in target directory.

Parameters
  • path – path used for listing relevant files and for saving the manifest file

  • loi – loi that is inserted in the file