Data Management System

Note

See the INTERSECT Scientific Data Layer documentation for details

The main purpose of the INTERSECT federated ecosystem is to enable science breakthroughs with autonomous experiments, self-driving laboratories, smart manufacturing, and AI-driven design, discovery and evaluation. The Data Management System (DMS) is responsible for collecting, tracking, transferring, storing, curating, and archiving the corresponding scientific data. The services of the DMS are organized in the following 4 tiers (Fig. 29):

  • Tier 0 (infrastructure) abstracts access to different storage backends, such as file systems, databases, object stores, etc.. They provide a Create, read, update and delete (CRUD)-like interface to abstract objects. The objects represent assets of a particular class (i.e. a file, a metadata entry, a catalog item, etc.).

  • Tier I provides data transport and storage management services.

  • Tier II provides data registration and deployment services.

  • Tier III provides data repository and data catalog services.

The Data Management System tiers

Fig. 29 The Data Management System tiers.

The DMS has the following services and microservice capabilities (mapping the System-of-Systems Architecture to the Microservices Architecture):

Minimum requrement

At minimum, there must be one and only one DMS in an INTERSECT federated ecosystem, as the DMS spans over the infrastructure systems within the same INTERSECT federated ecosystem. Individual services of the DMS may be distributed across infrastructure systems as needed, where some services may only exist once.

Optional requrement

Optionally, multiple INTERSECT federated ecosystems may exist that operate either completely indepenently from each other or collaborate with each other, but each INTERSECT federated ecosystem has only one (its own) DMS.

Note

Asset classes are loosely defined concepts here. In general an asset class is a Binary Large Object (BLOB). However, in the context they are used, i.e.’ on a higher abstraction layer, these BLOBs are well defined. Asset classes can also be defined based on other constraints like object size, frequency of access, etc. A data asset can be used as an abstraction of domain specific data and it has a unique identifier.