FairMQ/docs/SDK.md
2019-10-21 15:52:47 +02:00

3.7 KiB

Back

8. Controller SDK

The FairMQ Controller Software Development Kit (-DBUILD_SDK=ON) contains a (as of today still experimental) set of C++ APIs that provide essential functionality to the implementer of a global controller.

The FairMQ core library only provides two local controllers - static (a fixed sequence of state transitions) and interactive (a read-eval-print-loop which reads keyboard commands from standard input). A local controller only knows how steer a single FairMQ device - in fact, it runs in a thread within the device process.

A global controller has knowledge about the full topology of connected FairMQ devices. Its responsibility is to facilitate the lifecycle of a distributed FairMQ-based application (executing a topology), such as

  • allocating/releasing compute resources from a resource management system,
  • launching/setting up the run-time environment and the FairMQ devices,
  • driving the device state machines in lock-step across the full topology,
  • pushing the device configuration,
  • monitoring (some aspects of the application's) operation,
  • and handling/reporting (some) error cases.

The low-level hook to integrate FairMQ devices with such a global contoller is the plugin mechanism in the FairMQ core library. The FairMQ Controller SDK provides C++ APIs that communicate to the endpoints exposed by such a FairMQ plugin.

At the moment, the Controller SDK only supports DDS as resource manager and run-time environment. A second implementation based on PMIx (targeting its implementation in Slurm and OpenRTE) is in development.

The following section give a short overview on the APIs provided.

RMS and run-time environment

The classes fair::mq::sdk::DDSEnvironment, fair::mq::sdk::DDSSession, and fair::mq::sdk::DDSTopology are thin wrappers of most of the synchronous APIs exposed by DDS (dds::tools_api and dds::topology_api). E.g. they allow to start a DDS session, allocate resources and launch a topology from a C++ program.

Driving the global state machine

The class fair::mq::sdk::Topology adds a FairMQ-specific view on an existing DDS session that is executing a topology of FairMQ devices. One can e.g. initiate a state transition on all devices in the topology simultaneously. This topology transition completes once a topology-wide barrier is passed (all devices completed the transition). This effectively exposes the device state machine as a topology state machine. The implementation is based on remote procedure calls over the DDS intercom service between the controller and the DDS plugin shipped with FairMQ (-DBUILD_DDS_PLUGIN=ON).

For future versions of the SDK new APIs are planned to inspect and modify the device configurations and also operate only on subsets of a given topology.

Back