API description¶
EasyVVUQ-QCGPJ Executor¶
Executor
is the main object responsible for steering the configuration
and parallel execution of selected EasyVVUQ tasks with QCG-PilotJob.
The object needs to be tied to the already prepared instance of
the EasyVVUQ campaign and therefore it takes it as the mandatory campaign
parameter for the constructor.
The second (optional) parameter of the Executor
’s constructor is
config_file
, which can be used to initialise the environment
of tasks started by QCG-PilotJob. More information on this topic is presented
in the section Passing the execution environment to QCG-PilotJob tasks
The next parameter is resume
. By default it is set to True, which
means that EQI will try to resume not completed workflow of tasks submitted to QCG-PilotJob Manager.
More on this topic is discussed in the section Resume mechanism
The last (optional) parameter is log_level
that allows to set
specific level of logging just for the EasyVVUQ-QCGPJ part of processing.
QCG-PilotJob Manager initialisation¶
The EasyVVUQ-QCGPJ Executor needs to be configured to use an instance of QCG-PilotJob Manager service. It is possible to do this in two ways:
The first and simpler option is to use
create_manager()
method that creates QCG-PilotJob Manager in a basic configuration. The method takes four optional parameters:dir
to customise a working directory of the manager (by default current directory)resources
to specify resources that should be assigned for the Pilot Job. If the parameter is not specified, the whole available resources will be assigned for the Pilot Job: it means that in case of running the Pilot Job inside a queuing system the whole allocation will be used. If the parameter is provided, its specification should be consisted with the format supported by Local mode of QCG-PilotJob manager, i.e.[NODE_NAME]:CORES[,[NODE_NAME]:CORES]...
reserve_core
to specify if the manager service should run on a separate, reserved core (by defaultFalse
, which means that the manager’s core will be shared with executed tasks)
log_level
to set logging level for QCG-PilotJob Manager service and client parts.
The second and more advanced option is to use
set_manager()
method. This methods takes a single parameter, which is an instance of externally created QCG-PilotJob Manager instance. Don’t try to use this method unless you have very specific needs.For the reference go to: QCG-PilotJob documentation.
Task types¶
EasyVVUQ-QCGPJ supports the following types of Tasks that may be executed by QCG PJ Manager:
ENCODING
: this Task is used for the encoding of a single sample.EXECUTION
: this Task is used for the execution of an application for a single sample. The constructor of this Task requires theapplication
parameter to be specified with the value defining a command to run the application. TheEXECUTION
Task for a given sample depends on theENCODING
Task for the same sample.ENCODING&EXECUTION
: this Task is used for running both encoding and execution for a single sample. Similarly to theEXECUTION
Task the constructor of this Task requires theapplication
parameter to be specified with the value defining a command to run the application.
The addition of a Task to Executor does not condition its later use -
this if the Task is actually used depends on a specific processing
scheme that is selected for the execution in the run()
method of
Executor. In order to keep consistency of the environment only a single
Task of a given type should be kept in the Executor.
Tasks requirements¶
Tasks defined for execution by the QCG-PilotJob system need to define their
resource requirements. In EasyVVUQ-QCGPJ the specification of resource
requirements for a Task is made directly via the Task’s constructor,
particularly by its second parameter - TaskRequirements
. This object
may be inited with a combination of two parameters: nodes
and
cores
. If the only specified parameter is cores
, the Task will
run on a specified number of cores regardless of their physical location
(the cores can be distributed on many nodes). If there are two
parameters specified: nodes
and cores
the Task will use the
number of cores requested by cores
parameter on each of the nodes
requested by nodes
parameter. Therefore, in order to have good
efficiency, for the multicore Tasks it is advised to specify two
parameters: nodes
and cores
(even if there is only a need to
take one node).
Both nodes
and cores
parameters may be of int
type or of Resources
type.
In the case when a parameter is of an int
type, the provided value is simply
mapped to the exact number of required resources. In the case of parameters of Resources
type, there is much more flexibility in the requirements specification,
which may be obtained with the following keyword combinations:
exact
- the exact number of resources should be used,min
-max
- the resources number should be larger thanmin
and lower than `max,min
-split-into
- all available resources should be divided into chunks of sizesplit-into
, but the size of chunks can’t be smaller thanmin
Example TaskRequirements
specifications:
- Use exactly 4 cores, regardeless of their location
TaskRequirements(cores=4)
- Use 4 cores on a single node
TaskRequirements(nodes=1,cores=4)
- Use from 4 to 6 cores on each of 2 nodes
TaskRequirements(nodes=2,cores=Resources(min=4,max=6))
The algorithm used to define Task requirements in EasyVVUQ-QCGPJ is inherited from the QCG-PilotJob system. Further instruction can be found in the QCG Pilot Job documentation
Task execution models¶
The optional parameter of Task
constructor is model
. It allows to adjust the way how a task will be
started by QCG-PilotJob Manager in a parallel environment. At the moment of writing this documentation, the
following models are available: threads
, openmpi
, intelmpi
, srunmpi
, default
.
Since this option comes directly from QCG-PilotJob, the detailed description of the particular models is available
in the QCG Pilot Job documentation
Processing schemes¶
EasyVVUQ-QCGPJ allows to process tasks in a few predefined schemes which differ in both the scope of covered EasyVVUQ steps as well as the order of submission and the way of processing of tasks by QCG-PilotJob.
Below we shortly describe the seven currently supported schemes, making the use of some kind of visual representation. Firstly, let’s assume that we have a set of EasyVVUQ samples marked as s1, s2, …, sN. Then:
STEP_ORIENTED
in this scheme tasks are submitted in a priority of STEP; we want to complete encoding step for all samples and then go to the execution step for all samples. This scheme is as follows:
encoding(s1)->encoding(s2)->...->encoding(sN)->execution(s1)->execution(s2)->...->execution(sN)
STEP_ORIENTED_ITERATIVE
this scheme is similar to
STEP_ORIENTED
in a sense that the tasks are submitted in a priority of STEP, but here we make use of iterative tasks of QCG-PilotJob to execute all operation within a STEP in a single iterative task (internally consisted of many iterations). This scheme can be expressed as follows:encoding_iterative(s1, s2, ..., sN)->execution_iterative(s1, s2, ..., sN)
SAMPLE_ORIENTED
in this scheme the tasks are submitted in a priority of SAMPLE; in other words we want to complete whole processing (encoding and execution) for a given sample as soon as possible and then go to the next sample. This scheme can be written as follows:
encoding(s1)->execution(s1)->encoding(s2)->execution(s2)->...->encoding(sN)->execution(sN)
SAMPLE_ORIENTED_CONDENSED
it is similar scheme to
SAMPLE_ORIENTED
, but the encoding and execution are condensed into a single PJ task. It could be expressed as:encoding&execution(s1)->encoding&execution(s2)->...->encoding&execution(sN)
SAMPLE_ORIENTED_CONDENSED_ITERATIVE
this type employs iterative tasks to run condensed encoding and execution. This is similar to
SAMPLE_ORIENTED_CONDENSED
, but here encoding&execution tasks are a part of iterative task. It could be expressed as:encoding&execution_iterative(s1, s2, ..., sN)
EXECUTION_ONLY
instructs to submit only the
EXECUTION
tasks assuming that the encoding step is executed outside QCG-PilotJob. It could be written as follows:execution(s1)->execution(s2)->...->execution(sN)
EXECUTION_ONLY_ITERATIVE
the variation of scheme to submit only the
EXECUTION
tasks, but in contrast to theEXECUTION_ONLY
scheme, here an iterative QCG-PilotJob task is used to run all tasks. It could be written as follows:execution_iterative(s1, s2,... sN)
The schemes use different task types that need to be added to Executor in order to allow processing:
- The
SAMPLE_ORIENTED
,STEP_ORIENTED``and ``STEP_ORIENTED_ITERATIVE
schemes requireENCODING
andEXECUTION
tasks. - The
EXECUTION_ONLY
andEXECUTION_ONLY_ITERATIVE
schemes requireEXECUTION
task. - The
SAMPLE_ORIENTED_CONDENSED
andSAMPLE_ORIENTED_CONDENSED_ITERATIVE
requireENCODING_AND_EXECUTION
task.
The efficiency of the schemes may significantly differ depending on use case and resource requirements defined for execution of both the whole PilotJob and the individual task types. For many scenarios the iterative schemes could run a bit better, but there is no general rule of thumb that says so, and therefore we encourage you to test different schemes when the efficiency is priority.
Passing the execution environment to QCG-PilotJob tasks¶
Since every QCG-PilotJob task is started in a separate process, it needs to be
properly configured to run in an environment consistent with the
requirements of the parent script. On the one hand, EasyVVUQ allows to
easily recover information about the campaign from the database, but
some environment settings, such as information about required
environment modules or virtual environment, have to be passed in a
different way. To this end, EasyVVUQ-QCGPJ delivers a simple mechanism based on
an idea of bash script, that is sourced by each task prior to its actual
execution. The path to this file can be provided in the EQI_CONFIG
environment variable. If this environment variable is available in the
master script, it is also automatically passed to QCG-PilotJob tasks.
To the large extent the structure of the script provided in
EQI_CONFIG
is fully custom. In this script a user can load
modules, set further environment variables or even do simple
calculations. The content can be all things that are needed by a Task in
prior of its actual execution. Very basic example of the
EQI_CONFIG
file may look as follows:
#!/bin/bash
module load openmpi/4.0
Note
The alternate option to provide the configuration file is to specify
its location by the config_file
parameter
provided into the constructor of the Executor
object.
Resume mechanism¶
EQI is able to resume not completed workflow of tasks submitted to QCG-PilotJob Manager
(for example terminated because of the walltime crossing).
By default the resume mechanism is activated automatically when Executor is inited with the campaign
for which EQI processing was already started (working directory exists) but it is not yet completed.
If this behaviour is not intended, the resume mechanism can be disabled with providing
resume=False
parameter to the Executor's
constructor.
The resumed workflow will start in a working directory of the previous, not-completed execution. This is fully expected behaviour, but since the partially generated output or intermediate files can exists, they need to be carefully handled. EQI tries to help in this matter by providing mechanisms for automatic recovery of individual tasks.
How much the automatism can interfere with the resume logic depends on a use case and therefore
EQI provides a few ResumeLevels
of automatic recovery. The levels can be set in the Task
’s
constructor with the resume_level
parameter. There are the following options available:
DISABLED
- Automatic resume is fully disabled for a task.
BASIC
- For the task types creating run directories (
ENCODING
,ENCODING_AND_EXECUTION
), the resume checks if an unfinished task created run directory. If such directory is available, this directory is recursively removed before the start of the resumed task. MODERATE
- This level processes all operations offered by the
BASIC
level, and adds the following features. At the beginning of a task’s execution, the list of directories and files in a run directory is generated and stored. The resumed task checks for the differences and remove new files and directories in order to resurrect the initial state.
Please note that this functionality may be not sufficient for more advanced scenarios (for example if input files are updated during an execution) and those for which the overhead of the built-in mechanism is not acceptable. In such cases, the more optimal logic of resume may need to be provided on a level of the actual code of a task.
External Encoders¶
EasyVVUQ allows to define custom encoders for specific use cases. This
works without any issues as long as we are in a single process. However,
in case we want to execute the encoding in a separate processes, there
is a need to instruct these processes about the encoder. This
information is partially available in the Campaign itself and can be
recovered, but we need to somehow instruct EasyVVUQ-QCGPJ code to import
required python modules for the encoder. To this end once again we make
use of environment variable - this time ENCODER_MODULES
. The value
of this variable should be the semicolon-separated list of the modules
names, which are required by the custom encoder. The modules will be
dynamically loaded before the encoder is recovered, what resolves the
problem. In order to use ENCODER_MODULES
variable we propose to
define it in the EQI_CONFIG
An example configuration of EQI_CONFIG
that includes
specification of custom ENCODER_MODULES
may look as follows (for the
full test case please look in tests/custom_encoder
):
#!/bin/bash
# WORKS ONLY IN BASH - SHOULD BE CHANGED (EG. TO GLOBAL PATHS) IN CASE OF OTHER INTERPRETERS
this_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
this_file=$(basename "${BASH_SOURCE[0]}")
PYTHONPATH="${PYTHONPATH}:${this_dir}"
ENCODER_MODULES="custom_encoder"
export PYTHONPATH
export ENCODER_MODULES
export EQI_CONFIG=$this_dir/$this_file