3.5. Resource Management

Let’s start with a basic example. We start a VM on Amazon EC2 using the SAGA resource API, submit a job to the newly instantiated VM using the SAGA job API and finally shut down the VM.

Note

In order to run this example, you need an account with Amazon EC2. You also need your Amazon EC2 id and key.

#!/usr/bin/env python

__author__    = "Andre Merzky, Ole Weidner"
__copyright__ = "Copyright 2012-2013, The SAGA Project"
__license__   = "MIT"


""" This is an example which shows how to access Amazon EC2 clouds via the SAGA
    resource package.

    In order to run this example, you need to set the following environment
    variables:

    * EC2_ACCESS_KEY:     your Amazon EC2 ID
    * EC2_SECRET_KEY:     your Amazon EC2 KEY
    * EC2_SSH_KEYPAIR_ID: name of ssh keypair within EC2
    * EC2_SSH_KEYPAIR:    your ssh keypair to use to access the VM, e.g.,
                          /home/username/.ssh/id_rsa_ec2
"""


import os
import sys
import time

import radical.saga as rs


# ------------------------------------------------------------------------------
#
def main():

    # In order to connect to EC2, we need an EC2 ID and KEY. We read those
    # from the environment.
    ec2_ctx = rs.Context('EC2')
    ec2_ctx.user_id  = os.environ['EC2_ACCESS_KEY']
    ec2_ctx.user_key = os.environ['EC2_SECRET_KEY']

    # The SSH keypair we want to use the access the EC2 VM. If the keypair is
    # not yet registered on EC2 saga will register it automatically.  This
    # context specifies the key for VM startup, ie. the VM will be configured to
    # accept this key
    ec2keypair_ctx = rs.Context('EC2_KEYPAIR')
    ec2keypair_ctx.token    = os.environ['EC2_KEYPAIR_ID']
    ec2keypair_ctx.user_key = os.environ['EC2_KEYPAIR']
    ec2keypair_ctx.user_id  = 'root'  # the user id on the target VM

    # We specify the *same* ssh key for ssh access to the VM.  That now should
    # work if the VM go configured correctly per the 'EC2_KEYPAIR' context
    # above.
    ssh_ctx = rs.Context('SSH')
    ssh_ctx.user_id  = 'root'
    ssh_ctx.user_key = os.environ['EC2_KEYPAIR']

    session = rs.Session(False)  # FALSE: don't use other (default) contexts
    session.contexts.append(ec2_ctx)
    session.contexts.append(ec2keypair_ctx)
    session.contexts.append(ssh_ctx)

    cr  = None  # compute resource handle
    rid = None  # compute resource ID
    try:

        # ----------------------------------------------------------------------
        #
        # reconnect to VM (ID given in ARGV[1])
        #
        if len(sys.argv) > 1:
            
            rid = sys.argv[1]

            # reconnect to the given resource
            print('reconnecting to %s' % rid)
            cr = rs.resource.Compute(id=rid, session=session)
            print('reconnected  to %s' % rid)
            print("  state : %s (%s)" % (cr.state, cr.state_detail))


        # ----------------------------------------------------------------------
        #
        # start a new VM
        #
        else:

            # start a VM if needed
            # in our session, connect to the EC2 resource manager
            rm = rs.resource.Manager("ec2://aws.amazon.com/", session=session)

            # Create a resource description with an image and an OS template,.
            # We pick a small VM and a plain Ubuntu image...
            cd = rs.resource.ComputeDescription()
            cd.image    = 'ami-0256b16b'    # plain ubuntu
            cd.template = 'Small Instance'

            # Create a VM instance from that description.
            cr  = rm.acquire(cd)
            rid = cr.id

            print("\nWaiting for VM to become active...")


        # ----------------------------------------------------------------------
        #
        # use the VM
        #
        # Wait for the VM to 'boot up', i.e., become 'ACTIVE'
        cr.wait(rs.resource.ACTIVE)

        # Query some information about the newly created VM
        print("Created VM: %s"      %  cr.id)
        print("  state   : %s (%s)" % (cr.state, cr.state_detail))
        print("  access  : %s"      %  cr.access)

        # give the VM some time to start up comlpetely, otherwise the subsequent
        # job submission might end up failing...
        time.sleep(60)

        # create a job service which uses the VM's access URL (cr.access)
        js = rs.job.Service(cr.access, session=session)

        jd = rs.job.Description()
        jd.executable = '/bin/sleep'
        jd.arguments = ['30']

        job = js.create_job(jd)
        job.run()

        print("\nRunning Job: %s" % job.id)
        print("  state : %s" % job.state)
        job.wait()
        print("  state : %s" % job.state)


    except rs.SagaException as ex:
        # Catch all saga exceptions
        print("An exception occured: (%s) %s " % (ex.type, (str(ex))))
        raise


    except Exception as e:
        # Catch all other exceptions
        print("An Exception occured: %s " % e)
        raise


    finally:

        # ----------------------------------------------------------------------
        #
        # shut VM down (only when id was specified on command line)
        if cr and rid:
            cr.destroy()
            print("\nDestroyed VM: %s" % cr.id)
            print("  state : %s (%s)" % (cr.state, cr.state_detail))


# ------------------------------------------------------------------------------
#
if __name__ == "__main__":
    
    sys.exit(main())


3.5.1. Resource Manager – radical.saga.resource.Manager

class radical.saga.resource.Manager(url=None, session=None, _adaptor=None, _adaptor_state={}, _ttype=None)[source]

Bases: Base, Async

In the context of RADICAL-SAGA, a ResourceManager is a service which asserts control over a set of resources. That manager can, on request, render control over subsets of those resources (resource slices) to an application.

This Manager class represents the contact point to such ResourceManager instances – the application can thus acquire compute, data or network resources, according to some resource specification, for a bound or unbound amount of time.

__init__(url)[source]

Create a new Manager instance. Connect to a remote resource management endpoint.

Parameters:

url (saga.Url) – resource management endpoint

acquire(desc)[source]

Create a new saga.resource.Resource handle for a resource specified by the description.

Parameters:

spec (Description or Url) – specifies the resource

Depending on the RTYPE attribute in the description, the returned resource may be a saga.resource.Compute, saga.resource.Storage or saga.resource.Network instance.

If the spec parameter is

destroy(rid)[source]

Destroy / release a resource.

:type rid : string :param rid : identifies the resource to be released

get_description(rid)[source]

Get the resource Description for the specified resource.

Parameters:

rid (str) – identifies the resource to be described.

get_image(name)[source]

Get a description string for the specified image.

Parameters:

name (str) – specifies the image name

get_template(name)[source]

Get a Description for the specified template.

Parameters:

name (str) – specifies the name of the template

The returned resource description instance may not have all attributes filled, and may in fact not sufficiently complete to allow for successful resource acquisition. The only guaranteed attribute in the returned description is TEMPLATE, containing the very template id specified in the call parameters.

list(rtype=None)[source]

List known resource instances (which can be acquired). Returns a list of IDs.

Parameters:

rtype (None or enum (COMPUTE | STORAGE | NETWORK)) – filter for one or more resource types

list_images(rtype=None)[source]

List image names available for the specified resource type(s). Returns a list of strings.

Parameters:

rtype (None or enum (COMPUTE | STORAGE | NETWORK)) – filter for one or more resource types

list_templates(rtype=None)[source]

List template names available for the specified resource type(s). Returns a list of strings.

Parameters:

rtype (None or enum (COMPUTE | STORAGE | NETWORK)) – filter for one or more resource types

3.5.2. Resource Description – radical.saga.resource.Description

class radical.saga.resource.Description(d=None)[source]

The resource description class.

Resource descriptions are used for two purposes:

  • an application can pass a description instances to a saga.resource.Manager instance, to request control over the resource slice described in the description;

  • an application can request a resource’s description for inspection of resource properties.

There are three specific types of descriptions:

  • saga.resource.ComputeDescription for the description of resources with compute capabilities;

  • saga.resource.StorageDescription for the description of resources with data storage capabilities;

  • saga.resource.NetworkDescription for the description of resources with communication capabilities.

There is at this point no notion of resources which combine different capabilities.

For all these capabilities, the following attributes are supported:

  • RTypeEnum, describing the capabilities of the resource

    (COMPUTE, STORAGE or NETWORK)

  • TemplateString, a backend specific resource class with some

    pre-defined hardware properties to apply to the resource.

  • ImageString, a backend specific resource class with some

    pre-defined software properties to apply to the resource.

  • DynamicBoolean, if `True signifies that the resource may

    dynamically change its properties at runtime

  • Start`Integer (seconds) since epoch when the resource is

    expected to enter / when the resource entered ACTIVE state.

  • End`Integer (seconds) since epoch when the resource is

    expected to enter / when the resource entered a FINAL state.

  • DurationInteger, seconds for which the resource is expected to

    remain / the resource remained in ACTIVE state.

  • MachineOSString, for COMPUTE resources, specifies the

    operating system type running on that resource.

  • MachineArch : `String, for COMPUTE resources, specifies the

    machine architecture of that resource.

  • SizeInteger, for COMPUTE resources, specifies the

    number of process slots provided, for STORAGE resource specifies the number of bytes, of the resource.

  • MemoryInteger, for COMPUTE resources, specifies the

    number of bytes provided as memory.

  • AccessString, usually an URL, which specifies the contact

    point for the resource capability interface / service interface.

__init__()[source]

Create a new Description instance.

clone()[source]

Implements deep copy.

Unlike the default python assignment (copy object reference), a deep copy will create a new object instance with the same state – after a deep copy, a change on one instance will not affect the other.

3.5.3. Resource – radical.saga.resource.Resource

class radical.saga.resource.Resource(id=None, session=None, _adaptor=None, _adaptor_state={}, _ttype=None)[source]

A Resource class instance represents a specific slice of resource which is, if in RUNNING state, under the applications control and ready to serve usage requests. The type of accepted usage requests depends on the specific resource types (job execution for saga.resource.Compute, data storage for saga.resource.Storage, and network connectivity for saga.resource.Network. The exact mechanism how those usage requests are communicated are not part of the resource’s class interface, but are instead served by other RADICAL-SAGA classes – typically those are saga.job.Service for Compute resources, and saga.filesystem.Directory for Storage resources (Network resources provide implicit connectivity, but do not have explicit, public entry points to request usage.

The process of resource acquisition is performed by a ResourceManager, represented by a saga.resource.Manager instance. The semantics of the acquisition process is defined as the act of moving a slice (subset) of the resources managed by the resource manager under the control of the requesting application (i.e. under user control), to use as needed. The type and property of the resource slice to be acquired and the time and duration over which the resource will be made available to the application are specified in a saga.resource.Description, to be supplied when acquiring a resource.

The exact backend semantics on how a resource slice is provisioned to the application is up to the resource manager backend – this can be as simple as providing a job submission endpoint to a classic HPC resource, and as complex as instantiating a pilot job or pilot data container, or reserving a network fiber on demand, or instantiating a virtual machine – the result will, from the application’s perspective, indistinguishable: a resource slice is made available for the execution of usage requests (tasks, workload, jobs, …).

Resources are stateful: when acquired from a resource manager, they are typically in NEW state, and will become ACTIVE once they are provisioned to the application and can serve usage requests. Some resources may go through an intermediate state, PENDING, when they are about to become active at some point, and usage requests can already be submitted – those usage requests will not be executed until the resources enters the ACTIVE state. The resource can be release from application control in three different ways: they can be actively be destroyed by the application, and will then enter the CANCELED state; they can internally cease to function and become unable to serve usage requests, represented by a FAILED state, and the resource manager can retract control from the application because the agreed time duration has passed – this is represented by the EXPIRED state.

destroy()[source]

The semantics of this method is equivalent to the semantics of the destroy() call on the saga.resource.Manager class.

get_access()[source]

Return the resource access Url.

get_description()[source]

Return the description that was used to aquire this resource.

get_id()[source]

Return the resource ID.

get_manager()[source]

Return the manager instance that was used to acquire this resource.

get_rtype()[source]

Return the resource type.

get_state()[source]

Return the state of the resource.

get_state_detail()[source]

Return the state details (backend specific) of the resource.

reconfig(descr)[source]

A resource is acquired according to a resource description, i.e. to a specific set of attributes. At some point in time, while the resource is running, the application requirements on the resource may have changed – in that case, the application can request to change the resource’s configuration on the fly.

This method cannot be used to change the type of the resource. Backends may or may not support this operation – if not, a saga.NotImplemented exception is raised. If the method is supported, , then the semantics of the method is equivalent to the semantics of the acquire() call on the saga.resource.Manager class.

wait(state=FINAL, timeout=None)[source]

Wait for a resource to enter a specific state.

Parameters:
  • state (float) – resource state to wait for (UNKNOWN, NEW, PENDING, ACTIVE, DONE, FAILED, EXPIRED, CANCELED, FINAL)

  • state – time to block while waiting.

This method will block until the resource entered the specified state, or until timeout seconds have passed – whichever occurs earlier. If the resource is in a final state, the call will raise and saga.IncorrectState exception when asked to wait for any non-final state.

A negative timeout value represents an indefinit timeout.

3.5.4. Compute Resource – radical.saga.resource.Compute

class radical.saga.resource.Compute(id=None, session=None, _adaptor=None, _adaptor_state={}, _ttype=None)[source]

Bases: Resource

A Compute resource is a resource which provides compute capabilities, i.e. which can execute compute jobs. As such, the ‘Access’ attribute of the compute resource (a URL) can be used to create a saga.job.Service instance to submit jobs to.

3.5.5. Storage Resource – radical.saga.resource.Storage

class radical.saga.resource.Storage(id=None, session=None, _adaptor=None, _adaptor_state={}, _ttype=None)[source]

Bases: Resource

A Storage resource is a resource which has storage capabilities, i.e. the ability to persistently store, organize and retrieve data. As such, the ‘Access’ attribute of the storage resource (a URL) can be used to create a saga.filesystem.Directory instance to manage the resource’s data space.

3.5.6. Storage Resource – radical.saga.resource.Network

class radical.saga.resource.Network(id=None, session=None, _adaptor=None, _adaptor_state={}, _ttype=None)[source]

Bases: Resource

A Network resource is a resource which has network capabilities.