3.5. Resource Management¶
Let’s start with a basic example. We start a VM on Amazon EC2 using the SAGA resource API, submit a job to the newly instantiated VM using the SAGA job API and finally shut down the VM.
Note
In order to run this example, you need an account with Amazon EC2. You also need your Amazon EC2 id and key.
#!/usr/bin/env python
__author__ = "Andre Merzky, Ole Weidner"
__copyright__ = "Copyright 2012-2013, The SAGA Project"
__license__ = "MIT"
""" This is an example which shows how to access Amazon EC2 clouds via the SAGA
resource package.
In order to run this example, you need to set the following environment
variables:
* EC2_ACCESS_KEY: your Amazon EC2 ID
* EC2_SECRET_KEY: your Amazon EC2 KEY
* EC2_SSH_KEYPAIR_ID: name of ssh keypair within EC2
* EC2_SSH_KEYPAIR: your ssh keypair to use to access the VM, e.g.,
/home/username/.ssh/id_rsa_ec2
"""
import os
import sys
import time
import radical.saga as rs
# ------------------------------------------------------------------------------
#
def main():
# In order to connect to EC2, we need an EC2 ID and KEY. We read those
# from the environment.
ec2_ctx = rs.Context('EC2')
ec2_ctx.user_id = os.environ['EC2_ACCESS_KEY']
ec2_ctx.user_key = os.environ['EC2_SECRET_KEY']
# The SSH keypair we want to use the access the EC2 VM. If the keypair is
# not yet registered on EC2 saga will register it automatically. This
# context specifies the key for VM startup, ie. the VM will be configured to
# accept this key
ec2keypair_ctx = rs.Context('EC2_KEYPAIR')
ec2keypair_ctx.token = os.environ['EC2_KEYPAIR_ID']
ec2keypair_ctx.user_key = os.environ['EC2_KEYPAIR']
ec2keypair_ctx.user_id = 'root' # the user id on the target VM
# We specify the *same* ssh key for ssh access to the VM. That now should
# work if the VM go configured correctly per the 'EC2_KEYPAIR' context
# above.
ssh_ctx = rs.Context('SSH')
ssh_ctx.user_id = 'root'
ssh_ctx.user_key = os.environ['EC2_KEYPAIR']
session = rs.Session(False) # FALSE: don't use other (default) contexts
session.contexts.append(ec2_ctx)
session.contexts.append(ec2keypair_ctx)
session.contexts.append(ssh_ctx)
cr = None # compute resource handle
rid = None # compute resource ID
try:
# ----------------------------------------------------------------------
#
# reconnect to VM (ID given in ARGV[1])
#
if len(sys.argv) > 1:
rid = sys.argv[1]
# reconnect to the given resource
print('reconnecting to %s' % rid)
cr = rs.resource.Compute(id=rid, session=session)
print('reconnected to %s' % rid)
print(" state : %s (%s)" % (cr.state, cr.state_detail))
# ----------------------------------------------------------------------
#
# start a new VM
#
else:
# start a VM if needed
# in our session, connect to the EC2 resource manager
rm = rs.resource.Manager("ec2://aws.amazon.com/", session=session)
# Create a resource description with an image and an OS template,.
# We pick a small VM and a plain Ubuntu image...
cd = rs.resource.ComputeDescription()
cd.image = 'ami-0256b16b' # plain ubuntu
cd.template = 'Small Instance'
# Create a VM instance from that description.
cr = rm.acquire(cd)
rid = cr.id
print("\nWaiting for VM to become active...")
# ----------------------------------------------------------------------
#
# use the VM
#
# Wait for the VM to 'boot up', i.e., become 'ACTIVE'
cr.wait(rs.resource.ACTIVE)
# Query some information about the newly created VM
print("Created VM: %s" % cr.id)
print(" state : %s (%s)" % (cr.state, cr.state_detail))
print(" access : %s" % cr.access)
# give the VM some time to start up comlpetely, otherwise the subsequent
# job submission might end up failing...
time.sleep(60)
# create a job service which uses the VM's access URL (cr.access)
js = rs.job.Service(cr.access, session=session)
jd = rs.job.Description()
jd.executable = '/bin/sleep'
jd.arguments = ['30']
job = js.create_job(jd)
job.run()
print("\nRunning Job: %s" % job.id)
print(" state : %s" % job.state)
job.wait()
print(" state : %s" % job.state)
except rs.SagaException as ex:
# Catch all saga exceptions
print("An exception occured: (%s) %s " % (ex.type, (str(ex))))
raise
except Exception as e:
# Catch all other exceptions
print("An Exception occured: %s " % e)
raise
finally:
# ----------------------------------------------------------------------
#
# shut VM down (only when id was specified on command line)
if cr and rid:
cr.destroy()
print("\nDestroyed VM: %s" % cr.id)
print(" state : %s (%s)" % (cr.state, cr.state_detail))
# ------------------------------------------------------------------------------
#
if __name__ == "__main__":
sys.exit(main())
3.5.1. Resource Manager – radical.saga.resource.Manager
¶
- class radical.saga.resource.Manager(url=None, session=None, _adaptor=None, _adaptor_state={}, _ttype=None)[source]¶
Bases:
Base
,Async
In the context of RADICAL-SAGA, a ResourceManager is a service which asserts control over a set of resources. That manager can, on request, render control over subsets of those resources (resource slices) to an application.
This
Manager
class represents the contact point to such ResourceManager instances – the application can thus acquire compute, data or network resources, according to some resource specification, for a bound or unbound amount of time.- __init__(url)[source]¶
Create a new Manager instance. Connect to a remote resource management endpoint.
- Parameters:
url (
saga.Url
) – resource management endpoint
- acquire(desc)[source]¶
Create a new
saga.resource.Resource
handle for a resource specified by the description.- Parameters:
spec (
Description
or Url) – specifies the resource
Depending on the RTYPE attribute in the description, the returned resource may be a
saga.resource.Compute
,saga.resource.Storage
orsaga.resource.Network
instance.If the spec parameter is
- destroy(rid)[source]¶
Destroy / release a resource.
:type rid : string :param rid : identifies the resource to be released
- get_description(rid)[source]¶
Get the resource
Description
for the specified resource.- Parameters:
rid (str) – identifies the resource to be described.
- get_image(name)[source]¶
Get a description string for the specified image.
- Parameters:
name (str) – specifies the image name
- get_template(name)[source]¶
Get a
Description
for the specified template.- Parameters:
name (str) – specifies the name of the template
The returned resource description instance may not have all attributes filled, and may in fact not sufficiently complete to allow for successful resource acquisition. The only guaranteed attribute in the returned description is TEMPLATE, containing the very template id specified in the call parameters.
- list(rtype=None)[source]¶
List known resource instances (which can be acquired). Returns a list of IDs.
- Parameters:
rtype (None or enum (COMPUTE | STORAGE | NETWORK)) – filter for one or more resource types
3.5.2. Resource Description – radical.saga.resource.Description
¶
- class radical.saga.resource.Description(d=None)[source]¶
The resource description class.
Resource descriptions are used for two purposes:
an application can pass a description instances to a
saga.resource.Manager
instance, to request control over the resource slice described in the description;an application can request a resource’s description for inspection of resource properties.
There are three specific types of descriptions:
saga.resource.ComputeDescription
for the description of resources with compute capabilities;saga.resource.StorageDescription
for the description of resources with data storage capabilities;saga.resource.NetworkDescription
for the description of resources with communication capabilities.
There is at this point no notion of resources which combine different capabilities.
For all these capabilities, the following attributes are supported:
- RTypeEnum, describing the capabilities of the resource
(COMPUTE, STORAGE or NETWORK)
- TemplateString, a backend specific resource class with some
pre-defined hardware properties to apply to the resource.
- ImageString, a backend specific resource class with some
pre-defined software properties to apply to the resource.
- DynamicBoolean, if `True signifies that the resource may
dynamically change its properties at runtime
- Start`Integer (seconds) since epoch when the resource is
expected to enter / when the resource entered ACTIVE state.
- End`Integer (seconds) since epoch when the resource is
expected to enter / when the resource entered a FINAL state.
- DurationInteger, seconds for which the resource is expected to
remain / the resource remained in ACTIVE state.
- MachineOSString, for COMPUTE resources, specifies the
operating system type running on that resource.
- MachineArch : `String, for COMPUTE resources, specifies the
machine architecture of that resource.
- SizeInteger, for COMPUTE resources, specifies the
number of process slots provided, for STORAGE resource specifies the number of bytes, of the resource.
- MemoryInteger, for COMPUTE resources, specifies the
number of bytes provided as memory.
- AccessString, usually an URL, which specifies the contact
point for the resource capability interface / service interface.
3.5.3. Resource – radical.saga.resource.Resource
¶
- class radical.saga.resource.Resource(id=None, session=None, _adaptor=None, _adaptor_state={}, _ttype=None)[source]¶
A
Resource
class instance represents a specific slice of resource which is, if in RUNNING state, under the applications control and ready to serve usage requests. The type of accepted usage requests depends on the specific resource types (job execution forsaga.resource.Compute
, data storage forsaga.resource.Storage
, and network connectivity forsaga.resource.Network
. The exact mechanism how those usage requests are communicated are not part of the resource’s class interface, but are instead served by other RADICAL-SAGA classes – typically those aresaga.job.Service
for Compute resources, andsaga.filesystem.Directory
for Storage resources (Network resources provide implicit connectivity, but do not have explicit, public entry points to request usage.The process of resource acquisition is performed by a ResourceManager, represented by a
saga.resource.Manager
instance. The semantics of the acquisition process is defined as the act of moving a slice (subset) of the resources managed by the resource manager under the control of the requesting application (i.e. under user control), to use as needed. The type and property of the resource slice to be acquired and the time and duration over which the resource will be made available to the application are specified in asaga.resource.Description
, to be supplied when acquiring a resource.The exact backend semantics on how a resource slice is provisioned to the application is up to the resource manager backend – this can be as simple as providing a job submission endpoint to a classic HPC resource, and as complex as instantiating a pilot job or pilot data container, or reserving a network fiber on demand, or instantiating a virtual machine – the result will, from the application’s perspective, indistinguishable: a resource slice is made available for the execution of usage requests (tasks, workload, jobs, …).
Resources are stateful: when acquired from a resource manager, they are typically in NEW state, and will become ACTIVE once they are provisioned to the application and can serve usage requests. Some resources may go through an intermediate state, PENDING, when they are about to become active at some point, and usage requests can already be submitted – those usage requests will not be executed until the resources enters the ACTIVE state. The resource can be release from application control in three different ways: they can be actively be destroyed by the application, and will then enter the CANCELED state; they can internally cease to function and become unable to serve usage requests, represented by a FAILED state, and the resource manager can retract control from the application because the agreed time duration has passed – this is represented by the EXPIRED state.
- destroy()[source]¶
The semantics of this method is equivalent to the semantics of the
destroy()
call on thesaga.resource.Manager
class.
- reconfig(descr)[source]¶
A resource is acquired according to a resource description, i.e. to a specific set of attributes. At some point in time, while the resource is running, the application requirements on the resource may have changed – in that case, the application can request to change the resource’s configuration on the fly.
This method cannot be used to change the type of the resource. Backends may or may not support this operation – if not, a
saga.NotImplemented
exception is raised. If the method is supported, , then the semantics of the method is equivalent to the semantics of theacquire()
call on thesaga.resource.Manager
class.
- wait(state=FINAL, timeout=None)[source]¶
Wait for a resource to enter a specific state.
- Parameters:
state (float) – resource state to wait for (UNKNOWN, NEW, PENDING, ACTIVE, DONE, FAILED, EXPIRED, CANCELED, FINAL)
state – time to block while waiting.
This method will block until the resource entered the specified state, or until timeout seconds have passed – whichever occurs earlier. If the resource is in a final state, the call will raise and
saga.IncorrectState
exception when asked to wait for any non-final state.A negative timeout value represents an indefinit timeout.
3.5.4. Compute Resource – radical.saga.resource.Compute
¶
- class radical.saga.resource.Compute(id=None, session=None, _adaptor=None, _adaptor_state={}, _ttype=None)[source]¶
Bases:
Resource
A Compute resource is a resource which provides compute capabilities, i.e. which can execute compute jobs. As such, the ‘Access’ attribute of the compute resource (a URL) can be used to create a
saga.job.Service
instance to submit jobs to.
3.5.5. Storage Resource – radical.saga.resource.Storage
¶
- class radical.saga.resource.Storage(id=None, session=None, _adaptor=None, _adaptor_state={}, _ttype=None)[source]¶
Bases:
Resource
A Storage resource is a resource which has storage capabilities, i.e. the ability to persistently store, organize and retrieve data. As such, the ‘Access’ attribute of the storage resource (a URL) can be used to create a
saga.filesystem.Directory
instance to manage the resource’s data space.