Aurora is a multi-tenant system that can run jobs of multiple clients/tenants. Going beyond the resource isolation on an individual host, it is crucial to prevent those jobs from stepping on each others toes.
The namespace for jobs in Aurora follows a hierarchical structure. This is meant to make it easier
to differentiate between different jobs. A job key consists of four parts. The four parts are
<cluster>/<role>/<environment>/<jobname>
in that order:
Role names correspond to user accounts. They are used for authentication, as the linux user used to run jobs, and for the assignment of quota. If you don’t know what accounts are available, contact your sysadmin.
The environment component in the job key, serves as a namespace. The values for
environment are validated in the scheduler. By default allowing any of devel
, test
,
production
, and any value matching the regular expression staging[0-9]*
. This validation can be
changed to allow any arbitrary regular expression by setting the scheduler option allowed_job_environments
.
None of the values imply any difference in the scheduling behavior. Conventionally, the
“environment” is set so as to indicate a certain level of stability in the behavior of the job
by ensuring that an appropriate level of testing has been performed on the application code. e.g.
in the case of a typical Job, releases may progress through the following phases in order of
increasing level of stability: devel
, test
, staging
, production
.
Tier is a predefined bundle of task configuration options. Aurora schedules tasks and assigns them resources based on their tier assignment. The default scheduler tier configuration allows for 3 tiers:
revocable
: The revocable
tier requires the task to run with revocable
resources.preemptible
: Setting the task’s tier to preemptible
allows for the possibility of that task
being preempted by other tasks when cluster is running low on resources.preferred
: The preferred
tier prevents the task from using revocable
resources and from being preempted.Since it is possible that a cluster is configured with a custom tier configuration, users should consult their cluster administrator to be informed of the tiers supported by the cluster. Attempts to schedule jobs with an unsupported tier will be rejected by the scheduler.
In order to guarantee that important production jobs are always running, Aurora supports preemption.
Let’s consider we have a pending job that is candidate for scheduling but resource shortage pressure prevents this. Active tasks can become the victim of preemption, if:
preemptible
or revocable
tier task and the candidate
is a preferred
tier task.In other words, tasks from preferred
tier jobs may
preempt tasks from any preemptible
or revocable
job. However, a preferred
task may only be
preempted by tasks from preferred
jobs in the same role with higher priority.
Aurora requires resource quotas for production non-dedicated jobs. Quota is enforced at the job role level and when set, defines a non-preemptible pool of compute resources within that role. All job types (service, adhoc or cron) require role resource quota unless a job has dedicated constraint set.
To grant quota to a particular role in production, an operator can use the command
aurora_admin set_quota
.