When setting up your cluster, you will install the scheduler on a small number (usually 3 or 5) of machines. This guide helps you get the scheduler set up and troubleshoot some common hurdles.
The Aurora scheduler is a standalone Java server. As part of the build process it creates a bundle of all its dependencies, with the notable exceptions of the JVM and libmesos. Each target server should have a JVM (Java 7 or higher) and libmesos (0.20.1) installed.
To create a distribution for installation you will need build tools installed. On Ubuntu this can be
done with sudo apt-get install build-essential default-jdk
.
git clone http://gitbox.apache.org/repos/asf/incubator-aurora.git
cd incubator-aurora
./gradlew distZip
Copy the generated dist/distributions/aurora-scheduler-*.zip
to each node that will run a scheduler.
Extract the aurora-scheduler zip file. The example configurations assume it is extracted to
/usr/local/aurora-scheduler
.
sudo unzip dist/distributions/aurora-scheduler-*.zip -d /usr/local
sudo ln -nfs "$(ls -dt /usr/local/aurora-scheduler-* | head -1)" /usr/local/aurora-scheduler
Like Mesos, Aurora uses command-line flags for runtime configuration. As such the Aurora
“configuration file” is typically a scheduler.sh
shell script of the form.
#!/bin/bash
AURORA_HOME=/usr/local/aurora-scheduler
# Flags controlling the JVM.
JAVA_OPTS=(
-Xmx2g
-Xms2g
# GC tuning, etc.
)
# Flags controlling the scheduler.
AURORA_FLAGS=(
-http_port=8081
# Log configuration, etc.
)
# Environment variables controlling libmesos
export JAVA_HOME=...
export GLOG_v=1
export LIBPROCESS_PORT=8083
JAVA_OPTS="${JAVA_OPTS[*]}" exec "$AURORA_HOME/bin/aurora-scheduler" "${AURORA_FLAGS[@]}"
That way Aurora’s current flags are visible in ps
and in the /vars
admin endpoint.
Examples are available under examples/scheduler/
. For a list of available Aurora flags and their
documentation run
/usr/local/aurora-scheduler/bin/aurora-scheduler -help
All Aurora state is persisted to a replicated log. This includes all jobs Aurora is running including where in the cluster they are being run and the configuration for running them, as well as other information such as metadata needed to reconnect to the Mesos master, resource quotas, and any other locks in place.
Aurora schedulers use ZooKeeper to discover log replicas and elect a leader. Only one scheduler is leader at a given time - the other schedulers follow log writes and prepare to take over as leader but do not communicate with the Mesos master. Either 3 or 5 schedulers are recommended in a production deployment depending on failure tolerance and they must have persistent storage.
In a cluster with N
schedulers, the flag -native_log_quorum_size
should be set to
floor(N/2) + 1
. So in a cluster with 1 scheduler it should be set to 1
, in a cluster with 3 it
should be set to 2
, and in a cluster of 5 it should be set to 3
.
Number of schedulers (N) | -native_log_quorum_size setting (floor(N/2) + 1 ) |
---|---|
1 | 1 |
3 | 2 |
5 | 3 |
7 | 4 |
Incorrectly setting this flag will cause data corruption to occur!
See this document for more replicated log and storage configuration options.
Before you start Aurora you will also need to initialize the log on the first master.
mesos-log initialize --path="$AURORA_HOME/scheduler/db"
Failing to do this will result the following message when you try to start the scheduler.
Replica in EMPTY status received a broadcasted recover request
See this document for scheduler storage performance considerations.
The Aurora scheduler listens on 2 ports - an HTTP port used for client RPCs and a web UI, and a libprocess (HTTP+Protobuf) port used to communicate with the Mesos master and for the log replication protocol. These can be left unconfigured (the scheduler publishes all selected ports to ZooKeeper) or explicitly set in the startup script as follows:
# ...
AURORA_FLAGS=(
# ...
-http_port=8081
# ...
)
# ...
export LIBPROCESS_PORT=8083
# ...
Configure a supervisor like Monit or
supervisord to run the created scheduler.sh
file and restart it
whenever it fails. Aurora expects to be restarted by an external process when it fails. Aurora
supports an active health checking protocol on its admin HTTP interface - if a GET /health
times
out or returns anything other than 200 OK
the scheduler process is unhealthy and should be
restarted.
For example, monit can be configured with
if failed port 8081 send "GET /health HTTP/1.0\r\n" expect "OK\n" with timeout 2 seconds for 10 cycles then restart
assuming you set -http_port=8081
.
Please see our dedicated monitoring guide for in-depth discussion on monitoring.
Aurora is best suited to run stateless applications, but it also accommodates for stateful services like databases, or services that otherwise need to always run on the same machines.
The Mesos slave has the --attributes
command line argument which can be used to mark a slave with
static attributes (not to be confused with --resources
, which are dynamic and accounted).
Aurora makes these attributes available for matching with scheduling
constraints. Most of these
constraints are arbitrary and available for custom use. There is one exception, though: the
dedicated
attribute. Aurora treats this specially, and only allows matching jobs to run on these
machines, and will only schedule matching jobs on these machines.
The dedicated attribute has semantic meaning. The format is $role(/.*)?
. When a job is created,
the scheduler requires that the $role
component matches the role
field in the job
configuration, and will reject the job creation otherwise. The remainder of the attribute is
free-form. We’ve developed the idiom of formatting this attribute as $role/$job
, but do not
enforce this.
Consider the following slave command line:
mesos-slave --attributes="host:$HOST;rack:$RACK;dedicated:db_team/redis" ...
And this job configuration:
Service(
name = 'redis',
role = 'db_team',
constraints = {
'dedicated': 'db_team/redis'
}
...
)
The job configuration is indicating that it should only be scheduled on slaves with the attribute
dedicated:dba_team/redis
. Additionally, Aurora will prevent any tasks that do not have that
constraint from running on those slaves.
So you’ve started your first cluster and are running into some issues? We’ve collected some common stumbling blocks and solutions here to help get you moving.
Storage is not READY
I1016 16:12:27.234133 26081 replica.cpp:638] Replica in EMPTY status
received a broadcasted recover request
I1016 16:12:27.234256 26084 recover.cpp:188] Received a recover response
from a replica in EMPTY status
When you create a new cluster, you need to inform a quorum of schedulers that they are safe to consider their database to be empty by initializing the replicated log. This is done to prevent the scheduler from modifying the cluster state in the event of multiple simultaneous disk failures or, more likely, misconfiguration of the replicated log path.
Scheduler log contains
Framework has not been registered within the tolerated delay.
Double-check that the scheduler is configured correctly to reach the master. If you are registering the master in ZooKeeper, make sure command line argument to the master:
--zk=zk://$ZK_HOST:2181/mesos/master
is the same as the one on the scheduler:
-mesos_master_address=zk://$ZK_HOST:2181/mesos/master
PENDING
foreverThe scheduler is registered, and (receiving offers](docs/monitoring.md#schedulerresourceoffers),
but tasks are perpetually shown as PENDING - Constraint not satisfied: host
.
Check that your slaves are configured with host
and rack
attributes. Aurora requires that
slaves are tagged with these two common failure domains to ensure that it can safely place tasks
such that jobs are resilient to failure.
See our vagrant example for details.