The .aurora
file format is just Python. However, Job
, Task
,
Process
, and other classes are defined by a templating library called
Pystachio, a powerful tool for configuration specification and reuse.
Aurora Configuration Reference has a full reference of all Aurora/Thermos defined Pystachio objects.
When writing your .aurora
file, you may use any Pystachio datatypes, as
well as any objects shown in the Aurora+Thermos Configuration
Reference without import
statements - the Aurora config loader
injects them automatically. Other than that the .aurora
format
works like any other Python script.
Pystachio uses the visually distinctive {{}} to indicate template variables. These are often called “mustache variables” after the similarly appearing variables in the Mustache templating system and because the curly braces resemble mustaches.
If you are familiar with the Mustache system, templates in Pystachio have significant differences. They have no nesting, joining, or inheritance semantics. On the other hand, when evaluated, templates are evaluated iteratively, so this affords some level of indirection.
Let’s start with the simplest template; text with one
variable, in this case name
;
Hello {{name}}
If we evaluate this as is, we’d get back:
Hello
If a template variable doesn’t have a value, when evaluated it’s replaced with nothing. If we add a binding to give it a value:
{ "name" : "Tom" }
We’d get back:
Hello Tom
Every Pystachio object has an associated .bind
method that can bind
values to {{}} variables. Bindings are not immediately evaluated.
Instead, they are evaluated only when the interpolated value of the
object is necessary, e.g. for performing equality or serializing a
message over the wire.
Objects with and without mustache templated variables behave differently:
>>> Float(1.5)
Float(1.5)
>>> Float('{{x}}.5')
Float({{x}}.5)
>>> Float('{{x}}.5').bind(x = 1)
Float(1.5)
>>> Float('{{x}}.5').bind(x = 1) == Float(1.5)
True
>>> contextual_object = String('{{metavar{{number}}}}').bind(
... metavar1 = "first", metavar2 = "second")
>>> contextual_object
String({{metavar{{number}}}})
>>> contextual_object.bind(number = 1)
String(first)
>>> contextual_object.bind(number = 2)
String(second)
You usually bind simple key to value pairs, but you can also bind three other objects: lists, dictionaries, and structurals. These will be described in detail later.
Most Aurora/Thermos users don’t ever (knowingly) interact with String
,
Float
, or Integer
Pystashio objects directly. Instead they interact
with derived structural (Struct
) objects that are collections of
fundamental and structural objects. The structural object components are
called attributes. Aurora’s most used structural objects are Job
,
Task
, and Process
:
class Process(Struct):
cmdline = Required(String)
name = Required(String)
max_failures = Default(Integer, 1)
daemon = Default(Boolean, False)
ephemeral = Default(Boolean, False)
min_duration = Default(Integer, 5)
final = Default(Boolean, False)
Construct default objects by following the object’s type with (). If you want an attribute to have a value different from its default, include the attribute name and value inside the parentheses.
>>> Process()
Process(daemon=False, max_failures=1, ephemeral=False,
min_duration=5, final=False)
Attribute values can be template variables, which then receive specific values when creating the object.
>>> Process(cmdline = 'echo {{message}}')
Process(daemon=False, max_failures=1, ephemeral=False, min_duration=5,
cmdline=echo {{message}}, final=False)
>>> Process(cmdline = 'echo {{message}}').bind(message = 'hello world')
Process(daemon=False, max_failures=1, ephemeral=False, min_duration=5,
cmdline=echo hello world, final=False)
A powerful binding property is that all of an object’s children inherit its bindings:
>>> List(Process)([
... Process(name = '{{prefix}}_one'),
... Process(name = '{{prefix}}_two')
... ]).bind(prefix = 'hello')
ProcessList(
Process(daemon=False, name=hello_one, max_failures=1, ephemeral=False, min_duration=5, final=False),
Process(daemon=False, name=hello_two, max_failures=1, ephemeral=False, min_duration=5, final=False)
)
Remember that an Aurora Job contains Tasks which contain Processes. A Job level binding is inherited by its Tasks and all their Processes. Similarly a Task level binding is available to that Task and its Processes but is not visible at the Job level (inheritance is a one-way street.)
When you define a Struct
schema, one powerful, but confusing, feature
is that all of that structure’s attributes are Mustache variables within
the enclosing scope once they have been populated.
For example, when Process
is defined above, all its attributes such as
{{name
}}, {{cmdline
}}, {{max_failures
}} etc., are all immediately
defined as Mustache variables, implicitly bound into the Process
, and
inherit all child objects once they are defined.
Thus, you can do the following:
>>> Process(name = "installer", cmdline = "echo {{name}} is running")
Process(daemon=False, name=installer, max_failures=1, ephemeral=False, min_duration=5,
cmdline=echo installer is running, final=False)
WARNING: This binding only takes place in one direction. For example,
the following does NOT work and does not set the Process
name
attribute’s value.
>>> Process().bind(name = "installer")
Process(daemon=False, max_failures=1, ephemeral=False, min_duration=5, final=False)
The following is also not possible and results in an infinite loop that
attempts to resolve Process.name
.
>>> Process(name = '{{name}}').bind(name = 'installer')
Do not confuse Structural attributes with bound Mustache variables. Attributes are implicitly converted to Mustache variables but not vice versa.
A second templating method is both as powerful as the aforementioned and often confused with it. This method is due to automatic conversion of Struct attributes to Mustache variables as described above.
Suppose you create a Process object:
>>> p = Process(name = "process_one", cmdline = "echo hello world")
>>> p
Process(daemon=False, name=process_one, max_failures=1, ephemeral=False, min_duration=5,
cmdline=echo hello world, final=False)
This Process
object, “p
”, can be used wherever a Process
object is
needed. It can also be reused by changing the value(s) of its
attribute(s). Here we change its name
attribute from process_one
to
process_two
.
>>> p(name = "process_two")
Process(daemon=False, name=process_two, max_failures=1, ephemeral=False, min_duration=5,
cmdline=echo hello world, final=False)
Template creation is a common use for this technique:
>>> Daemon = Process(daemon = True)
>>> logrotate = Daemon(name = 'logrotate', cmdline = './logrotate conf/logrotate.conf')
>>> mysql = Daemon(name = 'mysql', cmdline = 'bin/mysqld --safe-mode')
As described above, .bind()
binds simple strings or numbers to
Mustache variables. In addition to Structural types formed by combining
atomic types, Pystachio has two container types; List
and Map
which
can also be bound via .bind()
.
The bind()
function can take Python dictionaries or kwargs
interchangeably (when “kwargs
” is in a function definition, kwargs
receives a Python dictionary containing all keyword arguments after the
formal parameter list).
>>> String('{{foo}}').bind(foo = 'bar') == String('{{foo}}').bind({'foo': 'bar'})
True
Bindings done “closer” to the object in question take precedence:
>>> p = Process(name = '{{context}}_process')
>>> t = Task().bind(context = 'global')
>>> t(processes = [p, p.bind(context = 'local')])
Task(processes=ProcessList(
Process(daemon=False, name=global_process, max_failures=1, ephemeral=False, final=False,
min_duration=5),
Process(daemon=False, name=local_process, max_failures=1, ephemeral=False, final=False,
min_duration=5)
))
>>> fibonacci = List(Integer)([1, 1, 2, 3, 5, 8, 13])
>>> String('{{fib[4]}}').bind(fib = fibonacci)
String(5)
>>> first_names = Map(String, String)({'Kent': 'Clark', 'Wayne': 'Bruce', 'Prince': 'Diana'})
>>> String('{{first[Kent]}}').bind(first = first_names)
String(Clark)
>>> String('{{p.cmdline}}').bind(p = Process(cmdline = "echo hello world"))
String(echo hello world)
Use structural templates when binding more than two or three individual values at the Job or Task level. For fewer than two or three, standard key to string binding is sufficient.
Structural binding is a very powerful pattern and is most useful in
Aurora/Thermos for doing Structural configuration. For example, you can
define a job profile. The following profile uses HDFS
, the Hadoop
Distributed File System, to designate a file’s location. HDFS
does
not come with Aurora, so you’ll need to either install it separately
or change the way the dataset is designated.
class Profile(Struct):
version = Required(String)
environment = Required(String)
dataset = Default(String, hdfs://home/aurora/data/{{environment}}')
PRODUCTION = Profile(version = 'live', environment = 'prod')
DEVEL = Profile(version = 'latest',
environment = 'devel',
dataset = 'hdfs://home/aurora/data/test')
TEST = Profile(version = 'latest', environment = 'test')
JOB_TEMPLATE = Job(
name = 'application',
role = 'myteam',
cluster = 'cluster1',
environment = '{{profile.environment}}',
task = SequentialTask(
name = 'task',
resources = Resources(cpu = 2, ram = 4*GB, disk = 8*GB),
processes = [
Process(name = 'main', cmdline = 'java -jar application.jar -hdfsPath
{{profile.dataset}}')
]
)
)
jobs = [
JOB_TEMPLATE(instances = 100).bind(profile = PRODUCTION),
JOB_TEMPLATE.bind(profile = DEVEL),
JOB_TEMPLATE.bind(profile = TEST),
]
In this case, a custom structural “Profile” is created to self-document
the configuration to some degree. This also allows some schema
“type-checking”, and for default self-substitution, e.g. in
Profile.dataset
above.
So rather than a .bind()
with a half-dozen substituted variables, you
can bind a single object that has sensible defaults stored in a single
place.