This project has retired. For details please refer to its Attic page.
Apache Aurora

Aurora Configuration Templating

The .aurora file format is just Python. However, Job, Task, Process, and other classes are defined by a templating library called Pystachio, a powerful tool for configuration specification and reuse.

Aurora Configuration Reference has a full reference of all Aurora/Thermos defined Pystachio objects.

When writing your .aurora file, you may use any Pystachio datatypes, as well as any objects shown in the Aurora+Thermos Configuration Reference without import statements - the Aurora config loader injects them automatically. Other than that the .aurora format works like any other Python script.

Templating 1: Binding in Pystachio

Pystachio uses the visually distinctive {{}} to indicate template variables. These are often called “mustache variables” after the similarly appearing variables in the Mustache templating system and because the curly braces resemble mustaches.

If you are familiar with the Mustache system, templates in Pystachio have significant differences. They have no nesting, joining, or inheritance semantics. On the other hand, when evaluated, templates are evaluated iteratively, so this affords some level of indirection.

Let’s start with the simplest template; text with one variable, in this case name;

Hello {{name}}

If we evaluate this as is, we’d get back:

Hello

If a template variable doesn’t have a value, when evaluated it’s replaced with nothing. If we add a binding to give it a value:

{ "name" : "Tom" }

We’d get back:

Hello Tom

Every Pystachio object has an associated .bind method that can bind values to {{}} variables. Bindings are not immediately evaluated. Instead, they are evaluated only when the interpolated value of the object is necessary, e.g. for performing equality or serializing a message over the wire.

Objects with and without mustache templated variables behave differently:

>>> Float(1.5)
Float(1.5)

>>> Float('{{x}}.5')
Float({{x}}.5)

>>> Float('{{x}}.5').bind(x = 1)
Float(1.5)

>>> Float('{{x}}.5').bind(x = 1) == Float(1.5)
True

>>> contextual_object = String('{{metavar{{number}}}}').bind(
... metavar1 = "first", metavar2 = "second")

>>> contextual_object
String({{metavar{{number}}}})

>>> contextual_object.bind(number = 1)
String(first)

>>> contextual_object.bind(number = 2)
String(second)

You usually bind simple key to value pairs, but you can also bind three other objects: lists, dictionaries, and structurals. These will be described in detail later.

Structurals in Pystachio / Aurora

Most Aurora/Thermos users don’t ever (knowingly) interact with String, Float, or Integer Pystashio objects directly. Instead they interact with derived structural (Struct) objects that are collections of fundamental and structural objects. The structural object components are called attributes. Aurora’s most used structural objects are Job, Task, and Process:

class Process(Struct):
  cmdline = Required(String)
  name = Required(String)
  max_failures = Default(Integer, 1)
  daemon = Default(Boolean, False)
  ephemeral = Default(Boolean, False)
  min_duration = Default(Integer, 5)
  final = Default(Boolean, False)

Construct default objects by following the object’s type with (). If you want an attribute to have a value different from its default, include the attribute name and value inside the parentheses.

>>> Process()
Process(daemon=False, max_failures=1, ephemeral=False,
  min_duration=5, final=False)

Attribute values can be template variables, which then receive specific values when creating the object.

>>> Process(cmdline = 'echo {{message}}')
Process(daemon=False, max_failures=1, ephemeral=False, min_duration=5,
        cmdline=echo {{message}}, final=False)

>>> Process(cmdline = 'echo {{message}}').bind(message = 'hello world')
Process(daemon=False, max_failures=1, ephemeral=False, min_duration=5,
        cmdline=echo hello world, final=False)

A powerful binding property is that all of an object’s children inherit its bindings:

>>> List(Process)([
... Process(name = '{{prefix}}_one'),
... Process(name = '{{prefix}}_two')
... ]).bind(prefix = 'hello')
ProcessList(
  Process(daemon=False, name=hello_one, max_failures=1, ephemeral=False, min_duration=5, final=False),
  Process(daemon=False, name=hello_two, max_failures=1, ephemeral=False, min_duration=5, final=False)
  )

Remember that an Aurora Job contains Tasks which contain Processes. A Job level binding is inherited by its Tasks and all their Processes. Similarly a Task level binding is available to that Task and its Processes but is not visible at the Job level (inheritance is a one-way street.)

Mustaches Within Structurals

When you define a Struct schema, one powerful, but confusing, feature is that all of that structure’s attributes are Mustache variables within the enclosing scope once they have been populated.

For example, when Process is defined above, all its attributes such as {{name}}, {{cmdline}}, {{max_failures}} etc., are all immediately defined as Mustache variables, implicitly bound into the Process, and inherit all child objects once they are defined.

Thus, you can do the following:

>>> Process(name = "installer", cmdline = "echo {{name}} is running")
Process(daemon=False, name=installer, max_failures=1, ephemeral=False, min_duration=5,
        cmdline=echo installer is running, final=False)

WARNING: This binding only takes place in one direction. For example, the following does NOT work and does not set the Process name attribute’s value.

>>> Process().bind(name = "installer")
Process(daemon=False, max_failures=1, ephemeral=False, min_duration=5, final=False)

The following is also not possible and results in an infinite loop that attempts to resolve Process.name.

>>> Process(name = '{{name}}').bind(name = 'installer')

Do not confuse Structural attributes with bound Mustache variables. Attributes are implicitly converted to Mustache variables but not vice versa.

Templating 2: Structurals Are Factories

A Second Way of Templating

A second templating method is both as powerful as the aforementioned and often confused with it. This method is due to automatic conversion of Struct attributes to Mustache variables as described above.

Suppose you create a Process object:

>>> p = Process(name = "process_one", cmdline = "echo hello world")

>>> p
Process(daemon=False, name=process_one, max_failures=1, ephemeral=False, min_duration=5,
        cmdline=echo hello world, final=False)

This Process object, “p”, can be used wherever a Process object is needed. It can also be reused by changing the value(s) of its attribute(s). Here we change its name attribute from process_one to process_two.

>>> p(name = "process_two")
Process(daemon=False, name=process_two, max_failures=1, ephemeral=False, min_duration=5,
        cmdline=echo hello world, final=False)

Template creation is a common use for this technique:

>>> Daemon = Process(daemon = True)
>>> logrotate = Daemon(name = 'logrotate', cmdline = './logrotate conf/logrotate.conf')
>>> mysql = Daemon(name = 'mysql', cmdline = 'bin/mysqld --safe-mode')

Advanced Binding

As described above, .bind() binds simple strings or numbers to Mustache variables. In addition to Structural types formed by combining atomic types, Pystachio has two container types; List and Map which can also be bound via .bind().

Bind Syntax

The bind() function can take Python dictionaries or kwargs interchangeably (when “kwargs” is in a function definition, kwargs receives a Python dictionary containing all keyword arguments after the formal parameter list).

>>> String('{{foo}}').bind(foo = 'bar') == String('{{foo}}').bind({'foo': 'bar'})
True

Bindings done “closer” to the object in question take precedence:

>>> p = Process(name = '{{context}}_process')
>>> t = Task().bind(context = 'global')
>>> t(processes = [p, p.bind(context = 'local')])
Task(processes=ProcessList(
  Process(daemon=False, name=global_process, max_failures=1, ephemeral=False, final=False,
          min_duration=5),
  Process(daemon=False, name=local_process, max_failures=1, ephemeral=False, final=False,
          min_duration=5)
))

Binding Complex Objects

Lists
>>> fibonacci = List(Integer)([1, 1, 2, 3, 5, 8, 13])
>>> String('{{fib[4]}}').bind(fib = fibonacci)
String(5)
Maps
>>> first_names = Map(String, String)({'Kent': 'Clark', 'Wayne': 'Bruce', 'Prince': 'Diana'})
>>> String('{{first[Kent]}}').bind(first = first_names)
String(Clark)
Structurals
>>> String('{{p.cmdline}}').bind(p = Process(cmdline = "echo hello world"))
String(echo hello world)

Structural Binding

Use structural templates when binding more than two or three individual values at the Job or Task level. For fewer than two or three, standard key to string binding is sufficient.

Structural binding is a very powerful pattern and is most useful in Aurora/Thermos for doing Structural configuration. For example, you can define a job profile. The following profile uses HDFS, the Hadoop Distributed File System, to designate a file’s location. HDFS does not come with Aurora, so you’ll need to either install it separately or change the way the dataset is designated.

class Profile(Struct):
  version = Required(String)
  environment = Required(String)
  dataset = Default(String, hdfs://home/aurora/data/{{environment}}')

PRODUCTION = Profile(version = 'live', environment = 'prod')
DEVEL = Profile(version = 'latest',
                environment = 'devel',
                dataset = 'hdfs://home/aurora/data/test')
TEST = Profile(version = 'latest', environment = 'test')

JOB_TEMPLATE = Job(
  name = 'application',
  role = 'myteam',
  cluster = 'cluster1',
  environment = '{{profile.environment}}',
  task = SequentialTask(
    name = 'task',
    resources = Resources(cpu = 2, ram = 4*GB, disk = 8*GB),
    processes = [
  Process(name = 'main', cmdline = 'java -jar application.jar -hdfsPath
             {{profile.dataset}}')
    ]
   )
 )

jobs = [
  JOB_TEMPLATE(instances = 100).bind(profile = PRODUCTION),
  JOB_TEMPLATE.bind(profile = DEVEL),
  JOB_TEMPLATE.bind(profile = TEST),
 ]

In this case, a custom structural “Profile” is created to self-document the configuration to some degree. This also allows some schema “type-checking”, and for default self-substitution, e.g. in Profile.dataset above.

So rather than a .bind() with a half-dozen substituted variables, you can bind a single object that has sensible defaults stored in a single place.