The Python components of Aurora are built using Pants.
The Python code is laid out according to the following conventions:
1 BUILD
per 3rd level directory. For a list of current top-level packages run:
% find src/main/python -maxdepth 3 -mindepth 3 -type d |\
while read dname; do echo $dname |\
sed 's@src/main/python/\(.*\)/\(.*\)/\(.*\).*@\1.\2.\3@'; done
Each BUILD
file exports 1
python_library
that provides a
setup_py
containing each
python_binary
in the BUILD
file, named the same as the directory it’s in so that it can be referenced
without a ’:’ character. The sources
field in the python_library
will almost always be
rglobs('*.py')
.
Other BUILD files may only depend on this single public python_library
target. Any other target is considered a private implementation detail and
should be prefixed with an _
.
python_binary
targets are always named the same as the exported console script.
python_binary
targets must have identical dependencies
to the python_library
exported
by the package and must use entry_point
.
The means a PEX file generated by pants will contain exactly the same files that will be
available on the PYTHONPATH
in the case of pip install
of the corresponding library
target. This will help our migration off of Pants in the future.
% find src/main/python/apache/thermos/runner
src/main/python/apache/thermos/runner
src/main/python/apache/thermos/runner/__init__.py
src/main/python/apache/thermos/runner/thermos_runner.py
src/main/python/apache/thermos/runner/BUILD
% cat src/main/python/apache/thermos/runner/BUILD
# License boilerplate omitted
import os
# Private target so that a setup_py can exist without a circular dependency. Only targets within
# this file should depend on this.
python_library(
name = '_runner',
# The target covers every python file under this directory and subdirectories.
sources = rglobs('*.py'),
dependencies = [
'3rdparty/python:twitter.common.app',
'3rdparty/python:twitter.common.log',
# Source dependencies are always referenced without a ':'.
'src/main/python/apache/thermos/common',
'src/main/python/apache/thermos/config',
'src/main/python/apache/thermos/core',
],
)
# Binary target for thermos_runner.pex. Nothing should depend on this - it's only used as an
# argument to ./pants binary.
python_binary(
name = 'thermos_runner',
# Use entry_point, not source so the files used here are the same ones tests see.
entry_point = 'apache.thermos.bin.thermos_runner',
dependencies = [
# Notice that we depend only on the single private target from this BUILD file here.
':_runner',
],
)
# The public library that everyone importing the runner symbols uses.
# The test targets and any other dependent source code should depend on this.
python_library(
name = 'runner',
dependencies = [
# Again, notice that we depend only on the single private target from this BUILD file here.
':_runner',
],
# We always provide a setup_py. This will cause any dependee libraries to automatically
# reference this library in their requirements.txt rather than copy the source files into their
# sdist.
provides = setup_py(
# Conventionally named and versioned.
name = 'apache.thermos.runner',
version = open(os.path.join(get_buildroot(), '.auroraversion')).read().strip().upper(),
).with_binaries({
# Every binary in this file should also be repeated here.
# Always use the dict-form of .with_binaries so that commands with dashes in their names are
# supported.
# The console script name is always the same as the PEX with .pex stripped.
'thermos_runner': ':thermos_runner',
}),
)
The Aurora source repository and distributions contain several binary files to qualify the backwards-compatibility of thermos with checkpoint data. Since thermos persists state to disk, to be read by the thermos observer), it is important that we have tests that prevent regressions affecting the ability to parse previously-written data.
The files included represent persisted checkpoints that exercise different features of thermos. The existing files should not be modified unless we are accepting backwards incompatibility, such as with a major release.
It is not practical to write source code to generate these files on the fly, as source would be vulnerable to drift (e.g. due to refactoring) in ways that would undermine the goal of ensuring backwards compatibility.
The most common reason to add a new checkpoint file would be to provide
coverage for new thermos features that alter the data format. This is
accomplished by writing and running a
job configuration that exercises the feature, and
copying the checkpoint file from the sandbox directory, by default this is
/var/run/thermos/checkpoints/<aurora task id>
.