Usage

Installation

dlb is written in Python and requires at least Python 3.7.

The canonical way to install dlb is from the Python Package Index (PyPI):

$ python3 -m pip install dlb

If you prefer not to install to the Python system location or do not have privileges to do so, you can add a flag to install to a location specific to your own account:

$ python3 -m pip install --user dlb

After the successful installation, the dlb modules are ready for import by a Python 3 interpreter:

>>> import dlb
>>> dlb.__version__
'1.2.3'

Check also the installed command-line utility [1]:

$ dlb --help

This shows you the location of all installed files:

$ python3 -m pip show -f dlb

It is also possible to “install” dlb into a project as a ZIP archive. See here for details.

Update and uninstall

Update an dlb installation with:

$ python3 -m pip install --upgrade [ --user ] dlb

Uninstall it with:

$ python3 -m pip uninstall [ --user ] dlb

A simple project

We assume that you want to build some software from a Git repository with dlb and a POSIX compliant shell on a GNU/Linux system.

Let’s call the project hello.

Create the Git working directory

First, create the repository:

$ mkdir hello
$ cd hello
$ git init

dlb requires a .dlbroot directory as a marker for the root of its working tree, similar to .git of Git:

$ mkdir .dlbroot

Now, the directory is ready for use by dlb as a working tree. dlb does not require or assume anything about existing file or directories outside .dlbroot (see here for details on the directory layout). We will use a dlb script called build.py to build our project, so let’s start with an polite one:

$ echo 'print("hello there!")' > build.py

Now, we can use dlb to run build.py:

$ dlb build
hello there!

We could also have used python3 "${PWD}"/build.py instead of dlb build. dlb comes in handy when you are working in a subdirectory of the working tree or when you need modules from ZIP archives (e.g. dlb itself):

$ mkdir src
$ cd src
$ dlb
using arguments of last successful run: 'build.py'
hello there!
$ cd ..

See dlb --help for a detailed description.

Run a custom tool in an execution context

Replace the content of build.py with this:

import dlb.ex

class Replacer(dlb.ex.Tool):
    PATTERN = 'xxx'
    REPLACEMENT = 'hello'

    template_file = dlb.ex.input.RegularFile()
    output_file = dlb.ex.output.RegularFile()

    async def redo(self, result, context):
        with open(self.template_file.native, 'r') as i:
            c = i.read()  # read input
        with context.temporary() as t:
            with open(t.native, 'w') as o:
                o.write(c.replace(self.PATTERN, self.REPLACEMENT))  # write transformed 'c' to temporary
            context.replace_output(result.output_file, t)  # atomically replace output_file by temporary

t = Replacer(template_file='src/main.c.tmpl', output_file='build/out/main.c')  # create a tool instance
with dlb.ex.Context():  # an execution context
    t.start()  # start running the tool instance in the active execution context

dlb.di.inform('finished successfully')

This defines a tool called Replacer with an input dependency role template_file and an output dependency role output_file. The class attributes PATTERN and REPLACEMENT are execution parameters of the tool. The method redo() is called by t.start() eventually if a redo is necessary.

Create a file src/main.c.tmpl with this content:

// xxx
#include <stdio.h>

int main() {
    printf("xxx\n");
    return 0;
}

When you run dlb now, you get something like this:

$ dlb build
D check redo necessity for tool instance 1... [+0.000000s]
  D explicit output dependencies... [+0.000161s]
    I redo necessary because of filesystem object: 'build/out/main.c'
      | reason: [Errno 2] No such file or directory: '/.../hello/build/out/main.c'
    D done. [+0.000264s]
  D done. [+0.000331s]
I start redo for tool instance 1 [+0.014796s]
I replaced regular file with different one: 'build/out/main.c'
I finished successfully

It informs you that a redo was necessary for the tool instance because the output dependency build/out/main.c did not exist. It was created by the redo and now contains:

// hello

#include <stdio.h>

int main() {
    printf("hello\n");
    return 0;
}

Now run dlb again:

$ dlb build
I finished successfully

Nothing happens because the output existed and the input (including the tool definition in build.py) did not change. After a modification of the input dependency, dlb again causes a redo:

$ touch src/main.c.tmpl
$ dlb build
D check redo necessity for tool instance 1... [+0.000000s]
  D compare input dependencies with state before last successful redo... [+0.000287s]
    I redo necessary because of filesystem object: 'src/main.c.tmpl'
      | reason: mtime has changed
    D done. [+0.000375s]
  D done. [+0.000385s]
I start redo for tool instance 1 [+0.014572s]
I replaced regular file with different one: 'build/out/main.c'
I finished successfully

Control the diagnostic message verbosity

dlb is configured by configuration parameters in dlb.cf.

You want to know how exactly dlb calls the external tools and like some output after each run? Add the following lines to build.py (before the line with dlb.ex.Context():):

import dlb.di
import dlb.cf

dlb.cf.level.helper_execution = dlb.di.INFO
dlb.cf.latest_run_summary_max_count = 5

This instructs dlb to use the log level dlb.di.INFO for all future diagnostic messages of the category dlb.cf.level.helper_execution and to output a summary after each run that compares the run with the previous ones.

It is good practice to output some summary of a successful build even if no redo was necessary. This can be a relevant information on the most important build product (e.g. code size of an application) or just the line dlb.di.inform('finished successfully') at the end of the dlb script.

In case you find the standard Python traceback (output on uncaught exceptions) too verbose or cluttered, you can replace it by the one provided by dlb_contrib.exctrace.

Commit the changes

Git does not track empty directories. If we want Git to create .dlbroot as part of the repository, a file must be added. We can use an empty file .dlbroot/z to that end:

$ touch .dlbroot/z
$ echo /.dlbroot/ > .gitignore
$ git add .gitignore
$ git add -f .dlbroot/z
$ git add ...
$ git commit

Understand redo necessity

Everything related to dependency checking and redos is centered around tool instances; only tool instances can have dependencies.

This line creates a tool instance t that assigns the concrete input dependency dlb.fs.Path('src/main.c.tmpl') to the input dependency role template_file and the concrete output dependency dlb.fs.Path('build/out/main.c') to the output dependency role output_file:

t = Replacer(template_file='src/main.c.tmpl', output_file='build/out/main.c')

t.start() performs a redo when

  1. one is explicitly requested by t.start(force_redo=True) or
  2. a redo is considered necessary (see here for general conditions and the documentation of the dependency classes for the specific ones).

Note

In contrast to what someone used to the appearance of SCons scripts might expect, the constructor of a tool instance does not run it. Make sure you call start() on a tool instance when you want it to perform its actual task.

After the successful completion of a redo of a tool instance t the run-database contains the depended-upon state of its (explicit and non-explicit) input dependencies before the start of the redo and its non-explicit input dependencies.

A redo of t from above is considered necessary if at least one of the following conditions is true:

  • A redo was never performed successfully before for t (same class and fingerprint) according to the run-database.
  • build/out/main.c does not exist as a regular file.
  • The mtime, size, UID, GID, or set of filesystem permissions of src/main.c.tmpl has changed since the start of the last known successful redo for t (because it is an output dependency of t)
  • The value of PATTERN or REPLACEMENT has changed since the the last known successful redo for t.
  • The mtime, size, UID, GID, or set of filesystem permissions of build.py has changed since the last known successful redo of t (because build.py is a definition file for t in the managed tree).

Note

You may have noticed that an mtime modification of build/out/main.c does not lead to a redo. A modification of an output dependency is always treated as purposeful. This allows for modification of output dependencies after they were generated (e.g. for source code formatting or for small fixes in a huge set of generated HTML documents). [2]

Tool instances are identified by their class (file path and line number of definition) and their fingerprint. The fingerprint includes the concrete dependencies of the tool instance which are defined by arguments of the constructor matching class attributes, and its execution parameters. Consider the following tool instances:

t = Replacer(template_file='src/main.c.tmpl', output_file='build/out/main.c')  # from above
t2 = Replacer(template_file=dlb.fs.Path('src/main.c.tmpl'), output_file='build/out/main.c')
t3 = Replacer(template_file='src/MAIN.C.TMPL', output_file='build/out/main.c')

t2 and t have the same same class and fingerprint and are therefore indistinguishable with respect to dependencies; the statements t.start() and t2.start() have the same effect under all circumstances. t3 on the other hand has a different fingerprint; t3.start() does not affect the last known successful redo for t.

Note

dlb never stores the state of filesystem objects outside the working tree in its run-database. The modification of such a filesystem object does not lead to a redo. [15]

Understand redo concurrency

When t.start() “performs a redo” it schedules the eventual (asynchronous) execution of redo() and then returns immediately. The completion of the pending redo is left to asyncio.

So, redos are parallel by default. The maximum number of pending redos at a time is given by max_parallel_redo_count of the active context.

In contrast to GNU Make or Ninja, for example, filesystem paths used in multiple tool instances do not form an implicit mutual exclusion mechanism. Synchronization and ordering of events are explicit in dlb. Redos can be synchronized

  1. globally for all pending redos by the means of (execution) contexts or
  2. selectively for a specific redo by accessing the result (proxy) object return by dlb.ex.Tool.start().

See dlb.ex.Tool.start() for details.

As a rule, paths should not be repeated like build/out/main.c in this snippet (which may execute the redos of Replacer(...) and CCompiler(...) in parallel):

Replacer(template_file='src/main.c.tmpl', output_file='build/out/main.c').start()
CCompiler(source_files=['build/out/main.c'], object_files=['build/out/main.c.o']).start()

Better use a variable whose name expresses the meaning of the filesystem object or cascade tool instances with their result objects. Write this, for example, if you want to express that one tool instance depends on the result of another one:

r = Replacer(template_file='src/main.c.tmpl', output_file='build/out/main.c').start()
CCompiler(source_files=[r.output_file], object_files=['build/out/main.c.o']).start()
# waits for pending redo with result r to complete before CCompiler(...).start()

This mechanism is used in example/c-minimal/.

To wait for the completion of a specific redo without referring to specific dependencies, you can use complete() instead:

r = Replacer(...).start().complete()
assert r.iscomplete
# note: the missing '_' makes clear that 'complete' and 'iscomplete'
# are not names of dependencies

Alternatively, you could wait for all pending redos to complete before Compiler(...).start() if you prefer to split the build into sequential phases like this:

# code generation phase
Replacer(template_file='src/main.c.tmpl', output_file='build/out/main.c').start()

# compilation phase
with dlb.ex.Context():  # waits for all pending redos to complete
    CCompiler(source_files=['build/out/main.c'], object_files=['build/out/main.c.o']).start()

This mechanism is used in example/c-gtk-doxygen/.

Real stuff

There are more meaningful tasks than replacing text in a text file.

For example, building a C program with GCC looks like this: example/c-minimal/build-all.py.

The package dlb_contrib provides tools and utilities to build upon.

Self-contained projects: dlb as part of the repository

ZIP archives in .dlbroot/u/ are automatically added to the module search path of the Python interpreter by dlb. Placing the dlb package as a version controlled ZIP archive there — say, .dlbroot/u/dlb-1.2.3.zip — allows you to keep a certain version of dlb in your project’s repository independent of a system-wide installed version.

If you do not need the command-line utility dlb, dlb does not even have to be installed (globally) to build your project.

Redirection of diagnostic messages

Diagnostic messages are output to sys.stderr by default. To unambiguously separate them from output of executed tools (e.g. compiler warnings) you can always set a destination with dlb.di.set_output_file():

import dlb.di
dlb.di.set_output_file(open('run.log', 'w'))
# any object with a file-like write() method can be used as output file

The following snippet “exposes” the destination of diagnostic messages to the parent process and therefore allows for its manipulation by shell redirection:

try:
    dlb.di.set_output_file(open(3, 'w', buffering=1))
except OSError:  # e.g. because file descriptor 3 not opened by parent process
    pass

Possible applications (on a typical GNU/Linux system):

$ dlb 3>run.log             # write to file
$ dlb 3>/dev/pts/0          # show in specific pseudo terminal
$ dlb 3>&1 1>&2 | gedit -   # (incrementally) show in GEdit window

$ mkfifo dlb.fifo
$ tilix -e cat dlb.fifo && dlb 3>dlb.fifo  # show in new terminal emulator window

If you mostly work with a terminal emulator that is (at least partially) compliant with ISO/IEC 6429, colored output might be useful which can easily be achieved with MessageColorator from dlb_contrib.iso6429.

PyCharm integration

If you use PyCharm to edit (and/or run and debug), your dlb scripts you can take advantage of the integrated referral to external HTML documentation: Place the caret in the editor on a dlb object (anything except a module) — e.g. between the P and the a of dlb.fs.Path — and press Shift+F1 or Ctrl+Q to show the HTML documentation in your web browser.

Configuration (as of PyCharm 2020.1): Add the following documentation URLs in the settings page Tool ‣ External Documentation:

Module Name URL/Path Pattern
dlb https://dlb.readthedocs.io/en/<which>/reference.html#element.qname
dlb_contrib https://dlb.readthedocs.io/en/<which>/reference.html#element.qname

Replace <which> by a specific version like v0.3.0 or stable for the latest version.

Recommendations for efficiency and reliability

These recommendations describe the typical use case. Use them as a starting point for most efficient and reliable operation. [3]

Setup a working tree

  • Place the entire working tree on the same file system with a decently fine effective mtime resolution (no courser than 100 ms). XFS or Ext4 are fine. Avoid FAT32. [4]

    Make sure the filesystem is mounted with “normal” (immediate) update of mtime (e.g. without lazytime for Ext4). [5]

  • Place all input files (that are only read by tool instances) in a filesystem tree in the working tree that is not modified by tool instances.

    This is not required but good practice. It also enables you to use operating system specific features to protect the build against accidental changes of input files. For example: Protect the input files from change by a transparent read-only filesystem mounted on top of it during the build.

  • Do not use symbolic links in the managed tree to filesystem objects not in the same managed tree.

Run dlb

  • Do not modify the management tree unless told so by dlb. [6]

  • Do not modify the mtime of filesystem objects in the working tree manually while dlb is running. [13]

  • Do not modify the content of filesystem objects in the managed tree on purpose while dlb is running, if they are used as input dependencies or output dependencies of a tool instance.

    Yes, I know: it happens a lot by mistake when editing source files.

    dlb itself is designed to be relatively robust to such modifications. As long as the size of the modified regular file changes or the working tree time is monotonic, there is no redo miss in the current or in any future run of dlb. [7] [8]

    However, many external tools cannot guarantee proper behaviour if some of their input files are changed while they are being executed (e.g. a compiler working on multiple input files).

  • Avoid mv to replace regular files; is does not update its target’s mtime.

    Use cp instead.

  • Be careful when you modify a file that is an input dependency of a tool instance via mmap. [14]

  • Do not put the system time used as working tree’s system time back on purpose while dlb is running or while you are modifying the managed tree. [11]

Write scripts and tools

  • It is safe to modify the managed tree immediately after a run of dlb is completed (e.g. in the same script) without risking a redo miss [9]

  • Do not use (explicit) multithreading. Use asyncio instead.

  • Do not use multiple hierarchical scripts (where one calls another). This would be error-prone and inefficient. Use scripts only on the top-level.

  • Split large scripts into small modules that are imported by the script (like this: example/c-gtk-doxygen/). You can place these modules in the directory they control.

  • Use only one root context and nest all other contexts inside (even in modules imported inside this context). [10]

    Do:

    import dlb.ex
    ...
    with dlb.ex.Context():
        with dlb.ex.Context():
            ...
        with dlb.ex.Context():
            ...
        import build_subsystem_a  # contains 'with dlb.ex.Context(): ... '
    

    Don’t:

    import dlb.ex
    ...
    
    with dlb.ex.Context():
       ...  # context manager exit is artificially delayed as necessary according to the
            # filesystem's effective mtime resolution
    
    with dlb.ex.Context():
       ...  # context manager exit is artificially delayed as necessary according to the
            # filesystem's effective mtime resolution (again)
    
  • Do not modify the managed tree in a script – e.g. by calling shutil.rmtree() directly – unless you are sure no redo is pending that accesses the affected filesystem objects. [7]

Footnotes

[1]When installed with python3 -m pip install --user dlb, the command-line utility is created below python3 -m site --user-base according to PEP 370. Make sure this directory is part of the search paths for executables.
[2]It is impossible to reliably detect an mtime modification of a (POSIX) filesystem object after its generation without the requirement of monotonic system time and real-time guarantees. Without such (unrealistic) requirements, the probability of correct detection can be made arbitrarily small by pausing the involved processes at the right moments.
[3]Although they are not formally specified, Make has by design much stricter requirements and much looser guarantees.
[4]A-F1, A-T3
[5]A-F2, A-F3, A-F4
[6]A-A1
[7](1, 2) A-A2, G-D1, G-D2, G-D3
[8]Make is very vulnerable to this. Even with a monotonically increasing working tree time, the inputs (sources of a rule) must not be changed from the moment its recipe’s execution is started until the next increase of the working tree time after the recipe’s execution is completed. Otherwise, there is a redo miss in every future run — until the working tree time a an input is changed again in a way that does not cause a redo miss.
[9]This is not the case with Make.
[10]G-T2
[11]A-T2 G-D1, G-D3
[12]G-T1
[13]A-A3
[14]A-F3
[15]This is a deliberate design decision. It avoids complicated assumptions related to the mtimes of different filesystems, helps to promote a clean structure of project files and makes it possible to move an entire working tree without changing the meaning of the run-database in an unpredictable manner.