Usage

Installation

dlb is written in Python and requires at least Python 3.7.

The canonical way to install dlb is from the Python Package Index (PyPI):

$ python3 -m pip install dlb

If you prefer not to install to the Python system location, or do not have privileges to do so, you can add a flag to install to a location specific to your own account:

$ python3 -m pip install --user dlb

After the successful installation, the dlb modules are ready for import by a Python 3 interpreter:

>>> import dlb
>>> dlb.__version__
'1.2.3'

Check also the installed command-line utility [1]:

$ dlb --help

This shows you the location of all installed files:

$ python3 -m pip show -f dlb

It is also possible to “install” dlb into a project as a ZIP archive. See here for details.

Update and uninstall

Update an dlb installation with:

$ python3 -m pip install --upgrade [ --user ] dlb

Uninstall it with:

$ python3 -m pip uninstall [ --user ] dlb

A simple project

We assume that you want to build some software from a Git repository with dlb on a GNU/Linux system with a POSIX compliant shell. Let’s call the project hello.

Create the Git working directory

First, create the repository:

$ mkdir hello
$ cd hello
$ git init

dlb requires a .dlbroot directory as a marker for the root of its working tree, similar to .git of Git:

$ mkdir .dlbroot

Now, the directory is ready for use by dlb as a working tree. dlb does not require or assume anything about existing file or directories outside .dlbroot (see here for details on the directory layout). We will use a dlb script called build.py to build our project, so let’s start with an polite one:

$ echo 'print("hello there!")' > build.py

Now, we can use dlb to run build.py:

$ dlb build
hello there!

Instead of dlb build we could also have used python3 "${PWD}"/build.py. dlb comes in handy when you are working in a subdirectory of the working tree or when you need modules from ZIP archives (e.g. dlb itself):

$ mkdir src
$ cd src
$ dlb
using arguments of last successful run: 'build.py'
hello there!
$ cd ..

See dlb --help for a detailed description.

Run a custom tool in an execution context

Replace the content of build.py by this:

import dlb.ex

class Replacer(dlb.ex.Tool):
    PATTERN = 'xxx'
    REPLACEMENT = 'hello'

    template = dlb.ex.Tool.Input.RegularFile()
    output = dlb.ex.Tool.Output.RegularFile()

    async def redo(self, result, context):
        with open(self.template.native, 'r') as i:
            c = i.read()  # read input
        with context.temporary() as t:
            with open(t.native, 'w') as o:
                o.write(c.replace(self.PATTERN, self.REPLACEMENT))  # write transformed 'c' to temporary
            context.replace_output(result.output, t)  # atomically replace output by temporary

with dlb.ex.Context():  # an execution context
    Replacer(template='src/main.c.tmpl', output='build/out/main.c').run()  # create a tool instance and run it

This defines a tool called Replacer with an input dependency template and an output dependency output. The class attributes PATTERN and REPLACEMENT are execution parameters of the tool. The method redo() is called by Replacer(...).run() if a redo is necessary.

Create a file src/main.c.tmpl with this content:

// xxx
#include <stdio.h>

int main() {
    printf("xxx\n");
    return 0;
}

When you run dlb now, you get something like:

$ dlb build
D check redo necessity for tool instance 1... [+0.000000s]
  D explicit output dependencies... [+0.000161s]
    I redo necessary because of filesystem object that is an output dependency: 'build/out/main.c'
      | reason: [Errno 2] No such file or directory: '/.../hello/build/out/main.c'
    D done. [+0.000264s]
  D done. [+0.000331s]
I start redo for tool instance 1 [+0.014796s]

It informs you that a redo was necessary for the tool instance because the output dependency build/out/main.c did not exist. It was created by the redo and now contains:

// hello

#include <stdio.h>

int main() {
    printf("hello\n");
    return 0;
}

Now run dlb again:

$ dlb build

Nothing happens because the output existed and the input (including the tool definition in build.py) did not change. After a modification of the input dependency, dlb again causes a redo:

$ touch src/src/main.c.tmpl
$ dlb build
D check redo necessity for tool instance 1... [+0.000000s]
  D compare input dependencies with state before last successful redo... [+0.000287s]
    I redo necessary because of filesystem object: 'src/main.c.tmpl'
      | reason: mtime has changed
    D done. [+0.000375s]
  D done. [+0.000385s]
I start redo for tool instance 1 [+0.014572s]

Real stuff

There are more meaningful tasks than replacing text in text file.

For example, building a C program with GCC looks like this.

The package dlb_contrib provides tools and utilities to build upon.

Commit the changes

Git does not track empty directories. If we want Git to create .dlbroot as part of the repository, a file must be added. We can use the file .dlbroot/o created by the root context of a previous run of dlb to that end:

$ git add .dlbroot/o
$ git commit

Self-contained project: add dlb to the repository

ZIP archives in .dlbroot/u/ are automatically added to the module search path of the Python interpreter by dlb. Placing the dlb package as a version controlled ZIP archive there — say, .dlbroot/u/dlb-1.2.3.zip — allows you to keep a certain version of dlb independent of a system-wide installed version.

Recommendations for efficiency and reliability

These recommendation describe the typical use case. Use them as a starting point for most efficient and reliable operation. [2]

Setup a working tree

  • Place the entire working tree on the same file system with a decently fine effective mtime resolution (no courser than 100 ms). XFS or Ext4 are fine. Avoid FAT32. [3]

    Make sure the filesystem is mounted with “normal” (immediate) update of mtime (e.g. without lazytime for Ext4). [4]

  • Place all input files (that are only read by tool instances) in a filesystem tree in the working tree that is not modified by tool instances.

    This is not required but good practice. It also enables you to use operating system specific features to protect the build against accidental changes of input files. For example: Protect the input files from change by a transparent read-only filesystem mounted on top of it during the build.

  • Do not use symbolic links in the managed tree to filesystem objects not in the same managed tree.

Run dlb

  • Do not modify the management tree unless told so by dlb. [5]

  • Do not modify the mtime of filesystem objects in the working tree manually while dlb is running. [12]

  • Do not modify the content of filesystem objects in the managed tree on purpose while dlb is running, if they are used as input dependencies or output dependencies of a tool instance.

    Yes, I know: it happens a lot by mistake when editing source files.

    dlb itself is designed to be relatively robust to such modifications. As long as the size of modified regular file changes or the working tree time is monotonic, there is no redo miss in the current or in any future run of dlb. [6] [7]

    However, many external tools cannot guarantee proper behaviour if some of their input files are changed while they are being executed (e.g. a compiler working on multiple input files).

  • Avoid mv to replace regular files; is does not update its target’s mtime.

    Use cp instead.

  • Be careful when you modify a file via mmap that is an input dependency of a tool instance. [13]

  • Do not put the system time used as working tree’s system time back on purpose while dlb is running or while you are modifying the managed tree. [10]

Write scripts and tools

  • Do not modify the managed tree in a script inside a root context, e.g. by calling shutil.rmtree() directly. [6]

    Use tool instances instead.

  • It is safe to modify the managed tree immediately after a run of dlb is completed (e.g. in the same script, without risking a redo miss [8]

  • Do not use (explicit) multithreading. Use asyncio instead.

  • Do not use multiple hierarchical scripts (where one calls another). This would be error-prone an inefficient. Use scripts only on the top-level.

  • Split large scripts into small modules that are imported by the script. You can place these modules in the directory they control.

  • Use only one root context and nest all other contexts inside (even in modules imported inside this context). [9]

    Do:

    import dlb.ex
    ...
    with dlb.ex.Context():
        with dlb.ex.Context():
            ...
        with dlb.ex.Context():
            ...
        import build_subsystem_a  # contains 'with dlb.ex.Context(): ... '
    

    Don’t:

    import dlb.ex
    ...
    
    with dlb.ex.Context():
       ...  # context manager exit is artificially delayed as necessary according to the
            # filesystem's effective mtime resolution
    
    with dlb.ex.Context():
       ...  # context manager exit is artificially delayed as necessary according to the
            # filesystem's effective mtime resolution (again)
    
  • Use context to serialize groups of running tool instances, even when running in parallel [11]:

    with dlb.ex.Context(max_parallel_redo_count=4):
        ...
    
    ...  #  all running tool instances are completed here
    
    with dlb.ex.Context():
        ...
    

Footnotes

[1]When installed with python3 -m pip install --user dlb, the command-line utility is created below python3 -m site --user-base according to PEP 370. Make sure this directory is part of the search paths for executables.
[2]Although they are not formally specified, Make has by design much stricter requirements and much looser guarantees.
[3]A-F1, A-T3
[4]A-F2, A-F3, A-F4
[5]A-A1
[6](1, 2) A-A2, G-D1, G-D2, G-D3
[7]Make is very vulnerable to this. Even with a monotonically increasing working tree time, the inputs (sources of a rule) must not be changed from the moment its recipe’s execution is started until the next increase of the working tree time after the recipe’s execution is completed. Otherwise, there is a redo miss in every future run — until the working tree time a an input is changed again in a way that does not cause a redo miss.
[8]This is not the case with Make.
[9]G-T2
[10]A-T2 G-D1, G-D3
[11]G-T1
[12]A-A3
[13]A-F3