Usage¶
Installation¶
dlb is written in Python and requires at least Python 3.7.
The canonical way to install dlb is from the Python Package Index (PyPI):
$ python3 -m pip install dlb
If you prefer not to install to the Python system location or do not have privileges to do so, you can add a flag to install to a location specific to your own account:
$ python3 -m pip install --user dlb
After the successful installation, the dlb modules are ready for import by a Python 3 interpreter:
>>> import dlb
>>> dlb.__version__
'1.2.3'
Check also the installed command-line utility [1]:
$ dlb --help
This shows you the location of all installed files:
$ python3 -m pip show -f dlb
It is also possible to “install” dlb into a project as a ZIP archive. See here for details.
Update and uninstall¶
Update an dlb installation with:
$ python3 -m pip install --upgrade [ --user ] dlb
Uninstall it with:
$ python3 -m pip uninstall [ --user ] dlb
A simple project¶
We assume that you want to build some software from a Git repository with dlb and a POSIX compliant shell on a GNU/Linux system.
Let’s call the project hello.
Create the Git working directory¶
First, create the repository:
$ mkdir hello
$ cd hello
$ git init
dlb requires a .dlbroot
directory as a marker for the root of its working tree, similar to .git
of Git:
$ mkdir .dlbroot
Now, the directory is ready for use by dlb as a working tree. dlb does not require or assume anything about existing
file or directories outside .dlbroot
(see here for details on the
directory layout).
We will use a dlb script called build.py
to build our project, so let’s start with an
polite one:
$ echo 'print("hello there!")' > build.py
Now, we can use dlb
to run build.py
:
$ dlb build
hello there!
We could also have used python3 "${PWD}"/build.py
instead of dlb build
. dlb
comes in handy when you are
working in a subdirectory of the working tree or when you need modules from ZIP archives
(e.g. dlb itself):
$ mkdir src
$ cd src
$ dlb
using arguments of last successful run: 'build.py'
hello there!
$ cd ..
See dlb --help
for a detailed description.
Run a custom tool in an execution context¶
Replace the content of build.py
with this:
import dlb.ex
class Replacer(dlb.ex.Tool):
PATTERN = 'xxx'
REPLACEMENT = 'hello'
template_file = dlb.ex.input.RegularFile()
output_file = dlb.ex.output.RegularFile()
async def redo(self, result, context):
with open(self.template_file.native, 'r') as i:
c = i.read() # read input
with context.temporary() as t:
with open(t.native, 'w') as o:
o.write(c.replace(self.PATTERN, self.REPLACEMENT)) # write transformed 'c' to temporary
context.replace_output(result.output_file, t) # atomically replace output_file by temporary
t = Replacer(template_file='src/main.c.tmpl', output_file='build/out/main.c') # create a tool instance
with dlb.ex.Context(): # an execution context
t.start() # start running the tool instance in the active execution context
dlb.di.inform('finished successfully')
This defines a tool called Replacer
with an input dependency role template_file
and an output
dependency role output_file
. The class attributes PATTERN
and REPLACEMENT
are execution parameters of the
tool. The method redo()
is called by t.start()
eventually if a redo is necessary.
Create a file src/main.c.tmpl
with this content:
// xxx
#include <stdio.h>
int main() {
printf("xxx\n");
return 0;
}
When you run dlb
now, you get something like this:
$ dlb build
D check redo necessity for tool instance 1... [+0.000000s]
D explicit output dependencies... [+0.000161s]
I redo necessary because of filesystem object: 'build/out/main.c'
| reason: [Errno 2] No such file or directory: '/.../hello/build/out/main.c'
D done. [+0.000264s]
D done. [+0.000331s]
I start redo for tool instance 1 [+0.014796s]
I replaced regular file with different one: 'build/out/main.c'
I finished successfully
It informs you that a redo was necessary for the tool instance because the output dependency
build/out/main.c
did not exist.
It was created by the redo and now contains:
// hello
#include <stdio.h>
int main() {
printf("hello\n");
return 0;
}
Now run dlb again:
$ dlb build
I finished successfully
Nothing happens because the output existed and the input (including the tool definition in build.py
)
did not change. After a modification of the input dependency, dlb again causes a redo:
$ touch src/main.c.tmpl
$ dlb build
D check redo necessity for tool instance 1... [+0.000000s]
D compare input dependencies with state before last successful redo... [+0.000287s]
I redo necessary because of filesystem object: 'src/main.c.tmpl'
| reason: mtime has changed
D done. [+0.000375s]
D done. [+0.000385s]
I start redo for tool instance 1 [+0.014572s]
I replaced regular file with different one: 'build/out/main.c'
I finished successfully
Control the diagnostic message verbosity¶
dlb is configured by configuration parameters in dlb.cf
.
You want to know how exactly dlb calls the external tools and like some output after each run?
Add the following lines to build.py
(before the line with dlb.ex.Context():
):
import dlb.di
import dlb.cf
dlb.cf.level.helper_execution = dlb.di.INFO
dlb.cf.latest_run_summary_max_count = 5
This instructs dlb to use the log level dlb.di.INFO
for all future diagnostic messages of the category
dlb.cf.level.helper_execution
and to output a summary after each run that compares the run with the
previous ones.
It is good practice to output some summary of a successful build even if no redo was necessary.
This can be a relevant information on the most important build product (e.g. code size of an application)
or just the line dlb.di.inform('finished successfully')
at the end of the dlb script.
In case you find the standard Python traceback (output on uncaught exceptions) too verbose or cluttered,
you can replace it by the one provided by dlb_contrib.exctrace
.
Commit the changes¶
Git does not track empty directories. If we want Git to create .dlbroot
as part of the repository, a file
must be added. We can use an empty file .dlbroot/z
to that end:
$ touch .dlbroot/z
$ echo /.dlbroot/ > .gitignore
$ git add .gitignore
$ git add -f .dlbroot/z
$ git add ...
$ git commit
Understand redo necessity¶
Everything related to dependency checking and redos is centered around tool instances; only tool instances can have dependencies.
This line creates a tool instance t that assigns the concrete input dependency dlb.fs.Path('src/main.c.tmpl')
to
the input dependency role template_file
and the concrete output dependency dlb.fs.Path('build/out/main.c')
to
the output dependency role output_file
:
t = Replacer(template_file='src/main.c.tmpl', output_file='build/out/main.c')
t.start()
performs a redo when
- one is explicitly requested by
t.start(force_redo=True)
or - a redo is considered necessary (see here for general conditions and the documentation of the dependency classes for the specific ones).
Note
In contrast to what someone used to the appearance of SCons scripts might expect, the constructor of a tool instance
does not run it. Make sure you call start()
on a tool instance when you want it to perform its actual task.
After the successful completion of a redo of a tool instance t the run-database contains the depended-upon state of its (explicit and non-explicit) input dependencies before the start of the redo and its non-explicit input dependencies.
A redo of t from above is considered necessary if at least one of the following conditions is true:
- A redo was never performed successfully before for t (same class and fingerprint) according to the run-database.
build/out/main.c
does not exist as a regular file.- The mtime, size, UID, GID, or set of filesystem permissions of
src/main.c.tmpl
has changed since the start of the last known successful redo for t (because it is an output dependency of t) - The value of
PATTERN
orREPLACEMENT
has changed since the the last known successful redo for t. - The mtime, size, UID, GID, or set of filesystem permissions of
build.py
has changed since the last known successful redo of t (becausebuild.py
is a definition file for t in the managed tree).
Note
You may have noticed that an mtime modification of build/out/main.c
does not lead to a redo.
A modification of an output dependency is always treated as purposeful.
This allows for modification of output dependencies after they were generated (e.g. for source code formatting
or for small fixes in a huge set of generated HTML documents). [2]
Tool instances are identified by their class (file path and line number of definition) and their fingerprint. The fingerprint includes the concrete dependencies of the tool instance which are defined by arguments of the constructor matching class attributes, and its execution parameters. Consider the following tool instances:
t = Replacer(template_file='src/main.c.tmpl', output_file='build/out/main.c') # from above
t2 = Replacer(template_file=dlb.fs.Path('src/main.c.tmpl'), output_file='build/out/main.c')
t3 = Replacer(template_file='src/MAIN.C.TMPL', output_file='build/out/main.c')
t2 and t have the same same class and fingerprint and are therefore indistinguishable with respect to dependencies;
the statements t.start()
and t2.start()
have the same effect under all circumstances.
t3 on the other hand has a different fingerprint; t3.start()
does not affect the last known successful redo
for t.
Note
dlb never stores the state of filesystem objects outside the working tree in its run-database. The modification of such a filesystem object does not lead to a redo. [15]
Understand redo concurrency¶
When t.start()
“performs a redo” it schedules the eventual (asynchronous) execution of
redo()
and then returns immediately. The completion of the pending redo is left to
asyncio
.
So, redos are parallel by default. The maximum number of pending redos at a time is given by
max_parallel_redo_count
of the active context.
In contrast to GNU Make or Ninja, for example, filesystem paths used in multiple tool instances do not form an implicit mutual exclusion mechanism. Synchronization and ordering of events are explicit in dlb. Redos can be synchronized
- globally for all pending redos by the means of (execution) contexts or
- selectively for a specific redo by accessing the result (proxy) object return by
dlb.ex.Tool.start()
.
See dlb.ex.Tool.start()
for details.
As a rule, paths should not be repeated like
build/out/main.c
in this snippet (which may execute the redos of Replacer(...)
and CCompiler(...)
in parallel):
Replacer(template_file='src/main.c.tmpl', output_file='build/out/main.c').start()
CCompiler(source_files=['build/out/main.c'], object_files=['build/out/main.c.o']).start()
Better use a variable whose name expresses the meaning of the filesystem object or cascade tool instances with their result objects. Write this, for example, if you want to express that one tool instance depends on the result of another one:
r = Replacer(template_file='src/main.c.tmpl', output_file='build/out/main.c').start()
CCompiler(source_files=[r.output_file], object_files=['build/out/main.c.o']).start()
# waits for pending redo with result r to complete before CCompiler(...).start()
This mechanism is used in example/c-minimal/.
To wait for the completion of a specific redo without referring to specific dependencies, you can use
complete()
instead:
r = Replacer(...).start().complete()
assert r.iscomplete
# note: the missing '_' makes clear that 'complete' and 'iscomplete'
# are not names of dependencies
Alternatively, you could wait for all pending redos to complete before Compiler(...).start()
if you prefer
to split the build into sequential phases like this:
# code generation phase
Replacer(template_file='src/main.c.tmpl', output_file='build/out/main.c').start()
# compilation phase
with dlb.ex.Context(): # waits for all pending redos to complete
CCompiler(source_files=['build/out/main.c'], object_files=['build/out/main.c.o']).start()
This mechanism is used in example/c-gtk-doxygen/.
Real stuff¶
There are more meaningful tasks than replacing text in a text file.
For example, building a C program with GCC looks like this: example/c-minimal/build-all.py.
The package dlb_contrib
provides tools and utilities to build upon.
Self-contained projects: dlb as part of the repository¶
ZIP archives in .dlbroot/u/
are automatically added to the module search path of the Python interpreter
by dlb. Placing the dlb
package as a version controlled ZIP archive there
— say, .dlbroot/u/dlb-1.2.3.zip
— allows you to keep a certain version of dlb in your project’s repository
independent of a system-wide installed version.
If you do not need the command-line utility dlb, dlb does not even have to be installed (globally) to build your project.
Redirection of diagnostic messages¶
Diagnostic messages are output to sys.stderr
by default.
To unambiguously separate them from output of executed tools (e.g. compiler warnings) you can always set a destination
with dlb.di.set_output_file()
:
import dlb.di
dlb.di.set_output_file(open('run.log', 'w'))
# any object with a file-like write() method can be used as output file
The following snippet “exposes” the destination of diagnostic messages to the parent process and therefore allows for its manipulation by shell redirection:
try:
dlb.di.set_output_file(open(3, 'w', buffering=1))
except OSError: # e.g. because file descriptor 3 not opened by parent process
pass
Possible applications (on a typical GNU/Linux system):
$ dlb 3>run.log # write to file
$ dlb 3>/dev/pts/0 # show in specific pseudo terminal
$ dlb 3>&1 1>&2 | gedit - # (incrementally) show in GEdit window
$ mkfifo dlb.fifo
$ tilix -e cat dlb.fifo && dlb 3>dlb.fifo # show in new terminal emulator window
If you mostly work with a terminal emulator that is (at least partially) compliant with ISO/IEC 6429, colored output
might be useful which can easily be achieved with MessageColorator
from dlb_contrib.iso6429
.
PyCharm integration¶
If you use PyCharm to edit (and/or run and debug), your dlb scripts you can take advantage
of the integrated referral to external HTML documentation: Place the caret in the editor on a dlb object
(anything except a module) — e.g. between the P
and the a
of dlb.fs.Path
—
and press Shift+F1 or Ctrl+Q to show the HTML documentation in your web browser.
Configuration (as of PyCharm 2020.1): Add the following documentation URLs in the settings page
:Module Name | URL/Path Pattern |
---|---|
dlb |
https://dlb.readthedocs.io/en/<which>/reference.html#element.qname |
dlb_contrib |
https://dlb.readthedocs.io/en/<which>/reference.html#element.qname |
Replace <which> by a specific version like v0.3.0
or stable
for the latest version.
Recommendations for efficiency and reliability¶
These recommendations describe the typical use case. Use them as a starting point for most efficient and reliable operation. [3]
Setup a working tree¶
Place the entire working tree on the same file system with a decently fine effective mtime resolution (no courser than 100 ms). XFS or Ext4 are fine. Avoid FAT32. [4]
Make sure the filesystem is mounted with “normal” (immediate) update of mtime (e.g. without
lazytime
for Ext4). [5]Place all input files (that are only read by tool instances) in a filesystem tree in the working tree that is not modified by tool instances.
This is not required but good practice. It also enables you to use operating system specific features to protect the build against accidental changes of input files. For example: Protect the input files from change by a transparent read-only filesystem mounted on top of it during the build.
Do not use symbolic links in the managed tree to filesystem objects not in the same managed tree.
Run dlb¶
Do not modify the management tree unless told so by dlb. [6]
Do not modify the mtime of filesystem objects in the working tree manually while dlb is running. [13]
Do not modify the content of filesystem objects in the managed tree on purpose while dlb is running, if they are used as input dependencies or output dependencies of a tool instance.
Yes, I know: it happens a lot by mistake when editing source files.
dlb itself is designed to be relatively robust to such modifications. As long as the size of the modified regular file changes or the working tree time is monotonic, there is no redo miss in the current or in any future run of dlb. [7] [8]
However, many external tools cannot guarantee proper behaviour if some of their input files are changed while they are being executed (e.g. a compiler working on multiple input files).
Avoid mv to replace regular files; is does not update its target’s mtime.
Use cp instead.
Be careful when you modify a file that is an input dependency of a tool instance via
mmap
. [14]Do not put the system time used as working tree’s system time back on purpose while dlb is running or while you are modifying the managed tree. [11]
Write scripts and tools¶
It is safe to modify the managed tree immediately after a run of dlb is completed (e.g. in the same script) without risking a redo miss [9]
Do not use (explicit) multithreading. Use
asyncio
instead.Do not use multiple hierarchical scripts (where one calls another). This would be error-prone and inefficient. Use scripts only on the top-level.
Split large scripts into small modules that are imported by the script (like this: example/c-gtk-doxygen/). You can place these modules in the directory they control.
Use only one root context and nest all other contexts inside (even in modules imported inside this context). [10]
Do:
import dlb.ex ... with dlb.ex.Context(): with dlb.ex.Context(): ... with dlb.ex.Context(): ... import build_subsystem_a # contains 'with dlb.ex.Context(): ... '
Don’t:
import dlb.ex ... with dlb.ex.Context(): ... # context manager exit is artificially delayed as necessary according to the # filesystem's effective mtime resolution with dlb.ex.Context(): ... # context manager exit is artificially delayed as necessary according to the # filesystem's effective mtime resolution (again)
Do not modify the managed tree in a script – e.g. by calling
shutil.rmtree()
directly – unless you are sure no redo is pending that accesses the affected filesystem objects. [7]
Footnotes
[1] | When installed with python3 -m pip install --user dlb , the command-line utility is created below
python3 -m site --user-base according to PEP 370.
Make sure this directory is part of the search paths for executables. |
[2] | It is impossible to reliably detect an mtime modification of a (POSIX) filesystem object after its generation without the requirement of monotonic system time and real-time guarantees. Without such (unrealistic) requirements, the probability of correct detection can be made arbitrarily small by pausing the involved processes at the right moments. |
[3] | Although they are not formally specified, Make has by design much stricter requirements and much looser guarantees. |
[4] | A-F1, A-T3 |
[5] | A-F2, A-F3, A-F4 |
[6] | A-A1 |
[7] | (1, 2) A-A2, G-D1, G-D2, G-D3 |
[8] | Make is very vulnerable to this. Even with a monotonically increasing working tree time, the inputs (sources of a rule) must not be changed from the moment its recipe’s execution is started until the next increase of the working tree time after the recipe’s execution is completed. Otherwise, there is a redo miss in every future run — until the working tree time a an input is changed again in a way that does not cause a redo miss. |
[9] | This is not the case with Make. |
[10] | G-T2 |
[11] | A-T2 G-D1, G-D3 |
[12] | G-T1 |
[13] | A-A3 |
[14] | A-F3 |
[15] | This is a deliberate design decision. It avoids complicated assumptions related to the mtimes of different filesystems, helps to promote a clean structure of project files and makes it possible to move an entire working tree without changing the meaning of the run-database in an unpredictable manner. |