off the stack

builder for docker images

2014-03-21 | docker, python

Docker provides a facility to easily create images based on a recipe-like "Dockerfile". After the image is built using the docker build command, it can be run as a container and also be distributed using the public or a private docker registry.

While using the build command is easy enough to fit into a manual workflow, automating things can be quite handy and open up new possibilities (such as the trusted builds feature available on docker.io).

Of course there are also several open source projects out there already providing some kind of automation on top of the build command itself (stackbrew, docker-build-worker ...).

There are also projects which use a base image ("stack") and use it to create a new image or provide heroku-like "slugs" in order to reduce image file size (buildstep, slugbuilder). Around that abstraction, then an automated build facility is created (e.g. dokku, flynn).

Last but not least one could also use a CI/build server such as jenkins or buildbot to reduce manual labor (and obviously reap other benefits as well).

However, implementing a builder myself will give me more insights into docker and will be fun I think.

first shot

What follows is a "small" script which reads a YAML configuration file containing image definitions, builds and tags them, and keeps track of the image's source files/repository in order to build a new image only in case there have been some changes.

#!/usr/bin/env python2
"""
A simple `docker build` wrapper, reading definitions from a configuration
file and using file and repository information to avoid builds in case
nothing changed.

Note:
- implementation is quite naive
- no special error handling/display ... using just plain exceptions
- not separated enough (side-effects vs pure)
- more advanced implementation could use distinct services
  which regularly check repositories/directories and distribute
  work to a set of workers/builders using a queuing solution
  (celery e.g.)
- git repo check is now based on prefix check for "git://" or
  "github.com" of the url (as docker does too inside utils/utils.go
  isGIT)
- using a branch is currently not supported within docker
  (see api.go) .... so we skip it here too (local clones
  are out of scope here even though they provide benefits
  https://github.com/dotcloud/docker/issues/3556#issuecomment-32624330)
- more advanced implementation could also provide tracking
  of branches/particular revisions (see pip.vcs functionality)
- no optimizations have been implemented to reduce overhead in
  building image from git repo (transfer to docker daemon)
- this was using the docker-py library at first ... but that had
  some issues on it's own so now the docker cli is used
- using the docker cli on the other hand means less moving parts
"""
import logging
import sys
import re
import hashlib

import argh
from argh.decorators import arg
from sqlalchemy import create_engine, Table, Column, String, Integer
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
import yaml
import sh



Base = declarative_base()
class ImageState(Base):
    __tablename__ = 'image_state'
    id = Column(Integer, primary_key=True)
    name = Column(String, unique=True)
    digest = Column(String)
    image_id = Column(String)



def build(config_file, state_dir, clean=False):
    """
    Build docker images based on the definitions in `config_file`, keeping
    track of source changes in order to avoid rebuilds if nothing has changed.

    Args:
        config_file: the path of the yaml configuration file to read
        state_dir: the path of the directory to use for change tracking state
        clean: whether the state db should be cleaned before build
    Returns:
        None
    """
    logger = logging.getLogger(__name__)

    logger.info('Checking command line tools')
    git = sh.git
    ls = sh.ls
    docker = sh.docker
    test = sh.test

    logger.info('Load configuration file {}'.format(config_file))
    with open(config_file) as f:
        config = yaml.load(f)
    logger.debug('Configuration: {!r}'.format(config))

    db_url = 'sqlite:///{}/.builder.state.db'.format(state_dir)
    logger.info('Prepare state db connection {}'.format(db_url))
    db_engine = create_engine(db_url)

    logger.info('Create tables in state db if needed')
    Base.metadata.create_all(bind=db_engine)

    logger.info('Setup the db session')
    db = sessionmaker(bind=db_engine)()

    if clean:
        logger.info('Cleaning state db as requested')
        db.query(ImageState).delete()
        db.commit()

    logger.info('Looping through image definitions')
    for image in config['builder']['images']:
        logger.info('Working on {name}'.format(**image))
        logger.debug('Definition data: {!r}'.format(image))

        logger.info('Determine hash digest')
        digest = None
        if image['from'].startswith(('git://', 'github.com')):
            logger.info('GIT repo ... using master head rev.')
            repo = image['from']
            if image['from'].startswith('github.com'):
                repo = 'https://{from}'.format(**image)
            digest = git('ls-remote', repo, 'refs/heads/master').split()[0]
        else:
            logger.info('No GIT repo ... assuming directory')
            test('-e', image['from'])
            digest = hashlib.sha1(str(ls('-lARL', image['from'], _ok_code=[0,1]))).hexdigest()
        logger.debug('digest = {}'.format(digest))

        logger.info('Determine if image needs to be build')
        state = db.query(ImageState).filter_by(name=image['name']).first()
        build = not state or state.digest != digest
        logger.debug('State {!r}'.format(state.__dict__ if state else state))
        logger.info('Build? {!r}'.format(build))

        image_id = None
        if build:
            logger.info('Building ...')
            running_command = docker('build', '-rm', '-t', image['name'], image['from'], _iter=True)
            build_output = ""
            for line in running_command:
                build_output += line
                logger.info(line.strip())
            build_ids = re.search('Successfully built ([a-z0-9]+)', build_output).groups()
            if not build_ids:
                raise Exception('Build failed: {}'.format(build_output))
            image_id = build_ids[-1]
            logger.info('Building done')
            logger.debug('Image id = {}'.format(image_id))

        if image_id:
            logger.info('Updating state with build info')
            if not state:
                state = ImageState(name=image['name'])
            state.digest = digest
            state.image_id = image_id
            logger.debug('State {!r}'.format(state.__dict__))
            db.add(state)
            db.commit()

    logger.info('Clean up stale images in state')
    image_names = [x['name'] for x in config['builder']['images']]
    (db.query(ImageState)
     .filter(~ImageState.name.in_(image_names))
     .delete(synchronize_session='fetch'))
    db.commit()



@arg('config_file', help='The (YAML) configuration file to use')
@arg('-c', '--clean', dest='clean', help='Clean the state db')
@arg('-s', '--state-dir', dest='state_dir', help='Where the state file will go')
@arg('-v', '--verbose', dest='verbose', action='count', help='How verbose the output should be; repeat for increased verbosity')
def main(config_file, state_dir='.', verbose=0, clean=False):
    """
    A simple `docker build` wrapper, reading definitions from a configuration
    file and using file and repository information to avoid builds in case
    nothing changed.
    """
    levels = { 0: logging.WARNING, 1: logging.INFO, 2: logging.DEBUG }
    verbose = verbose if verbose in levels else 0
    format = '%(asctime)s %(name)s %(levelname)6s %(message)s'
    logging.basicConfig(level=levels[verbose], format=format)
    logging.getLogger('sqlalchemy.engine').setLevel(levels[verbose - 1 if verbose > 0 else 0])
    build(config_file, state_dir, clean)



if __name__ == '__main__':
    argh.dispatch_command(main)

Using the following YAML configuration:

---
builder:
  images:
  - name: test/busybox
    from: github.com/dotcloud/docker-busybox
  - name: test/tmp
    from: test

it yields the following results:

vagrant@precise64:/vagrant$ ./builder.py dingen.yml -v
2014-03-21 22:02:01,988 __main__   INFO Checking command line tools
2014-03-21 22:02:01,990 __main__   INFO Load configuration file dingen.yml
2014-03-21 22:02:01,995 __main__   INFO Prepare state db connection sqlite:///./.builder.state.db
2014-03-21 22:02:02,002 __main__   INFO Create tables in state db if needed
2014-03-21 22:02:02,011 __main__   INFO Setup the db session
2014-03-21 22:02:02,012 __main__   INFO Looping through image definitions
2014-03-21 22:02:02,012 __main__   INFO Working on test/busybox
2014-03-21 22:02:02,013 __main__   INFO Determine hash digest
2014-03-21 22:02:02,013 __main__   INFO GIT repo ... using master head rev.
2014-03-21 22:02:02,932 __main__   INFO Determine if image needs to be build
2014-03-21 22:02:02,938 __main__   INFO Build? True
2014-03-21 22:02:02,938 __main__   INFO Building ...
2014-03-21 22:02:20,757 __main__   INFO Step 0 : from scratch
2014-03-21 22:02:20,757 __main__   INFO ---> 511136ea3c5a
2014-03-21 22:02:20,758 __main__   INFO Step 1 : add busybox.tar.bz2 /
2014-03-21 22:02:21,391 __main__   INFO ---> 29903600c7c5
2014-03-21 22:02:21,391 __main__   INFO Step 2 : maintainer Jerome Petazzoni <jerome@dotcloud.com>
2014-03-21 22:02:21,409 __main__   INFO ---> Running in a1031a9239e1
2014-03-21 22:02:21,420 __main__   INFO ---> 437a86f8a2d9
2014-03-21 22:02:21,421 __main__   INFO Successfully built 437a86f8a2d9
2014-03-21 22:02:21,436 __main__   INFO Removing intermediate container 36f438ec0a8c
2014-03-21 22:02:21,443 __main__   INFO Removing intermediate container a1031a9239e1
2014-03-21 22:02:21,461 __main__   INFO Building done
2014-03-21 22:02:21,462 __main__   INFO Updating state with build info
2014-03-21 22:02:21,470 __main__   INFO Working on test/tmp
2014-03-21 22:02:21,470 __main__   INFO Determine hash digest
2014-03-21 22:02:21,471 __main__   INFO No GIT repo ... assuming directory
2014-03-21 22:02:21,505 __main__   INFO Determine if image needs to be build
2014-03-21 22:02:21,514 __main__   INFO Build? True
2014-03-21 22:02:21,514 __main__   INFO Building ...
2014-03-21 22:02:21,546 __main__   INFO Step 0 : FROM ubuntu:12.04
2014-03-21 22:02:21,546 __main__   INFO ---> 9cd978db300e
2014-03-21 22:02:21,547 __main__   INFO Step 1 : RUN apt-get update && apt-get -q -y install apache2 && apt-get clean && rm -rf /var/lib/apt/lists/*
2014-03-21 22:02:21,550 __main__   INFO ---> Using cache
2014-03-21 22:02:21,551 __main__   INFO ---> 5ee6c8eb7c4c
2014-03-21 22:02:21,551 __main__   INFO Step 2 : ENV APACHE_RUN_USER www-data
2014-03-21 22:02:21,556 __main__   INFO ---> Using cache
2014-03-21 22:02:21,557 __main__   INFO ---> 1ca385555c50
2014-03-21 22:02:21,557 __main__   INFO Step 3 : ENV APACHE_RUN_GROUP www-data
2014-03-21 22:02:21,564 __main__   INFO ---> Using cache
2014-03-21 22:02:21,564 __main__   INFO ---> 981121ac107e
2014-03-21 22:02:21,565 __main__   INFO Step 4 : ENV APACHE_LOG_DIR /var/log/apache2
2014-03-21 22:02:21,569 __main__   INFO ---> Using cache
2014-03-21 22:02:21,569 __main__   INFO ---> 49a9468dbec1
2014-03-21 22:02:21,570 __main__   INFO Step 5 : EXPOSE 80
2014-03-21 22:02:21,587 __main__   INFO ---> Running in ca3ea1f08ac0
2014-03-21 22:02:21,602 __main__   INFO ---> 8e2d89709317
2014-03-21 22:02:21,602 __main__   INFO Step 6 : CMD ["/usr/sbin/apache2", "-D", "FOREGROUND"]
2014-03-21 22:02:21,630 __main__   INFO ---> Running in 88ce4b522434
2014-03-21 22:02:21,648 __main__   INFO ---> 8e8691401ff3
2014-03-21 22:02:21,649 __main__   INFO Successfully built 8e8691401ff3
2014-03-21 22:02:21,660 __main__   INFO Removing intermediate container ca3ea1f08ac0
2014-03-21 22:02:21,670 __main__   INFO Removing intermediate container 88ce4b522434
2014-03-21 22:02:21,688 __main__   INFO Building done
2014-03-21 22:02:21,688 __main__   INFO Updating state with build info
2014-03-21 22:02:21,699 __main__   INFO Clean up stale images in state

vagrant@precise64:/vagrant$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
test/busybox        latest              437a86f8a2d9        5 seconds ago       3.229 MB
test/tmp            latest              8e8691401ff3        5 seconds ago       258.5 MB
<none>              <none>              cc2a40a7e31e        6 minutes ago       258.5 MB
<none>              <none>              f6e2afd65a0c        6 minutes ago       3.229 MB
ubuntu              12.04               9cd978db300e        6 weeks ago         204.4 MB
scratch             latest              511136ea3c5a        9 months ago        0 B

vagrant@precise64:/vagrant$ ./builder.py dingen.yml -v
2014-03-21 22:02:33,577 __main__   INFO Checking command line tools
2014-03-21 22:02:33,577 __main__   INFO Load configuration file dingen.yml
2014-03-21 22:02:33,581 __main__   INFO Prepare state db connection sqlite:///./.builder.state.db
2014-03-21 22:02:33,586 __main__   INFO Create tables in state db if needed
2014-03-21 22:02:33,591 __main__   INFO Setup the db session
2014-03-21 22:02:33,592 __main__   INFO Looping through image definitions
2014-03-21 22:02:33,592 __main__   INFO Working on test/busybox
2014-03-21 22:02:33,592 __main__   INFO Determine hash digest
2014-03-21 22:02:33,592 __main__   INFO GIT repo ... using master head rev.
2014-03-21 22:02:34,408 __main__   INFO Determine if image needs to be build
2014-03-21 22:02:34,415 __main__   INFO Build? False
2014-03-21 22:02:34,415 __main__   INFO Working on test/tmp
2014-03-21 22:02:34,415 __main__   INFO Determine hash digest
2014-03-21 22:02:34,415 __main__   INFO No GIT repo ... assuming directory
2014-03-21 22:02:34,454 __main__   INFO Determine if image needs to be build
2014-03-21 22:02:34,458 __main__   INFO Build? False
2014-03-21 22:02:34,458 __main__   INFO Clean up stale images in state

It does quite a few things. Probably too much in retrospect. Furthermore error handling could be improved and parallelisation could be implemented among other things.