Docker provides a facility to easily create images based on a
recipe-like "Dockerfile". After the image is built using the docker build
command, it can be run as a container and also be distributed
using the public or a private docker registry.
While using the build command is easy enough to fit into a manual workflow, automating things can be quite handy and open up new possibilities (such as the trusted builds feature available on docker.io).
Of course there are also several open source projects out there already providing some kind of automation on top of the build command itself (stackbrew, docker-build-worker ...).
There are also projects which use a base image ("stack") and use it to create a new image or provide heroku-like "slugs" in order to reduce image file size (buildstep, slugbuilder). Around that abstraction, then an automated build facility is created (e.g. dokku, flynn).
Last but not least one could also use a CI/build server such as jenkins or buildbot to reduce manual labor (and obviously reap other benefits as well).
However, implementing a builder myself will give me more insights into docker and will be fun I think.
What follows is a "small" script which reads a YAML configuration file containing image definitions, builds and tags them, and keeps track of the image's source files/repository in order to build a new image only in case there have been some changes.
#!/usr/bin/env python2
"""
A simple `docker build` wrapper, reading definitions from a configuration
file and using file and repository information to avoid builds in case
nothing changed.
Note:
- implementation is quite naive
- no special error handling/display ... using just plain exceptions
- not separated enough (side-effects vs pure)
- more advanced implementation could use distinct services
which regularly check repositories/directories and distribute
work to a set of workers/builders using a queuing solution
(celery e.g.)
- git repo check is now based on prefix check for "git://" or
"github.com" of the url (as docker does too inside utils/utils.go
isGIT)
- using a branch is currently not supported within docker
(see api.go) .... so we skip it here too (local clones
are out of scope here even though they provide benefits
https://github.com/dotcloud/docker/issues/3556#issuecomment-32624330)
- more advanced implementation could also provide tracking
of branches/particular revisions (see pip.vcs functionality)
- no optimizations have been implemented to reduce overhead in
building image from git repo (transfer to docker daemon)
- this was using the docker-py library at first ... but that had
some issues on it's own so now the docker cli is used
- using the docker cli on the other hand means less moving parts
"""
import logging
import sys
import re
import hashlib
import argh
from argh.decorators import arg
from sqlalchemy import create_engine, Table, Column, String, Integer
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
import yaml
import sh
Base = declarative_base()
class ImageState(Base):
__tablename__ = 'image_state'
id = Column(Integer, primary_key=True)
name = Column(String, unique=True)
digest = Column(String)
image_id = Column(String)
def build(config_file, state_dir, clean=False):
"""
Build docker images based on the definitions in `config_file`, keeping
track of source changes in order to avoid rebuilds if nothing has changed.
Args:
config_file: the path of the yaml configuration file to read
state_dir: the path of the directory to use for change tracking state
clean: whether the state db should be cleaned before build
Returns:
None
"""
logger = logging.getLogger(__name__)
logger.info('Checking command line tools')
git = sh.git
ls = sh.ls
docker = sh.docker
test = sh.test
logger.info('Load configuration file {}'.format(config_file))
with open(config_file) as f:
config = yaml.load(f)
logger.debug('Configuration: {!r}'.format(config))
db_url = 'sqlite:///{}/.builder.state.db'.format(state_dir)
logger.info('Prepare state db connection {}'.format(db_url))
db_engine = create_engine(db_url)
logger.info('Create tables in state db if needed')
Base.metadata.create_all(bind=db_engine)
logger.info('Setup the db session')
db = sessionmaker(bind=db_engine)()
if clean:
logger.info('Cleaning state db as requested')
db.query(ImageState).delete()
db.commit()
logger.info('Looping through image definitions')
for image in config['builder']['images']:
logger.info('Working on {name}'.format(**image))
logger.debug('Definition data: {!r}'.format(image))
logger.info('Determine hash digest')
digest = None
if image['from'].startswith(('git://', 'github.com')):
logger.info('GIT repo ... using master head rev.')
repo = image['from']
if image['from'].startswith('github.com'):
repo = 'https://{from}'.format(**image)
digest = git('ls-remote', repo, 'refs/heads/master').split()[0]
else:
logger.info('No GIT repo ... assuming directory')
test('-e', image['from'])
digest = hashlib.sha1(str(ls('-lARL', image['from'], _ok_code=[0,1]))).hexdigest()
logger.debug('digest = {}'.format(digest))
logger.info('Determine if image needs to be build')
state = db.query(ImageState).filter_by(name=image['name']).first()
build = not state or state.digest != digest
logger.debug('State {!r}'.format(state.__dict__ if state else state))
logger.info('Build? {!r}'.format(build))
image_id = None
if build:
logger.info('Building ...')
running_command = docker('build', '-rm', '-t', image['name'], image['from'], _iter=True)
build_output = ""
for line in running_command:
build_output += line
logger.info(line.strip())
build_ids = re.search('Successfully built ([a-z0-9]+)', build_output).groups()
if not build_ids:
raise Exception('Build failed: {}'.format(build_output))
image_id = build_ids[-1]
logger.info('Building done')
logger.debug('Image id = {}'.format(image_id))
if image_id:
logger.info('Updating state with build info')
if not state:
state = ImageState(name=image['name'])
state.digest = digest
state.image_id = image_id
logger.debug('State {!r}'.format(state.__dict__))
db.add(state)
db.commit()
logger.info('Clean up stale images in state')
image_names = [x['name'] for x in config['builder']['images']]
(db.query(ImageState)
.filter(~ImageState.name.in_(image_names))
.delete(synchronize_session='fetch'))
db.commit()
@arg('config_file', help='The (YAML) configuration file to use')
@arg('-c', '--clean', dest='clean', help='Clean the state db')
@arg('-s', '--state-dir', dest='state_dir', help='Where the state file will go')
@arg('-v', '--verbose', dest='verbose', action='count', help='How verbose the output should be; repeat for increased verbosity')
def main(config_file, state_dir='.', verbose=0, clean=False):
"""
A simple `docker build` wrapper, reading definitions from a configuration
file and using file and repository information to avoid builds in case
nothing changed.
"""
levels = { 0: logging.WARNING, 1: logging.INFO, 2: logging.DEBUG }
verbose = verbose if verbose in levels else 0
format = '%(asctime)s %(name)s %(levelname)6s %(message)s'
logging.basicConfig(level=levels[verbose], format=format)
logging.getLogger('sqlalchemy.engine').setLevel(levels[verbose - 1 if verbose > 0 else 0])
build(config_file, state_dir, clean)
if __name__ == '__main__':
argh.dispatch_command(main)
Using the following YAML configuration:
---
builder:
images:
- name: test/busybox
from: github.com/dotcloud/docker-busybox
- name: test/tmp
from: test
it yields the following results:
vagrant@precise64:/vagrant$ ./builder.py dingen.yml -v
2014-03-21 22:02:01,988 __main__ INFO Checking command line tools
2014-03-21 22:02:01,990 __main__ INFO Load configuration file dingen.yml
2014-03-21 22:02:01,995 __main__ INFO Prepare state db connection sqlite:///./.builder.state.db
2014-03-21 22:02:02,002 __main__ INFO Create tables in state db if needed
2014-03-21 22:02:02,011 __main__ INFO Setup the db session
2014-03-21 22:02:02,012 __main__ INFO Looping through image definitions
2014-03-21 22:02:02,012 __main__ INFO Working on test/busybox
2014-03-21 22:02:02,013 __main__ INFO Determine hash digest
2014-03-21 22:02:02,013 __main__ INFO GIT repo ... using master head rev.
2014-03-21 22:02:02,932 __main__ INFO Determine if image needs to be build
2014-03-21 22:02:02,938 __main__ INFO Build? True
2014-03-21 22:02:02,938 __main__ INFO Building ...
2014-03-21 22:02:20,757 __main__ INFO Step 0 : from scratch
2014-03-21 22:02:20,757 __main__ INFO ---> 511136ea3c5a
2014-03-21 22:02:20,758 __main__ INFO Step 1 : add busybox.tar.bz2 /
2014-03-21 22:02:21,391 __main__ INFO ---> 29903600c7c5
2014-03-21 22:02:21,391 __main__ INFO Step 2 : maintainer Jerome Petazzoni <jerome@dotcloud.com>
2014-03-21 22:02:21,409 __main__ INFO ---> Running in a1031a9239e1
2014-03-21 22:02:21,420 __main__ INFO ---> 437a86f8a2d9
2014-03-21 22:02:21,421 __main__ INFO Successfully built 437a86f8a2d9
2014-03-21 22:02:21,436 __main__ INFO Removing intermediate container 36f438ec0a8c
2014-03-21 22:02:21,443 __main__ INFO Removing intermediate container a1031a9239e1
2014-03-21 22:02:21,461 __main__ INFO Building done
2014-03-21 22:02:21,462 __main__ INFO Updating state with build info
2014-03-21 22:02:21,470 __main__ INFO Working on test/tmp
2014-03-21 22:02:21,470 __main__ INFO Determine hash digest
2014-03-21 22:02:21,471 __main__ INFO No GIT repo ... assuming directory
2014-03-21 22:02:21,505 __main__ INFO Determine if image needs to be build
2014-03-21 22:02:21,514 __main__ INFO Build? True
2014-03-21 22:02:21,514 __main__ INFO Building ...
2014-03-21 22:02:21,546 __main__ INFO Step 0 : FROM ubuntu:12.04
2014-03-21 22:02:21,546 __main__ INFO ---> 9cd978db300e
2014-03-21 22:02:21,547 __main__ INFO Step 1 : RUN apt-get update && apt-get -q -y install apache2 && apt-get clean && rm -rf /var/lib/apt/lists/*
2014-03-21 22:02:21,550 __main__ INFO ---> Using cache
2014-03-21 22:02:21,551 __main__ INFO ---> 5ee6c8eb7c4c
2014-03-21 22:02:21,551 __main__ INFO Step 2 : ENV APACHE_RUN_USER www-data
2014-03-21 22:02:21,556 __main__ INFO ---> Using cache
2014-03-21 22:02:21,557 __main__ INFO ---> 1ca385555c50
2014-03-21 22:02:21,557 __main__ INFO Step 3 : ENV APACHE_RUN_GROUP www-data
2014-03-21 22:02:21,564 __main__ INFO ---> Using cache
2014-03-21 22:02:21,564 __main__ INFO ---> 981121ac107e
2014-03-21 22:02:21,565 __main__ INFO Step 4 : ENV APACHE_LOG_DIR /var/log/apache2
2014-03-21 22:02:21,569 __main__ INFO ---> Using cache
2014-03-21 22:02:21,569 __main__ INFO ---> 49a9468dbec1
2014-03-21 22:02:21,570 __main__ INFO Step 5 : EXPOSE 80
2014-03-21 22:02:21,587 __main__ INFO ---> Running in ca3ea1f08ac0
2014-03-21 22:02:21,602 __main__ INFO ---> 8e2d89709317
2014-03-21 22:02:21,602 __main__ INFO Step 6 : CMD ["/usr/sbin/apache2", "-D", "FOREGROUND"]
2014-03-21 22:02:21,630 __main__ INFO ---> Running in 88ce4b522434
2014-03-21 22:02:21,648 __main__ INFO ---> 8e8691401ff3
2014-03-21 22:02:21,649 __main__ INFO Successfully built 8e8691401ff3
2014-03-21 22:02:21,660 __main__ INFO Removing intermediate container ca3ea1f08ac0
2014-03-21 22:02:21,670 __main__ INFO Removing intermediate container 88ce4b522434
2014-03-21 22:02:21,688 __main__ INFO Building done
2014-03-21 22:02:21,688 __main__ INFO Updating state with build info
2014-03-21 22:02:21,699 __main__ INFO Clean up stale images in state
vagrant@precise64:/vagrant$ docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
test/busybox latest 437a86f8a2d9 5 seconds ago 3.229 MB
test/tmp latest 8e8691401ff3 5 seconds ago 258.5 MB
<none> <none> cc2a40a7e31e 6 minutes ago 258.5 MB
<none> <none> f6e2afd65a0c 6 minutes ago 3.229 MB
ubuntu 12.04 9cd978db300e 6 weeks ago 204.4 MB
scratch latest 511136ea3c5a 9 months ago 0 B
vagrant@precise64:/vagrant$ ./builder.py dingen.yml -v
2014-03-21 22:02:33,577 __main__ INFO Checking command line tools
2014-03-21 22:02:33,577 __main__ INFO Load configuration file dingen.yml
2014-03-21 22:02:33,581 __main__ INFO Prepare state db connection sqlite:///./.builder.state.db
2014-03-21 22:02:33,586 __main__ INFO Create tables in state db if needed
2014-03-21 22:02:33,591 __main__ INFO Setup the db session
2014-03-21 22:02:33,592 __main__ INFO Looping through image definitions
2014-03-21 22:02:33,592 __main__ INFO Working on test/busybox
2014-03-21 22:02:33,592 __main__ INFO Determine hash digest
2014-03-21 22:02:33,592 __main__ INFO GIT repo ... using master head rev.
2014-03-21 22:02:34,408 __main__ INFO Determine if image needs to be build
2014-03-21 22:02:34,415 __main__ INFO Build? False
2014-03-21 22:02:34,415 __main__ INFO Working on test/tmp
2014-03-21 22:02:34,415 __main__ INFO Determine hash digest
2014-03-21 22:02:34,415 __main__ INFO No GIT repo ... assuming directory
2014-03-21 22:02:34,454 __main__ INFO Determine if image needs to be build
2014-03-21 22:02:34,458 __main__ INFO Build? False
2014-03-21 22:02:34,458 __main__ INFO Clean up stale images in state
It does quite a few things. Probably too much in retrospect. Furthermore error handling could be improved and parallelisation could be implemented among other things.