Continuously building our container image chain

Container images can be layered on each other, so that you do not need to always rebuild different layers from scratch.
This helps you to a) standardize your container images and b) to not to repeat yourself all over the place.
As an example, we have a couple of ruby applications, and they all require ruby, bundler and some common dependencies, since every ruby app eventually bundles nokogiri and thus requires a bunch of libxml and libxslt libraries. Simplified this means we have the following chain:

base -> ruby -> app

At immerda we build our own base images from scratch so we can fully control what is in them. E.g. we add our repository, our CA certificate and so on. Then we have a common ruby image, that can be used to develop and test ruby applications. And in the end we might package the application as an image, like our WKD service.

We are updating our systems on regular and unattended basis to keep the systems itself up to date with the latest security releases and bug fixes. Now when it comes to container images, we are also pulling the images of running containers on a regular basis and restarting the containers / pods when there was update to the image tag the container is referring to. This way we do not only keep the applications packaged in containers up to date, but are also making sure, that we get security fixes for lower level libraries, like e.g. openssl (remember heartbleed?) in a timely manner.

Since container images are intended to be immutable (and we actually run nearly all of our containers, with –readonly=true) you want to and must rebuild to keep them updated.

Now when it comes to pull and use images from random projects or registries (looking at you docker-hub), we try to vet images at the time we start using them that they are also continuously built. As an example the official library images, like the official postgres images are good examples for that.
For our own images, we are doing that through gitlab-ci.

However, since images are layered on each other, it means, that you need to rebuild your images in the right order, so that your ruby image builds on top of your latest base image and your app image uses the latest ruby updates.

Gitlab-ci has limited capabilities (especially in the free core version) when it comes to modelling pipelines across projects in the right order. However, with building blocks as webhook events for pipeline runs, we can orchestrate that pretty well.

This orchestration is done using a project form our friends at Autistici/Inventati called gitlab-deps. This tool keeps track of the dependency chain of your images and triggers the pipelines to build these images in the right order.

Central to gitlab-deps is a small webhook receiver, that receives events from our gitlab-ci pipelines and based on the dependency list of your images, it triggers the next pipelines. We are running the webhook receiver in a container and it scans the immerda group in gitlab for projects, either using a Containerfile/Dockerfile FROM statement or a list of depending projects in a file called .gitlab-deps. The latter is much more accurate and likely the better idea to describe the dependencies of your project. As an example see the gitlab-deps container project itself.

gitlab-deps itself requires an API token for gitlab, that is able to trigger pipeline runs, as well as adding webhooks to the project. For now this seems to be a token from a user with the Maintainer role.

We then also trigger a systemd-timer, that runs a script in the container to update all the dependencies as well as add webhooks to all the projects we know of.

For our base images, we defined a scheduled pipeline run to kick off building the initial lowest layer of our images, which – if successful – will trigger the next layers and then the next layer and so on.

If one of the pipelines fails gitlab-deps won’t trigger anything further down the chain. Gitlab will notify you (e.g. via email) about the failed pipeline run, you can investigate and if you are able to re-trigger a successful pipeline run, the chain continuous to be built.

While we are using this now for our image build chain, gitlab-deps can also be used to rebuild/update projects that are depending on a library and so on. While moving more of our work into gitlab and hooking it into CI, we might also start using these dependency triggers for other kinds of pipeline chains.

GitLab CI with podman

We know GitLab CI with docker runners for quiet a while now, but what’s about GitLab CI with podman? Podman is the next generation container tool under Linux, it can start docker containers within the user space, no root privileges are required. With RHEL 8 there is no docker runtime available at the moment, but Red Hat supports podman. But how can we integrate that with GitLab CI? The GitLab CI runner has some native support (called executor) for docker, shell, …, but there is no native support for podman. There are two possibilities, using the shell runner or using the custom runner. With the shell runner, you have to ensure that every project starts podman, and only podman. So let’s try the custom runner.

GitLab CI runner with custom executor

Let’s start build a GitLab CI custom executor with podman on a RHEL/CentOS 7 or 8 with a really basic container. First, install the gitlab-ci-runner Go binary and create a user with a home directory under which the gitlab-ci-runner should run later.
For this example we assume there is a unix user called gitlab-runner with the home directory /home/gitlab-runner. This user is able to run podman. Let’s try that:

sudo -u gitlab-runner podman run -it --rm \
    registry.code.immerda.ch/immerda/container-images/base/fedora:30 \
    bash

Next, let’s make a systemd service for the GitLab runner (/etc/systemd/system/gitlab-runner.service):

[Unit]
Description=GitLab Runner
After=syslog.target network.target
ConditionFileIsExecutable=/usr/local/bin/gitlab-runner

[Service]
User=gitlab-runner
Group=gitlab-runner
StartLimitInterval=5
StartLimitBurst=10
ExecStart=/usr/local/bin/gitlab-runner run --working-directory /home/gitlab-runner
Restart=always
RestartSec=120

[Install]
WantedBy=multi-user.target

Now, let’s register a runner to a GitLab instance.

sudo -u gitlab-runner gitlab-runner register \
    --url https://code.immerda.ch/ \
    --registration-token $GITLAB_REGISTRATION_TOKEN \
    --name "Podman fedora runner" \
    --executor custom \
    --builds-dir /home/user \
    --cache-dir /home/user/cache \
    --custom-prepare-exec "/home/gitlab-runner/fedora/prepare.sh" \
    --custom-run-exec "/home/gitlab-runner/fedora/run.sh" \
    --custom-cleanup-exec "/home/gitlab-runner/fedora/cleanup.sh"
  • –builds-dir: The build directory within the container.
  • –cache-dir: The cache directory within the container.
  • –custom-prepare-exec: Prepare the container before each job.
  • –custom-run-exec: Pass the .gitlab-ci.yml script items to the container.
  • –custom-cleanup-exec: Cleanup all left-overs after each job.

There are three scripts referenced at this point. Those script will be executed for each job (a CI/CD pipeline can contain multiple jobs, e.g. build, test, deploy). The whole magic will happen within those scripts. The output of those scripts is always shown in the GitLab job, so for debugging reasons it’s possible to do a set -x.

Scripts

Every job will start all the referenced scripts. First have a look on some variables we need during all scripts. Let’s create a file /home/gitlab-runner/fedora/base.sh

CONTAINER_ID="runner-$CUSTOM_ENV_CI_RUNNER_ID-project-$CUSTOM_ENV_CI_PROJECT_ID-concurrent-$CUSTOM_ENV_CI_CONCURRENT_PROJECT_ID-$CUSTOM_ENV_CI_JOB_ID"
IMAGE="registry.code.immerda.ch/immerda/container-images/base/fedora:30"
CACHE_DIR="$(dirname "${BASH_SOURCE[0]}")/../_cache/runner-$CUSTOM_ENV_CI_RUNNER_ID-project-$CUSTOM_ENV_CI_PROJECT_ID-concurrent-$CUSTOM_ENV_CI_CONCURRENT_PROJECT_ID-pipeline-$CUSTOM_ENV_CI_PIPELINE_ID"
  • CONTAINER_ID: Name of the container.
  • IMAGE: Image to use for the container.
  • CACHE_DIR: The cache directory on the host system.

Prepare script

The prepare executable (/home/gitlab-runner/fedora/prepare.sh) will

  • pull the image from the registry
  • start a container
  • install the dependencies (curl, git, gitlab-runner)
#!/usr/bin/env bash

currentDir="$(cd "$(dirname "${BASH_SOURCE[0]}")" >/dev/null 2>&1 && pwd)"
source ${currentDir}/base.sh

set -eo pipefail

# trap any error, and mark it as a system failure.
trap "exit $SYSTEM_FAILURE_EXIT_CODE" ERR

start_container() {
    if podman inspect "$CONTAINER_ID" >/dev/null 2>&1; then
        echo 'Found old container, deleting'
        podman kill "$CONTAINER_ID"
        podman rm "$CONTAINER_ID"
    fi

    # Container image is harcoded at the moment, since Custom executor
    # does not provide the value of `image`. See
    # https://gitlab.com/gitlab-org/gitlab-runner/issues/4357 for
    # details.
    mkdir -p "$CACHE_DIR"
    podman pull "$IMAGE"
    podman run \
        --detach \
        --interactive \
        --tty \
        --name "$CONTAINER_ID" \
        --volume "$CACHE_DIR":"/home/user/cache" \
        "$IMAGE"
}

install_dependencies() {
    podman exec -u 0 "$CONTAINER_ID" sh -c "dnf install -y git curl"

    # Install gitlab-runner binary since we need for cache/artifacts.
    podman exec -u 0 "$CONTAINER_ID" sh -c "curl -L --output /usr/local/bin/gitlab-runner https://gitlab-runner-downloads.s3.amazonaws.com/latest/binaries/gitlab-runner-linux-amd64"
    podman exec -u 0 "$CONTAINER_ID" sh -c "chmod +x /usr/local/bin/gitlab-runner"
}

echo "Running in $CONTAINER_ID"

start_container
install_dependencies

Run script

The run executable (/home/gitlab-runner/fedora/run.sh) will run the commands defined in the .gitlab-ci.yml within the container

#!/usr/bin/env bash

currentDir="$(cd "$(dirname "${BASH_SOURCE[0]}")" >/dev/null 2>&1 && pwd)"
source ${currentDir}/base.sh

podman exec "$CONTAINER_ID" /bin/bash < "$1"
if [ $? -ne 0 ]; then
    # Exit using the variable, to make the build as failure in GitLab
    # CI.
    exit $BUILD_FAILURE_EXIT_CODE
fi

Cleanup script

And finally the cleanup executable (/home/gitlab-runner/fedora/cleanup.sh) will cleanup after every job.

#!/usr/bin/env bash

currentDir="$(cd "$(dirname "${BASH_SOURCE[0]}")" >/dev/null 2>&1 && pwd)"
source ${currentDir}/base.sh

echo "Deleting container $CONTAINER_ID"

podman kill "$CONTAINER_ID"
podman rm "$CONTAINER_ID"
exit 0

The script above doesn’t cleanup the cache. The reason is, that perhaps we need the cache during the next job or during the next pipeline. So an additional cleanup on the host system is needed for purging the cache after a while.

Last but not least

This is just a quick howto. If you wanna implement that, there are a lot of improvements. This should just explain how the custom executor can be used and how to use podman for GitLab CI runner. At the moment there is no support for the tag image (see #4357).

How we configure services: ibox types

As previously mentioned we are using the ibox project as a way to refactor, modernize and share our automation setup with other interested folks. As we are looking back to around 10 years of automating our services using puppet, there might one or the other place where it’s time to do such a refactor. So this whole project is a slow but steady process to make our plans happen: That we – internally, but also others – are able to replicate parts of our infrastructure on a local environment to easily renew, improve, debug or extend it.

Since our last posting we migrated quite a bunch of services to the ibox structure and so it might be a good time to share what you can do with it now. Also it’s a good opportunity for us to get things at least somehow a bit in a shape we can share it (read document).

ibox types

ibox types are somewhat our puppet roles and within our infrastructure in general we try to isolate things as it makes sense. This means we might actually run different ibox types together on a certain node type and some ibox types might be purely isolated within their VM. The same is possible with the ibox allowing you quite nice setups as you might see later.

Though something should be mentioned: we are never removing types from our nodes and so this is also not something we want to enable within the ibox. Removing craft respecting all kinds of different combination can become very cumbersome and given that the ibox’s idea is to easily replicate something within a local development environment, we actually don’t support that here either.

You should be able to easily get an overview of the different types by looking at the ibox::types:: space within the ibox module. Also we try to share certain examples within the vagrant.yaml.example file.

As you might see there is broad range of our services available and more will come in the future. To give you some idea on how to start using that concept, we give you an example how to use a few of the available types.

webhosting & webservices

Besides what most people are using from us – email & other communication services – we are also offering webhosting for your static or php-based websites. Most of the things that are driving the setup for such a webhosting are part of the webhosting and apache modules. You can easily use them to for example replicate your configuration of a webhosting with us locally on your machine using the webhosting type and the following hiera data example.

In short what this does is:

  1. configure a few defaults for a webhosting host, especially with regards to SFTP access.
  2. Setup the databases required for the hostings.
  3. Configure a simple php hosting with installing the php security checker of sektions eins to be automatically installed (Browse to http://php56.ibox-one.local/phpconfigcheck.php to verify our configuration). We activated only one as it should show what the others do as well.
  4. Automatically install a full WordPress installation, that is setup and ready to serve your blogpost. You can retrieve wordpress’ admin password using trocla get wordpress_adminuser_wp.ibox-one.local plain
  5. Automatically install a full Mediawiki installation, that is setup and ready to organize your content.
  6. And last but not least: A SimpleMachine Forum installation ready to go through the web installer. Something which we didn’t yet automate to the end, but we would take merge requests for that! 🙂

All these installations match what and how we are installing things on our webhostings. So if you might have problem debugging something, this might be a good opportunity to get a real shell for your webhosting 🙂

# a webhosting type with mysql
# to illustrate we also install a bunch of hostings
ibox::types: ['webhosting','dbserver']
# we don't need postgres here
ibox::types::dbserver::is_postgres_server: false
# let's get some space for the webhostings
ib_disks::datavgs::www::size_data: 12G
# let's make sure our hostings can login
# over ssh but only using sftp and are chrooted
sshd::sftp_subsystem: 'internal-sftp'
sshd::allowed_groups: 'root sftponly'
sshd::use_pam: 'yes'
sshd::hardened: '+sha1'
sshd::tail_additional_options: |
  Match Group root
         PasswordAuthentication no

  Match Group sftponly
         PasswordAuthentication yes
         ChrootDirectory %h
         ForceCommand internal-sftp
         AllowTcpForwarding no
# the databases we need
ib_mysql::server::default_databases:
  'wp_test': {}
  'wiki_test': {}
  'smf_test': {}
ib_webhosting::hostings::php:
#  'php.ibox-one.local':
#    git_repo: 'https://github.com/sektioneins/pcc'
#  'php54.ibox-one.local':
#    git_repo: 'https://github.com/sektioneins/pcc'
#    php_installation: "scl54"
#  'php55.ibox-one.local':
#    git_repo: 'https://github.com/sektioneins/pcc'
#    php_installation: "scl55"
  'php56.ibox-one.local':
    git_repo: 'https://github.com/sektioneins/pcc'
    php_installation: "scl56"
# setup a wordpress hosting fully automatic
# mind the database above
ib_webhosting::hostings::wordpress:
  'wp.ibox-one.local':
    blog_options:
      dbname: 'wp_test'
      adminemail: 'admin@ibox.local'
# setup a mediawiki fully automatic
# mind again the database above
ib_webhosting::hostings::mediawiki:
  'mw.ibox-one.local':
    db_name: 'wiki_test'
    contact: 'admin@ibox.local'
    sitename: 'mw'
    db_server: '127.0.0.1'
# install a smf hosting, ready to be clicked
# through the webinstaller
ib_webhosting::hostings::simplemachine:
  'smf.ibox-one.local': {}
# setup all diffrent kind of php hostings, either using
# the system php installation or scl installations

Webservices are managed services that we are fully running and providing to our users. Such a service is for example coquelicot powering our dl.immerda.ch service. And you can actually setup your own, by adding the following hieradata:

# webservices
ibox::types: ['webservices',]
#ib_apache::services::webhtpasswd::htpasswd_name: 'ht.ibox-one.local'
# get a coquelicot up and running
ib_apache::services::coquelicot::instances:
  'dl.ibox-one.local': {}

And as an example we have another service htpasswd.immerda.ch of us added there as well.

tor onion services

Since last week we also integrated the way how we are setting up onion services into ibox. And so you can actually make your ibox also available as an onion service:

ibox::types: ['onion_service']
ib_tor::onion::services:
  "%{hostname}":
    '22': {}

So this will create an onionserver called ibox-one (you can get the onion address @ /var/lib/tor/ibox-one/hostname) which makes ssh accessible.

You can combine that and make for example a wordpress that you installed available over an onion service:

ibox::types: ['webhosting','dbserver','onion_service']
# we don't need postgres here
ibox::types::dbserver::is_postgres_server: false
# let's get some space for the webhostings
ib_disks::datavgs::www::size_data: 12G
# the databases we need
ib_mysql::server::default_databases:
  'wp_test': {}
# setup a wordpress hosting fully automatic
# mind the database above
ib_webhosting::hostings::wordpress:
  'wp.ibox-one.local':
    domainalias: ['*.onion']
    blog_options:
      dbname: 'wp_test'
      adminemail: 'admin@ibox.local'
ib_tor::onion::services:
  "wp_os":
    '80': {}

For simplicity reasons we are serving any .onion address for the wordpress host, so we don’t need to read out the generated onion address and rerun puppet, but you might get the idea. Please note that some components might need additional tuning to make them safe, when being used through a local onion service, as lot’s of service assume localhost to be a safe place.

And so you get your wordpress served through an onion service:

$ torsocks curl -I "$(vagrant ssh -c 'cat /var/lib/tor/wp_os/hostname' | cut -c 1-22)/"
HTTP/1.1 200 OK
Date: Fri, 18 Nov 2016 15:12:01 GMT
Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips mod_fcgid/2.3.9
Link: <http://wp.ibox-one.local/?rest_route=/>; rel="https://api.w.org/"
Content-Type: text/html; charset=UTF-8

Validate certificates with perl-Net-Imap

To authenticate users on our SMTP Server (exim), we are using the embedded perl interpreter to do a simple login with the passed credentials against our imap server. So in the end dovecot will do the whole authentication dance.

We used this solution for years, however while deploying newer EL7 based relay smtp servers and testing authentication, it became apparent while looking at the logs that there seems to be an issue with validating the certificate of the IMAP server:

 *******************************************************************
 Using the default of SSL_verify_mode of SSL_VERIFY_NONE for client
 is deprecated! Please set SSL_verify_mode to SSL_VERIFY_PEER
 together with SSL_ca_file|SSL_ca_path for verification.
 If you really don't want to verify the certificate and keep the
 connection open to Man-In-The-Middle attacks please set
 SSL_verify_mode explicitly to SSL_VERIFY_NONE in your application.
*******************************************************************
  at /usr/share/perl5/vendor_perl/Net/IMAP/Simple.pm line 135.

So it seems that newer perl-Net-IO-Socket-SSL versions finally do some proper validation and the old ones never did. Actually, the state of ssl in perl is/was quite worrisome, but things got fixed for the better in the last few years.

So all we had to do was to adapt our authentication-code (Part of the exim_imap_auth module) so that it now enforces validation of certificates. This required us to also package the latest version of perl-Net-IMAP-Simple for our distributions, as none of them have picked up the new versions so far.

varnish host stats – with what is your varnish busy

Every website or http-application hosted in our environment is fronted by a varnish instance (and a nginx to varnish pair if we talk https). We use it to route traffic to the right backend, but also to introduce some easy – but very effective – caching to boost websites.

Sometimes when dealing with lots of requests to a certain domain you might get interested in with what your varnish is busy and then being able to identify traffic even further. Because of the proxy/cache cascade, it’s usually not very effective to do this on the backend and hence you need to analyze what varnish is busy with.

This is where varnishHostStat might become handy, so that you can see which websites or even which parts of a website are causing which amount of traffic. See the README for a detailed description what it can do.

To be able to easier distribute varnishStats onto our systems, we created a simple SPEC file for EL6 & 7 to package this up and distribute on our systems. So packages can also be found in our yum repository

Building the common ground – a basic iBox

In the previous post we introduced the building blocks for the iBox, which can also be used for local development. As an example we brought up the CentOS 7 integration into our environment. So how does this work?

Background

We run a couple of services and they all run on either one or multiple systems. So, if we talk about systems in our setup, we talk about many VirtualMachines all fulfilling some kind of role or – as we call it – type. VMs give us for example a way of separation and isolation, as we don’t want to keep mailboxes alongside user-maintained CMSs, that tend to be outdated and hacked/defaced/… from time to time. But it also eases management, as VMs are much easier to handle, than baremetal systems – even if it’s just to avoid the usually long BIOS roundtrip time.

Among all these different types, you have usually a set of configurations that need to be applied to all systems. Things like: sshd configuration, firewall, ntp, repositories and so on. In the end this means that a VM consists of two parts: a base system – equal among all systems – and the additional software/configuration – representing the specific type of workload.

And this common ground is often the first part, that we need to adapt and hence implement for a new major release of a distribution. As the new version might use newer or other software, tends to solve problems differently and sometimes we can even drop a few workarounds that we put in place on earlier versions.

Getting a box with the basic OS configuration

If we look at how our configuration management is set up, this means that if we start a box without a type, but still run our configuration management on top of it, we will simply get our common ground. And getting such a box is as simple as described in the README of the ibox_base repository.

  1. Install vagrant
  2. Import a stemcell from our stemcell repository
  3. Checkout the repository and its submodules.
  4. Run vagrant up
  5. Profit

After running vagrant up, vagrant will start a VM and once it got access to it, starting to apply our base configuration, that every system should have. We haven’t yet ported every detail of our base configuration to the ibox modules, but the most important things are there, the rest will be added as we move on.

Hiera

Nearly all aspects of our setup can be tuned using hiera and the sames applies to the local development environment you get with iBox.

There is a basic configuration that defines a sane hierarchy for a development environment (our production environment has a bit more levels, but the idea should be clear), which is used while running puppet. The exisiting default.yaml (in manifests/hieradata/default.yaml) should contain everything which is our default configuration of the puppet modules we use.

Furthermore, we ship a sample vagrant.yaml (located in manifests/hieradata/vagrant.yaml.sample) file, which should show you common configuration options, that we have so far ported to iBox. Simply make a copy of it to vagrant.yaml and start editing for what you like to try out.

Types

A basic configuration of the operating system is usually nothing spectacular. More interesting are usually the applications that you run on top of it.

As we move further with our adoption of CentOS 7, we have to verify the configuration of our types on the new major release, which usually means it is a good point to also release our so far secret – but still share able parts – into iBox building blocks.

As an example we can take the dbserver type, representing a database server nodes running (either or both) MariaDB and PostgreSQL.

# enables the type dbserver
ibox::types: ['dbserver']

# configure a postgres admin user called test2 (superuser role)
ib_postgres::server::admin_users:
  'test2': {}
# configure a postgres database called test and
# a role test owning that database
ib_postgres::server::default_databases:
  'test': {}

# setup a mysql user have all privileges on all databases
ib_mysql::server::admin_users:
  'user1': {}
# setup 2 MariaDB databases and for each of them a user
# with the same name and with full privileges on the specific database
ib_mysql::server::default_databases:
  'testdb1': {}
  'testdb2': {}

Having the above in the manifests/hieradata/vagrant.yaml file and a simple vagrant up will give you an iBox with 2 fully configured database servers (e.g. incl. backup) and a few resources (like databases) configured.

We will talk about types and further ports to the iBox as we move forward with releasing the parts.

Introducing iBox – stemcells

As a lot of people might know: We are happy puppet users @ immerda for ages. Also since the very beginning, we try to publish as much as possible of our puppet work and collaborate with others on improving these puppet modules and make them available and (most important!) useful for others as well.

Background

As (for obvious reasons) we can’t publish the whole configuration of our infrastructure, we went with a dual-modules setup: public and private (site-specific) modules. While this was fine in the beginning and it let us at least publish the common ground for our infrastructure, it was very hard to see how in the end all these public modules build up our infrastructure. Within the last 6 years puppet and its ecosystem developed and improved heavily and made a lot of things easier. Drastic changes within the language (e.g. 2.7 to 3.0) made it hard to keep things updated and staying on top of the current development, while still running a platform in production. But once you made the big step, a lot of things became way easier. Especially, some of these changes made it easier to share more of our infrastructure and especially publish our usage of our public modules. Furthermore, it becomes more and more important for us to be able to quickly try something new out, to develop and integrate a new feature into our infrastructure, while not necessarily be too disruptive to it. In the end our users – and also we – are using our infrastructure daily for our digital communication and we don’t want to be too disruptive to it, which would just bury us in (unnecessary) work.

Additionally, more and more people became interested in how we set things up and would like to help out and contribute to our infrastructure in one or another way. So we had a new goal: Make more of our internal (so far secret) sauce publicly available, so that people can inspect it, fix it, contribute to it or even just use it to run their very own version of our infrastructure.

iBox

This lead us to kick off (another) new internal project, where we refactor our current puppet code base in such a way that we can publish more parts of our infrastructure. In the end this could mean that anyone could use our codebase to run her very own instance of our infrastructure. And the basis for that would be something that we named: iBox.

It is a very ambitious initiative and we are far away from the final goal. However, we see it more as a project, that will accompany us in the next years, while improving the infrastructure further. We don’t plan to dedicate too much concentrated effort for it, more it should grow as we grow and move on. So in the past few months we started working on it and are now in a state where we are able to publish first, but still very basic parts of what we envision. And given that CentOS 7 was recently published, the integration of the new version of our most used distribution, was also a good starting point for our effort. We anyway would have to do the integration of the new version into our infrastructure, so why not kick off the iBox initiative and use it to develop the CentOS 7 integration into our infrastructure. Eat your own dog food! 😉

This and a few more upcoming blog posts will introduce you (and also some members of our collective) to what we have currently developed, how we built it and how you can use it. It is far from being useful, but it is a start and we expect to grow and integrate more and more things, as we move on with our infrastructure. We don’t have a concrete time line nor road map (we luckily never have!), but don’t expect it to just die after the few things that we publish.

stemcells

If we look at all the prerequisites that one needs to be able to run the code for our infrastructure, it starts with a very simple thing: a server or at least a virtualmachine (VM). You need an operating system, where you can run puppet to setup all the parts that build an infrastructure. Luckily, with today’s availability of virtualization it is very easy to run VMs on a local desktop computer and in the recent years a lot of nice tools appeared, addressing all the boring and cumbersome steps. One of them is Vagrant, which makes it easy to setup and run development VMs on your local desktop. More about that later.

Vagrant is based on the idea, that you have a set of base-images, which you use as a starting point to apply further modifications using a configuration management tool or good old simple scripts. There are plenty of boxes available, but usually projects established their own process and guidelines how a box should look like (e.g. filesystem-wise). So did we and this is what we refer to as a stemcell. Once you have such a stemcell for different virtualization providers, you can publish them somewhere and people can consume them for their vagrant setups, without redoing the OS installation on their own. Building such a stemcell is no magic and there are different tools available, that make this process very easy: e.g. Veewee or Packer.

We first went with VeeWee, but given that packer is more or less the successor and makes a few things much easier, we ported things over to packer. We already have our own internal kickstarting server (called iBoot), that makes it very easy for us to kickstart (or preseed) new VMs or physical machines based on such a file. So that each installed machine looks the same. Hence, it was easy to just adapt these existing kickstart files for a setup with packer. This is what the ibox-stemcells repository contains.

packer

Packer works in the way, that you write a template, which describes how your images (in our terms: stemcells) should be built on different virtualization platforms. For each of them, you describe a builder. Packer also comes with an internal http server, that can be used to serve the kickstart files or anything else that should be available at installation time. Packer also downloads your (net-)install images, boots the new VM using them, enters the right boot parameter, kicks off the installation and so on.

Once the installation is done, packer will apply a set of defined provisioners that could do further modification of the installed system and could even kick off a configuration management tool. However, as we just want to have a very basic image, which matches what we have after the very basic OS installation, we only apply a few things through a provisioner as a postinstall script. The basis should be as clean as possible, so that we can develop our puppet modules on top of it using Vagrant. This process will be part of a next blog post. Using packer, we now have 2 CentOS 7 stemcells available: One for kvm (qemu) and another one for Virtualbox, which should make it very easy for most people to use them.

There is also already a builder for a vmWare stemcell in place. However, the author failed to get an usable vmWare installation running in an appropriate timeframe and left it out as an exercise for the interested people.

Also it should be very easy to create stemcells for Debian, previous CentOS versions or any other intersting distribution. This should make it really easy to be able to not only develop our manifests on CentOS 7, but adding more support for other distributions.

trocla – get (hashed) passwords out of puppet manifests

Background

At immerda.ch, we try to automate every aspect of our infrastructure, so we can work on more interesting things and let the repetitive and boring work be done by puppet. This means that we also manage a lot of users, required by the different services, with puppet, whether these are plain system-, mysql- or any other kind of users. For some of them, e.g. the SFTP-Users for the webhostings, we are also managing the passwords.

Up to now, we generated and hashed the passwords by hand and put them in our manifests, which means that the password hashes also ended up being version controlled by git. Managing the users and their passwors with puppet works very well and have proven to be a very stable solution. However, it has the disadvantage that a lot of (mainly hashed) passwords are laying around in different places in our infrastracture:

  1. The host on which they are used: In the actual backend (shadow, mysql, …) and the puppet catalog.
  2. On the puppet master: In the manifests that are checked out from git.
  3. In the git repository: This means a) on our internal git server, but b) also checked out on different systems of immerda admins.

Point 1 and 2 are obvious and can’t be changed given that we want local authentication (no central ldap) for most parts of our infrastructure and given that we want to run puppet in master/agent mode as our source of truth for various reasons. Although, we take the protection of the data we handle serious and no immerda admin should ever work with any content from our systems on disks without strong encryption, we think that it is better to not spread data more than it is necessary. So point 3 was in our eyes always a bit annoying.

Meet trocla

To address this issue and also to make password generation a bit more comfortable, we use trocla. Trocla has 2 main parts:

  1. A gem that provides all the logic and a little cli to work with the data
  2. A puppet function to query trocla while compiling the manifests, which fetches the passwords from trocla (and thus generates them if not yet existing).

So instead of generating and hashing the passwords ourself and keeping them in our puppet manifests, read: in our git repository, we simply use a puppet function that will do all of these steps for us and keep our git repositories password free.

How trocla works

Trocla is a wrapper around a key/value storage. Actually, it was built that you can use any kind of key/value storage that is supported by a newer not yet released version of the moneta-gem. By default trocla uses a yaml backend, which should be sufficient for most use-cases. The keys are used in the manifests to lookup the passwords from trocla and the value would be the stored password. That’s more or less the big picture.

However, with only that feature set we could also simply stick with something like extlookup or hiera (or hiera-gpg) and just put our values in a storage file, that is not in our git repository. But lets get a step back and look at all the steps that need to be performed, if we set somewhere a password:

  1. The plaintext password (which a user can later user to login)
  2. The password in the format of the actual service. So for example for local users we use salted SHA-512 passwords, MySQL passwords are stored using a simple SHA1, etc.

Trocla extends this simply key/value lookup with a third argument named format. This argument refers to the format of the password that we are interested in and is used by the service/user we are managing. The format option actually refers to the algorithm that have been used to hash the password. And to automate things a little bit further: trocla will generate a random password, if it does not yet find a password for the key.

In short we can describe trocla’s workflow as followed:

  1. Do I have the key/format tuple stored? Yes? -> Return the stored value.
  2. Do I have the value for the key stored in the plain format? Yes? -> Generate the requested format, store it (for later lookups) and go to 1.
  3. Otherwise: Generate a new random password, store it as plain format for that key and go to 2.

We need to store the hashed passwords, as we always want to return the same password hash for a certain key during multiple runs, so we don’t have to challenge puppet’s requirement for idempotency. Also, as mentioned above at some point (mainly in the beginning) you are usually also interested in the plain password, hence we store that one as well.

Workflow

Now, by using trocla, we are able to get rid off any passwords in our manifests and replace them with puppet-trocla-function calls and puppet will retrieve the passwords during catalog compilation in the right format. This means that the passwords are now only stored in 2 places:

  1. On the host itself: In the compiled catalog and the backend.
  2. On the puppet master: As hashed version and as plaintext password.

So, the only place where the plaintext password is stored is on the puppetmaster, which is anyway our source of truth and central point to manage all our systems. However, if we don’t need the plaintext password on the target host itself, it is not really necessary to keep the password on the puppetmaster. Still, our users should get the plaintext password, so they can actually login and use the service. Would be nice, not? 😉

If we keep trocla’s lookup in mind: Once the hashed password is generated and the plaintext password is not used in any place in the puppet manifests, there is no need to keep the plaintext password on the puppetmaster. As mentioned in the beginning, the trocla gem comes also with a little cli tool to work with its storage. All the different actions of that cli are explained in the README file and the one we are interested in is delete:

$ trocla delete user1 plain
# This will delete the plain password of the key user1 and return it.

The last part of how that command works is the most interesting: This action will not only delete the value of the supplied format, but will also return (read display) it! So we can get the plaintext password, while removing it the from the puppetmaster. 2 important things to remember at this point:

  1. In the manifests, we usually only query the hashed format.
  2. If the hashed password is once stored, trocla will directly return that stored format.

So to wrap up our workflow for generating passwords for our users works now the following way:

  1. Add the new user to the puppet manifests and use the trocla function to query the passwords.
  2. Let puppet run on the target host, so puppet manages the user, hence queries trocla for the password, which will generate the passwords in the first run, but subsequently directly return the hashed password.
  3. Login on the puppet master and query the plaintext password by deleting it. This has the advantage that you not only got the password, but that it’s also not anymore stored on puppetmaster.

Note: Beware that you always delete only the plain format and not hashed format, or no format. The latter will delete and return all stored formats for that key, which is the same as a password reset and deleting a hashed format is only interesting if the format uses a salt and you’d like to resalt the hashed password, but keeping the plaintext format. However for both issues, there are other actions provided by the trocla cli.

Supported hashes and more

Trocla currently only supports a few hashes:

  • bcrypt: -hashed passwords
  • md5crypt: salted MD5-shadow passwords
  • mysql: SHA1-Hashes for MySQL-Users
  • pgsql: MD5 hashed passwords for PostgreSQL, that are salted with the username, which you need to pass as an option
  • sha256crypt: salted SHA256-shadow passwords
  • sha512crypt: salted SHA512-shadow passwords

However, trocla is built-in mind to easily extend it with further formats and if you look at the various formats you should be able to quickly get an idea how to extend trocla with further formats. Git pull requests are always welcome!

And to finish a few examples, how trocla is used in our manifests:

# common usage:
webhosting::static{'www.immerda.ch':
  ...
  password => trocla('webhosting_www.immerda.ch','sha512crypt');
}
# format requires an option:
postgres::role{'some_user':
  ...
  password => trocla('postgres_some_user','pgsql','username: some_user');
}

But we took that part even a step further and integrated the usage of trocla in completely transparent manner into our manifests. Examples can be found in the mysql module or the webhosting module.

Future

Trocla gives us now an automated integration of password storing and generation into puppet manifests. If you take the steps taken to that point a little bit further, we see plenty of more options to automate various things further and probably also to integrate them with other interfaces (to users?). So that various configuration parts of webhostings could be done by the users themselves, but would still be managed by puppet.

Storing mail credentials using bcrypt

We wanted to migrate the hashes of our mail user database for some time now. We couldn’t sleep at night anymore since there were still md5 passwords in there. This database is mainly used by exim and dovecot in our setup.

First our plan was to migrate to salted sha512 like it is described in the dovecot wiki. But this migration approach has a huge problem: all passwords are sent to the sql server in plaintext – just to be able to refer to them in a post login script. This is rather insane since there is really no technical reason why the sql server needs the plaintext passwords.

So we went off writing our own authentication script that implements the dovecot checkpassword specification. Now we can migrate our hashes (or everything else we want to do) while checking the password.

And most importantly: inspired by this little post we also decided that it would make sense to use a more sane hash function to store the passwords – namely bcrypt instead of sha.

If you want to check out our solution skip down to the dovecot howto. Please just write us if you need help with this. It is still quite alpha and not so well documented.

Other services

Our MTA (Exim) now uses imap to authenticate smtp users. The solution is originally from here.

For the users to change their passwords we use horde-passwd which does not support bcrypt. We fixed this by extending the sql backend with a custom driver that assumes passwords are in bcrypt. We agree that this is a hack…

Future Plans

The whole approach should be integrated into the tools we use. In the long run it would definitely pay off to directly extend dovecot, exim and horde to support this authentication. If we ever have time we should write patches for them.

Since we now have our custom authentication solution it would be cool to do even more stuff with it.

For example a long standing plan would be to use encfs to encrypt the maildirs. On login we would decrypt the maildir of the user and copy in all mails he got since the last login. When he’s not logged in we couldn’t access his mailbox anymore!

Dovecot Howto

If you’re interested in using our checkpassword script, it is available in our git repo. To get it running you need to:

  • set CONFIG_DIR in checkpasswd-bcrypt.incl.rb
  • adapt checkpasswd-bcrypt.conf.rb to your needs by providing a db access and the names of your db columns
  • adapt the sql queries in checkpasswd-bcrypt.sql.conf.rb to your setup
    (There is also a query to store the month of the last login. You can disable this in the config with KeepLastlogin = false)
  • set the dovecot config to use the checkpasswd-bcrypt.rb script

Arver – distributed LUKS key management

We recently developed Arver for LUKS – a distributed key manager with benefits!

If you want to use it too check out arver at codecoop.

It is not just another tool to conviniently store passwords for LUKS. No, it is a shiny monkeywrench for all sorts of challenges you face when administrating more than one server with encrypted harddrives. Plus it even enhances LUKS security in several ways.

But let me prove this by giving some examples:

shared passwords no more

Shared passwords are arguably one of the worst threats to your environment. They are hard to change, hard to revoke, hard to keep safe and tend to be simpler than advised. With arver every admin has its own gpg-key that is used to grant him access to the LUKS disks. Moreover access can be granted on a per-device basis!

Lets assume i created a new LUKS partition and want to grant bob access to it: ‘arver –add-user bob a_server/a_disk’ will assign a new passphrase to a_disk on a_server and store it encrypted with bob’s public gpg-key as arver-key. Bob can then use this arver-key to open a_disk. No need to communicate any password in plaintext!

mind the rubberhose

Well what would Alice do if Bob made her aware that he might be under pressure to release any internal data. She would just execute ‘arver –del-user bob ALL’

And even if she didn’t do this Bob could always claim that he doesn’t have access to a particular disk since his arver-key doesn’t reveal for which disks it is.

more uptime

Arver lets you automate all tasks surrounding LUKS managemant. It has script hooks for pre-/post open/close. Imagine you had a power outtake in a_colo. With the right setup it should be enough to: ‘arver –open a_colo’.

This will loop over all hosts at a_colo, e.g. first executing pre_open scripts on a disk-basis that create a loop-device. Then post-open scripts on a host-hasis to start all virtual servers that were waiting for a LUKS disk to be opened.

interested?

If you’d like to know more about Arver we recommend reading the man page, look at this confusing diagram or download arver directly as gem.