Continuously building our container image chain

Container images can be layered on each other, so that you do not need to always rebuild different layers from scratch.
This helps you to a) standardize your container images and b) to not to repeat yourself all over the place.
As an example, we have a couple of ruby applications, and they all require ruby, bundler and some common dependencies, since every ruby app eventually bundles nokogiri and thus requires a bunch of libxml and libxslt libraries. Simplified this means we have the following chain:

base -> ruby -> app

At immerda we build our own base images from scratch so we can fully control what is in them. E.g. we add our repository, our CA certificate and so on. Then we have a common ruby image, that can be used to develop and test ruby applications. And in the end we might package the application as an image, like our WKD service.

We are updating our systems on regular and unattended basis to keep the systems itself up to date with the latest security releases and bug fixes. Now when it comes to container images, we are also pulling the images of running containers on a regular basis and restarting the containers / pods when there was update to the image tag the container is referring to. This way we do not only keep the applications packaged in containers up to date, but are also making sure, that we get security fixes for lower level libraries, like e.g. openssl (remember heartbleed?) in a timely manner.

Since container images are intended to be immutable (and we actually run nearly all of our containers, with –readonly=true) you want to and must rebuild to keep them updated.

Now when it comes to pull and use images from random projects or registries (looking at you docker-hub), we try to vet images at the time we start using them that they are also continuously built. As an example the official library images, like the official postgres images are good examples for that.
For our own images, we are doing that through gitlab-ci.

However, since images are layered on each other, it means, that you need to rebuild your images in the right order, so that your ruby image builds on top of your latest base image and your app image uses the latest ruby updates.

Gitlab-ci has limited capabilities (especially in the free core version) when it comes to modelling pipelines across projects in the right order. However, with building blocks as webhook events for pipeline runs, we can orchestrate that pretty well.

This orchestration is done using a project form our friends at Autistici/Inventati called gitlab-deps. This tool keeps track of the dependency chain of your images and triggers the pipelines to build these images in the right order.

Central to gitlab-deps is a small webhook receiver, that receives events from our gitlab-ci pipelines and based on the dependency list of your images, it triggers the next pipelines. We are running the webhook receiver in a container and it scans the immerda group in gitlab for projects, either using a Containerfile/Dockerfile FROM statement or a list of depending projects in a file called .gitlab-deps. The latter is much more accurate and likely the better idea to describe the dependencies of your project. As an example see the gitlab-deps container project itself.

gitlab-deps itself requires an API token for gitlab, that is able to trigger pipeline runs, as well as adding webhooks to the project. For now this seems to be a token from a user with the Maintainer role.

We then also trigger a systemd-timer, that runs a script in the container to update all the dependencies as well as add webhooks to all the projects we know of.

For our base images, we defined a scheduled pipeline run to kick off building the initial lowest layer of our images, which – if successful – will trigger the next layers and then the next layer and so on.

If one of the pipelines fails gitlab-deps won’t trigger anything further down the chain. Gitlab will notify you (e.g. via email) about the failed pipeline run, you can investigate and if you are able to re-trigger a successful pipeline run, the chain continuous to be built.

While we are using this now for our image build chain, gitlab-deps can also be used to rebuild/update projects that are depending on a library and so on. While moving more of our work into gitlab and hooking it into CI, we might also start using these dependency triggers for other kinds of pipeline chains.

lvm cache create + resize

LVM allows to have a caching layer, where your actual LV resides on spinning (slow) disks and you have a caching layer in a secondary LV, that caches some of your most frequent reads and the writes. From an end-user perspective the details are transparent: one single blockdevice. For a good overview and introduction see the following blog post: Using LVM cache for storage tiering

In our case mainly writeback cache is mainly interesting and we are adding a raid 1 SSD/NVME to the VG for the spinning disks (usually raid10).

Create

Once some fast PV has been added to the VG, we can start caching individual LVs within that VG:

lvcreate --type cache --cachemode writeback --name slowlv_cache --size 100G storage/slowlv /dev/md3

Where:

  • slowlv_cache – the name of the caching LV
  • size: The cache size
  • storage/slowlv: the vg & slow LV to cache
  • /dev/md3 the SSD/NVME PV in the vg storage

Resize

You cannot directly resize a cached LV, rather you need to uncache, resize and then add the caching LV again. When uncaching, the not yet written-through data gets synced down. This might take a while depending on your cache size and speed of the slow disks.


lvconvert --uncache /dev/storage/slowlv
lvresize -L +200G /dev/storage/slowlv
lvcreate --type cache --cachemode writeback --name slowlv_cache --size 100G storage/slowlv /dev/md3

The last command is exactly the same one as we initially used to create the cache.

Intercept traffic sent over a socket

What is the easiest way to intercept traffic sent over a UNIX Socket?

In general a socket is just a file, so you an use strace for any programm and capture what it writes there. BUT the data you capture there won’t be easy processable by something like wireshark or so.

The other way is that you setup a socat-chain, which proxies the data through a local port and then you capture the data there. You do that, by either pointing the clients or the server to another socket to write to or read from.

Let’s assume usually clients connect to /tmp/socket and we can more easily change them.

(terminal 1)$ socat -t100 -d -d -x -v UNIX-LISTEN:/tmp/socket.proxy,mode=777,reuseaddr,fork \
TCP-Connect:127.0.0.1:9000
(terminal 2)$ socat -t100 -d -d -x -v TCP-LISTEN:9000,fork,reuseaddr  UNIX-CONNECT:/tmp/socket
(terminal 3)$ tcpdump -w /tmp/data.pcap -i lo -nn port 9000

Now reconnect the client to /tmp/socket.proxy and tcpdump will record the traffic flowing over the socket as normal tcp packets.

Another pygrub adventure

So the bug from the article Get Centos 7 DomU guests booting on Xen 4.1 hit us again, after a while as we wanted to reboot a few guests.

Everything still seemed to be fine within the guest, nevertheless pygrub on host wasn’t able to find a kernel to boot.

Revisiting one of the old xen bus that we already searched on our previous experience, showed us that we also need to patch pygrub on the host:

> -            title_match = re.match('^menuentry ["\'](.*)["\'] (.*){', l)
> +            title_match = re.match('^menuentry ["\'](.*?)["\'] (.*){', l)

in /usr/lib/xen-4.1/bin/pygrub and your guests boot again.

Validate certificates with perl-Net-Imap

To authenticate users on our SMTP Server (exim), we are using the embedded perl interpreter to do a simple login with the passed credentials against our imap server. So in the end dovecot will do the whole authentication dance.

We used this solution for years, however while deploying newer EL7 based relay smtp servers and testing authentication, it became apparent while looking at the logs that there seems to be an issue with validating the certificate of the IMAP server:

 *******************************************************************
 Using the default of SSL_verify_mode of SSL_VERIFY_NONE for client
 is deprecated! Please set SSL_verify_mode to SSL_VERIFY_PEER
 together with SSL_ca_file|SSL_ca_path for verification.
 If you really don't want to verify the certificate and keep the
 connection open to Man-In-The-Middle attacks please set
 SSL_verify_mode explicitly to SSL_VERIFY_NONE in your application.
*******************************************************************
  at /usr/share/perl5/vendor_perl/Net/IMAP/Simple.pm line 135.

So it seems that newer perl-Net-IO-Socket-SSL versions finally do some proper validation and the old ones never did. Actually, the state of ssl in perl is/was quite worrisome, but things got fixed for the better in the last few years.

So all we had to do was to adapt our authentication-code (Part of the exim_imap_auth module) so that it now enforces validation of certificates. This required us to also package the latest version of perl-Net-IMAP-Simple for our distributions, as none of them have picked up the new versions so far.

Get Centos 7 DomU guests booting on Xen 4.1

All newer physical servers of our infrastructure are KVM hypervisors. However, given that we also have a bunch of older hardware around, that do not yet have the right CPU extensions for Hardware virtualization, we are still running some Xen-based hypervisors. As RedHat (and so CentOS – in an initial phase) discontinued support for xen Dom0 (since EL6 only Xen guests are supported), we’re running these hypervisors on Debian Stable. Which at the moment is wheezy, running Xen 4.1.

So far we rolled out new hosts (or updating existing ones) only on kvm based hypervisors and this worked fine. Yesterday, we wanted to update an existing EL6 guest on a Xen-based hypervisor to an EL7 guest, by simply re-installing it and reapplying our automation.

Installation went fine, however afterwards we were not able to boot the installed system. We are using pygrub as a bootloader to extract the kernel from within the guest for xen. And it looked like pygrub wasn’t able to correctly extract the kernel:

# virsh start guest
error: Failed to start domain guest
error: POST operation failed: xend_post: error from xen daemon: (xend.err "Boot loader didn't return any data!")
# /usr/lib/xen-default/bin/pygrub /dev/mapper/guest_rootfs 
Traceback (most recent call last):
  File "/usr/lib/xen-default/bin/pygrub", line 704, in <module>
    chosencfg = run_grub(file, entry, fs, incfg["args"])
  File "/usr/lib/xen-default/bin/pygrub", line 551, in run_grub
    g = Grub(file, fs)
  File "/usr/lib/xen-default/bin/pygrub", line 206, in __init__
    self.read_config(file, fs)
  File "/usr/lib/xen-default/bin/pygrub", line 410, in read_config
    raise RuntimeError, "couldn't find bootloader config file in the image provided."
RuntimeError: couldn't find bootloader config file in the image provided.

Searching for the error didn’t help much, but it became clear that Debian stable’s pygrub version is not (yet) able to read the newer grub2 config file within the EL7 guest. Locally patching pygrub didn’t really help nor is there a more current version (supporting the newer grub2 config file) file available.

Luckily someone already had the same issue with Fedora 20, which is the basis for EL7. And meanwhile he also had the proper solution in a post kickstart section for EL7.

Integrating this in our kickstart process, made the re-installed guests also booting on a Debian stable Xen 4.1.