Automatic certificate renewal with Let’s Encrypt and DnsMadeEasy on Kubernetes

These days, it is ill-advised to run a website (such as this one), over HTTP, even if there is no security risk at all. When hosting your website on HTTP, users will see a warning triangle in the address bar and many users will simply turn away. This is especially painful if the website is blogging on devops and security related things.

Currently, I am migrating everything I have locally to a more secure and future proof setup from VMs to kubernetes. The approach I decided to take is to start with the front-facing services and move everything from there step by step. Therefore, the first step is to move the reverse proxy from a VM to kubernetes. Since I don’t want to pay for certificates, I will be using Let’s Encrypt. Unfortunately, Let’s Encrypt only supports 90 day certificates so this means a lot of certificate renewals.

The ACME protocol

To support this Let’s Encrypt supports the ACME protocol, which allows for automatic certificate renewal. The ACME protocol automates the complete process. A certificate renewal process looks like this:

  • create a new private key
  • create a certificate signing request using the private key
  • request a certificate from the CA (Certificate Authority)
  • the CA asks for verification of ownership of the domain either using a HTTP challenge for a single domain or a DNS challenge for a wildcard domain. The HTTP challenges amounts to putting a certain file with the request content (the challenge) in the URL space of the domain. The DNS challenge typically asks for a TXT record to be created in the DNS for the domain with the requested content. If someone is able to do that then ownership is confirmed
  • confirm to the HTTP or DNS challenge by creating the appropriate file or DNS record respectively
  • notify the CA that the challenge was answered
  • the CA checks the challenge
  • the CA issues the certificate after the challenge is verified.

Automatic renewal: cert-manager

There are various tools for automatically certificates updates based on the ACME protocol, but most of them are for VMs and have options for directly updating apache of nginx configuration files. Imagine what kind of horrot this could be if such a script would corrupt your special/home-grown configuration. On kubernetes things are a lot more clean using cert-manager.

Cert-manager introduces a number of concepts:

  • Certificate: A custom resource describing a certificate. This defines the names (DNS alternative names) that will appear on the certificate as well as the name of the TLS secret that will be created and which is used by ingress.
  • Issuer: The object that is responsible for using the ACME protocol to contact Let’s Encrypt. It uses a webhook for the specific DNS provider. The task of the web hook is to create and remove the challenge records as required by Let’s Encrypt. In my setup I am using the DNS challenge since I want to get wildcard certificates.
  • Webhook: A custom webhook that is used by an issuer to comply to the challenges from the CA.

Installation

Install cert-manager by following the instructions on the website. In my case, I did the following:

helm repo add jetstack https://charts.jetstack.io
helm repo updatehelm install \
  cert-manager jetstack/cert-manager \
    --namespace cert-manager \
    --create-namespace \
    --version v1.8.0 \
    --set installCRDs=true

After this, I used cmctl to verify the installation

OS=$(go env GOOS); ARCH=$(go env GOARCH); curl -sSL -o cmctl.tar.gz https://github.com/cert-manager/cert-manager/releases/download/v1.7.2/cmctl-$OS
-$ARCH.tar.gz
tar xvf cmctl.tar.gz
cmctl check api

The cert-manager website contains  more instructions to verify the installation using a self-signed certificate. It is recommended to follow these instructions to make sure that everything works.

Next up is the installation of the DNS madeeasy webhook.

helm repo add k8s-at-home https://k8s-at-home.com/charts/
helm repo updatehelm install dnschallenger k8s-at-home/dnsmadeeasy-webhook --namespace cert-manager -f values-dnsmadeeasychallenger.yaml 

The values-dnsmadeeasychallenger.yaml configuration file is used to define the groupName which is used by the issuer to identify the webhook:

values-dnsmadeeaychallenger.yaml

# This name will need to be referenced in each Issuer's `webhook` stanza to
# inform cert-manager of where to send ChallengePayload resources in order to
# solve the DNS01 challenge.
# This group name should be **unique**, hence using your own company's domain
# here is recommended.
groupName: dnsmadeeasy.challenger

 

Note that I installed the webhook in the same namespace as cert-manager since it is an integral part of certificate management and belongs in the same namespace.

Configuration

We need to define some essential information for everything to work:

  • the DNS alternative names of the certificate. These are configured in the Certificate resource.
  • the name of the TLS secret that will contain the generated certificates. This is configured in the Certificate resource.
  • the API key and secret to be able to use the DnsMadeEasy API. This is a separate Secret resource.

Examples for brakkee.org are as follows:

brakkee-org-certificate.yaml

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: wildcard-brakkee-org
  namespace: exposure
spec:
  dnsNames:
  - "*.brakkee.org"
  - "brakkee.org"
  issuerRef:
    name: dnsmadeeasy-issuer
    kind: Issuer
  secretName: brakkee-org-tls-secret

Note that I am defining the root domain brakkee.org as a separate name. This is because the wildcard comain *.brakkee.org only covers subdomains of brakkee.org but not the root domain brakkee.org.

dnsmadeeasy-apikey.yaml

apiVersion: v1
kind: Secret
metadata:
  name: dnsmadeeasy-apikey
  namespace: cert-manager
type: Opaque
stringData:
  key: your_key_here
  secret: your_secret_here

Apart from this, we need to configure the issuer to use the webhook and to pass the DnsMadeEasy API key and secret on to the webhook to comply to the challenge:

dnsmadeeasy-issuer.yaml

apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: dnsmadeeasy-issuer
  namespace: exposure
spec:
  acme:
    #server: https://acme-staging-v02.api.letsencrypt.org/directory
    server: https://acme-v02.api.letsencrypt.org/directory
    email: info@brakkee.org
    privateKeySecretRef:
      name: hosting-key-secret
    solvers:
    - dns01:
        webhook:
          groupName: dnsmadeeasy.challenger
          solverName: dnsmadeeasy
          config:
            apiKeyRef:
              name: dnsmadeeasy-apikey
              key: key
            apiSecretRef:
              name: dnsmadeeasy-apikey
              key: secret

In the above file, note the server attribute where you can use the staging URL for Let’s Encrypt if you want to test. This is useful because of the strict rate limits on the Let’s Encrypt API. Note the groupName  which links back to the webhook we installed earlier. Also note the reference to the API key sedret to pass on configuration values to the web hook. The private key is stored in a separate secret (hosting-key-secret). This secret is created by the Issuer.

After applying the above resources, first a temporary secret will be created in the exposure namespace. After it is finished, you should have a new TLS secret brakkee-org-tls-secret in the exposure namespace that can be used in an ingress rule. During the process you can do a kubectl describe on the certificate resource to see the progress.

Note that I am creating all these resources in a separate exposure namespace. This is the namespace where I will have all SSL termination using ingress. Backend applications will typically be running in different namespaces. This will turn out to be important in a future post where I will go into setting up network policies to improve security.

Using the TLS secret in an ingress rule

Using the TLS secret in an ingress rule is straightforward. In my case I am forwarding all traffic for brakkee.org and *.brakkee.org to the same HTTPD backend service. For example:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ingress-brakkee-org
  namespace: exposure
spec: 
  tls: 
  - hosts:
    - "*.brakkee.org"
    - "brakkee.org"
    secretName: brakkee-org-tls-secret
  rules:
  - host: "brakkee.org"
    http: &proxy_rules
      paths:
      - path: /
        pathType: Prefix 
        backend:
          service:
            name: httpd-reverse-proxy-brakkee-org
            port: 
              number: 80
  - host: "*.brakkee.org"
    http: *proxy_rules

A nice trick here is to use a YAML anchor (&proxy_rules). This allows me to avoid duplicating the same rules for brakkee.org and *.brakkee.org. The backend service httpd-reverse-proxy-brakkee-org is not shown in this post but it is not too hard to adapt the example to use your own backend service. It is easy to add more domains with their own certificates simply by adding hosts entries to the tls section and by adding rules.

Certificate renewal

I have tested certificate renewal by using the spec.renewBefore field in the Certificate to force earlier renewal. For example:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: wildcard-brakkee-org
  namespace: exposure
spec:
  renewBefore: 2136h
  ...

It is not possible to request a shorter duration of the certificate since Let’s Encrypt always generates 90 day certificates. Just choose renewBefore counting back from the 90 day expiry time of the certificate.

Force regeneration of the certificate

To force regeneration of the secret you can just delete the generated secret and it will get generated again. The same applies to the host key secret.

Rate limits

If you run into a rate limit at Let’s Encrypt, note that the rate limit applies to the list of requested DNS alternative names (*.brakkee.org, brakkee.org in my case). The current rate limit is 5 times per week for each unique combination of DNS alternative names. If you run into this limit, then temporarily add another DNS alternative name such as x.y.brakkee.org to the list. This name does not fall under *.brakkee.org. Using this it is easy to work around the limit. It is of course always better to use the staging URL of Let’s Encrypt before you start using tricks.

Costs

To use the DnsMadeEasy APIs I had to upgrade my membership to a business membership which costs 75 USD per year. But at least I am not manually updating certificates and not paying a similar amount per domain.

Posted in Devops/Linux | Leave a comment

Hosting services on Google Kubernetes Engine

This post explains how to host services on Google Compute Engine, parts of this are applicable to regular (non-GKE) Kubernetes clusters as well. This post will cover:

  • allowing multiple services to be deployed in different namespaces
  • allowing multiple (sub)domains to be deployed
  • making a service available through a public and fixed IP accessible using a host name
  • making it accessible over HTTPS using a certificate
  • HTTP redirect to HTTPS

The treatment of the subject is a bit high level with focus on what resources and fields are relevant and on the way different Kubernetes resources relate to each other. Also some snippets of YAML configuration are given. This should be sufficient to ‘roll your own’.

Deploying services in different namespaces

The basic architecture we start off with is as follows:

PlantUML Syntax:<br />
hide circle<br />
class Ingress {<br />
namespace<br />
}<br />
class Service {<br />
namespace<br />
}<br />
class Deployment {<br />
namespace<br />
}</p>
<p>Ingress –> “*” Service<br />
Service –> “1” Deployment</p>
<p>

In this basic setup,  there is one Ingress object in a certain namespace and each Service that it exposes is in a separate namespace together with its deployment. There is a limitation in Ingress that does not allow it to be used when Services are in a different namespace than the Ingress resource. This limitation is expected to be lifted in a future version of Kubernetes using the Gateway API.

One workaround is to deploy a separate Ingress in every Service namespace. However, this leads to other problems since than separate IPs must be used for each service.
A better solution is to use a separate ingress namespace in which to deploy a reverse proxy (in the example httpd is used, but nginx is of course also possible) and from there to setup a reverse proxy using the cluster local DNS name for the service. Given that a service x is deployed in namespace y, it can be accessed using the DNS name x.y.svc.cluster.local.

On apache, this can be a configuration such as this:

ProxyPass /hello http://X.Y.svc.cluster.local/hello
ProxyPassReverse /hello http://X.Y.svc.cluster.local/jenkins

The architecture for two services resp. x in namespace y and x2 in namespace y2 then becomes:

PlantUML Syntax:<br />
package “namespace: exposure” {<br />
object “expose:Ingress” as expose<br />
object “httpd:Service” as httpdservice<br />
object “httpd:Deployment” as httpddeploy<br />
object “httpd-config:ConfigMap” as httpdconfig<br />
}<br />
package “namespace: y” {<br />
object “x:Service” as xservice<br />
object “x:Deployment” as xdeploy<br />
}<br />
package “namespace: x2” {<br />
object “x2:Service” as x2service<br />
object “x2:Deployment” as x2deploy<br />
}<br />
expose –> httpdservice<br />
httpdservice -> httpddeploy<br />
httpddeploy “\nx.y.svc.cluster.local” –> xservice<br />
httpddeploy -> httpdconfig<br />
xservice -> xdeploy<br />
httpddeploy “\nx2.y2.svc.cluster.local” –> x2service<br />
x2service -> x2deploy</p>
<p>

This architecture is quite flexible. It allows SSL termination on a using a single Ingress rule and provides complete flexibility in allocating services to namespaces.

The apache deployment is configured using a config map that contains the httpd.conf file in a ConfigMap resource. The ConfigMap is mounted into the pod as follows:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpd
  namespace: exposure
  ...
spec:
  ...
  template:
    ...
    spec:
      containers:
        - name: httpd
          image: httpd:2.4
          ...
          volumeMounts:
            - name: httpd-conf
              mountPath: /usr/local/apache2/conf/httpd.conf
              subPath: httpd.conf
      volumes:
        - name: httpd-conf
          configMap: 
            name: httpd-config

Note the use of the subPath configuration in the deployment spec to mount only a single file from the httpd-conf volume, which keeps all other files in the /usr/local/apache2/conf/ directory intact. The config map can be created from an existing httpd.conf file as follows:

kubectl create configmap --namespace exposure \
  httpd-config --from-file=httpd.conf

The httpd.conf used is an adaptation of the file that comes standard with the container. Since we are terminating SSL using Ingress, the httpd config is without any SSL configuration or reference to certificate files.

Exposing the service externally

To expose the httpd service externally (and thereby all services for which it is a reverse proxy), we need to introduce the FrontendConfig resource. The FrontendConfig is a custom resource that is only available on GKE as far as I know and this is used to configure Ingress features.The SSL Policy is a GCP object that defines SSL features for the HTTPS connection.

PlantUML Syntax:<br />
package “Kubernetes” {<br />
object “expose:Ingress” as expose<br />
object “frontendconfig:FrontendConfig” as frontend<br />
}<br />
package “GCP” {<br />
object “gke-ingress-ssl-policy:SSLPolicy” as policy<br />
object “example-ip:IpAddress” as ipaddress<br />
object “example-com:PreSharedCertificate” as certificate<br />
}</p>
<p>expose -right-> frontend<br />
frontend -down-> policy<br />
expose -down-> ipaddress<br />
expose -down-> certificate<br />

Using this setup, we can configure:

  • the certificate used
  • HTTP to HTTPS redirect
  • the external IP address

It is most easy to look at the Ingress configuration:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: httpd-ingress
  annotations:
    #kubernetes.io/ingress.allow-http: "false"
    kubernetes.io/ingress.global-static-ip-name: example-ip
    ingress.gcp.kubernetes.io/pre-shared-cert: example-com
    networking.gke.io/v1beta1.FrontendConfig: frontendconfig
  namespace: exposure
spec:
  # tls config not needed since a pre-shared certificate
  # is used. 
  #tls:
  #  - hosts:
  #      - example.com
  #    secretName: example-com-certificates

  rules:
    - host: example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: httpd
                port:
                  number: 80

First of all, we see that all traffic is forwarded to httpd, which is ok since httpd distributes the traffic over backend services and as far as Ingress just routes to httpd. Next, we see several other features.

The annotation kubernetes.io/ingress.allow-http can be used to disable http but this is commented out since we allow http and will do a redirect to https instead using the FrontendConfig.

Next, the annotation kubernetes.io/ingress.global-static-ip-name defines the name of a previously created static IP address in GCP which is the static IP address under which the services are exposed.

The annotation ingress.gcp.kubernetes.io/pre-shared-cert defines the name of the pre-shared certificate that is used.

A pre-shared certificate is a certificate registered at GCP. Based on the domain name, GCP automatically chooses the certificate. This means that explicit configuration of the certificate using the tls section is not needed and this part is therefore commented out. It also allows the same certificate to be used by multiple GKE clusters and is thus lower maintenance than using the alternative setup by defining a secret per GKE cluster for the certificates.

To create the pre-shared certificate, concatenate the crt files of the certificate containing the certificate and certificate chain and put it into a single file as follows:

  cat example.com.crt example.com.2022.crt > full.crt

Next, create the pre-shared certificate from this file and the private key file:

gcloud compute ssl-certificates create example-com \
  --project example-project \
  --global \
  --certificate=full.crt \
  --private-key=example_2022.key

The final part is the networking.gke.io/v1beta1.FrontendConfig annotation which links the ingress resource to the frontend config.

The FrontendConfig, finally, is as follows:

apiVersion: networking.gke.io/v1beta1
kind: FrontendConfig
metadata:
  name: frontendconfig
  namespace: exposure
spec:
  redirectToHttps:
    enabled: true
  sslPolicy: gke-ingress-ssl-policy

In this configuration we see that the redirect from http to https is configured. Also, there is reference to an SSL policy which defines SSL features. For instance:

gcloud compute ssl-policies create gke-ingress-ssl-policy \
    --profile MODERN \
    --min-tls-version 1.2
Posted in Devops/Linux | Leave a comment

Optimal payment scheme for a Dutch Bank Savings Mortgage, using a Mixed Integer Linear Program

There is a special type of mortgage in the Netherlands (Bank Savings Mortgage or bankspaarhypotheek in Dutch), which I presume, does not exist anywhere else in the world. The construct is that you have a mortgage on your house for a certain amount, say 200000 €, and split it up into two parts:

  • one part with a loan (in this case 200000), where you pay only interest
  • a second part which is a savings account.

Usually both of these accounts have the same interest rates and you pay a fixed amount per year so that in the end the savings account can be used to pay off the loan. For the bank, this construct is exactly identical to having just one account with a loan that is paid off like an annuitary mortgage with a fixed amount per month.

The difference is in the way this type of mortgage is taxed. On the loan part, the interest is tax deductable, and the savings account is not taxed at all. Therefore in comparison with an annuitary mortgage, where the loan gets less over time (it is equivalent before taxes), there is more tax deduction. And this is what makes it an attractive type of mortgage.

With this type of mortage it is possible to pay off parts of the loan part or to add extra money to the savings account. When given a choice, adding a given amount to the savings account is a lot more attractive than using it to pay off the mortgage. Even though it is equivalent to the bank, putting it into the savings account leads to a higher tax deduction.

But of course, the government has put forward some rules to prevent people from profiting too much, and these make it challenging to determine an optimal scheme to pay of the mortgage to minimize total payments made. For this purpose, this post examines a simple Mixed Linear Integer Program (MILP). Continue reading

Posted in Data Science | Leave a comment

Setting up up a deep learning box

After doing a number of courses on machine learning I now have some overview of what is available and how it all works. So now it is getting time to start doing some work from start to finish myself. To do some of the more interesting things you definitely need access to a system with a good GPU and the systems I have at home are not really suitable for this:

  • My work laptop has an Nvidia GT Quadro 1000M which has compute capability 2.1
  • My private laptop has a GPU (Nvidia GT 330M) which has compute capability 1.2
  • My server which does not have a GPU and so has compute capability 0

On the other hand popular frameworks like Tensorflow require, as of this writing, compute capability 3.0. This effectively rules out the use of my private and work laptops.

As alternatives, I considered starting of in the cloud by using Google or Amazon GPU offerings. But then the workflow there is also always to first setup some work at home and then do the same in the cloud. Also, costs can add up quite quickly if you go that way. Then another alternative is to get a new laptop or a new PC with a fast GPU. That seems nice since it opens up some more opportunities for gaming as well, but then I am not really a gamer, and it also feels like a bit of a shame to get another laptop/PC when my current one is still working fine (A Sony Vaio F11 laptop with 1.6GHz CPU). My current laptop is running linux most of the time and really still performs quite well.

Then I started looking at another possibility which is to add a GPU to my server. In fact, this turns out to be possible since my server has a free PCIe 2.0 x16 slot. Looking on the internet it seems that PCIe 3.0 cards should work without problems in PCI 2.0 so that makes it possible. Also, there is a way to setup a VM on KVM so that GPU accelerated computing can be used inside a VM, see for instance server world. This is preferred over running natively. The idea will be to do a lot of (long running) experiments locally and if I really want to do something big ‘rent’ some capacity in the cloud.

To do all this, I first had to upgrade my server which is now running Centos 6 on the virtual host to Centos 7. Well “upgrade” is a big word. It involved installing Centos 7 side by side to Centos 6 and getting everything to work again. Now that part is done. The next step is to get a nice graphics card (e.g. Nvidia GTX 1080 or 1070) and set that up in the server.  This will be interesting.

Posted in Data Science, Devops/Linux, Fun, Software | Leave a comment

Why finalizers are really bad

It is more or less common knowledge that using finalize functions in java is bad. For one, you are depending on garbage collection for cleanup and there is no guarantee when the finalizer will be called. Further there is also no guarantee it will ever get called even if the appliction terminates nicely using System.exit().

However there is a far more important reason why finalizers are bad. Continue reading

Posted in Java | Leave a comment

Encrypting an existing Centos install (2)

In a previous post, I described how to encrypt an existing Centos install that approach was based on find out how LUKS worked and then creating a storage logical volume that was encrypted with then logical volumes on top of that to contain the original data. The main disadvantage of that approach was that it was not possible to encrypt the root partition, and thus still potentially leaking confidential data.

Therefore, I looked at how a standard fully encrypted Centos install worked and basically that is quite simple. The basic setup of an encrypted Centos install is to have a simple partitioning setup with one small physical partition (e.g. /dev/sda1) with /boot on it (using typically ext4), and a second partition /dev/sda2 which is encrypted. On top of the encrypted /dev/sda2 device (e.g. /dev/mapper/luks) the previous logical volumes are based. This approach requires no power managements hacks nor special mount options in /etc/fstab.
Continue reading

Posted in Uncategorized | Leave a comment

Encrypting an existing Centos install

Edit: Meanwhile I have found a better way to migrate an existing centos unencrypted install to a fully encrypted install with /boot as the only unencrypted disk space. This solution is much preferred over the one described in this post. The new approach is here.

Inspired by yet another incident in the news of a laptop with sensitive information getting stolen, you start imagining what would happen if someone would get hold of your laptop. How much sensitive data would be on it and what would the consequences be. A small investigation revealed that the consequences could be quite big. There are various personal document stored, version control systems and IDEs with insecure password storage and of course various browser history files and cookies. That made me a bit nervous.

Therefore, I set out the investigate how to make this laptop more secure. The setup is an extension of setups I found on the internet where typically LUKS over LVM or LVM over LUKS is used. The current setup will in effect be LVM over LUKS over LVM.
Continue reading

Posted in Devops/Linux, Java, Software | 1 Comment

Creating a USB install for Centos 6.4

The days of rotating disks for storing information and in particular for installing OSes are nearing their end. Why rely on something with rotating parts for storing data in the 21st century? Unfortunately, not every software vendor has caught up with this so in some cases special measures must be taken for installing an OS from a USB disk. One example of this is Centos/RHEL which does not come with a USB install by default. There is procedure from Red Hat that can be used, but that procedure is limited to starting an installation when you already have the installation media available somewhere (e.g. on a hard drive).

One common method to create such a USB install is to use the livecd-iso-to-disk script. Unfortunately that did not appear to work and I have tried it many times. After reading the interesting discussion on unix.stackexchange.com, I tried to give it another shot and this time it worked.

What I did was the following on a laptop running Centos 6.4:

  • Insert the USB stick: Find out the device name (e.g. using dmesg). Make sure the stick is unmounted as it could be automounted.
  • Partitioning: Make sure the disk is partitioned to contain one single primary partition (e.g. /dev/sdb1) using for example cfdisk. For now I will assume that /dev/sdb is the USB stick. Make sure to substitute this for the correct device in the next instructions.
  • File system: Create an ext3 filesystem on /dev/sdb1
    mkfs.ext3 /dev/sdb1

    I did not try ext2 and ext4 but these could also work. You can also optionally do a

    tune2fs -m0 /dev/sdb1

    to increase the available space by removing reserved blocks for the kernel (these are not needed anyway).

  • Install livecd tools: Install using yum:
    yum install livecd-tools
  • Transfer the ISO to the USB stick: Transfer disk 1 of the Centos 6.4 installation to the USB stick:
    livecd-iso-to-disk  CentOS-6.4-x86_64-bin-DVD1.iso  /dev/sdb1

    Note that it is important to specify /dev/sdb1 here and not /dev/sdb.

     

Testing

After this step, the USB stick can be tested locally using qemu-kvm.

To simply verify the the USB stick is found and the boot menu is recognized, bootup a virtual machine with only the USB disk:

/usr/libexec/qemu-kvm -hda /dev/sdb -m 256 -vga std

And use a VNC viewer (e.g. vncviewer from tigervnc) to view the VM. This should show a boot menu and should allow you to start the installation until the point that the installation procedure cannot continue anymore.

If you want to test a full installation, create a disk using logical volume management

lvcreate -L 10g -n bladibla vg_mylaptop

where vg_mylaptop is a volume group where you have at least 10GB of space left, and start qemu-kvm with the created logical volume as disk hdb and give it a bit more memory:

/usr/libexec/qemu-kvm -boot c -hda /dev/sdb -hdb /dev/vg_mylaptop/bladibla -m 2048 -vga std

After the install is completed, start the VM again without the USB stick

/usr/libexec/qemu-kvm -boot c  -hda /dev/vg_mylaptop/bladibla -m 2048 -vga std

The VM should now start up successfully. The USB boot stick is also recognized by my laptop natively and I it looks like I can install a full OS also there (at least the upgrade, which did nothing of course in my case, worked completely).

Disclaimer: As mentioned in the discussion at the link above, the whole procedure might give different results based on the USB stick you might use. I tested this procedure on a Dell Latitude M4700 laptop using a Kingston GT160 8GB memory stick.

Posted in Devops/Linux, Software | 4 Comments

Java from the trenches: improving reliability

Java and the JVM are great things. In contrast to writing native code, making a mistake in your Java code will not (or should not) crash the virtual machine. However, in my new position working for a SAAS company I have been closer to production systems then ever before and in the short time I have been there I have already gained a lot of experience with the JVM. In any case, crashes and hangs occur but there is something we can do about it. These experiences are based on running Java 1.6 update 29 on Centos 6.2 and RHEL 6 as well as windows server 2003.

Java Service Wrapper

To start of with, I would like to recommend the Java Service Wrapper. This is a great little piece of software which allows you to run Java as a service with both a Windows and Linux implementation. The service wrapper monitors your java process and restarts it when it crashes or restarts it explicitly when it appears hung. The documentation is excellent and it works as advertized. It has given us no problems at all apart from tweaking the timeout to consider a java process hung.

The service wrapper writes its own log file but we found that it contained also every log statement written by the application. The cause of this turned out to be the ConsoleLogger of java.util.Logging which was still enabled. This problem was easily solved by setting the handler property empty in jre/lib/logging.properties

handler=
#handlers= java.util.logging.ConsoleHandler

This also solved a performance problem whereby  due to a bug in the application, excessive logging was being done and the java service wrapper simply could not keep up anymore.

With a default JRE logging configuration, the logging output can also be disabled by setting the following properties in the wrapper.conf file:

wrapper.syslog.loglevel=NONE
wrapper.console.loglevel=NONE
wrapper.logfile.loglevel=STATUS
wrapper.java.command.loglevel=STATUS

Of course, with the console logging turned off, it should be possible to remove the wrapper.console.loglevel setting (not tried yet).

Garbage collection

Since we would like to achieve low response time and minimize server freezes due to garbage collection, we settled on the CMS (Concurrent Mark and Sweep) garbage collector.

Using the CMS collector we found one important issue where on windows, the server would run perfectly but on linux it would become unresponsive after just a couple of hours traffic. The cause was quickly found to be permgen space. It turns out that garbage collection behavior on windows differed from linux. In particular, garbage collection of the permgen space was being done on windows but not on linux. After hours and hours of searching, we found this option that fixed this behavior:

-XX:+CMSClassUnloadingEnabled

The full list of options we use for garbage collection is now as follows:

-XX:+UseConcMarkSweepGC
-XX:+ExplicitGCInvokesConcurrent
-XX:+CMSClassUnloadingEnabled
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-verbose:gc
-Xloggc:/var/log/gc.log

The last four options are for garbage collection logging which is useful for troubleshooting potential garbage collection issues after the fact.

One of the issues with the above configuration is that upon restart of the JVM, the garbage collection log file is overwritten instead of being appended to, thereby losing information when the JVM crashes. This problem can be worked around by using a ‘tail -F gc.log > gc.log.all’ command, but this solution is not nice as it will create very large log files. An optimal solution would be if the JVM would cooperate with standard facillities on linux such as logrotate. Similar to how, for instance, apache handles logging, the JVM could simply close the gc.log file when it receives a signal and then reopen it again. That would be sufficient for logrotate to work. Unfortunately, this is not yet implemented in the JVM as far as I can tell.

Crashes in libzip.so or zip.dll

It turns out that this problem can occur when a zip file is being overwritten while it is being read. The causes of this could be in the application of course, but still the JVM should not crash on this. It appears to be a known issue which was fixed in 6u21-rev-b09, but the solution for this is disabled by default.

If you set the system property

-Dsun.zip.disableMemoryMapping=true

then memory mapped IO will no longer occur for zip files which solves this issue. This system property only works on linux and solaris, and not on windows. Luckily a colleague found this solution. It is very difficult to find this setting on the internet, which is full of stories about crashes in the zip library, even if you know what you are looking for.

Crashes in networking libraries/general native code

Another issue we ran into were occasional crashes, mostly in networking libraries’ native code. This also appears to be a known issue with 64 bit JVMs. The cause of this is that there is insufficient stack space left for native code to execute.

How it works is as follows. First of all, the java virtual machine uses a fixed size for the stack of a thread. This size can be specified with the -Xss option if needed. While executing java code, the JVM can figure out whether there is enough space to execute the call and throw a StackOverflowError if there’s not. However, with native code, the JVM cannot do that so in that case it checks whether a minimum space is left for the native code. The minimum space is configured using the StackShadowPages option. It turns out that by default, this space is configured too low on older 64 bit JVMs, causing crashes in for instance socket libraries (e.g. when database access is being done). See for instance here. In particular, on JDK 1.6 update 29, the default value is 6 and on JDK 1.7 update 5 it is 20.

Therefore, a good setting of this flag is to use 20

-XX:StackShadowSize=20

The size of 1 page is 4096 bytes so increasing the stack shadow pages from 6 to 20 would mean that you need 56KB additional stack size. This page size can be verified by running java with a low stack size and passing different values for stack shadow pages like this:

erik@pelican> java -Xss128k -XX:StackShadowPages=19 -version
The stack size specified is too small, Specify at least 156

The stack size per thread may be important on memory constrained systems. For instance, with a stack size of 512KB a 1000 threads would consume about 500MB of memory. This may be important for smaller systems (especially 32 bit if these are still around), but are no issue at all for a modern server.

Debug JVM options

To find out what the final (internal) settings are for the JVM, execute:

java -XX:+PrintFlagsFinal <myadditionalflags> -version

Logging

If your environment still uses log4j for some reason then be aware that log4j synchronizes your entire application. We found an issue where an exception with a huge message string and stack trace was being logged. The toString() method of the exception in this case took about one minute during which time the entire application froze. To reduce these synchronization issues of log4j use AsyncAppender and specify a larger buffer size (128 is default) and set blocking to false. The async appender may have some overhead in single-threaded scenarios, but for a server application it is certainly recommended.

Posted in Devops/Linux, Java, Software | 1 Comment

Why do developers write instead of reuse?

I am frequently amazed at the amount of software that is being written instead of simply looking around and reusing what’s already available. In practice I have seen a lot of reasons for this:

  • Our problems are unique: The misconception that “our problems are unique”. I really can’t recall how many times I have seen this but this is really occurring a lot.
  • Not looking for similar solutions: Simply forgetting to look for similar solutions on the internet to see what’s available (if only as an inspiration on how to best solve the problem). This is often also a side effect of thinking that this is a unique problem.
  • Underestimation of the problem: The misconception that it’s easy to write it yourself. In most cases, it is easy to come up with a first (half) working version that does approximately what you need. However, the work involved in making the same solution maintainable and with the correct feature set will make it much more expensive (the 80-20% rule).
  • Limited scope: A developer specialized in platform X (e.g. X = java) will typically only look for solutions in that area, whereas looking broader will reveal more solutions.
  • Coolness factor: It is cool to develop it yourself. Perhaps it involves an opportunity to do something cool with clustering or another chance to use one of your favorite frameworks. Perhaps you could use one of those cloud databases?
  • Overestimation of oneself: The idea that we can do something better in a few weeks time than what the industry or open source community has come up with using man years of development.
  • The desire for fame by writing reusable software: Paradoxically, the desire for reusable software can stimulate to roll your own. The problem is that writing reusable software (or calling it reusable) provides you with fame (even if it’s only in your local department). The reality is however that reuse can only exist through the willingness of people to use other people’s software. If there is one developer writing a reusable piece of software and 20 others using it, then clearly the willingness to use other’s software far outweighs writing it yourself.

I have seen these problems in companies of all sizes.

Posted in Software | 2 Comments