Kubernetes Network Policy Tester

As mentioned in my previous post I would follow up with a tool for testing network policies. This tool is finally here and is available on github. It is written in python an available through install on PyPi.

The idea behind the tool is to ‘instrument’ pods by adding a debug container to them, and then from this debug container to do network checks. Then ‘all’ that remains is creating a input configuration file, parsing and validating it, running the tests, printing test results, etc.

All in all, most of the work went into finding out how to add a debug container to a pod using the kubernetes python API. This resulted in an issue to which I found a workaround myself.

Then most of the work went into validating the input using the python cerberus package together with my own validation checks on top. The rest was actually quite straightforward.

Posted in Devops/Linux | Leave a comment

Securing network communication on kubernetes using network policies

An often overlooked topic in kubernetes is network security. This is probably caused by the fact that perople just assume it is secure by default because it is new, and because of the Service concept. The standard way to expose a service running in a pod is to create a Service for it and then reference the service from the consuming pod using its name. The advantage of this is that the service has a DNS name and also provides a fixed IP towards consumers, regardless of how many pods are running for this service or whether pods get deleted and created. However, direct communication between pods is possible by default.

This exposes a serious security issue if this network access is not restricted. This is because it would allow lateral movement by a hacker from one pod to the next since all ports are open. In particular, also access to the kubernetes API service is not restricted and pods, by default (!), also contain the mounted secrets for the service account.

There is even a standard feature in kubernetes to query all services and their ports in a cluster. Just start a container that has dig and do the appropriate SRV dns request:

> kubectl run --rm -it --image tutum/dnsutils -- bash
If you don't see a command prompt, try pressing enter.
root@bash:/# dig -t SRV any.any.svc.cluster.local

; <<>> DiG 9.9.5-3ubuntu0.2-Ubuntu <<>> -t SRV any.any.svc.cluster.local
...snip...

;; ANSWER SECTION:
any.any.svc.cluster.local. 30   IN      SRV     0 4 9402 cert-manager.cert-manager.svc.cluster.local.
any.any.svc.cluster.local. 30   IN      SRV     0 4 80 httpd-reverse-proxy-mountainhoppers-nl.exposure.svc.cluster.local.
any.any.svc.cluster.local. 30   IN      SRV     0 4 80 kubernetes-dashboard.kubernetes-dashboard.svc.cluster.local.
any.any.svc.cluster.local. 30   IN      SRV     0 4 80 nginx-nginx-ingress.nginx.svc.cluster.local.
...snip...

Now, isn’t this a warm welcome for hackers? A full list of all running services,namespaces, and their ports. The ideal starting point for trying to hack a system.

 

Testing network communication within the cluster.

Before we do anything, it is good to look around in an un-secured cluster. Let us try to contact the nginx pod from another pod. First, we check which ports are open in the nginx containers:

kubectl get pods -n nginx -o yaml

and we see that ports 80, 443, 9113, and 8081 are all open. We also need to get the IP address of one of the nginx pods:

kubectl get pods -n nginx -o wide

Next, we can use an nmap command to check for open TCP ports:

nmap "$SERVERIP" -p "$PORT" -Pn

A similar check can be done to check for UDP ports:

nmap "$SERVERIP" -p "$PORT" -sU

If the output contains the stat open for the port than we are sure that it is open.

Next, we need a temporary container to run the nmap command in and we have a simple automated check that is run from within the kubernetes cluster.

kubectl run --rm -it -n $NAMESPACE --labels=app=porttesting --image instrumentisto/nmap --restart=Never nmapclient --command -- \
       nmap "$SERVERIP" -p "$PORT" -Pn | 
grep "$PORT.*open " # note the space after 'open'

The above command starts a container in a given namespace with the label app=porttesting, using a docker image that has nmap. The label can be used to delete the pod in case it somehow keeps on running (kubectl delete pod -l app=podtesting -A). The restart=Never option is there to ensure that kubernetes is not continuously restarting the container, because it should only run this command once. The grep at the end finally executes on the local host, so not in the container and provides an exit status 0 if the port is open and non-zero if it is not. This is a nice way to obtain an automated test. Substituting various namespaces, we see that the nginx port can be accessed from everywhere. See also the remark on debug containers at the end to support testing in already running pods.

Network policies: short intro

Network policies can be used to restrict network access within a cluster. The following rules determine how network policies work:

  • By default network traffic is unrestricted. If at least one network policy matches a pod for egress, then network policies for egress that match the pod determine what egress traffic is allowed, similarly for ingress.
  • There are only allow rules, and no deny rules. Basically, network policies add up. Each network policy describes a form of allowed traffic. If there are multiple network policies that match a pod then all the traffic described in these network policies is allowed. This means that the order in which network policies are applied is irrelevant.
  • Network policies only apply to pods. If there is a Service that forwards port 80 to port 8080 of a pod, then, to allow traffic to service port 80, traffic to port 8080 of the pod must be allowed.

Basically, we want to achieve microsegmentation where only the required traffic is allowed.

Network policies can define ingress or egress rules or both and specify source pod selectors based on labels, and target addresses based on CIDRs, pod labels of pods in the same namespace, or namespace selectors describing all pods in a different namespace. For the latter, namespaces must be labeled because they are selected by label. For an ingress rule, access to a source pod is defined and for an egress rule, access to a target address is specified from a source pod. See network policy recipes for examples of network policies for specific cases.

The way to deny access is to define a default rule that allows nothing.

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: default-allow-nothing
  namespace: wamblee-org
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress

In the rule above, both egress and ingress traffic is specified using the policyTypes configuration and since no ports are defined, this network policy does not open any access. However, since all pods in the namespace are selected because of the podSelector, this means that it applies to all pods in the namespace and therefore, no ingress or egress traffic is allowed for all pods in the namespace. This also includes ingress from outside the cluster and from pods in other namespaces and egress to outside of the cluster and to pods in other namespaces.

This is a nice failsafe, since when refining network policies it may occur that as part of more specific pod selectors, some pods are not matched anymore which then inadvertently opens up access to these pods. This default rule prevents that.

This ‘allow nothing’ rule must be defined for all namespaces for which network access must be restricted since network policies are namespaced resources. Note that there is a policyTypes element. In general, it will work when you define a network policy and leave out the policyTypes. This is because the policy type is inferred. However, the rules are a bit weird as explained by the docs:

kubectl explain networkpolicy.spec.policyTypes

Without policyTypes, the policy types always include ingress and egress is only included if there are one or more egress rules. For network policies that include only Egress rules this is equivalent to policyTypes with only Egress if the Ingress traffic is by default already denied. This means you have to be careful when defining an egress rule for a pod for which you don’t want to have any ingress rules. This rule is weird because it always defaults to having an ingress policy type even if no ingress rule is specified. Things would be a lot clearer if policy type ingress would only be inferred if an ingress rule is defined.

The only case where you cannot leave out policy types is when you want to specify a default egress rule that allows no traffic. In that case there are no egress rules (or equivalently with egress: [] an empty array). In that case, you would only get policyTypes Ingress. This is why for the default rule, the policy types are specified.

Network policies: application

To apply network policies, we apply the default ‘allow nothing’ rule for every namespace so that we start from a principal of least privilege where nothing is allowed unless it is specifically allowed.

Next we need to look at the deployment picture below to see what network traffic is required:

PlantUML Syntax:<br />
allow_mixing</p>
<p>cloud “internet” as internet</p>
<p>package “namespace: kube-system” {<br />
object “kube-apiserver:Pod” as x<br />
}</p>
<p>package “namespace: nginx” {<br />
object “nginx:Pod” as nginx<br />
}</p>
<p>package “namespace: exposure” {<br />
object “brakkee-org-httpd:Pod” as brakkeepod<br />
object “wamblee-org-httpd:Pod” as wambleepod<br />
}</p>
<p>package “namespace: brakkee-org” {<br />
object “backend1:Pod” as backend1<br />
object “backend2:Pod” as backend2<br />
}</p>
<p>package “namespace: wamblee-org” {<br />
object “nexus:Pod” as backend3<br />
object “some_other:Pod” as backend4<br />
}</p>
<p>internet -> nginx<br />
backend3-> internet</p>
<p>nginx -down-> brakkeepod<br />
nginx -down-> wambleepod</p>
<p>brakkeepod -down-> backend1<br />
brakkeepod -down-> backend2</p>
<p>wambleepod -down-> backend3<br />
wambleepod -down-> backend4</p>
<p>

The deployment picture above is simplified, the services  and ingress resources are omitted since network policies revolve around pods only. The ingress pods proxy traffic to the apache servers for the two domains running in the exposure namespace. These apache servers in turn proxy to pods running in the namespaces for these respective domains. For details on this setup, see my earlier post.

From the picture we obtain the following network rules:

  • internet access
    • traffic from the internet is allowed to nginx
    • nexus is allowed to access the internet. This is required for proxying remote repositories although here more detailed rules can be defined for each remote repository
  • nginx to exposure: Nginx may access the two pods in the exposure namespace. This can be realized by allowing access from all pods in the nginx namespace with the label app: nginx-nginx-ingress.
  • nginx to the api-server: Nginx requires access to the API server to successfully startup.
  • wamblee-org pods may be accessed from the wamblee-org-httpd pod. This is done based on the label
    app:httpd-wamblee-org
  • brakkee-org pods may be accessed from the brakkee-org-httpd pod.
  • system namespaces such as kube-system, may not be accessed from any of the shown pods. This rule is in fact enabled by default. Since we started with the default rule, this access is not allowed and if we don’t allow this access explicitly, then no access is possible.

Since we have the default ‘allow-nothing’ rule, we must both allow ingress and egress traffic for every allowed network communication. To simplify the setup we define the global communication topology in two steps:

  • define Egress rules for access from one namespace to another to define the global communication topology
  • define Ingress rules for each workload in a namespace using podSelector and (optionally) namespaceSelector that precisely define the allowed communication.

Internet access

Internet access to nginx is allowed using the following rule:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-http-https
  namespace: nginx
spec:
  podSelector: {}
  ingress:
    - ports:
        - port: 80
        - port: 443
      from:
        - ipBlock:
            cidr: 0.0.0.0/0
            except:
              - 10.0.0.0/8
              - 172.16.0.0/12
              - 192.168.0.0/16

This defines a single rule that limits internet traffic to ports 80 and 443 from all public IP addresses by excluding local IP addresses.  The above network policy has a single rule, but if you accidentally add a dash before the from keyword then there are two rules allowing all egress traffic to ports 80 and 443 or to any port on any public IP address.

Similarly for outgoing traffic from nexus, but here we also need to allow DNS queries to access external repositories by name:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: external-internet-from-nexus
  namespace: wamblee-org
spec:
  podSelector:
    matchLabels:
      app: nexus-server
  egress:
    - ports:
        - port: 80
        - port: 443
      to:
        - ipBlock:
            cidr: 0.0.0.0/0
            except:
              - 10.0.0.0/8
              - 172.16.0.0/12
              - 192.168.0.0/16
    - ports:
        - port: 53
          protocol: UDP
        - port: 53
          protocol: TCP

API server access

Some pods may require access to the API server. One such example are the nginx pods that require this access for startup. There is no standardized way to do allow this access, however, therefore, we have to configure the IP CIDR of the api server (it listens on a node port of the controller) as well as the port. Usually, you know the address and port as part of defining the kubernetes cluster, but you also get it quickly using the following command which prints out the livenessProbe details of the API server pods:

> kubectl get pods -l component=kube-apiserver -o json | jq .items[].spec.containers[].livenessProbe.httpGet
{
  "host": "192.168.178.123",
  "path": "/livez",
  "port": 6443,
  "scheme": "HTTPS"
}

Next, we use an egress rule to allow access from nginx to the API server:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: nginx-allow-api-server-access
  namespace: nginx
spec:
  podSelector: {}
  policyTypes:
    - Egress
  egress:
    - to:
      - ipBlock:
          cidr: 192.168.178.123/32
      ports: 
        - port: 6443

Note that in my home setup, I am using a single controller node. Hence the /32 netmask.

Cross namespace access

For cross-namespace network policies we need to label the namespaces because these are selected by label. To select namespace we define a purpose label with values:

  • web: nginx namespace
  • exposure: the exposure namespaces
  • domain: the namespaces related to the brakkee.org and wamblee.org domains
  • system: any system namespace of kubernetes such as metallb-system, tigera-operator, calico-system, cert-manager, kube-node-lease, kube-system.

SImply use kubectl label –overwrite ns NAMESPACE app=VALUE to label a namespace.

We now have the following cross namespace rules for the exposure namespace allowing incoming traffic from nginx and outgoing traffic to the domain namespaces.

---
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-egress-to-exposure
  namespace: nginx
spec:
  podSelector: {}
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              purpose: exposure
---
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-http-https
  namespace: exposure
spec:
  podSelector: {}
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: nginx-nginx-ingress
          namespaceSelector:
            matchLabels:
              purpose: web
---
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-access-to-backends
  namespace: exposure
spec:
  podSelector: {}
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              purpose: domain

Next, we allow ingress access to Nexus to ports 8081 and 8082

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-nexus
  namespace: wamblee-org
spec:
  podSelector:
    matchLabels:
      app: nexus-server
  ingress:
    - ports:
        - port: 8081
        - port: 8082
      from:
        - podSelector:
            matchLabels:
              app: httpd-wamblee-org
          namespaceSelector:
            matchLabels:
              purpose: exposure

With this setup, we have achieved microsegmentation where precisely the traffic that is required is allowed.

Service access from the exposure namespace

To allow service access using service names we must allow DNS queries like so:

---
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-access-to-dns
  namespace: exposure
spec:
  podSelector: {}
  egress:
    - ports:
        - port: 53
          protocol: UDP
        - port: 53
          protocol: TCP

Other access

A final rule is required to allow access to a number of VMs since I am still migrating stuff to kubernetes and the migration is not complete. Therefore, sometimes, I am proxying traffic to specific VMs:

---
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-access-to-vms
  namespace: exposure
spec:
  podSelector: {}
  egress:
    - ports:
        - port: 80
        - port: 443
      to:
        - ipBlock:
            cidr: 192.168.178.123/32 # VM 1
        - ipBlock:
            cidr: 192.168.178.124/32 # VM 2

Final thoughts

Kubernetes by default allows unlimited network communication between pods. Traffic is only restricted by default when there is a network policy that matches a specific pod for ingress or pod for egress. This is unfortunate since it violates the principle of least privilege and is probably a left over of earlier kubernetes versions before network policies existed. The current way of working is then downwards compatible with these previous versions. I would like to see future kubernetes versions where default ‘allow nothing’ network policies would be active for new namespaces with additional network policies for the system namespaces.

There is certainly some redundancy in the rules for configuring similar things such as DNS access and internet access. This is screaming for a templating solution such as helm or jsonnet, or other tools (?). Also, there are tools available for enforcing certain policies in a kubernetes cluster (such as default ‘allow nothing’ rules).

In my experience, working with network policies is reliable and fast with the Calico network provider (I did not try any others). I refrained from using Calico-specific configurations, even if these are better (i.e. cluster wide policies). This way I can remain flexible w.r.t. the network provider I will use. Also, some cloud environments limit your choices or do not allow explicitly configuring a custom network provider, so in this way, anything I use is portable. Hopefully/no doubt, functionality from Calico and other network providers will trickle down into the standard APIs.

Finally, there is new functionality in kubernetes that allows containers to be added to a running pod (ephemeral containers). This functionality allows for testing network communication for already running pods. What I am aiming at here is a tool that allows testing of network policies using these debug containers, based on a compactly defined set of test rules. Such a tool would also allow the development of network policies. This is what I will be writing on in a future post.

Posted in Devops/Linux | Leave a comment

Basic kubernetes infrastructure: RPM and container repo

As part of migrating all the stuff I have from virtual machines to a kubernetes infrastructure, some important pieces of infrastructure are needed. These are:

  • RPM repository: I use custom RPM repositories for setting up virtual machines. These same RPMs are used for building container images that are required by kubernetes
  • Docker repository: Custom docker images may be required for kubernetes

Since I want to run everything at home and make minimal use of internet services for my setup, I need to deploy solutions for this on my kubernetes cluster. Currently, I already use an RPM repository based on Nexus 2. In the mean while, a lot has happened. For instance Nexus 3 now natively supports RPM repositories and it also supports docker repositories. Therefore, as part of the setup, I need to run Nexus 3 and move all my RPM artifacts over from Nexus 2 to Nexus 3.

Deploying nexus

Nexus will be running in the wamblee-org namespace on kubernetes and an apache reverse proxy in the exposure namespace will be used for exposing it. See my earlier post for details about this setup. Note that that post is specifically about GKE but the main ideas apply here as well.

For deploying Nexus 3, the instructions for the Nexus 3 docker image can be used. When deploying the Nexus service, this will require a Service and a StatefulSet for deploying Nexus. A StatefulSet is more appropriate here than a Deployment since Nexus is a stateful service. This means that each pod of the StatefulSet will have its own unique storage. In my deployment I will use a replica count of 1 since I am running Nexus 3 OSS and high availability is not of concern for my home kubernetes setup.

First of all, let’s look at the StatefulSet:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: nexus
  namespace: wamblee-org
spec:
  serviceName: nexus
  replicas: 1
  selector:
    matchLabels:
      app: nexus-server
  template:
    metadata:
      labels:
        app: nexus-server
    spec:
      containers:
        - name: nexus
          image: sonatype/nexus3:3.40.1
          resources:
            limits:
              memory: "4Gi"
              cpu: "10000m"
            requests:
              memory: "2Gi"
              cpu: "500m"
          ports:
            - containerPort: 8081
            - containerPort: 8082
          volumeMounts:
            - name: nexus-data
              mountPath: /nexus-data
  volumeClaimTemplates:
    - metadata:
        name: nexus-data
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi

There are several things to remark:

  • Replica count is 1. This is not a high available Nexus 3 deployment
  • It uses the standard Nexus 3 image with a fixed version so that we are guaranteed not to get surprise upgrades.
  • Two container ports are exposed: 8081 and 8082. The first port is dedicated to the web interface and interface used for maven artifacts and RPMs. The second port is specifically for Docker. Each hosted docker repository will have its own unique port. This is required by nexus. When exposing these repos externally, different host names will be used for port 8081 and port 8082 respectively.
  • I am running in the wamblee-org namespace which is the namespace where I am hosting everything for the wamblee.org domain.
  • A separate PersistentVolumeClaim is used for Nexus 3. This is better than an emptyDir because it will allow us to delete and reinstall Nexus 3 without losing data.

Volumes

The principle I am trying to follow here is to know exactly where my data is so that I can lose a kubernetes cluster due to (my own) error, but never lose the data and always able to setup everything from scratch again. I also go that far not to use storage classes and provisioners in my setup and in practice use labeled nodes and host path volumes, tying the storage explicitly to a specific directory on a specific node.

First of all I am labeling one node where I want the volume to be:

kubectl label node weasel wamblee/type=production

Next, I am using the following volume definitions:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nexus-data
  labels:
    type: local
    app: nexus
spec:
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: wamblee/type
          operator: In
          values:
          - production
  claimRef:
    name: nexus-data-nexus-0
    namespace: wamblee-org
  persistentVolumeReclaimPolicy: Retain
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/data/nexus"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nexus-data-nexus-0
  namespace: wamblee-org
spec:
  storageClassName: ""
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

Through the nodeAffinity construct I am tying the volume to a specific node. Note the name of the PersistentVolumeClaim which is determined by the StatefulSet name and the PersistentVolumeClaim in the stateful set definition together with the sequence number of the pod in the stateful set.

Services

To expose Nexus3 we need two services:

apiVersion: v1
kind: Service
metadata:
  name: nexus
  namespace: wamblee-org
spec:
  selector: 
    app: nexus-server
  type: ClusterIP
  ports:
    - port: 8081
      targetPort: 8081
---
apiVersion: v1
kind: Service
metadata:
  name: nexus-docker
  namespace: wamblee-org
spec:
  selector: 
    app: nexus-server
  type: ClusterIP
  ports:
    - port: 8082
      targetPort: 8082

Here I am using one service for every port. The Service is of type ClusterIP to avoid direct access from outside the cluster.

Ingress

The service is finally exposed through an Apache server running in the exposure namespace using the following apache config:

<VirtualHost *:80>
  ServerName mynexus.wamblee.org # actual host name is different

  ProxyRequests off
  ProxyPreserveHost on
  AllowEncodedSlashes on

  ProxyPass / http://nexus.wamblee-org.svc.cluster.local:8081/ disablereuse=On
  ProxyPassReverse / http://nexus.wamblee-org.svc.cluster.local:8081/
</VirtualHost>

<VirtualHost *:80>
  ServerName mydockerrepo.wamblee.org # actual host name is different 

  ProxyRequests off
  ProxyPreserveHost on
  AllowEncodedSlashes on

  ProxyPass / http://nexus-docker.wamblee-org.svc.cluster.local:8082/ disablereuse=On
  ProxyPassReverse / http://nexus-docker.wamblee-org.svc.cluster.local:8082/
</VirtualHost>

Note that the apache configuration uses the DNS local service name of Nexus. The Ingress rule is already defined, see earlier post, and provides SSL termination with automatic certificate management.

One thing that is important to know is that if you are deleting the Nexus services and deploying them again, then the backend services will get a new IP address. In the mean time, Apache will by default cache the DNS lookup of the services during the lifetime of the worker. As a result, Apache may never pickup the changes. One quick fix is to kubectl into the httpd container and do a apachectl graceful, forcing a reload of the workers. Another option is to set disablereuse=On which disables caching of connections to the backend services. That way, changes are picked up immediately. However, I wouldn’t use that in any serious production setup, but for home use it is ok. For production a different setting like  MaxConnectionsPerChild 100 would be better, forcing the recycling of workers after a small number of requests, or triggering the apache reload.

Nexus docker setup

There are some details for setting up a hosted docker registry on Nexus that are important to setup in the Nexus admin interface:

  • In Security/Realms, enable the “Docker bearer token realm”. Without this you can never authenticate
  • For configuring read/write access for  user you must create a user with the nx-repository-view-*-* role.
  • A separate port (in this example 8082) must be configured for the hosted docker repository where it will listen on. Each hosted docker repository on Nexus must have its unique port.

After this setup, you can check the installation using docker login, tagging an image with your docker repo hostname and pushing it. Also, verify pulling an image on Kubernetes by configuring a registry credentials secret and using that in a pod definition to run a pod.

Nexus RPM setup

To setup RPM for RPM repositories, create a hosted Yum repository. I am using repodata depth 0, layout policy permissive, and strict content type validation off.  This way, I can continue to use the rpm-maven-plugin to build RPMs together with the maven-release-plugin to publish RPMs to Nexus. Deployed RPMs will become available for use within minutes after deploy. Using repodepth of 0, I have no restrictions on path names and will effectively host a single YUM repository in the hosted YUM repository.

To migrate RPM artifacts from the old Nexus 2 to Nexus 3, I simply change directory to the RPM repository on the webserver where it is currently hosted and push each RPM to nexus 3 using a simple script which is executed from the directory of the RPM repo:

#!/bin/bash

nexushost=mynexus.wamblee.org
repo=rpms
subdir=nexus2
userpwd="USER:PWD"

while read rpm
do
  echo $rpm
  name="$( basename "$rpm" .rpm )"
  echo "  $name"
  echo "Deploying $rpm"
  curl -v --user "$userpwd" --upload-file "$rpm" https://$nexushost/repository/$repo/$subdir/$name.rpm
done < <( find . -name '*.rpm' )

I remember using a more advanced migration from Nexus 2 to Nexus 3 at work where we had a lot more data, downtime requirements, and also a lot of java artifacts in Nexus 2. That procedure was quite complex. In the current case however,  where the amount of data is small and downtime requirements non-existent, this simple approach is the way to go.

For client side yum repo configuration, username and password can be encoded in the repo file as follows:

baseurl=https://USER:PWD@mynexus.wamblee.org/repository/rpms

or on more modern systems using

username=USER
password=PWD
baseurl=https://mynexus.wamblee.org/repository/rpms
Posted in Devops/Linux | Leave a comment

Automatic certificate renewal with Let’s Encrypt and DnsMadeEasy on Kubernetes

These days, it is ill-advised to run a website (such as this one), over HTTP, even if there is no security risk at all. When hosting your website on HTTP, users will see a warning triangle in the address bar and many users will simply turn away. This is especially painful if the website is blogging on devops and security related things.

Currently, I am migrating everything I have locally to a more secure and future proof setup from VMs to kubernetes. The approach I decided to take is to start with the front-facing services and move everything from there step by step. Therefore, the first step is to move the reverse proxy from a VM to kubernetes. Since I don’t want to pay for certificates, I will be using Let’s Encrypt. Unfortunately, Let’s Encrypt only supports 90 day certificates so this means a lot of certificate renewals.

The ACME protocol

To support this Let’s Encrypt supports the ACME protocol, which allows for automatic certificate renewal. The ACME protocol automates the complete process. A certificate renewal process looks like this:

  • create a new private key
  • create a certificate signing request using the private key
  • request a certificate from the CA (Certificate Authority)
  • the CA asks for verification of ownership of the domain either using a HTTP challenge for a single domain or a DNS challenge for a wildcard domain. The HTTP challenges amounts to putting a certain file with the request content (the challenge) in the URL space of the domain. The DNS challenge typically asks for a TXT record to be created in the DNS for the domain with the requested content. If someone is able to do that then ownership is confirmed
  • confirm to the HTTP or DNS challenge by creating the appropriate file or DNS record respectively
  • notify the CA that the challenge was answered
  • the CA checks the challenge
  • the CA issues the certificate after the challenge is verified.

Automatic renewal: cert-manager

There are various tools for automatically certificates updates based on the ACME protocol, but most of them are for VMs and have options for directly updating apache of nginx configuration files. Imagine what kind of horror this could be if such a script would corrupt your special/home-grown configuration. On kubernetes things are a lot more clean using cert-manager.

Cert-manager introduces a number of concepts:

  • Certificate: A custom resource describing a certificate. This defines the names (DNS alternative names) that will appear on the certificate as well as the name of the TLS secret that will be created and which is used by ingress.
  • Issuer: The object that is responsible for using the ACME protocol to contact Let’s Encrypt. It uses a webhook for the specific DNS provider. The task of the web hook is to create and remove the challenge records as required by Let’s Encrypt. In my setup I am using the DNS challenge since I want to get wildcard certificates.
  • Webhook: A custom webhook that is used by an issuer to comply to the challenges from the CA.

Installation

Install cert-manager by following the instructions on the website. In my case, I did the following:

helm repo add jetstack https://charts.jetstack.io
helm repo updatehelm install \
  cert-manager jetstack/cert-manager \
    --namespace cert-manager \
    --create-namespace \
    --version v1.8.0 \
    --set installCRDs=true

After this, I used cmctl to verify the installation

OS=$(go env GOOS); ARCH=$(go env GOARCH); curl -sSL -o cmctl.tar.gz https://github.com/cert-manager/cert-manager/releases/download/v1.7.2/cmctl-$OS
-$ARCH.tar.gz
tar xvf cmctl.tar.gz
cmctl check api

The cert-manager website contains  more instructions to verify the installation using a self-signed certificate. It is recommended to follow these instructions to make sure that everything works.

Next up is the installation of the DNS madeeasy webhook.

helm repo add k8s-at-home https://k8s-at-home.com/charts/
helm repo updatehelm install dnschallenger k8s-at-home/dnsmadeeasy-webhook --namespace cert-manager -f values-dnsmadeeasychallenger.yaml 

The values-dnsmadeeasychallenger.yaml configuration file is used to define the groupName which is used by the issuer to identify the webhook:

values-dnsmadeeaychallenger.yaml

# This name will need to be referenced in each Issuer's `webhook` stanza to
# inform cert-manager of where to send ChallengePayload resources in order to
# solve the DNS01 challenge.
# This group name should be **unique**, hence using your own company's domain
# here is recommended.
groupName: dnsmadeeasy.challenger

 

Note that I installed the webhook in the same namespace as cert-manager since it is an integral part of certificate management and belongs in the same namespace.

Configuration

We need to define some essential information for everything to work:

  • the DNS alternative names of the certificate. These are configured in the Certificate resource.
  • the name of the TLS secret that will contain the generated certificates. This is configured in the Certificate resource.
  • the API key and secret to be able to use the DnsMadeEasy API. This is a separate Secret resource.

Examples for brakkee.org are as follows:

brakkee-org-certificate.yaml

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: wildcard-brakkee-org
  namespace: exposure
spec:
  dnsNames:
  - "*.brakkee.org"
  - "brakkee.org"
  issuerRef:
    name: dnsmadeeasy-issuer
    kind: Issuer
  secretName: brakkee-org-tls-secret

Note that I am defining the root domain brakkee.org as a separate name. This is because the wildcard comain *.brakkee.org only covers subdomains of brakkee.org but not the root domain brakkee.org.

dnsmadeeasy-apikey.yaml

apiVersion: v1
kind: Secret
metadata:
  name: dnsmadeeasy-apikey
  namespace: cert-manager
type: Opaque
stringData:
  key: your_key_here
  secret: your_secret_here

Apart from this, we need to configure the issuer to use the webhook and to pass the DnsMadeEasy API key and secret on to the webhook to comply to the challenge:

dnsmadeeasy-issuer.yaml

apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: dnsmadeeasy-issuer
  namespace: exposure
spec:
  acme:
    #server: https://acme-staging-v02.api.letsencrypt.org/directory
    server: https://acme-v02.api.letsencrypt.org/directory
    email: info@brakkee.org
    privateKeySecretRef:
      name: hosting-key-secret
    solvers:
    - dns01:
        webhook:
          groupName: dnsmadeeasy.challenger
          solverName: dnsmadeeasy
          config:
            apiKeyRef:
              name: dnsmadeeasy-apikey
              key: key
            apiSecretRef:
              name: dnsmadeeasy-apikey
              key: secret

In the above file, note the server attribute where you can use the staging URL for Let’s Encrypt if you want to test. This is useful because of the strict rate limits on the Let’s Encrypt API. Note the groupName  which links back to the webhook we installed earlier. Also note the reference to the API key sedret to pass on configuration values to the web hook. The private key is stored in a separate secret (hosting-key-secret). This secret is created by the Issuer.

After applying the above resources, first a temporary secret will be created in the exposure namespace. After it is finished, you should have a new TLS secret brakkee-org-tls-secret in the exposure namespace that can be used in an ingress rule. During the process you can do a kubectl describe on the certificate resource to see the progress.

Note that I am creating all these resources in a separate exposure namespace. This is the namespace where I will have all SSL termination using ingress. Backend applications will typically be running in different namespaces. This will turn out to be important in a future post where I will go into setting up network policies to improve security.

Using the TLS secret in an ingress rule

Using the TLS secret in an ingress rule is straightforward. In my case I am forwarding all traffic for brakkee.org and *.brakkee.org to the same HTTPD backend service. For example:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ingress-brakkee-org
  namespace: exposure
spec: 
  tls: 
  - hosts:
    - "*.brakkee.org"
    - "brakkee.org"
    secretName: brakkee-org-tls-secret
  rules:
  - host: "brakkee.org"
    http: &proxy_rules
      paths:
      - path: /
        pathType: Prefix 
        backend:
          service:
            name: httpd-reverse-proxy-brakkee-org
            port: 
              number: 80
  - host: "*.brakkee.org"
    http: *proxy_rules

A nice trick here is to use a YAML anchor (&proxy_rules). This allows me to avoid duplicating the same rules for brakkee.org and *.brakkee.org. The backend service httpd-reverse-proxy-brakkee-org is not shown in this post but it is not too hard to adapt the example to use your own backend service. It is easy to add more domains with their own certificates simply by adding hosts entries to the tls section and by adding rules.

Certificate renewal

I have tested certificate renewal by using the spec.renewBefore field in the Certificate to force earlier renewal. For example:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: wildcard-brakkee-org
  namespace: exposure
spec:
  renewBefore: 2136h
  ...

It is not possible to request a shorter duration of the certificate since Let’s Encrypt always generates 90 day certificates. Just choose renewBefore counting back from the 90 day expiry time of the certificate.

Force regeneration of the certificate

To force regeneration of the secret you can just delete the generated secret and it will get generated again. The same applies to the host key secret.

Rate limits

If you run into a rate limit at Let’s Encrypt, note that the rate limit applies to the list of requested DNS alternative names (*.brakkee.org, brakkee.org in my case). The current rate limit is 5 times per week for each unique combination of DNS alternative names. If you run into this limit, then temporarily add another DNS alternative name such as x.y.brakkee.org to the list. This name does not fall under *.brakkee.org. Using this it is easy to work around the limit. It is of course always better to use the staging URL of Let’s Encrypt before you start using tricks.

Costs

To use the DnsMadeEasy APIs I had to upgrade my membership to a business membership which costs 75 USD per year. But at least I am not manually updating certificates and not paying a similar amount per domain.

Posted in Devops/Linux | Leave a comment

Hosting services on Google Kubernetes Engine

This post explains how to host services on Google Compute Engine, parts of this are applicable to regular (non-GKE) Kubernetes clusters as well. This post will cover:

  • allowing multiple services to be deployed in different namespaces
  • allowing multiple (sub)domains to be deployed
  • making a service available through a public and fixed IP accessible using a host name
  • making it accessible over HTTPS using a certificate
  • HTTP redirect to HTTPS

The treatment of the subject is a bit high level with focus on what resources and fields are relevant and on the way different Kubernetes resources relate to each other. Also some snippets of YAML configuration are given. This should be sufficient to ‘roll your own’.

Deploying services in different namespaces

The basic architecture we start off with is as follows:

PlantUML Syntax:<br />
hide circle<br />
class Ingress {<br />
namespace<br />
}<br />
class Service {<br />
namespace<br />
}<br />
class Deployment {<br />
namespace<br />
}</p>
<p>Ingress –> “*” Service<br />
Service –> “1” Deployment</p>
<p>

In this basic setup,  there is one Ingress object in a certain namespace and each Service that it exposes is in a separate namespace together with its deployment. There is a limitation in Ingress that does not allow it to be used when Services are in a different namespace than the Ingress resource. This limitation is expected to be lifted in a future version of Kubernetes using the Gateway API.

One workaround is to deploy a separate Ingress in every Service namespace. However, this leads to other problems since than separate IPs must be used for each service.
A better solution is to use a separate ingress namespace in which to deploy a reverse proxy (in the example httpd is used, but nginx is of course also possible) and from there to setup a reverse proxy using the cluster local DNS name for the service. Given that a service x is deployed in namespace y, it can be accessed using the DNS name x.y.svc.cluster.local.

On apache, this can be a configuration such as this:

ProxyPass /hello http://X.Y.svc.cluster.local/hello
ProxyPassReverse /hello http://X.Y.svc.cluster.local/jenkins

The architecture for two services resp. x in namespace y and x2 in namespace y2 then becomes:

PlantUML Syntax:<br />
package “namespace: exposure” {<br />
object “expose:Ingress” as expose<br />
object “httpd:Service” as httpdservice<br />
object “httpd:Deployment” as httpddeploy<br />
object “httpd-config:ConfigMap” as httpdconfig<br />
}<br />
package “namespace: y” {<br />
object “x:Service” as xservice<br />
object “x:Deployment” as xdeploy<br />
}<br />
package “namespace: x2” {<br />
object “x2:Service” as x2service<br />
object “x2:Deployment” as x2deploy<br />
}<br />
expose –> httpdservice<br />
httpdservice -> httpddeploy<br />
httpddeploy “\nx.y.svc.cluster.local” –> xservice<br />
httpddeploy -> httpdconfig<br />
xservice -> xdeploy<br />
httpddeploy “\nx2.y2.svc.cluster.local” –> x2service<br />
x2service -> x2deploy</p>
<p>

This architecture is quite flexible. It allows SSL termination on a using a single Ingress rule and provides complete flexibility in allocating services to namespaces.

The apache deployment is configured using a config map that contains the httpd.conf file in a ConfigMap resource. The ConfigMap is mounted into the pod as follows:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpd
  namespace: exposure
  ...
spec:
  ...
  template:
    ...
    spec:
      containers:
        - name: httpd
          image: httpd:2.4
          ...
          volumeMounts:
            - name: httpd-conf
              mountPath: /usr/local/apache2/conf/httpd.conf
              subPath: httpd.conf
      volumes:
        - name: httpd-conf
          configMap: 
            name: httpd-config

Note the use of the subPath configuration in the deployment spec to mount only a single file from the httpd-conf volume, which keeps all other files in the /usr/local/apache2/conf/ directory intact. The config map can be created from an existing httpd.conf file as follows:

kubectl create configmap --namespace exposure \
  httpd-config --from-file=httpd.conf

The httpd.conf used is an adaptation of the file that comes standard with the container. Since we are terminating SSL using Ingress, the httpd config is without any SSL configuration or reference to certificate files.

Exposing the service externally

To expose the httpd service externally (and thereby all services for which it is a reverse proxy), we need to introduce the FrontendConfig resource. The FrontendConfig is a custom resource that is only available on GKE as far as I know and this is used to configure Ingress features.The SSL Policy is a GCP object that defines SSL features for the HTTPS connection.

PlantUML Syntax:<br />
package “Kubernetes” {<br />
object “expose:Ingress” as expose<br />
object “frontendconfig:FrontendConfig” as frontend<br />
}<br />
package “GCP” {<br />
object “gke-ingress-ssl-policy:SSLPolicy” as policy<br />
object “example-ip:IpAddress” as ipaddress<br />
object “example-com:PreSharedCertificate” as certificate<br />
}</p>
<p>expose -right-> frontend<br />
frontend -down-> policy<br />
expose -down-> ipaddress<br />
expose -down-> certificate<br />

Using this setup, we can configure:

  • the certificate used
  • HTTP to HTTPS redirect
  • the external IP address

It is most easy to look at the Ingress configuration:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: httpd-ingress
  annotations:
    #kubernetes.io/ingress.allow-http: "false"
    kubernetes.io/ingress.global-static-ip-name: example-ip
    ingress.gcp.kubernetes.io/pre-shared-cert: example-com
    networking.gke.io/v1beta1.FrontendConfig: frontendconfig
  namespace: exposure
spec:
  # tls config not needed since a pre-shared certificate
  # is used. 
  #tls:
  #  - hosts:
  #      - example.com
  #    secretName: example-com-certificates

  rules:
    - host: example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: httpd
                port:
                  number: 80

First of all, we see that all traffic is forwarded to httpd, which is ok since httpd distributes the traffic over backend services and as far as Ingress just routes to httpd. Next, we see several other features.

The annotation kubernetes.io/ingress.allow-http can be used to disable http but this is commented out since we allow http and will do a redirect to https instead using the FrontendConfig.

Next, the annotation kubernetes.io/ingress.global-static-ip-name defines the name of a previously created static IP address in GCP which is the static IP address under which the services are exposed.

The annotation ingress.gcp.kubernetes.io/pre-shared-cert defines the name of the pre-shared certificate that is used.

A pre-shared certificate is a certificate registered at GCP. Based on the domain name, GCP automatically chooses the certificate. This means that explicit configuration of the certificate using the tls section is not needed and this part is therefore commented out. It also allows the same certificate to be used by multiple GKE clusters and is thus lower maintenance than using the alternative setup by defining a secret per GKE cluster for the certificates.

To create the pre-shared certificate, concatenate the crt files of the certificate containing the certificate and certificate chain and put it into a single file as follows:

  cat example.com.crt example.com.2022.crt > full.crt

Next, create the pre-shared certificate from this file and the private key file:

gcloud compute ssl-certificates create example-com \
  --project example-project \
  --global \
  --certificate=full.crt \
  --private-key=example_2022.key

The final part is the networking.gke.io/v1beta1.FrontendConfig annotation which links the ingress resource to the frontend config.

The FrontendConfig, finally, is as follows:

apiVersion: networking.gke.io/v1beta1
kind: FrontendConfig
metadata:
  name: frontendconfig
  namespace: exposure
spec:
  redirectToHttps:
    enabled: true
  sslPolicy: gke-ingress-ssl-policy

In this configuration we see that the redirect from http to https is configured. Also, there is reference to an SSL policy which defines SSL features. For instance:

gcloud compute ssl-policies create gke-ingress-ssl-policy \
    --profile MODERN \
    --min-tls-version 1.2
Posted in Devops/Linux | Leave a comment

Optimal payment scheme for a Dutch Bank Savings Mortgage, using a Mixed Integer Linear Program

There is a special type of mortgage in the Netherlands (Bank Savings Mortgage or bankspaarhypotheek in Dutch), which I presume, does not exist anywhere else in the world. The construct is that you have a mortgage on your house for a certain amount, say 200000 €, and split it up into two parts:

  • one part with a loan (in this case 200000), where you pay only interest
  • a second part which is a savings account.

Usually both of these accounts have the same interest rates and you pay a fixed amount per year so that in the end the savings account can be used to pay off the loan. For the bank, this construct is exactly identical to having just one account with a loan that is paid off like an annuitary mortgage with a fixed amount per month.

The difference is in the way this type of mortgage is taxed. On the loan part, the interest is tax deductable, and the savings account is not taxed at all. Therefore in comparison with an annuitary mortgage, where the loan gets less over time (it is equivalent before taxes), there is more tax deduction. And this is what makes it an attractive type of mortgage.

With this type of mortage it is possible to pay off parts of the loan part or to add extra money to the savings account. When given a choice, adding a given amount to the savings account is a lot more attractive than using it to pay off the mortgage. Even though it is equivalent to the bank, putting it into the savings account leads to a higher tax deduction.

But of course, the government has put forward some rules to prevent people from profiting too much, and these make it challenging to determine an optimal scheme to pay of the mortgage to minimize total payments made. For this purpose, this post examines a simple Mixed Linear Integer Program (MILP). Continue reading

Posted in Data Science | Leave a comment

Setting up up a deep learning box

After doing a number of courses on machine learning I now have some overview of what is available and how it all works. So now it is getting time to start doing some work from start to finish myself. To do some of the more interesting things you definitely need access to a system with a good GPU and the systems I have at home are not really suitable for this:

  • My work laptop has an Nvidia GT Quadro 1000M which has compute capability 2.1
  • My private laptop has a GPU (Nvidia GT 330M) which has compute capability 1.2
  • My server which does not have a GPU and so has compute capability 0

On the other hand popular frameworks like Tensorflow require, as of this writing, compute capability 3.0. This effectively rules out the use of my private and work laptops.

As alternatives, I considered starting of in the cloud by using Google or Amazon GPU offerings. But then the workflow there is also always to first setup some work at home and then do the same in the cloud. Also, costs can add up quite quickly if you go that way. Then another alternative is to get a new laptop or a new PC with a fast GPU. That seems nice since it opens up some more opportunities for gaming as well, but then I am not really a gamer, and it also feels like a bit of a shame to get another laptop/PC when my current one is still working fine (A Sony Vaio F11 laptop with 1.6GHz CPU). My current laptop is running linux most of the time and really still performs quite well.

Then I started looking at another possibility which is to add a GPU to my server. In fact, this turns out to be possible since my server has a free PCIe 2.0 x16 slot. Looking on the internet it seems that PCIe 3.0 cards should work without problems in PCI 2.0 so that makes it possible. Also, there is a way to setup a VM on KVM so that GPU accelerated computing can be used inside a VM, see for instance server world. This is preferred over running natively. The idea will be to do a lot of (long running) experiments locally and if I really want to do something big ‘rent’ some capacity in the cloud.

To do all this, I first had to upgrade my server which is now running Centos 6 on the virtual host to Centos 7. Well “upgrade” is a big word. It involved installing Centos 7 side by side to Centos 6 and getting everything to work again. Now that part is done. The next step is to get a nice graphics card (e.g. Nvidia GTX 1080 or 1070) and set that up in the server.  This will be interesting.

Posted in Data Science, Devops/Linux, Fun, Software | Leave a comment

Why finalizers are really bad

It is more or less common knowledge that using finalize functions in java is bad. For one, you are depending on garbage collection for cleanup and there is no guarantee when the finalizer will be called. Further there is also no guarantee it will ever get called even if the appliction terminates nicely using System.exit().

However there is a far more important reason why finalizers are bad. Continue reading

Posted in Java | Leave a comment

Encrypting an existing Centos install (2)

In a previous post, I described how to encrypt an existing Centos install that approach was based on find out how LUKS worked and then creating a storage logical volume that was encrypted with then logical volumes on top of that to contain the original data. The main disadvantage of that approach was that it was not possible to encrypt the root partition, and thus still potentially leaking confidential data.

Therefore, I looked at how a standard fully encrypted Centos install worked and basically that is quite simple. The basic setup of an encrypted Centos install is to have a simple partitioning setup with one small physical partition (e.g. /dev/sda1) with /boot on it (using typically ext4), and a second partition /dev/sda2 which is encrypted. On top of the encrypted /dev/sda2 device (e.g. /dev/mapper/luks) the previous logical volumes are based. This approach requires no power managements hacks nor special mount options in /etc/fstab.
Continue reading

Posted in Uncategorized | Leave a comment

Encrypting an existing Centos install

Edit: Meanwhile I have found a better way to migrate an existing centos unencrypted install to a fully encrypted install with /boot as the only unencrypted disk space. This solution is much preferred over the one described in this post. The new approach is here.

Inspired by yet another incident in the news of a laptop with sensitive information getting stolen, you start imagining what would happen if someone would get hold of your laptop. How much sensitive data would be on it and what would the consequences be. A small investigation revealed that the consequences could be quite big. There are various personal document stored, version control systems and IDEs with insecure password storage and of course various browser history files and cookies. That made me a bit nervous.

Therefore, I set out the investigate how to make this laptop more secure. The setup is an extension of setups I found on the internet where typically LUKS over LVM or LVM over LUKS is used. The current setup will in effect be LVM over LUKS over LVM.
Continue reading

Posted in Devops/Linux, Java, Software | 1 Comment