In April 2024 I successfully passed the CKS certification exam, but compared to the Certified Kubernetes Application Developer and Certified Kubernetes Administrator this was the toughest one yet. Not because the exam is particularly hard. The questions were all in closed form, I guess so that automatic grading is possible, but the main difficulties are that:
- there are many new topics such as runtime security with Falco, Seccomp, Apparmor, and gvisor, as well as security scanning tools such as trivy and kube-bench.
- the time pressure during the exam is high. I experienced that first hand by not being able to finish 1 of the questions. To go fast you need to prepare things very well. In particular, I setup my own single node kubernetes cluster using Vagrant so I could test out many topics using a standard kubeadm setup.
- the auditing during the exam was the worst experience yet. The proctor took more than 30 minutes to release the exam so I could start. Additionally, the proctor interrupted me twice, I think, for no good reason that caused a huge interruption of my flow.
Why I took the exam
I already knew ghat as a result from studying for the CKAD and CKA exams, I became much quicker in optimal use of the command-line and online docs that really to this day helps me to get things done more quickly. Also, if there is something to be configured, like roles and rolebinding, and service accounts or other things it just seems easy now. In a way, it shifts boundaries. Having a good overview of what kubernetes really allows to make better design decisions and things that seemed daunting before have become easy now.
With the CKA in particular I got a much better understanding of how the different components of kubernetes work together. As a result of that it became a lot easier to rescue my home cluster in case of problems and became a lot more confident and succesful in fixing things. For both CKAD and CKA there was time pressure, but not as much as for CKS. With CKA for instance, I was finished in 1.5 hours, leaving 30 minutes for troubleshooting and fixing questions where I had doubts.
Then after finishing the CKA exam, I was so happy that I immediately bought the CKS exam, especially after getting a huge discount and getting the CKS exam for just 150 USD. Then, nothing happened, I did not study at all for it and met some people at a CNCF kubernetes day in December and talked about CKS. That reminded me again to take exam, so slowly over the course of january I started to study for the exam. The kodekloud course wasn’t that good in my opinion so I watched another course on youtube to get a more complete picture. After that a lot of practice using questions from kodekloud, killer shell exam preparation, and some exercises that I defined myself. All in all, a lot of preparation went into this.
Also, the CKS exam was my goal from the start because of the subjects covered, and CKA is a requirement for CKS. I am happy I achieved this goal now.
Tips and tricks
Below I will list my own tips and tricks. Many of the things here I are based on validation, and some essential checks not to get blocked right at the beginning. Also there are some speed tips that can be really useful. The tips and tricks are focused on the tools, not on the questions you may get on the exam.
Basic approach
My standard settings in .bashrc were:
export DRYRUN=”–dry-run=client -o yaml”
alias kls=’kubectl config get-contexts’
alias kns=’kubectl config set-context –current –namespace’
alias kctx=’kubectl config use-context’
Especially the kns macro I used a lot.
I created a separate directory for every question, named q1, q2, etc. If there was a question I needed to get back to I simply touched a file ~/q1.checkfinelresult (or something more specific). Make sure to backup input files so you can always go back if needed.
Be really quick in the use of the command line. Use kubectl as much as possible. Make use of kubectl api-resources and kubectl explain where needed. This is always faster than the online docs.
Copy snippets from the kubernetes documentation website to a local templates directory so you can reuse them. This is faster then looking them up a second time.
Use the killer shell practice exams provided with the CKS
Use these exams to get to know the exam environment. In particular cut and paste is important. In my case it was selecting items on the question using the left mouse (weird and unnatural), then pasting in a terminal using right-click paste. Cutting and pasting from firefox running in the virtual desktop is standard linux using the middle mouse for paste. Also, it cannot hurt to use firefox while preparing so that you are used to firefox at the exam.
Also check how you can reduce the font size because the default size is too big.
Use yamllint
Install yamllint when checking yaml after modifying files, especially in /etc/kubernetes/manifests. Install it using apt install yamllint -y. All errors of yamllint about spaces can be ignored. But yamllint showing duplicates can be a problem since a second tag overrides the first one. I am sure that this cost me some points at the CKA and CKAD exams.
Quick restarts
Use kubectl delete pod <pod> –now or kubectl delete pod <pod> –force –grace-period=0 to quickly delete a pod. No sense waiting for too long.
To restart the apiserver after a config change, you can either wait until it is restarted automatically or kill the apiserver process from the master and do a systemctl restart kubelet. Guess which one is (a lot) faster?
Investigating container processes on the host
To find out the pod name of a container process from the host, use either
nsenter -t <pid> -u hostname
or
cat /proc/<pid>/environ | strings | grep HOSTNAME
This identifies the hostname which is (usually) identical to the pod name.
Given a container, use
crictl inspect <containerid> | grep pid
to identify the pid of the main container process on the host. This is the first pid in the output.
The entire file system as a process sees it is at /proc/<pid>/root. This can be used to quickly check whether a given volume mount is already working. I.e. is my config file already visible at the correct location by the process that needs it.
Apiserver troubleshooting. Use crictl ps -a | grep apiserver to get the container id of the failed process. Use crictl logs <containerid> to get the logs of the failed startup.
Falco
First thing to find out with falco is to find out how it is running. In most cases it will be running as a systemd service named falco, at least in all the courses I have seen. However, installing falco yourself on a single-node cluster reveals that there are many ways to run falco using different services.
Tip 1: Identify what systemd service falco is using.
systemctl list-unit-files | grep falco
And identify the command line used to run falco.
This will identify the falco service that is actually running. Now use systemctl status falco-bpf (or whatever service was used) to find the path to the service file. From that service file get the command that is used to run falco, which can be useful later.
Tip 2: If you adapt rules use falco -V rulesfile.yaml to validate rules.
Here, the rules file can be any of the rules files in /etc/falco or /etc/falco/rules.d. Systemd somehow does not show errors at falco startup in a consistent way.
Tip 3: If you are asked to quickly identify containers, pods, or kubernetes namespaces add the -pk flag to the falco startup.
This allows you to use %container.info in output formatting, which prints out a lot of statistics about a container including kubernetes namespace and pod.
Tip 4: Use falco –list and don’t use the online documentation at falco.org/docs
This is easy, using the command line makes it fast. Also remember some of the important categories such as evt, proc, and k8s. Use falco –list | grep ‘^proc’ for instance to see all process formatting options.
Tip 5: When you need output in a certain format, add -p “:RULE %evt.time,%proc.cmdline,…” to the options
Using the -p option allows you to append the given output to every output rule. This is fast since it allows you to avoid editing rule files. The best approach for production would be to identify the rules that require modification, copy them into falco_rules.local.yaml and then edit their output fields. However this is slow. An advantage of prefixing the additional output with “:RULE ” is that is allows you to quickly filter out the existing rule text when finally saving the required output to a file. With tip 5, tip 2 is no longer needed of course.
Tip 6: Run falco in the foreground instead of as a service.
Stopping the falco service and running it by hand based on the command line identified from tip 1 and extending it based on tip 5 has many advantages: troubleshooting is quick since errors will be logged to the terminal as well as the rule output. Also, it becomes easy to just let it run for a given amount of time after which you copy/paste the output into a file. Then filter it to remove the original rule text. Note that you can also script running falco for some time but then output buffering can be an issue and you need to use stdbuf -oL to force line buffering.
All in all, these tips can save you a lot of time with this task. I went back from 18 minutes to around 5 when using tips 5 and 6.
Seccomp
Seccomp is relatively easy. The syntax for seccomp in a pod yaml is simple, just memorize it. Also memorize the base path of the kubelet which is /var/lib/kubelet/seccomp.
In addition, verify that seccomp is being used by process using grep -i seccomp /proc/<pid>/status. This should show 2 when a process is configured with a specific json profile.
Even better, use crictl inspect <container> | jq ‘.. | objects | .seccomp // empty’ to show the actual json profile in use by the container. Or use crictl inspect <container> | jq ‘.. | objects | .seccomp’ and ignore the null values. This provides a deeper validation than just using /proc/<pid>/status.
When looking at the logs issued by seccomp using journalctl -x | grep -i seccomp, map between system call codes and names using ausyscall.
Apparmor
Remember: for Seccomp we use the securityContext to configure it and for apparmor we use annotations. Apparmor is not that hard so memorize the annotation. I memorized it in parts, namely container.apparmor.security followed by beta.kubernetes.io followed by /<CONTAINER>, with value of either localhost/<PROFILE>, runtime/default, or unconfined. Note that <PROFILE> is the name of the profile as defined inside the apparmor file, not the name of the file.
To check your work use ps auxZ or for a process tree ps auxZ –forest. The Z flag causes the apparmor profile to be listed. Also know your tools such as apparmor_parser to load profiles, and the various aa-* commands.
If all else fails, then consult the documentation (at work I usually do it in the other way around) . Practice this a number of times in your own environment.
Image policy webhook
For image policy webhook test out the error behavior of kubernetes for when you make mistakes in the configuration. That way, you know that when the apiserver comes back up, that a number of things are already ok.
Here is the error behavior I found in kubernetes 1.29 for when the ImagePolicyWebhook is added to the enabled admission plugins
- image policy webhook configured: apiserver does not start and logs error
- image policy absent in config file: same
- no kube config: same
- no URL in kube ocnfig: same
- host not found in kube config: kubectl error ‘no such host’ when defaultAllow false, silent otherwise
- wrong URL to existing host: kubectl error ‘the server could not find the requested resource’
So what do you know when the apiserver runs? Final thing to check is if the apiserver logs that image policy webhook is enabled.
For troubleshooting, add the –v=8 flag to the apiserver. Then restart the apiserver and grep the logs (using crictl as before) using egrep -i ‘imagepolicy|<HOST>’ where <HOST> is the hostname specified in the kube config used by the webhook. You should see the admission review, URL of the image policy webhook, and admission review response in the logs.
Use your own single-node kubeadm cluster with vagrant
Using vagrant it is easy to setup a cluster. Using a single node kubeadm cluster allows you to tryout anything you want.
See here for the vagrant setup. I used this mostly on linux with libvirt but also on windows with virtualbox. I used vagrant snapshot save base to create a snapshot after the cluster is running and restore it using vagrant snapshot restore base.
The setup uses ubuntu 20.04 similar to the exam but I used it also with debian 12.
Final thoughts
I hoped these tips will help someone during the exam. Because of the time pressure in the exam, I recommend doing validation if it can be done quickly and otherwise move on to the next question. This is particularly an issue with network policies that require more time to validate depending on the circumstances. If you are confident, then move on and validate later. Some people recommend doing the questions with most points first, which can also work, but I opted for just going ahead and doing them one by one since investigating what questions to do first also takes time.
The exam experience itself was horrible which includes the long intake procedure, interruptions by the proctor during the exam, and the time pressure. This was definitely the worst exam experience yet. I am not going to renew these kubernetes certifications in the future since I am working full time in this area now, renewal costs are just as high as the initial certification, and the bad exam experience.
However, I still think it was very useful to do all these certifications and I learned a lot preparing for them. In particular, I think I have become more security aware now as a result of the CKS and I have a better overview of the types of security measures that can be taken. It will definitely help me in the future.