Setting up up a deep learning box

After doing a number of courses on machine learning I now have some overview of what is available and how it all works. So now it is getting time to start doing some work from start to finish myself. To do some of the more interesting things you definitely need access to a system with a good GPU and the systems I have at home are not really suitable for this:

  • My work laptop has an Nvidia GT Quadro 1000M which has compute capability 2.1
  • My private laptop has a GPU (Nvidia GT 330M) which has compute capability 1.2
  • My server which does not have a GPU and so has compute capability 0

On the other hand popular frameworks like Tensorflow require, as of this writing, compute capability 3.0. This effectively rules out the use of my private and work laptops.

As alternatives, I considered starting of in the cloud by using Google or Amazon GPU offerings. But then the workflow there is also always to first setup some work at home and then do the same in the cloud. Also, costs can add up quite quickly if you go that way. Then another alternative is to get a new laptop or a new PC with a fast GPU. That seems nice since it opens up some more opportunities for gaming as well, but then I am not really a gamer, and it also feels like a bit of a shame to get another laptop/PC when my current one is still working fine (A Sony Vaio F11 laptop with 1.6GHz CPU). My current laptop is running linux most of the time and really still performs quite well.

Then I started looking at another possibility which is to add a GPU to my server. In fact, this turns out to be possible since my server has a free PCIe 2.0 x16 slot. Looking on the internet it seems that PCIe 3.0 cards should work without problems in PCI 2.0 so that makes it possible. Also, there is a way to setup a VM on KVM so that GPU accelerated computing can be used inside a VM, see for instance server world. This is preferred over running natively. The idea will be to do a lot of (long running) experiments locally and if I really want to do something big ‘rent’ some capacity in the cloud.

To do all this, I first had to upgrade my server which is now running Centos 6 on the virtual host to Centos 7. Well “upgrade” is a big word. It involved installing Centos 7 side by side to Centos 6 and getting everything to work again. Now that part is done. The next step is to get a nice graphics card (e.g. Nvidia GTX 1080 or 1070) and set that up in the server.  This will be interesting.

Posted in Fun, Server/LAN, Software | Leave a comment

Why finalizers are really bad

It is more or less common knowledge that using finalize functions in java is bad. For one, you are depending on garbage collection for cleanup and there is no guarantee when the finalizer will be called. Further there is also no guarantee it will ever get called even if the appliction terminates nicely using System.exit().

However there is a far more important reason why finalizers are bad. Continue reading

Posted in Uncategorized | Leave a comment

Encrypting an existing Centos install (2)

In a previous post, I described how to encrypt an existing Centos install that approach was based on find out how LUKS worked and then creating a storage logical volume that was encrypted with then logical volumes on top of that to contain the original data. The main disadvantage of that approach was that it was not possible to encrypt the root partition, and thus still potentially leaking confidential data.

Therefore, I looked at how a standard fully encrypted Centos install worked and basically that is quite simple. The basic setup of an encrypted Centos install is to have a simple partitioning setup with one small physical partition (e.g. /dev/sda1) with /boot on it (using typically ext4), and a second partition /dev/sda2 which is encrypted. On top of the encrypted /dev/sda2 device (e.g. /dev/mapper/luks) the previous logical volumes are based. This approach requires no power managements hacks nor special mount options in /etc/fstab.
Continue reading

Posted in Uncategorized | Leave a comment

Encrypting an existing Centos install

Edit: Meanwhile I have found a better way to migrate an existing centos unencrypted install to a fully encrypted install with /boot as the only unencrypted disk space. This solution is much preferred over the one described in this post. The new approach is here.

Inspired by yet another incident in the news of a laptop with sensitive information getting stolen, you start imagining what would happen if someone would get hold of your laptop. How much sensitive data would be on it and what would the consequences be. A small investigation revealed that the consequences could be quite big. There are various personal document stored, version control systems and IDEs with insecure password storage and of course various browser history files and cookies. That made me a bit nervous.

Therefore, I set out the investigate how to make this laptop more secure. The setup is an extension of setups I found on the internet where typically LUKS over LVM or LVM over LUKS is used. The current setup will in effect be LVM over LUKS over LVM.
Continue reading

Posted in Java, Server/LAN, Software | 1 Comment

Creating a USB install for Centos 6.4

The days of rotating disks for storing information and in particular for installing OSes are nearing their end. Why rely on something with rotating parts for storing data in the 21st century? Unfortunately, not every software vendor has caught up with this so in some cases special measures must be taken for installing an OS from a USB disk. One example of this is Centos/RHEL which does not come with a USB install by default. There is procedure from Red Hat that can be used, but that procedure is limited to starting an installation when you already have the installation media available somewhere (e.g. on a hard drive).

One common method to create such a USB install is to use the livecd-iso-to-disk script. Unfortunately that did not appear to work and I have tried it many times. After reading the interesting discussion on unix.stackexchange.com, I tried to give it another shot and this time it worked.

What I did was the following on a laptop running Centos 6.4:

  • Insert the USB stick: Find out the device name (e.g. using dmesg). Make sure the stick is unmounted as it could be automounted.
  • Partitioning: Make sure the disk is partitioned to contain one single primary partition (e.g. /dev/sdb1) using for example cfdisk. For now I will assume that /dev/sdb is the USB stick. Make sure to substitute this for the correct device in the next instructions.
  • File system: Create an ext3 filesystem on /dev/sdb1
    mkfs.ext3 /dev/sdb1

    I did not try ext2 and ext4 but these could also work. You can also optionally do a

    tune2fs -m0 /dev/sdb1

    to increase the available space by removing reserved blocks for the kernel (these are not needed anyway).

  • Install livecd tools: Install using yum:
    yum install livecd-tools
  • Transfer the ISO to the USB stick: Transfer disk 1 of the Centos 6.4 installation to the USB stick:
    livecd-iso-to-disk  CentOS-6.4-x86_64-bin-DVD1.iso  /dev/sdb1

    Note that it is important to specify /dev/sdb1 here and not /dev/sdb.

     

Testing

After this step, the USB stick can be tested locally using qemu-kvm.

To simply verify the the USB stick is found and the boot menu is recognized, bootup a virtual machine with only the USB disk:

/usr/libexec/qemu-kvm -hda /dev/sdb -m 256 -vga std

And use a VNC viewer (e.g. vncviewer from tigervnc) to view the VM. This should show a boot menu and should allow you to start the installation until the point that the installation procedure cannot continue anymore.

If you want to test a full installation, create a disk using logical volume management

lvcreate -L 10g -n bladibla vg_mylaptop

where vg_mylaptop is a volume group where you have at least 10GB of space left, and start qemu-kvm with the created logical volume as disk hdb and give it a bit more memory:

/usr/libexec/qemu-kvm -boot c -hda /dev/sdb -hdb /dev/vg_mylaptop/bladibla -m 2048 -vga std

After the install is completed, start the VM again without the USB stick

/usr/libexec/qemu-kvm -boot c  -hda /dev/vg_mylaptop/bladibla -m 2048 -vga std

The VM should now start up successfully. The USB boot stick is also recognized by my laptop natively and I it looks like I can install a full OS also there (at least the upgrade, which did nothing of course in my case, worked completely).

Disclaimer: As mentioned in the discussion at the link above, the whole procedure might give different results based on the USB stick you might use. I tested this procedure on a Dell Latitude M4700 laptop using a Kingston GT160 8GB memory stick.

Posted in Server/LAN, Software | 4 Comments

Java from the trenches: improving reliability

Java and the JVM are great things. In contrast to writing native code, making a mistake in your Java code will not (or should not) crash the virtual machine. However, in my new position working for a SAAS company I have been closer to production systems then ever before and in the short time I have been there I have already gained a lot of experience with the JVM. In any case, crashes and hangs occur but there is something we can do about it. These experiences are based on running Java 1.6 update 29 on Centos 6.2 and RHEL 6 as well as windows server 2003.

Java Service Wrapper

To start of with, I would like to recommend the Java Service Wrapper. This is a great little piece of software which allows you to run Java as a service with both a Windows and Linux implementation. The service wrapper monitors your java process and restarts it when it crashes or restarts it explicitly when it appears hung. The documentation is excellent and it works as advertized. It has given us no problems at all apart from tweaking the timeout to consider a java process hung.

The service wrapper writes its own log file but we found that it contained also every log statement written by the application. The cause of this turned out to be the ConsoleLogger of java.util.Logging which was still enabled. This problem was easily solved by setting the handler property empty in jre/lib/logging.properties

handler=
#handlers= java.util.logging.ConsoleHandler

This also solved a performance problem whereby  due to a bug in the application, excessive logging was being done and the java service wrapper simply could not keep up anymore.

With a default JRE logging configuration, the logging output can also be disabled by setting the following properties in the wrapper.conf file:

wrapper.syslog.loglevel=NONE
wrapper.console.loglevel=NONE
wrapper.logfile.loglevel=STATUS
wrapper.java.command.loglevel=STATUS

Of course, with the console logging turned off, it should be possible to remove the wrapper.console.loglevel setting (not tried yet).

Garbage collection

Since we would like to achieve low response time and minimize server freezes due to garbage collection, we settled on the CMS (Concurrent Mark and Sweep) garbage collector.

Using the CMS collector we found one important issue where on windows, the server would run perfectly but on linux it would become unresponsive after just a couple of hours traffic. The cause was quickly found to be permgen space. It turns out that garbage collection behavior on windows differed from linux. In particular, garbage collection of the permgen space was being done on windows but not on linux. After hours and hours of searching, we found this option that fixed this behavior:

-XX:+CMSClassUnloadingEnabled

The full list of options we use for garbage collection is now as follows:

-XX:+UseConcMarkSweepGC
-XX:+ExplicitGCInvokesConcurrent
-XX:+CMSClassUnloadingEnabled
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-verbose:gc
-Xloggc:/var/log/gc.log

The last four options are for garbage collection logging which is useful for troubleshooting potential garbage collection issues after the fact.

One of the issues with the above configuration is that upon restart of the JVM, the garbage collection log file is overwritten instead of being appended to, thereby losing information when the JVM crashes. This problem can be worked around by using a ‘tail -F gc.log > gc.log.all’ command, but this solution is not nice as it will create very large log files. An optimal solution would be if the JVM would cooperate with standard facillities on linux such as logrotate. Similar to how, for instance, apache handles logging, the JVM could simply close the gc.log file when it receives a signal and then reopen it again. That would be sufficient for logrotate to work. Unfortunately, this is not yet implemented in the JVM as far as I can tell.

Crashes in libzip.so or zip.dll

It turns out that this problem can occur when a zip file is being overwritten while it is being read. The causes of this could be in the application of course, but still the JVM should not crash on this. It appears to be a known issue which was fixed in 6u21-rev-b09, but the solution for this is disabled by default.

If you set the system property

-Dsun.zip.disableMemoryMapping=true

then memory mapped IO will no longer occur for zip files which solves this issue. This system property only works on linux and solaris, and not on windows. Luckily a colleague found this solution. It is very difficult to find this setting on the internet, which is full of stories about crashes in the zip library, even if you know what you are looking for.

Crashes in networking libraries/general native code

Another issue we ran into were occasional crashes, mostly in networking libraries’ native code. This also appears to be a known issue with 64 bit JVMs. The cause of this is that there is insufficient stack space left for native code to execute.

How it works is as follows. First of all, the java virtual machine uses a fixed size for the stack of a thread. This size can be specified with the -Xss option if needed. While executing java code, the JVM can figure out whether there is enough space to execute the call and throw a StackOverflowError if there’s not. However, with native code, the JVM cannot do that so in that case it checks whether a minimum space is left for the native code. The minimum space is configured using the StackShadowPages option. It turns out that by default, this space is configured too low on older 64 bit JVMs, causing crashes in for instance socket libraries (e.g. when database access is being done). See for instance here. In particular, on JDK 1.6 update 29, the default value is 6 and on JDK 1.7 update 5 it is 20.

Therefore, a good setting of this flag is to use 20

-XX:StackShadowSize=20

The size of 1 page is 4096 bytes so increasing the stack shadow pages from 6 to 20 would mean that you need 56KB additional stack size. This page size can be verified by running java with a low stack size and passing different values for stack shadow pages like this:

erik@pelican> java -Xss128k -XX:StackShadowPages=19 -version
The stack size specified is too small, Specify at least 156

The stack size per thread may be important on memory constrained systems. For instance, with a stack size of 512KB a 1000 threads would consume about 500MB of memory. This may be important for smaller systems (especially 32 bit if these are still around), but are no issue at all for a modern server.

Debug JVM options

To find out what the final (internal) settings are for the JVM, execute:

java -XX:+PrintFlagsFinal <myadditionalflags> -version

Logging

If your environment still uses log4j for some reason then be aware that log4j synchronizes your entire application. We found an issue where an exception with a huge message string and stack trace was being logged. The toString() method of the exception in this case took about one minute during which time the entire application froze. To reduce these synchronization issues of log4j use AsyncAppender and specify a larger buffer size (128 is default) and set blocking to false. The async appender may have some overhead in single-threaded scenarios, but for a server application it is certainly recommended.

Posted in Java, Server/LAN, Software | 1 Comment

Why do developers write instead of reuse?

I am frequently amazed at the amount of software that is being written instead of simply looking around and reusing what’s already available. In practice I have seen a lot of reasons for this:

  • Our problems are unique: The misconception that “our problems are unique”. I really can’t recall how many times I have seen this but this is really occurring a lot.
  • Not looking for similar solutions: Simply forgetting to look for similar solutions on the internet to see what’s available (if only as an inspiration on how to best solve the problem). This is often also a side effect of thinking that this is a unique problem.
  • Underestimation of the problem: The misconception that it’s easy to write it yourself. In most cases, it is easy to come up with a first (half) working version that does approximately what you need. However, the work involved in making the same solution maintainable and with the correct feature set will make it much more expensive (the 80-20% rule).
  • Limited scope: A developer specialized in platform X (e.g. X = java) will typically only look for solutions in that area, whereas looking broader will reveal more solutions.
  • Coolness factor: It is cool to develop it yourself. Perhaps it involves an opportunity to do something cool with clustering or another chance to use one of your favorite frameworks. Perhaps you could use one of those cloud databases?
  • Overestimation of oneself: The idea that we can do something better in a few weeks time than what the industry or open source community has come up with using man years of development.
  • The desire for fame by writing reusable software: Paradoxically, the desire for reusable software can stimulate to roll your own. The problem is that writing reusable software (or calling it reusable) provides you with fame (even if it’s only in your local department). The reality is however that reuse can only exist through the willingness of people to use other people’s software. If there is one developer writing a reusable piece of software and 20 others using it, then clearly the willingness to use other’s software far outweighs writing it yourself.

I have seen these problems in companies of all sizes.

Posted in Uncategorized | 2 Comments

Moving countdown


Yes folks! The countdown timer has been started again. This time it is counting down to the time when the move really starts and first boxes will be loaded onto a truck towards my new home.


Really looking forward to it…

Posted in Uncategorized | Leave a comment

Moving

The last time I moved to a different city was 13 years ago. And before that time I had been moving every two years or so. So when I finally settled in 1998, I decided that I was going to stay in one place for a much longer time. It is time now however to move again, I got a new job in a new location and it makes a lot of sense to move. For one, I will have much better house (buying a house in the middle of the credit crunch), with a very nice garden, and it will reduce my traveling time to and from work considerably. Also, the environment is quite nice because my favorite mountainbiking locations are closer and there are also many more opportunities for mountainbiking close by.

One of the most important things when moving is of course…. my server. Of course, I am depending a lot on it. For one it is running my mail server and it also handles a number of mailing lists. It runs 4 web sites, and it is also my VCR (mythtv).

Therefore, it is important to me to minimize downtime of the server during the move. Luckily, I am already prepared for this since I am running the server as a virtual machine already. So as part of the move I will run this virtual machine on my laptop, which gives me plenty of time to disassemble the server rack and set it all up again at my new location. In fact as I am writing this, I am already running the server from my laptop. It is easy for me to do this because my regular server backups are bootable, see here.

Because of this setup, I can minimize the total down time of my web sites to the order of minutes and minimize mail down time to less than possibly one day in total (but no-one will notice that because mail servers retry sending mail).

Interestingly, I had quite a fight today to get things working again with my TVIX M-6500 which allows me to play movies hosted on the server (through NFS) on my TV. As it turns out there are subtle issues with network bridges on linux dropping UDP packages in some cases, see here.  As it turns out, the TVIX uses UDP for NFS, which can give problems with bridged network interfaces on virtual machines in some cases. Luckily, I managed to solve this by replacing the virtio network model on the machine by device emulation of a RTL8139 chipset. Anyway, all is  good now. The server VM is now fully functional again and I can watch movies, send/receive mail and all my websites are up. The only thing I cannot do is record at this time, but ok, this is only for the next 10 days or so. On the 16th of February I hope to be able to start the server again at its new location.

Posted in Uncategorized | 1 Comment

Nested Logical Volume Management for VMs

As I blogged earlier, I have replaced the server setup that I originally had with a virtualized server setup. This introduces the concept of “hardware independent server” and makes it easy to run the server on any hardware without modification. More concretely, it allows me to run until the hardware fails. Previously I used to replace the server hardware before it really broke, but in this setup I can run it until it breaks. Should I have a serious hardware failure I can simply run the server(s) from any other hardware such as a laptop. This is because I have “bootable backups”. I.e. if the server breaks, I can either run a replacement server based on the same data or simply use a laptop and run the backup in a virtualized manner.

As part of the original migration from running native to virtualized I used the identical setup, which meant passing physical hardware partitions to the virtual machine. The virtual machine then used Linux Logical Volume Management based on these hardware partitions. For new virtual machines I used another approach which was allocating a disk logical volume on the host, and then partitioning this on the guest and using LVM again to manage storage within the guest. This in fact results in nested logical volume management and as I have seen from one of the new virtual machines works like a charm. It provides a nice separation of concerns where the host simply assigns storage to guests and the guests decide how to use this storage.

However, there was still one virtual machine (the original hardware based server) that was still being passed physical disk partitions. This introduced the problem of both the host and virtual machine seeing the same logical volumes and thus the chances for administrative error and data corruption when multiple OSes would concurrently access the same logical volumes.

To remedy this, I used the following procedure:

  • Allocate a physical volume on the host and a “disk” logical volume on it big enough to contain all logical volumes from the VM
  • Stop the VM
  • Add this virtual disk to the VM.
  • Start the VM
  • Partition the new disk on the VM and extend existing volume groups to use physical partitions on this disk.
  • Use pvmove to move data to the disk and remove the old unused physical partitions from the volume groups afterwards.
  • Stop the VM
  • Remove old physical partitions from the VM, leaving only the new “disk” logical volume
  • Start the VM

In executing this procedure I ran into the basic problem that I did not have enough storage. To solve this I used a separate disk that was connected temporarily to the server. Now, after executing this procedure, all physical storage on the existing logical volumes (RAID array) was unused, so I extended the logical volume for the disk with that from the RAID array on the host. Then again using pvmove to move data to the RAID array from the temporary disk. And afterwards removing the unused physical volumes on the temporary disk from the volume group. Of course, all done while the virtual machine was up and running (no-one likes downtime).

The new setup reduces the chance of administrative error considerably and allows me to move storage for virtual machines to other locations without even having to shutdown a virtual machine. It also nicely separates the allocation of storage to VMs on the host from how each VM uses its allocated storage.

Posted in Server/LAN | Leave a comment