Last week you might have read that I have been working on a new homelab, based on vSphere 6.7u1 including Horizon 7.7, NVIDIA vGPU, VMware vSAN, and 10 GbE. All based on hardware that has low power consumption. The Xeon D architecture on Supermicro motherboards was the ideal match for my homelab requirements. There was one more challenge to overcome though, making it even quieter.
My lab has three hosts:
- Management host: Supermicro E300-8D
- This host runs all of my management VMs including the vSAN witness.
- vSAN node 1: A Supermicro X10SDV-4C-TLN4F running a Tesla P4, and two flash devices.
- Half of my VDI and Deep Learning workloads are running on this host.
- vSAN node 2: A Supermicro X10SDV-4C-TLN4F running a Tesla P4, and two flash devices.
- The other half of my VDI and Deep Learning workloads are running on this host.
In my quest towards the least noise possible, I already replaced the two standard fans by three Noctua fans in the Supermicro E300-8D. By configuring the fan-mode in the IPMI to Full Speed, the CPU stays nice and cool with hardly making any noise. During heavy usage, the CPU stays at around 70 degrees Celcius, which is perfect. Xeon D CPUs are built to run in embedded systems without active cooling and have a temperature limit at 110 – 120 degrees, so that’s well above my own limit.
The challenge I didn’t solve yet, was finding the perfect airflow for my vGPU nodes. This is because of a couple of things:
- The nodes have three different fans, that all have their own goal:
- One to keep the CPU cool
- One to achieve an airflow out of the case
- One to suck air into the GPU
- In the Supermicro IPMI you are able to control the speed of the fans, but not individually. You are able to do that on the command line of the IPMI, but I wanted something dynamic.
- There are temperature sensors on the CPU, the 10GbE NICs and the motherboard, but not on the GPU (that can be used for a fan speed controller).
- The Tesla P4 GPU doesn’t have an option to attach a fan.
The best way to cool the GPU and reduce the noise (outside of peak usage), was to find a temperature-based fan controller that could also fit in the housing. Most controllers are built to fit in a 5.25 inch drive mount, and since the Supermicro CSE-721TQ-250B doesn’t have one, I had to find something else. A bit of googling lead to me to the following controller:
It seemed to be the perfect solution. And just of under $10, so I ordered it.
The package contains the controller, the temperature probe, and an external speaker that warns you if a fan isn’t working properly. Since the 12v power cable wasn’t included, I made one myself. After attaching everything, the controller looks like this:
I mounted the controller to the case by adding a bit of foam between the board and the case and everything perfectly fits. I used a bit of sticky tape to attach the temperature probe to the GPU.
There are 5 dip switches on controller that allow you to set the thresholds. The GPU at about 45-50 degrees if it’s idle, so I adjusted the lower threshold at 50 degrees. The high threshold is set to 70 degrees. This will mean that during moderate utilization, the fan works at “medium” speed. As soon as the GPU will be fully utilized and the temperature reaches the high threshold, the fan will run at full speed.
To fully utilize the GPU, I have a couple of tools that can be freely downloaded from the NVIDIA site, but my favorite is the big head:
This is a demo that both renders the images and uses CUDA cores to process the calculations. Directly after starting the demo, nvidia–smi shows the GPU utilization is at nearly 100%:
The fan is blowing at full speed and makes a bit of noise that is noticeable because at lower speeds you aren’t able to hear it. This makes it that I’m able to work next to the lab without having to wear noise-canceling headphones