What you’ll learn:
- Why experts can still mess up something simple.
- How to track disk temperature using Centreon.
- Why documentation is important.
It was an interesting weekend that eventually ended successfully. We had a new air conditioning (AC) system put in that was working rather nicely. Making the switch over to heating highlighted an installation problem, though, which required a bit of debugging of the Honeywell RTH9580WF thermostat. Documentation helped a lot.
About the same time, I was upgrading my servers, including the Centreon monitoring system. Why would anyone want to use a high-end system to monitor a home network?
Bill’s Systems aren’t Typical
Well, my network is a bit different than most. I have three physical servers running Rocky Linux (Fig. 1). The primary has five RAID groups with a total of 48 TB and a Broadcom MegaRAID controller, the secondary has four RAID groups with a total of 36 TB, and the small server has software RAID with 20 TB.
The main server runs KVM with 7 to 10 virtual servers depending on what I’m doing. This includes a Minecraft server for the grandkids, Zoneminder for my security cameras, MythTV to record television shows, and a host of other web servers for projects like the family’s Drupal/Webtrees genealogy server. Essentially a lot of things going on that need to be tracked using Centreon and managed using Cockpit. Of course, I don’t let my Verizon router handle everything, so there’s another server running ipFire. I still lose data on occasion due to my own stupidity, but the rest tends to be safe and secure.
This level of complexity is also found in the home heating system. It’s actually oil-based, hot water and forced air. There are three hot-water zones plus the hot-water system for general use. The added level of complexity was due to additions where hot-water, baseboard heating is employed.
Fun stuff.
Diagnosing a Home Heating System
I was already in the midst of my bi-yearly upgrades when we turned on the heating system and it stayed on, and on, and on. It was set to 69 degrees, but when it hit 84, it was time to call for help. Besides, they just installed the AC system a month or so before.
It turns out that the people who installed the AC system decided that setting the thermostat up to think we had a heat pump system was a good idea. Turns out it wasn’t.
One thing to keep in mind is that wiring up a heating-system thermostat should not be too difficult. It’s typically half a dozen wires with a heat and AC system (Fig. 2). Though the wire color and connector labeling tends to be standard, you want to have the documentation to make sure it’s all lining up, even when using a smart thermostat that tends to walk you through multiple setup selections.
Of course, that’s where the initial AC installation went wrong. Turns out the configuration should not be as a heat pump because it changes how the heating system is powered. It sort of worked when they tested it after the AC installation, but once we switched over to heat full time, the heat stayed on.
We had the company come back to address the problem, and the technician this time around was more experienced. It took a while figuring out the problem and then correcting it, but the thermostat manuals were invaluable in solving the problem. Luckily, no rewiring was needed, although one of the control units for hot-water control needed to be replaced as well. It would have been much harder to figure out without the documentation.
Getting Centreon to Play Nicely with Disk-Drive Temps
As noted, I have a few hard disks spinning all of the time. I’ve had issues in the past that I was lucky to catch when a fan stopped and disk-drive temps rose. It would be better to track the drive temperature, which turns out to be something you can do via Centreon with a little tweaking.
First, a little note about why I have so much hardware in my basement. I’m paranoid. I don’t like losing data, and I have decades of my past work and my family’s info. In the past, I replicated my two main servers using DRBD, but this turns out to be a bad idea with only two servers and no specialized hardware.
I’ve since gone with a simpler rsync replicate every night with a more conventional Bacula-based system. I also have cloud and local removable drives as backup targets, so there are usually four copies of most data.
Centreon is useful in making sure all components are working, not just whether rsync or Bacula is running.
So, my yearly or bi-yearly update is typically moving all my Linux servers to the next version. You can jump most Linux distributions like Fedora and Ubuntu’s two major versions without a problem—usually. Unfortunately, the upgrade to the Centreon virtual machine ran into issues when the heating system mess cropped up. The upgrade worked but the import of the prior configuration caused the Centreon poller to fail.
Delving into the documentation didn’t help. I finally figured out that importing the old poll settings was the problem, but it took a lot of debugging to figure this out. Likewise, there wasn’t an easy way to remove the polling details from the old settings. The trick was to export the configuration that was imported and not working without the poll configuration, reinstalling from scratch, and using the newly exported configuration. This was much easier than trying to redefine the 90+ services that it’s tracking.
Once things were back up and running, I was able to add the disk-drive temperature support to Centreon (Fig. 3). This wound up being a mix of standard and custom configurations. Centreon has a nice tool to check commands that work with hddtemp. This an open-source project that grabs ATA disk temperature. Unfortunately, the two main servers utilize SAS controllers that hide the drives, so I had to resort to custom command scripts that used the SAS controller command line programs to get the information.
So, All Is Well in the Wong Network
It’s been a week and the house is now warm, not toasty, and the network is humming away doing backups every evening. I rarely look at Centreon and Bacula, as they send out regular emails with their status when things go wrong. I’m hoping I have most of the checks in place so that I get warnings before the hardware goes south.
I helped diagnose and solve the heat/AC problem and handled the network on my own. Both required access to some decent documentation, although the latter required a bit more debugging where better documentation would have helped. Of the two, the network was more of a challenge because there were more materials to analyze.
With the thermostat, there are only six wires and just a few current loops. With Centreon, a dozen logs needed to be examined, and they had wonderfully cryptic log entries that weren’t really documented. I’m rather determined, though—I’ve dealt with this type of debugging situation having worked with dozens of Linux systems and applications. At least I was successful in the end.
Please document your projects and make diagnostics part of the mix. I’m onto the next project.