Download this article in PDF format.
A solid IoT solution encompasses a lot of working parts, and at the front end of that is developing the hardware. Marketing needs to have a firm understanding of functional and environmental requirements, and engineering must be inventive and thorough enough to incorporate all of these into the product. But the real fun begins once the first prototypes have been created and the product-development test labs start taking them through their paces.
Successful products in this space aren’t invented through an “armchair quarterback” approach, but rather ground into existence through the “elbow grease” iterations of create, test, break, fix, and repeat. Some interesting myths have developed over the years around these ruggedization and testing processes, and it’s high time that they be debunked.
1. All thermal chambers are created equal.
Visit almost any electronic hardware testing laboratory and chances are you will encounter several large thermal chambers noisily humming along. They work much like your air conditioning at home—a large compressor and furnace in the bottom and several fans that swirl the heated or cooled air around in the chamber area on top, where units to be tested are placed.
The purpose of the fans is to swirl the air around inside the chamber to ensure a uniform temperature gradient. These work great for testing hardware that’s actively cooled (i.e., have fans or blowers that move air through a heat exchanger), but they don’t work well for passively cooled items (no fan, just large-finned heat sinks that rely on natural convection to move heat away from the hardware).
Why don’t they work well for passively cooled hardware? The answer lies in all of that swirling air from those fans. For actively cooled hardware, the effect of the swirling air on thermal transfer is negligible compared to the internal fan/blower ramming air through a heat exchanger. For passively cooled hardware, the effect is too large to ignore. And worse, the data that you get is not conservative, meaning you could have the hardware pass the test in your thermal chamber but fail in the real world. Not a good situation!
Fortunately, there is a good solution: natural convection chambers. These chambers heat the testing chamber via infrared heaters and are designed to vent so that there’s no moving air. When the unit under test passes 70°C (as an example) in one of these chambers, you can feel confident that it will perform as intended, even in an installation closet of dead still air.
2. Testing to the specification limit is good enough.
In most test labs, there’s a lot of focus on passing the specification. Emphasis is placed on having a “statistically significant” sample size and calculating confidence levels. All of this is absolute goodness, but in my opinion doesn’t replace the need to explore how much margin is in your product. All of the reliability in the world at the stated spec limit won’t save your bacon the day your enclosure fan goes out, or one of the pieces of equipment decides to overheat and drive up the ambient temperature for everything else in the cabinet, or… well, you get the idea.
Another important reason to explore margin is that it gives insight into your weakest link. And oftentimes, it might be quite cost-effective to make a change that dramatically improves the margin of your product. If this sounds like an echo of the Highly Accelerated Life Testing advocated by Dr. Gregg Hobbs back in the late 1980s and 1990s, you’re exactly right. It’s not a new idea, but still a powerful one.
3. Dust testing isn’t important unless you’re deployed in the desert.
Dust isn’t restricted to the desert. It turns out it is all around us most of the time, and it can definitely affect the performance of your electronics by lowering its ability to get rid of heat (Fig. 1).
1. This IoT gateway inside a dust chamber is being tested per the IEC 60529 IP5X dust test.
But what about dust that contains a high level of pollution? Now you need to worry about how well your electronics can resist corrosion. What if you’re deploying the product in a machine shop or anywhere else where there’s a lot of conductive particulate in the air? Now you should be concerned about electrical shorting on the printed wiring boards of your electronics. Dust comes in several flavors, and you need to be aware of how your product is going to operate when it’s deployed in those environments.
4. One size vibration profile fits all.
When we talk about where IoT devices can be deployed, I always think about that old Steve Martin movie Planes, Trains, & Automobiles. It probably won’t take much effort to convince most people that the vibration profile on a train is quite different from that on a plane or in an automobile. Shipping on the water brings in yet another mode. But it is much more complicated than just sorting into four or five groups.
Take automotive as an example. You’re going to get a very different profile under the passenger seat of the cab of an 18-wheeler as compared to being mounted on the wall of the trailer it’s pulling. What if you’re deployed under hood somewhere in the engine compartment? There are lots of profiles to examine for your IoT product, and you need to be testing the relevant ones for your customers.
5. It doesn’t matter if you exercise the hardware during your environmental testing.
What if I wanted to sell you a car and told you that the engine didn’t overheat as long as you left it in park? Still interested? IoT hardware is deployed to perform some function or even multiple functions. It only makes common sense that you would put the hardware through its functional paces while doing the environmental testing to see how it handles the workload.
As an example, a gateway may run great at 70°C as long as it isn’t processing any sensor signals, but throttle significantly when it has to process a large number of them. An important part of IoT testing is to characterize the hardware responses to these kinds of stimuli and test for worst-case workloads.
6. All benchmarks give the same basic data.
Benchmarks are wonderful in that they allow us to compare different things, but you must be careful with those comparisons. For instance, you wouldn’t take a Windows-based benchmark and then assume that it holds for a Linux system. Yet the pressure is there to do just that since there aren’t a lot of Linux-based benchmark tools available.
You also need to know what parameters are being exercised by the benchmark. It won’t do you much good to buy an IoT product that has a great benchmark score if it turns out the benchmark doesn’t exercise heavy I/O traffic and you want to use it to process a large array of sensors.
7. Discovering product performance margin isn’t important.
Starting to get an idea that I’m a big fan of understanding product margin? Again, it’s important to understand the limits of the product under investigation so that you can have intelligent conversations around solving someone’s challenges.
What happens when someone has a special installation (aren’t they all) and the discussion turns to the obvious question: Will the hardware handle it? Or what if they want to double the number of sensors they want to monitor, but are only going to 60°C instead of 70°C? Of course, you may not have time to explore every combination of temperature, I/O load, computational load, etc. However, you want to try and book-end several of these in order to have intelligent conversations about how your hardware is going to respond to the myriad of potential customer scenarios.
8. Components aren’t as important as overall design.
What is that old wise saying: You are only as strong as your weakest link? This especially holds true for electronics. All of those capacitors, resistors, and inductors are put into the design for a reason, and you can bet that when one or more of them dies off, you will experience performance degradation.
The worst case is that they fail under thermal load, but perform adequately at room temperature when the tech is trying to troubleshoot them. This can end up wasting a lot of time as the hardware cycles back and forth between failing in the field and being labeled CND (Could Not Duplicate) in the failure analysis lab, only to be sent out to the field again to start the cycle anew. The message is that you want the hardware to have been designed and stuffed with high-end components, not the cheapest thing that could be found to try to save a few cents here or there.
9. Everything can be fixed with a software update.
When ruggedizing IoT hardware, you spend a significant amount of time worrying about thermal and vibration performance. You also are taking a hard look at functionality and how it stands up in different environments. It’s an iterative process and many improvements get put into the product during this testing. Examples include upgraded TIM (thermal interface material), more heat pipes, additional seals, certain components being glued down to the printed wiring board to make it more robust in the vibration environment, etc. Reviewing that list, how many of them do you think you’ll be able to fix with a software upgrade? You need to design, manufacture, and test it before it ever sees a customer’s hands, so that you don’t end up trying to sell the “software update” story.
10. Lab testing is the end of product evaluation.
Testing in the lab is a very important element of developing a high-quality product, but it isn’t the final chapter. A lot of thought, research, and data collection goes into this testing, but it’s still a proxy for what you really want to know—how will it perform in the real world (Fig. 2).
2. Here, an IoT gateway is mounted in a thermally and vibrationally challenging environment.
To that end, it’s important to embrace some form of early evaluation program into your product development. You want people who are actually trying to solve their problem exercising your hardware, using it in the manner that works and makes sense for them. Because, inevitably, they will find something that you didn’t discover in the lab, and it’s much better and cheaper to find out about it before you mass deploy.
11. I don’t have to worry about quality as long as I have a good warranty.
This myth is a very old one, and pre-dates modern electronics. The theory is that if I can buy two of product X for the cost of one product Y, I come out ahead the minute my second product X lasts longer than the aforementioned Y, and it’s all gravy after that point. The fallacy is that you aren’t factoring in the cost of recovering from the failure of product X, as well as the cost of deploying the second product X. These costs aren’t trivial, especially in some IoT solutions.
If your sensor or gateway is located on a pipeline smack in the middle of nowhere, 100 or more miles from the nearest small town and twice that far from the closest service technician, the cost of failure can way outstrip the purchase price of the product itself. (The service council estimates an average truck roll is $286 plus $79/hr for labor.) Not to mention the thousands of gallons of oil you might be on the hook to clean up because your sensor didn’t catch the break.
This is an extreme example, but the point is a solid one. You need a good understanding of the situation and environment that your solution is being deployed in order to understand where you’re going to want to fall on the cost-quality continuum.