An Internet of Things (IoT) platform is an incredibly complex piece of software, because it has to solve a number of hard problems:
- Wireless: radio communication is brittle and must conform to strict regulations
- Large scale: networks can have hundreds to thousands of devices
- Low power: devices run on tiny batteries and must be extremely power efficient
- Integration: everything must work together, from tiny ARM Cortex wireless microcontrollers to NodeJS-driven database backends
And all of this has to work, all the time.
How do we at Thingsquare ensure that the our IoT platform keeps working?
The answer is automated testing: every single change that we make to the code needs to go through rigorous automated testing before we accept it into the code base.
Our automated testing framework is integrated into our git workflow: when we make a pull request, that triggers a set of tests that exercise the entire system.
We run 27 parallel test runs for each change. Every test run focuses on a different aspect of the system. The most basic tests make sure that the code compiles without warnings. Some test the backend database logic. Some test the device code, such as making sure that the firmware update mechanism always is 100% stable. Others test the regulatory compliance of the wireless protocols.
Each test runs for between a few minutes and up to an hour. A change is considered valid only if is passes all tests. The output of a test run can be seen below:
The wireless network simulator
The secret sauce that makes us able to run automated testing is our wireless network simulator.
The simulator is able to simulate both the wireless conditions and the nodes that operate the network. We can emulate the code running on each device at such a low level that we can see the power consumption of individual wireless packets. We can also simulate the code at a higher level that allow us to run simulations with hundreds of nodes in a short amount of time.
The simulator uses Javascript to set up test cases and check that the output of the tests is what it is intended to be.
In total, a full test run in the current version of the Thingsquare platform uses 2000+ simulated nodes.
Power consumption
The power consumption of the devices in the system is tested in two ways:
- The simulator measures the power consumed by each device.
- The devices themselves estimate their own power consumption.
The numbers from the simulator and the devices are both compared with pre-set goals that they need to meet. These goals are set so that they will be able run for years on coin cell batteries.
Wireless regulations
The wireless spectrum on the sub-GHz band is regulated by authorities across the world, such as FCC in the US and ETSI in Europe. Products that use the sub-GHz band must follow those regulations.
The Thingsquare IoT platform is designed to comply with these regulations and the automated testing ensures that the regulations are always followed.
To check that the regulations are followed, the simulator keeps track of the time that each simulated node spends on their frequencies. To comply with both the FCC and ETSI regulations, devices may not stay on any individual channel for more than 400 ms before they switch to a different frequency.
Performance
The simulator checks that the performance of the system always conforms to pre-set goals. For example, a 100 hop network should be able to set itself up and send a specified number of messages without problems.
The setup process for new networks is also tested to make sure that it does not take more time than desired.
Over-the-air firmware updates
One of the most crucial pieces of functionality in the system is remote over-the-air (OTA) firmware updates. Without firmware updates, it is impossible to change the functionality of deployed devices.
We test firmware updates in two scenarios:
- individual firmware updates for specific devices
- network-wide firmware updates for a group of devices
For both mechanisms, the devices need to correctly receive the entire firmware binaries that they are to be updated with. This is checked using the standard mechanism by which the correctness of a binary is checked: through a public key crypto mechanism (PKCS #1).
After receiving the new binary over the network, the new binary is booted up to ensure that it works.
Conclusions
An Internet of Things (IoT) platform is a complex piece of software. We need to ensure that the Thingsquare IoT platform always works and performs as expected. We use automated testing on every change that we make to the system.