Power IoT Protocol "Hodgepodge": DL/T 645, Modbus, MQTT, IEC 104… One Industrial Router to Rule Them All — And Still Survive EMC Testing in Chemical Plants
1,000 devices. 68% online rate.
Terminals are fine. Platform is fine. SIM cards are fine. You've spent three weeks troubleshooting and can't find the root cause. Your boss wants a report tomorrow. You need an answer tonight.
I've spent five years doing power IoT work in chemical plants. I've lived through too many nights like this. Today I'm not going to talk frameworks — just three real-world disaster stories I've personally walked through. Each one crashed and burned on protocol and EMC — two issues that look minor on the surface but are absolutely lethal underneath.
Last year I took on a power monitoring project for a chemical industrial park. The client's requirement was clear: pull meter data, temperature control data, and fire safety data from 12 substations into a single unified platform.
Sounds like a standard ask. In reality —12 substations, running four completely different protocol stacks:
| Substations | Device | Protocol | Data Characteristics |
|---|---|---|---|
| 3 | Smart meters | DL/T 645-2007 | Periodic report, every 60 seconds |
| 2 | Temp sensors | Modbus RTU | Polling-based, response < 200ms |
| 4 | Power distribution terminals | IEC 60870-5-104 | Event-driven + periodic upload |
| 3 | Fire systems | MQTT | Publish/Subscribe, QoS=1 |
The project team did what most teams do: bought four gateways, one per protocol, and wrote four parsing scripts on the platform side — stitched together with brute force.
Week one: it worked. Week two: the data started "fighting."
Three layers of problems:
First, time bases weren't aligned.The timestamp on the DL/T 645 meter readings and the Modbus RTU temperature data drifted by 8 seconds. When the platform tried to correlate voltage and temperature at the same moment, they didn't match — and it threw errors.
Second, keep-alive mechanisms interfered with each other.IEC 104 heartbeats every 30 seconds. MQTT Keep Alive every 60 seconds. The two mechanisms fought for CPU resources on the gateway side. Under high concurrency, the gateway CPU sat above 90% permanently — and data started dropping.
Third, data semantics couldn't be aligned.Four protocols, four completely different data formats. Four parsing scripts, each working in isolation. The same alarm had different field names, different units, and different priority levels across protocols.
The most ironic result: all four gateways showed "online." The platform said everything was normal. But the data was a complete mess.
This is the most easily overlooked problem in power IoT: you solved the "connection" layer, but you didn't touch the "semantic" layer at all.Protocols can connect. That doesn't mean the data is usable.
How did we fix it? Replaced the four gateways with a single industrial router that supports multi-protocol parallel conversion. Unified protocol translation, time synchronization (NTP/PTP), and data cleansing happened on the device side. The platform received one standardized data stream. Problem solved the same day.
Bottom line: it's not that there are too many protocols. It's that the gateway only knows how to "relay messages," not "translate meaning."
This second case is even more of a wake-up call.
A power inspection project. Equipment selected entirely to "industrial-grade" specs: IP67, wide temp -40~70°C, M12 connectors. The spec sheet looked bulletproof.
Sent it to the chemical plant for EMC pre-compliance testing.Failed the first round.
Failed on what? Radiated Emissions (RE) — over the limit.
A spike appeared near 300 MHz, 6 dB above the limit.
The project manager didn't understand: "It's a metal enclosure. How can radiated emissions be over the limit?"
The answer: a metal enclosure blocks external interference from getting in. It does not stop internal noise from getting out.
Opened it up. The problems were textbook:
No matter how thick the enclosure is, it can't stop noise leaking from inside the PCB.
Even more fatal: immunity (EMS).
You think picking "industrial-grade" means you can handle a chemical plant? The electromagnetic environment in a chemical plant is a completely different level of challenge.
The EN 61000-4 series standards are not suggestions — they are mandatory gates. Radiated Emissions (RE), Conducted Emissions (CE), ESD, EFT, Surge — if any one of the five fails, no matter how expensive the device is, it doesn't get into the plant.
Back to the opening question.
1,000 devices, 68% online. Not the terminals. Not the platform.The gateway quietly died 300 times in the chemical plant's electromagnetic environment — and you never noticed once.
Why didn't you notice? Because most gateways have zero "self-rescue" capability.
The network hiccups — it goes down. No breakpoint resume, no heartbeat keep-alive, no auto-reconnect. When it's "online," it's zoning out. When it's "offline," it's playing dead. The only status the platform sees: gray.
So you start troubleshooting terminals. Three weeks later, you finally find it —the gateway firmware hadn't been updated in three months. The protocol stack had a known vulnerability that caused memory overflow under high concurrency.
This isn't an outlier. This is the most common "slow death" in power IoT projects:the device isn't broken. It's just as good as dead.
DL/T 645, Modbus RTU/TCP, MQTT, IEC 104 — if a device can do protocol conversion, time sync (NTP/PTP), and data cleansing all in one box, that's a gateway you can actually use. Otherwise you'll spend forever patching holes.
Radiated Emissions, Conducted Emissions, ESD ±6kV, EFT ±2kV, Surge ±2kV — all five must pass to earn your chemical plant entry ticket. PCB star grounding, analog/digital isolation, interface TVS protection, power-entry common-mode choke + X/Y capacitors — these aren't bonus points, they're basics. Wait until the test fails to fix it? The rework cost is three times the cost of getting it right the first time.
Hardware + software watchdog, breakpoint resume, auto-reconnect, remote config/upgrade/diagnostics — you can't send someone into a chemical plant every day to press the reset button. The device must recover from faults without human intervention.
Speaking of which, there are solutions that have actually been proven in these scenarios. USR's G806w from Someone Technology, for example — multi-protocol parallel conversion, wide temp -40~75°C, built-in eSIM with three-network auto-switching, industrial-grade EMC certified — it's been validated in chemical plant and power inspection deployments quite a few times. There's no single right answer for selection, butif your direction is right, you won't have to keep reworking down the road.
| Test | Requirement | Actual Result | Consequence |
|---|---|---|---|
| ESD (Electrostatic Discharge) | ±4kV no crash, ±6kV no damage | ±4kV reset, ±6kV MCU destroyed | Human body static in dry chemical plant seasons can reach 8kV |
| EFT (Electrical Fast Transient) | ±2kV no reset | Reset at ±1.5kV | Inevitably triggered during VFD start/stop |
| Surge | ±2kV no damage | Serial chip destroyed at ±1.8kV | Lightning-induced surges can happen at any time |
The most expensive cost in power IoT is never the equipment. It's rework.
Protocols don't connect — rework. EMC doesn't pass — rework. Online rate drops — rework.
And the starting point of all rework is almost always that gateway you thought was "good enough."
A chemical plant won't go easy on you just because you're on a tight schedule. The electromagnetic environment won't lower its standards because your budget is tight. The online rate of 1,000 devices ultimately depends on whether you got the protocol and EMC right on the very first gateway.
Don't wait until 3 AM to start thinking about this.