Electronics Design AU
ZigbeeSolved

Battery Zigbee sensor keeps 'disappearing' from Zigbee2MQTT and won't come back without a manual re-pair — is this a routing issue?

5 min read3 replies
Original Question

Asked by stale_biscuit_03 ·

Running a Zigbee2MQTT network with a Sonoff ZBDongle-E coordinator and about 15 devices — mostly mains-powered smart plugs acting as routers, plus a handful of battery temperature/humidity sensors as end devices. One specific sensor (furthest from the coordinator, routes through a smart plug two rooms away) keeps going "unavailable" in Zigbee2MQTT after anywhere from a few hours to a couple of days. Battery is fine, tested at 2.9V.

Once it drops off, it stays offline — no data, no LQI updates, nothing — until I physically walk over, take the battery out, and put it back in (or hit the reset button). Then it rejoins instantly and works fine again for a while.

Other end devices on the network are rock solid. This is the only one that's marginal on signal (LQI hovers around 60-80 in the map view when it is connected, versus 150+ for everything else). Is this just a range problem, or is there something about how end devices are supposed to recover from a bad link that isn't happening here?

From the knowledge baseWhat Is Zigbee?

3 Replies

ble_mesh_maven
Accepted Answer

This is a routing/rejoin problem more than a raw range problem, even though the low LQI is what's making it show up on this device specifically. Worth separating two things that are easy to conflate: whether the device can reach its parent, and whether the network still remembers it as a child.

Zigbee end devices — especially sleepy ones — don't maintain the mesh routing table themselves. They associate with a single parent router and rely entirely on it to buffer messages while asleep and forward traffic on wake. The Zigbee PRO stack includes an End Device Timeout mechanism: when an end device joins, it (or its parent, depending on stack implementation) negotiates a timeout value, and the parent is expected to age the child out of its neighbor/routing table if it doesn't hear a keep-alive poll within that window. Once the parent has purged the child, any future poll from that end device gets no response — from the device's perspective it "should" still be joined, but the network no longer has a route back to it, and it won't spontaneously rejoin on its own with most Zigbee stacks; that requires an explicit rejoin procedure the device firmware may or may not trigger automatically.

With a marginal link (your LQI 60-80 reading matches this pattern well), the end device's periodic poll or the parent's response to it is failing intermittently — not every time, just often enough that eventually a poll cycle is missed for long enough to exceed the timeout. Once that happens, you're not looking at "weak signal, slow recovery" — you're looking at "the device has been silently dropped and is waiting for something to trigger a full rejoin," which explains why it stays dead until you force one with the battery pull.

What to check:

  1. Move the marginal end device's parent assignment. In Zigbee2MQTT, check which router it's actually associated with (not just physically nearest) — a weak link to a distant router when a closer one is available usually means the network formed its routes before that closer router existed, or the device never got a chance to re-evaluate. Forcing a rejoin near a stronger router (walk it over, then relocate) often fixes this permanently.
  2. Check whether your specific end device firmware implements automatic rejoin on missed communication, versus requiring a manual trigger (button press, power cycle) — this varies significantly between vendors and isn't something Zigbee2MQTT/the coordinator can fix on its own if the end device's firmware simply doesn't attempt one.
  3. Add a router between the coordinator and this device's location if physically possible — see the Zigbee coordinator setup guide for the mesh density and router-placement considerations that prevent single marginal-link devices like this from existing in the first place.
kettledrum47

Seconding the "it's a rejoin problem, not a range problem" read — I see the exact same failure signature on cellular IoT deployments, just with different terminology. The device thinks it's still registered, the network has already forgotten about it, and nothing recovers until something forces a fresh handshake.

One diagnostic that'll confirm this quickly without touching the sensor: pull the Zigbee2MQTT coordinator debug log (log_level: debug in configuration.yaml) and watch what happens right when the device would normally poll. If you see the device's IEEE address simply stop appearing in the log around the time it goes "unavailable" — no failed-poll entries, nothing — that's consistent with the parent having already purged it silently rather than actively rejecting a poll. If instead you see repeated failed delivery attempts before it goes quiet, that points more toward the link genuinely degrading below usable at that specific time (which could be a router going into a bad RF state, not just distance).

Also worth ruling out mundane causes before chasing the protocol-level explanation: confirm the "router" it's parented to is actually a Zigbee router and not itself acting as an end device on some other coordinator (some smart plugs need a specific firmware/pairing mode to act as routers — a few ship as end-device-only out of the box until reflashed or explicitly configured).

zephyr_devotee

One more angle worth raising, since it's the thing that actually fixed this exact symptom for me on a Thread network (different mesh stack, same underlying architecture pattern with parent/child timeout) — check whether the sensor's own poll interval is longer than whatever timeout its parent router is enforcing. If the end device is configured to sleep for, say, 60 seconds between polls to save battery, but the router's child-timeout policy is shorter than that, the router will legitimately age it out on a schedule even with a perfect link, and it'll just happen to correlate with your worst-signal device because that's the one already living closest to the failure boundary. Not saying that's definitely it over ble_mesh_maven's read — it depends heavily on your specific sensor's firmware and whether it exposes the poll interval at all, some cheaper Zigbee sensors don't. But it's a five-minute check if the device's manufacturer app or Zigbee2MQTT exposes a reporting/poll interval setting, and it's a different fix (increase the sensor's poll rate) than "move it closer to a router."