ESP32 IoT device stops sending data after 6–8 hours but shows as connected on the AP — DHCP lease expired?
Asked by stale_biscuit_03 ·
Deployed an ESP32-S3 sensor node at a customer's warehouse last week. It reads temperature and humidity every 5 minutes and POSTs the data to our cloud endpoint. Tested it on my home network for a week with no issues.
First deployment at the customer site — worked fine for about 6 hours, then went completely silent. Came in the next morning to check, and the AP's client list still showed the device with its IP address. Wi-Fi RSSI was fine. I power cycled it and it immediately reconnected and resumed sending data.
The customer's AP assigns DHCP leases. I didn't think to check the lease time before leaving. My home router gives out 24-hour leases and I never had this issue in testing — could the customer's AP be using a short lease time and the device not renewing it? Is that even a thing that can happen silently while staying "connected" to the AP?
Using ESP-IDF v5.2, standard Wi-Fi STA mode, LwIP for networking. The device isn't
using any deep sleep between measurements — just polling with vTaskDelay.
3 Replies
Yes, DHCP lease expiry is exactly what this looks like, and the symptom pattern you're describing — AP still shows the device in its client list, Wi-Fi link appears up at the radio level, but all outbound traffic silently fails — is the tell. The AP and the device think they're still associated because the 802.11 link-layer is fine. The IP lease issue is entirely at the IP layer above it.
How DHCP lease renewal works (and fails)
DHCP leases have two renewal timers defined in RFC 2131:
- T1 (renewal timer): Fires at 50% of the lease lifetime. The client sends a unicast DHCPREQUEST directly to the server asking to renew the same IP address.
- T2 (rebinding timer): Fires at 87.5% of the lease lifetime. If T1 renewal failed (server didn't respond), the client broadcasts a DHCPREQUEST to any available DHCP server.
- Lease expiry: If neither renewal succeeded, the client must immediately stop using the IP and restart the full DISCOVER → OFFER → REQUEST → ACK process.
If the customer's AP is configured with a 12-hour lease, T1 fires at 6 hours — which matches your timeline almost exactly. LwIP (the TCP/IP stack underneath ESP-IDF) does implement these timers, but there are situations where the renewal can silently fail: the DHCPREQUEST goes out but the AP is busy, the server is temporarily unavailable, or the reply gets dropped. After T2 also fails, LwIP marks the IP as expired.
The wrinkle: LwIP can show the IP as still configured in esp_netif_get_ip_info() for
some time after expiry — the address looks valid from the application layer, but the AP
has removed the binding and will drop traffic from that IP at the routing level. This
is why your device appeared "connected" but produced no data.
Diagnosing it
Log the IP info at application startup and periodically:
esp_netif_ip_info_t ip_info;
esp_netif_get_ip_info(esp_netif_get_default_netif(), &ip_info);
ESP_LOGI(TAG, "IP: " IPSTR " GW: " IPSTR,
IP2STR(&ip_info.ip), IP2STR(&ip_info.gw));
If the IP comes back as 0.0.0.0 when the device is silent, LwIP has expired the lease and released the address. If the IP looks valid but HTTP POST still fails, try pinging the gateway first — if gateway ping fails while the IP looks fine, the AP has dropped the binding on its end.
Also check the DHCP lease time on the customer's AP before the next visit. Most managed enterprise APs (Ubiquiti, Cisco Meraki, Ruckus) default to 1–8 hour leases. Consumer routers typically default to 24 hours, which is why your home test never surfaced this.
Fix 1 — Force DHCP re-acquire in firmware (short-term)
Add a watchdog check in your application loop. If an HTTP POST fails, attempt to detect whether the issue is IP-layer by checking the IP and forcing a re-acquire:
static void check_and_renew_dhcp(void)
{
esp_netif_ip_info_t ip_info;
esp_netif_t *netif = esp_netif_get_default_netif();
esp_netif_get_ip_info(netif, &ip_info);
if (ip_info.ip.addr == 0) {
ESP_LOGW(TAG, "IP expired, forcing DHCP re-acquire");
esp_netif_dhcpc_stop(netif);
esp_netif_dhcpc_start(netif);
vTaskDelay(pdMS_TO_TICKS(3000)); /* allow time for DISCOVER/OFFER/REQUEST/ACK */
}
}
Call this before each POST attempt. It won't prevent the outage window while the lease is expired, but it will self-recover within seconds rather than requiring a power cycle.
Fix 2 — Static IP with DHCP reservation (robust long-term)
For IoT devices in managed networks, a DHCP reservation (or static IP assignment) is the correct architecture. Ask the customer's IT team to assign a reservation on the AP tied to the device's MAC address — the DHCP server will always hand out the same IP, and you'll never have a lease expiry event. For a network you control, you can also configure a static IP in firmware:
esp_netif_dhcpc_stop(netif); /* stop DHCP client first */
esp_netif_ip_info_t static_ip = {
.ip = ESP_IP4TOADDR(192, 168, 1, 50),
.netmask = ESP_IP4TOADDR(255, 255, 255, 0),
.gw = ESP_IP4TOADDR(192, 168, 1, 1),
};
esp_netif_set_ip_info(netif, &static_ip);
Static IP removes the lease dependency entirely. Just confirm the address is outside the AP's dynamic DHCP pool range to avoid a conflict when another device is assigned the same address by DHCP.
See the Wi-Fi provisioning guide for how to store and retrieve IP configuration from NVS — the same approach applies whether you're storing Wi-Fi credentials or a static IP configuration.
One thing to add on the ESP-IDF side: the esp_netif_dhcpc_stop() +
esp_netif_dhcpc_start() approach wifi_watchdog described is the right move, but
you need to handle the race between DHCP completion and your first POST. Don't just
use a vTaskDelay(3000) — wait for the IP_EVENT_STA_GOT_IP event properly:
static EventGroupHandle_t s_ip_event_group;
#define IP_ACQUIRED_BIT BIT0
static void on_ip_event(void *arg, esp_event_base_t base,
int32_t id, void *data)
{
if (id == IP_EVENT_STA_GOT_IP) {
ip_event_got_ip_t *ev = (ip_event_got_ip_t *)data;
ESP_LOGI(TAG, "Got IP: " IPSTR, IP2STR(&ev->ip_info.ip));
xEventGroupSetBits(s_ip_event_group, IP_ACQUIRED_BIT);
}
}
static void force_dhcp_renew(void)
{
esp_netif_t *netif = esp_netif_get_default_netif();
xEventGroupClearBits(s_ip_event_group, IP_ACQUIRED_BIT);
esp_netif_dhcpc_stop(netif);
esp_netif_dhcpc_start(netif);
EventBits_t bits = xEventGroupWaitBits(s_ip_event_group,
IP_ACQUIRED_BIT,
pdFALSE, pdFALSE,
pdMS_TO_TICKS(10000));
if (!(bits & IP_ACQUIRED_BIT)) {
ESP_LOGE(TAG, "DHCP renew timed out after 10s");
/* trigger a full reconnect or reboot */
}
}
You're registering on_ip_event with esp_event_handler_register() at init anyway
for the initial IP acquisition — so you can reuse the same event handler here. The
event fires reliably when the DHCP ACK comes in and LwIP applies the new lease.
Also worth noting: the LwIP DHCP renewal timer runs in the LwIP task. If you've ever
called vTaskSuspend() on that task (or done anything that blocked the tcpip_thread
for an extended period around T1 time), the renewal can be delayed or missed. The
tcpip_thread priority in ESP-IDF is 18 by default — if any of your tasks are running
at priority 19 or above for long periods, profile that before deploying again.
Had almost this exact scenario on a deployed Ethernet device (not Wi-Fi, but same LwIP stack behaviour) and the diagnosis path is slightly different from what you'd expect, so worth spelling out.
The confusing part is that you can't easily tell from the device's perspective whether the problem is "lease expired and never renewed" or "gateway / DNS / cloud endpoint issue." All of them look like failed HTTP POSTs. Here's how I'd triage remotely:
-
Log the raw IP every N minutes to a local circular buffer (UART log, SD card, or a local HTTP endpoint if you have one). You want to see the exact timestamp when the IP first showed as 0.0.0.0 or when the first POST failed. That gives you the interval to compare against the DHCP lease time.
-
Try pinging the gateway before each POST. If the ping to the gateway fails but the IP looks valid, the AP has dropped the binding. If the gateway pings fine but the cloud endpoint doesn't, the problem is upstream of your device (DNS, WAN, cloud outage) — which is a completely different fix.
-
Log the HTTP error code, not just "POST failed."
ESP_ERR_HTTP_CONNECTvsECONNREFUSEDvsEHOSTUNREACHpoint to different layers of the stack.
On the customer site visit, also ask them what brand of AP they're using. Enterprise Meraki APs are notorious for defaulting to 1-hour leases on guest networks and even on corporate SSIDs — 6-hour failure fits perfectly with a Meraki at default settings. Ubiquiti UniFi defaults to 24 hours but IT admins sometimes lower it. Check it in the AP's DHCP settings before assuming firmware is the problem.
The static IP reservation fix is cleaner than the retry loop for production, but the watchdog retry is a good safety net to add regardless — DHCP isn't the only thing that can cause a device to lose its IP, and self-recovery is always better than waiting for someone to power cycle it.