Electronics Design AU
ESP32Solved

ESP32 keeps dropping Wi-Fi after 20–30 minutes in deployed location — reconnect loop doesn't always recover

5 min read3 replies
Original Question

Asked by stale_biscuit_03 ·

Having a frustrating one. Built an ESP32 environmental monitor (SHT40 temp/humidity, reports to an MQTT broker every 5 minutes). Works flawlessly on my desk and at home. Installed it in a shed about 150m from the house router a week ago and it keeps dropping off after 20–30 minutes. Sometimes reconnects, sometimes just hangs until I power-cycle it.

Using ESP-IDF v5.1. My reconnection code in the event handler is:

case WIFI_EVENT_STA_DISCONNECTED:
    ESP_LOGI(TAG, "disconnected, reconnecting...");
    esp_wifi_connect();
    break;

Signal's around -65 to -70 dBm measured at the install location, which I thought was decent enough. I checked the Wi-Fi setup article but it focuses on the initial connection setup — I can't figure out why stable lab testing doesn't translate once it's deployed.

Does this pattern (works in testing, dies in the field) point to anything obvious?

From the knowledge baseHow Do You Set Up Wi-Fi and Provision an ESP32 Device?

3 Replies

wifi_watchdog
Accepted Answer

Classic production deployment pattern. Three things to look at, roughly in probability order for your description.

1. Power save mode — most likely cause

ESP-IDF's default Wi-Fi power save mode is WIFI_PS_MIN_MODEM. In this mode, the Wi-Fi modem sleeps between DTIM beacon intervals to save power. If the ESP32 misses enough consecutive beacons — because of timing jitter, RF interference, or the AP's DTIM interval being longer than expected — the driver declares a beacon timeout and drops the connection. Your home/lab AP probably has a 100ms beacon interval with DTIM 1; the router at the shed might have different settings, or the greater distance makes beacon reception less reliable.

For a sensor sending every 5 minutes, power save on the radio is almost certainly not something you need. Disable it:

/* call this after esp_wifi_start(), before or after connecting */
esp_wifi_set_ps(WIFI_PS_NONE);

This is the single most common fix for intermittent production disconnections on ESP32 devices that aren't battery-constrained.

2. Log the disconnect reason

Before changing anything, add the reason code to your log:

case WIFI_EVENT_STA_DISCONNECTED: {
    wifi_event_sta_disconnected_t *evt =
        (wifi_event_sta_disconnected_t *)event_data;
    ESP_LOGW(TAG, "disconnected, reason=%d", evt->reason);
    esp_wifi_connect();
    break;
}

Key reason codes: 200 (WIFI_REASON_BEACON_TIMEOUT) means missed beacons — power save or signal. Code 2 (WIFI_REASON_AUTH_EXPIRE) means the AP's authentication timer expired — an AP-side keepalive issue. Code 8 (WIFI_REASON_ASSOC_LEAVE) is a clean AP-initiated disconnect (AP deauthed you deliberately). Knowing which one you're hitting tells you where to focus.

3. Reconnection robustness

Calling esp_wifi_connect() directly in the disconnect handler is fine for basic cases, but it needs backoff and a retry limit. Without backoff, failed reconnect attempts can pile up and some APs will rate-limit or temporarily block a client that keeps hammering auth requests.

More robust pattern: use a retry counter and, after a few consecutive failures, do a full driver reset:

static int s_retry = 0;

case WIFI_EVENT_STA_DISCONNECTED:
    if (s_retry < MAX_RETRY) {
        esp_wifi_connect();
        s_retry++;
    } else {
        /* full stack reset clears driver state more thoroughly */
        esp_wifi_stop();
        esp_wifi_start();
        /* esp_wifi_connect() will be triggered by WIFI_EVENT_STA_START */
        s_retry = 0;
    }
    break;

case WIFI_EVENT_STA_CONNECTED:
    s_retry = 0;
    break;

Start with esp_wifi_set_ps(WIFI_PS_NONE) — that alone fixes this for most people. If it doesn't, the reason code will tell you the next step.

esp_idf_enjoyer

To expand on the reason codes: the numeric values come from two sources. Codes 1–45 are 802.11 standard deauthentication/disassociation reason codes defined in the IEEE 802.11 spec. Codes 200+ are ESP-IDF additions — WIFI_REASON_BEACON_TIMEOUT (200) and WIFI_REASON_NO_AP_FOUND (201) being the most common non-standard ones you'll see in practice. They're all defined in esp_wifi_types.h in the ESP-IDF headers if you want the full list.

One thing worth knowing about calling esp_wifi_connect() inside the event handler: the handler runs in the context of the Wi-Fi task, so the call is fine in simple cases, but if you have complex reconnection logic (exponential backoff, MQTT reconnection sequencing, NVS reads) it's cleaner to use a FreeRTOS event group and handle reconnection from your main task or a dedicated network management task. Something like:

/* in event handler */
case WIFI_EVENT_STA_DISCONNECTED:
    xEventGroupClearBits(s_wifi_event_group, WIFI_CONNECTED_BIT);
    xEventGroupSetBits(s_wifi_event_group, WIFI_DISCONNECTED_BIT);
    break;

/* in your main/network task */
EventBits_t bits = xEventGroupWaitBits(s_wifi_event_group,
                                        WIFI_DISCONNECTED_BIT,
                                        pdTRUE, pdFALSE,
                                        portMAX_DELAY);
if (bits & WIFI_DISCONNECTED_BIT) {
    vTaskDelay(pdMS_TO_TICKS(backoff_ms));
    esp_wifi_connect();
}

This separates the event detection from the reconnect logic and makes the backoff timing straightforward. The ESP-IDF wifi_station example in examples/wifi/getting_started/station/ uses the event group pattern if you want a reference.

Also: esp_wifi_set_ps(WIFI_PS_NONE) should definitely be your first test. In ESP-IDF 5.x it can be set before esp_wifi_start() if you call it after esp_wifi_init() and esp_wifi_set_mode().

midnight_debugger

One more angle if the power save fix doesn't resolve it: the AP itself.

Some consumer routers have a client idle timeout — they deauthenticate stations that haven't exchanged data for a period (anything from 5 minutes to 30 minutes depending on firmware). In the lab on your home AP this might not kick in because you're doing other network activity that keeps the AP's client table warm. In the shed it's the only device on that AP, sending only every 5 minutes. If the AP's idle timer is shorter than your reporting interval, it'll disconnect you between transmissions.

You can distinguish this from power save issues by looking at the reason code: AP-side deauth for idle timeout usually shows up as reason 2 (WIFI_REASON_AUTH_EXPIRE) or reason 4 (WIFI_REASON_AUTH_LEAVE), not 200 (beacon timeout).

Quick test to confirm: swap the router for a mobile hotspot temporarily. Phone hotspots generally don't have aggressive idle timeouts, so if the device stays connected for hours on the hotspot but not the router, you're hitting the AP's keepalive behaviour rather than a power save or signal problem. From there you can look at the router's settings (if it exposes them) or bump your MQTT keepalive interval below the AP's idle threshold to keep the association alive.

At -65 to -70 dBm you're not in great shape for a sensitive AP, but you're generally above the point where beacons get dropped reliably — so signal being the primary cause is less likely than power save or AP behaviour at that RSSI. Still worth running esp_wifi_sta_get_ap_info() to poll RSSI periodically and log it, just to confirm it's not worse than expected at certain times of day.

The deep sleep and power management article has a useful breakdown of the five ESP32 power modes and their modem behaviour, which is worth reading alongside this to understand the WIFI_PS_NONE vs WIFI_PS_MIN_MODEM vs WIFI_PS_MAX_MODEM trade-offs.

Related Discussions