How Do You Debug FreeRTOS Stack Overflows and Deadlocks?
Last updated 2 July 2026 · 7 min read
Direct Answer
FreeRTOS bugs fall into two categories that need different tools. Stack overflows are diagnosed with configCHECK_FOR_STACK_OVERFLOW (method 2, pattern-fill checking) combined with uxTaskGetStackHighWaterMark() to find which task is close to exhausting its allocation, since an overflow silently corrupts adjacent memory rather than faulting immediately. Deadlocks and priority inversion are diagnosed with vTaskList() for a static snapshot of every task's state, or SEGGER SystemView for a live timeline showing exactly when each task blocked, which resource it was waiting on, and what the scheduler did next — the timeline view is usually the fastest way to see a deadlock that a static snapshot can only hint at.
Detailed Explanation
FreeRTOS firmware bugs that are difficult to reproduce almost always fall into one of two categories: a stack overflow that silently corrupts memory, or a deadlock/priority-inversion scenario where tasks stop making progress without an obvious crash. Both require different diagnostic tools than a standard breakpoint-based debug session, because by the time the symptom is visible, the root cause has already happened somewhere else in time or memory. For background on task creation and stack sizing, see how do you create and schedule tasks in FreeRTOS?; for the communication primitives involved in most deadlocks, see how do FreeRTOS queues, semaphores, and mutexes work?.
Diagnosing Stack Overflows
A stack overflow occurs when a task's actual stack usage — the sum of local variables, function call nesting depth, and any interrupt nesting that runs on that task's stack — exceeds the size given to xTaskCreate(). Unlike a null pointer dereference, an overflow does not fault at the point of the overrun; it silently overwrites whatever memory sits immediately after the stack, which is often another task's TCB, a neighbouring stack, or heap metadata. The resulting corruption then manifests later, in unrelated code, making the fault location misleading.
Enable stack checking during development:
/* FreeRTOSConfig.h */
#define configCHECK_FOR_STACK_OVERFLOW 2
Method 2 writes a known pattern to the last 20 bytes of each task's stack at creation and checks it on every context switch — more reliable than method 1, which only checks whether the stack pointer itself has left its valid range and can miss an overflow that occurs and returns to a valid range between checks.
void vApplicationStackOverflowHook( TaskHandle_t xTask, char *pcTaskName )
{
( void ) xTask;
/* pcTaskName identifies the overflowing task by name */
configASSERT( 0 ); /* halt immediately, before further corruption spreads */
}
Measure actual usage, don't guess. uxTaskGetStackHighWaterMark() returns the minimum amount of stack that has remained unused since the task started — the inverse of peak usage. A high-water mark close to zero means the task has come close to overflowing even if it hasn't yet:
UBaseType_t uxHighWaterMark = uxTaskGetStackHighWaterMark( NULL ); /* NULL = calling task */
/* uxHighWaterMark is in words, not bytes — multiply by sizeof(StackType_t) (4 on Cortex-M) */
Call this from within each task after it has run through its worst-case code path at least once — a task that has only executed its idle loop will report a misleadingly high water mark that doesn't reflect the stack depth its error-handling or interrupt-nested paths actually require.
Common stack overflow triggers: deeply nested function calls (especially recursive parsing or printf-style formatting with many arguments), large local arrays or structs allocated on the stack instead of statically or via pvPortMalloc(), and ISRs that run nested on the interrupted task's stack rather than a dedicated interrupt stack (architecture-dependent — check your port's interrupt stack configuration).
Diagnosing Deadlocks and Priority Inversion
A deadlock or severe priority inversion presents as tasks that stop making progress without any fault — the system appears "stuck" rather than crashed, which makes a standard breakpoint debug session unhelpful, since halting execution to inspect state doesn't show how the system arrived there.
Static snapshot: vTaskList(). For a quick look at what every task is doing right now:
char pcWriteBuffer[ 512 ];
vTaskList( pcWriteBuffer );
/* Format: Name State Priority StackRemaining TaskNumber */
/* State codes: X = Running, R = Ready, B = Blocked, S = Suspended, D = Deleted */
A task shown as B (Blocked) that never transitions to R or X across repeated calls is the starting point for a deadlock investigation — check what it's blocked on (a specific semaphore, queue, or mutex) and then check which task holds that resource and why it isn't releasing it. vTaskList() requires configUSE_TRACE_FACILITY and configUSE_STATS_FORMATTING_FUNCTIONS both set to 1, and calls vTaskSuspendAll() internally — acceptable for a development debug console, not for a production hot path (see the FAQ below).
Live timeline: SEGGER SystemView. A static snapshot shows that a task is blocked; it doesn't show when it blocked, in what order, or what the scheduler did in response. SEGGER SystemView (free with a J-Link debug probe) instruments the FreeRTOS kernel's trace hooks to record every context switch, ISR entry/exit, and API call with microsecond timestamps, then renders it as a scrollable timeline. This turns "task X is stuck" into a visible sequence: task X called xSemaphoreTake(), blocked, and the timeline shows exactly which lower-priority task was holding the mutex and what it was doing instead of releasing it — the priority inversion pattern is visually obvious once you can see task ordering over time, in a way that reading source code or a single static snapshot rarely reveals.
To enable SystemView, add the FreeRTOS trace macro hooks (traceTASK_SWITCHED_IN, traceTASK_SWITCHED_OUT, and related macros) to FreeRTOSConfig.h — SEGGER provides a ready-made SEGGER_SYSVIEW_FreeRTOS.h header that wires these up automatically. No target-side UART or extra wiring is required beyond the existing J-Link SWD connection; capture happens over the same debug probe used for programming.
The most common root cause: binary semaphore used where a mutex is needed. A binary semaphore has no concept of ownership, so FreeRTOS cannot apply priority inheritance to it — a low-priority task holding a binary semaphore that a high-priority task is waiting on can be preempted indefinitely by a medium-priority task, and the high-priority task never runs. Using xSemaphoreCreateMutex() instead of xSemaphoreCreateBinary() for any resource that represents exclusive ownership (not just a signal) allows FreeRTOS's built-in priority inheritance to temporarily boost the low-priority holder's priority until it releases the resource, breaking the inversion automatically. See the forum discussion on this exact failure mode for a worked bring-up example.
Design Considerations
- Enable
configCHECK_FOR_STACK_OVERFLOW(method 2) from the start of development, not after the first mysterious hard fault. Retrofitting it after a corruption-based bug has already happened doesn't help diagnose that specific occurrence. - Budget stack size from measured high-water marks, not guesses. Copy-pasted stack depths from example projects rarely match your actual worst-case call depth, especially once error handling and printf-style logging are added later in development.
- Reserve
vTaskList()for a debug build or console, not a production code path — the internalvTaskSuspendAll()call is too disruptive for a system with real-time deadlines. Zeus Design builds FreeRTOS diagnostic tooling — including SystemView instrumentation — into embedded firmware from the start of a project rather than retrofitting it after a field issue. - Use mutexes (not binary semaphores) for anything representing exclusive ownership of a resource — this single distinction prevents the most common class of FreeRTOS priority inversion.
- Capture a SystemView trace of the actual failure, not just normal operation. Deadlocks and inversions are usually load- or timing-dependent; a trace of the system running normally rarely shows anything useful. Deliberately stress the suspect resource (e.g. hold a lock artificially longer) to force the failure into a captured window.
Common Mistakes
- Debugging a hard fault by looking only at the fault's PC/LR registers, when the actual defect is a stack overflow that corrupted memory long before the fault triggered. Check
uxTaskGetStackHighWaterMark()on all tasks before spending time on the fault location itself. - Calling
vTaskList()from a real-time-sensitive task or ISR context — it must be called from a normal task context and will disrupt scheduling for its duration; never call it from time-critical code. - Assuming a single static
vTaskList()snapshot is enough to diagnose a deadlock. A snapshot shows one instant; if the deadlock is intermittent or timing-dependent, repeated snapshots or a SystemView timeline capture are needed to catch it in the act. - Using a binary semaphore for mutual exclusion instead of a mutex — this is the single most common root cause of FreeRTOS priority inversion, and it is invisible in code review unless the reviewer specifically checks whether each semaphore represents a signal (binary semaphore, correct) or ownership (mutex required).
- Not instrumenting SystemView (or equivalent tracing) until after a production field failure. Adding trace hooks retrospectively to reproduce an intermittent issue is far slower than having the instrumentation already in place when the issue first appears during development or QA.
Frequently Asked Questions
- Why does my FreeRTOS firmware hard-fault in seemingly unrelated code after adding a new task?
- This is the classic signature of a stack overflow. When a task's stack grows past its allocated region, it silently corrupts whatever memory sits adjacent to it — often another task's TCB, a global variable, or heap metadata. The fault then appears later, in unrelated code that happens to read the corrupted memory, making it look like the bug is somewhere it isn't. Enable configCHECK_FOR_STACK_OVERFLOW (method 2) immediately after adding any new task or increasing an existing task's workload, and check uxTaskGetStackHighWaterMark() on every task rather than assuming the fault location is the actual defect location.
- Is vTaskList() safe to call in production firmware?
- Not as-is. vTaskList() calls vTaskSuspendAll() internally, which suspends the scheduler for the entire duration of the list generation — on a system with many tasks this can be tens of milliseconds, long enough to cause missed real-time deadlines or watchdog trips in a production system. Use vTaskList() freely during development and bring-up, gated behind a debug build flag or a UART/USB debug console. For production diagnostics, prefer the FreeRTOS trace hooks feeding SEGGER SystemView or a lighter-weight per-task runtime counter (configGENERATE_RUN_TIME_STATS) that doesn't require halting the scheduler.
- Can SystemView show a deadlock that hasn't happened yet?
- No — SystemView records what the scheduler actually did, so it shows a deadlock (or a near-deadlock, like severe priority inversion) only once it occurs during a captured session. Its value is in showing exactly when and why the system entered the bad state, which a static snapshot from vTaskList() alone cannot do. Combine SystemView captures with stress-testing the specific interaction that reproduces the failure — running the full application under normal load rarely triggers a deadlock, but deliberately forcing contention on the suspect resource usually does.
References
Related Questions
How Do You Create and Schedule Tasks in FreeRTOS?
Learn how to create FreeRTOS tasks with xTaskCreate, configure task priorities, size stacks safely, and start the scheduler on ARM Cortex-M MCUs.
How Do FreeRTOS Queues, Semaphores, and Mutexes Work?
How to use FreeRTOS queues, semaphores, and mutexes for inter-task communication — including ISR-safe variants, task notifications, and event groups.
How Does FreeRTOS Heap Memory Management Work?
FreeRTOS heap_4 is the recommended allocator for most embedded projects. Covers pvPortMalloc, runtime monitoring, fragmentation, and static allocation.
What Is an RTOS (Real-Time Operating System)?
An RTOS is a lightweight operating system that gives embedded firmware deterministic task scheduling. Learn how RTOSes work and when you actually need one.
How Do You Debug Embedded Firmware?
Covers JTAG/SWD hardware debugging, printf over UART or SWO trace, and logic analyser use for embedded firmware on STM32, ESP32, and other MCU platforms.
What Is a Watchdog Timer and How Do You Use It?
A watchdog timer resets an MCU when firmware hangs. Covers IWDG vs WWDG on STM32, prescaler setup, kick strategy, and window mode for fault detection.