How Do You Debug Embedded Firmware?
Last updated 30 June 2026 · 9 min read
Direct Answer
Embedded firmware is debugged using: (1) a hardware debug probe (J-Link, ST-Link, CMSIS-DAP) connecting via JTAG or SWD to set breakpoints and inspect variables in real time; (2) trace output — printf statements sent over UART or SWO ITM (for Cortex-M targets) to print variable values and execution paths without stopping the program; and (3) a logic analyser or oscilloscope to observe the hardware behaviour that firmware is driving. The JTAG/SWD debugger is the primary tool; printf debugging is the quickest for simple problems; the logic analyser is essential when firmware and hardware disagree about what a peripheral is doing.
Detailed Explanation
Debugging embedded firmware is fundamentally different from debugging application software. The program runs on hardware that has no standard output, may have timing constraints that debugging breaks, and often fails in ways that only appear under specific hardware conditions. Effective embedded debugging uses multiple techniques together.
The Three-Layer Model
Embedded bugs live in three distinct layers, each requiring different tools:
- Hardware layer — the PCB, power supplies, peripheral connections, and signal integrity. Debugged with oscilloscopes, multimeters, and logic analysers.
- Firmware logic layer — the C/C++ program's state machine, variable values, and execution flow. Debugged with a hardware debugger (JTAG/SWD) or trace output.
- Hardware-firmware interface — the configuration of peripherals (timers, UART, SPI, I2C, DMA) by the firmware and what the hardware actually does in response. Debugged with logic analysers and oscilloscopes while firmware runs.
Most bugs are in the interface layer: the firmware thinks it configured a peripheral correctly, but a register was written wrong, a clock divider is unexpected, or an interrupt is mapped to the wrong priority. A logic analyser observing the actual pins while a debugger inspects the register values resolves this class of bug faster than any amount of code review.
Technique 1 — JTAG / SWD Hardware Debugger
A hardware debug probe (ST-Link for STM32, CMSIS-DAP for general ARM, J-Link for anything) connects to the MCU's JTAG or SWD port and gives you:
- Breakpoints: Stop execution at a specific line of code and inspect all variable values, register contents, and memory at that instant. Most Cortex-M cores support 4–8 hardware breakpoints in flash; more are available as software (halt) breakpoints in RAM.
- Single stepping: Execute one line (or one instruction) at a time.
- Variable and register watch: Observe variable values in real time (requires "live watch" or continuous refresh — available in most IDEs with a GDB backend).
- Memory inspection: Read and write arbitrary memory locations while the target is halted or, on some cores, while running (live memory view).
- Call stack: See the function call chain that led to the current execution point.
For STM32: STM32CubeIDE or VS Code + Cortex-Debug + OpenOCD. The ST-Link V2 (built into Nucleo/Discovery boards, or as a standalone dongle) is the standard probe.
For ESP32: No native JTAG debug on most boards; ESP-IDF supports JTAG via an FT2232H-based adapter or the built-in USB-JTAG on ESP32-S3/C3. Many ESP32 developers rely more on printf-over-UART than on JTAG debugger.
Limitation: Breakpoints halt the CPU — all hardware keeps running (timers count, DMA transfers continue), but the firmware is frozen. In time-sensitive code (interrupt handlers, communication protocols), halting mid-execution can change the system state enough to mask the original bug.
Technique 2 — Printf / UART Trace
The simplest firmware debug technique: insert printf calls (or direct UART transmit) at key points in the code to print variable values and execution paths. These appear in a serial terminal (PuTTY, CoolTerm, VS Code serial monitor).
Why it works: Printf is low friction — add it anywhere, read immediate results, no hardware setup beyond a UART connection (TX only needed). For simple bugs (wrong calculation, unexpected branch, variable not initialised), printf typically finds the problem in minutes.
Limitations:
- Adds execution time (UART transmission is slow — a single
printf("x=%d\n", x)can take 10–1000 µs depending on UART baud rate and buffer implementation) - Can mask timing-sensitive bugs (see Heisenbug FAQ)
- Does not stop execution — you see log output but cannot inspect the full system state at a specific moment
Printf performance tip: Use HAL_UART_Transmit with DMA or a ring-buffer UART TX implementation so printf is non-blocking — the firmware returns immediately after queuing the string, not after the full transmission. With DMA-backed printf at 115200 baud, the overhead is typically under 5 µs per call.
Technique 3 — SWO / ITM Trace (Cortex-M)
ARM Cortex-M processors have a built-in trace subsystem — ITM (Instrumentation Trace Macrocell) with output on the SWO pin. ITM_SendChar() or equivalent functions send debug data over SWO at speeds up to several Mbps without blocking the CPU, and without the timing impact of UART printf.
SWO requires: a debug probe that supports trace (ST-Link V2+, J-Link), the SWO pin connected to the probe, and IDE configuration for the trace clock. STM32CubeIDE's "Live Expressions" and STM32CubeProgrammer's SWV viewer both support ITM trace.
For timing-critical code where UART printf causes Heisenbugs, ITM is the upgrade: non-blocking, high-speed, and observable in the IDE without a separate terminal window.
Technique 4 — GPIO Toggle + Oscilloscope or Logic Analyser
The fastest non-intrusive timing measurement: toggle a GPIO pin at the start and end of a code section, and observe the pulse width on an oscilloscope. The GPIO toggle itself costs 2–4 machine cycles (~10–25 ns on a 100 MHz Cortex-M), which is negligible in almost all contexts.
// Measure how long a function takes
HAL_GPIO_WritePin(GPIOA, GPIO_PIN_0, GPIO_PIN_SET); // start
do_the_thing();
HAL_GPIO_WritePin(GPIOA, GPIO_PIN_0, GPIO_PIN_RESET); // stop
Use cases:
- ISR latency measurement (toggle at ISR entry, observe time from triggering event to ISR response)
- DMA completion timing (toggle in DMA complete callback)
- Task timing in RTOS (toggle at task start and end to observe task switch overhead)
- Function execution time
Technique 5 — Logic Analyser for Protocol Inspection
A logic analyser captures many digital channels simultaneously and decodes protocols (I2C, SPI, UART, CAN, PWM, 1-Wire). This is invaluable for debugging peripheral configuration issues — the firmware configures an I2C peripheral, but does the actual SDA/SCL waveform show the correct address and data? Only the logic analyser can answer this.
The logic analyser vs oscilloscope page covers when to use each tool. In brief: use the logic analyser when debugging protocol transactions (wrong device address, unexpected NACK, SPI clock polarity mismatch, UART framing error); use the oscilloscope when inspecting signal integrity (overshoot, noise, rise time, power supply behaviour).
Debugging Common Embedded Problems
GPIO interrupt not firing: See stm32-gpio-interrupt-not-firing-nvic-exti. Check: NVIC enabled, EXTI line configured, GPIO interrupt mode, interrupt priority. Set a breakpoint inside the ISR; if it never hits, the interrupt is not reaching the CPU. Toggle a GPIO in the ISR to observe with an oscilloscope — if the oscilloscope shows the toggle but the debugger doesn't stop, check the callback function name or the HAL weak function override.
Peripheral not working (SPI/I2C/UART): Logic analyser first. Does the clock appear on SCK/SCL? Does CS assert? Is the data byte correct? Compare with the expected waveform from the component datasheet. If the waveform is correct but the firmware receives wrong data, look at the DMA buffer address, interrupt priority, or a volatile missing on a shared variable. The STM32 peripheral configuration page covers the common HAL/CubeMX pitfalls. For a concrete walkthrough of using a debugger to inspect baud-rate registers after a clock-tree change, see this UART garbage output forum thread.
RTOS task not running: Check task priority and stack size. A task that never runs has either a lower priority than a task that never blocks, or a stack overflow that corrupted the TCB. uxTaskGetStackHighWaterMark() reports the minimum free stack words for any task — call it periodically during development to validate stack sizes before shipping. See bare-metal vs RTOS for general stack sizing guidance.
MCU resetting unexpectedly: Check reset cause register (RCC_CSR on STM32, RSTCTL on others) immediately in the startup code — the register holds the cause of the last reset (watchdog, brownout, software, external). If it's a watchdog reset, the firmware is taking too long somewhere; add GPIO toggles to identify the bottleneck. See why does STM32 keep resetting for a systematic checklist.
Debugging RTOS Firmware
RTOS firmware adds concurrency — multiple tasks running interleaved, shared resources, and non-deterministic timing. Additional debugging techniques:
- Task-aware debugging: STM32CubeIDE and Ozone (J-Link IDE) both show FreeRTOS task lists, states, and stack usage in the debugger — without needing explicit
uxTaskGetStackHighWaterMark()calls in the firmware. - Trace tools: SEGGER SystemView, Percepio Tracealyzer, and FreeRTOS Trace capture task switch events and ISR entry/exit in real time. See the FreeRTOS priority inversion forum thread for a real-world example of using
vTaskList()and SystemView to diagnose a blocked high-priority task. - Volatile keyword: Shared variables between tasks or between a task and an ISR must be declared
volatilein C. Without it, the compiler may cache the variable in a register and the ISR write is never seen by the task. See what are interrupts in embedded systems.
Design Considerations
- Design in debug access from the start: A SWD header (4 pins), a UART debug port (at minimum TX to a test point), and accessible GPIO toggle pins reduce debugging time from hours to minutes on first board bring-up.
- Reserve a UART for debug output: On designs where all UARTs are assigned to production functions, reserve at least one UART TX line — even if only exposed as a test point — for debug output. Removing it in production firmware is trivial; adding it to a design that lacks the connection requires a board respin.
Common Mistakes
- Not checking the reset cause register when debugging unexpected resets — the register usually tells you exactly why the MCU reset, eliminating half the diagnostic tree immediately.
- Using printf debugging in an ISR or time-critical section, creating a Heisenbug that appears only when printf is removed.
- Forgetting the
volatilekeyword on shared variables between tasks or between ISR and task, producing a variable the task never sees updated. - Assuming a logic analyser shows real-time state during a JTAG halt — the logic analyser continues capturing during a breakpoint halt, but the firmware is frozen, so the peripheral activity observed is not representative of the bug condition.
Frequently Asked Questions
- Why does my bug disappear when I add a printf or breakpoint?
- This is a Heisenbug — a bug that changes or disappears when observed. Printf statements add execution time; breakpoints stop the processor entirely. If the bug is timing-dependent (a race condition, an ISR that must respond within a specific window, or a peripheral interaction with a timing constraint), adding any delay — even a single printf — can mask it. Solutions: use ITM/SWO trace (non-invasive, minimal latency) instead of UART printf; use a GPIO toggled at the event of interest rather than a print (visible on a scope in <100 ns); check for race conditions by inspecting shared variables (protect with volatile keyword and critical sections); or use a logic analyser to observe the hardware timing directly without disturbing firmware execution.
- What is the difference between JTAG and SWD debugging?
- JTAG (Joint Test Action Group) is a 4–5 wire debug interface (TCK, TMS, TDI, TDO, optional TRST). It supports full debug and boundary-scan testing but needs more pins. SWD (Serial Wire Debug) is ARM's 2-wire alternative (SWDIO, SWCLK) that offers equivalent debug capability for Cortex-M processors while using fewer pins. Most STM32, nRF, ESP32-based, and other Cortex-M targets support SWD; JTAG is more common on older or non-ARM platforms. For hardware design: always break out at least SWDIO, SWCLK, and GND to a header — this is the minimum needed for full debug access. Adding SWO (serial wire output) enables ITM trace output without using a UART.
- How do I debug firmware on a board with no UART or debug header?
- Options in order of preference: (1) Use SWD if the MCU supports it — the SWDIO/SWCLK pins are often accessible on unexposed pads or unpopulated footprints. (2) Use a GPIO output toggled at strategic points, observed with an oscilloscope or logic analyser — a GPIO toggle costs 1–4 machine cycles and does not disturb timing the way printf does. (3) Use onboard LEDs if available — simple blink patterns can communicate state. (4) Desolder a pad and solder directly to the MCU pins — last resort, but works when nothing else is accessible. For next board revision: always design in a SWD header (4 pins: 3.3V, SWDIO, SWCLK, GND) and at least one UART TX connection to a test point.
References
Related Questions
What Is JTAG?
JTAG (IEEE 1149.1) is a 4-pin interface for testing and debugging ICs via boundary scan. Learn how the TAP state machine, JTAG chains, and debug probes work.
What Is SWD (Serial Wire Debug)?
SWD is ARM's two-pin debug interface for Cortex-M microcontrollers, replacing JTAG's four pins. Learn how SWDIO/SWDCLK, SWO trace, and debug probes work.
How Does an Oscilloscope Work and What Can It Measure?
Covers oscilloscope basics: timebase, trigger, probe selection, AC/DC coupling, and how to measure analog signals, pulses, and power supply noise.
Logic Analyser vs Oscilloscope: Which Should You Use?
A logic analyser decodes protocols (I2C, SPI, UART). An oscilloscope shows analog waveforms and signal integrity. Covers when to use each for debugging.
PCB Bring-Up Checklist: First Power-On for a New Board
Covers PCB bring-up: pre-power inspection, short-circuit checks, current-limited power-on, rail verification, firmware loading, and peripheral validation.
What Are Interrupts in Embedded Systems and How Do They Work?
Interrupts let a microcontroller respond to hardware events instantly without polling. Learn how ISRs, NVIC priority, and interrupt latency work.
Related Forum Discussions
Scope showing 200+ mV spikes on my 3.3V rail — is this real or a probe problem?
Probing the 3.3V output of a switching regulator on a new board and I'm seeing large spikes on the scope that don't make sense to me. The wa
STM32 GPIO interrupt configured but ISR never fires — what am I missing?
Trying to use a button on PA0 to trigger an interrupt on an STM32F411 Nucleo board. Using HAL, generated the init code with CubeMX. The GPIO
Can't decide between FreeRTOS and bare-metal for a simple sensor node — what's the tipping point?
Working on a temperature and humidity monitoring node — STM32F103 target, BME280 over I2C, reports data every 60 seconds over UART to a Rasp
STM32 USB not detected by Windows after jumping to bootloader mode
Working on a custom STM32F411 board, trying to jump into the built-in USB DFU bootloader from application code instead of holding BOOT0 on p