How Do You Configure STM32 HAL DMA for UART, SPI, and ADC?
Last updated 29 June 2026 · 14 min read
Direct Answer
STM32 DMA transfers data between peripherals and memory without CPU involvement. HAL provides DMA-mode variants for each peripheral: HAL_UART_Transmit_DMA(), HAL_SPI_Transmit_DMA(), HAL_ADC_Start_DMA(). Configure DMA in CubeMX by enabling the DMA request on the peripheral's DMA Settings tab — CubeMX selects the correct stream/channel and generates HAL_DMA_Init(). Use Normal mode for one-shot transfers (fixed byte count, fires TxCpltCallback on completion); use Circular mode for continuous streaming — ADC scan, UART receive ring buffers — which fires HalfCplt and Cplt callbacks on each pass. On STM32H7 and STM32F7, the Cortex-M7 D-cache is not synchronised with DMA memory accesses: call SCB_CleanDCache_by_Addr() before starting a DMA read from a CPU-written TX buffer, and SCB_InvalidateDCache_by_Addr() after DMA completes an RX write before the CPU reads it. DMA buffers must be 32-byte aligned on these devices.
Detailed Explanation
DMA (Direct Memory Access) is the mechanism by which an STM32 peripheral transfers data to or from memory without the CPU executing each byte transfer. The DMA controller is a separate hardware bus master on the AHB bus matrix — it acquires the bus between CPU cycles and performs memory or peripheral read/write operations independently.
Without DMA, transmitting 1000 bytes over UART at 115200 baud via HAL_UART_Transmit() blocks the CPU for approximately 87 ms. With DMA, the same transfer runs in hardware while the CPU executes other code, and a callback fires on completion. At high-speed SPI (driving a display or reading an SD card at 10+ Mbit/s), DMA is not optional — the CPU cannot sustain byte-at-a-time transfers without starving all other tasks.
Understanding the STM32 clock tree is a prerequisite: the AHB clock that drives the DMA controller and peripheral buses must be configured before enabling DMA transfers.
STM32 DMA Architecture
STM32F4 and F7 — stream and channel model:
STM32F4 and F7 devices have two DMA controllers — DMA1 and DMA2 — each with eight numbered streams (0–7). Each stream is an independent hardware transfer engine with its own source address, destination address, transfer count, and FIFO buffer. Each stream is configured to one request channel (0–7), which determines which peripheral can trigger a transfer on that stream.
A peripheral's DMA request maps to a specific stream and channel combination — for example, USART2 TX on the STM32F407 is available on DMA1 Stream6 Channel4 (and alternatively Stream3 Channel4). Using the wrong stream or channel means the peripheral's request is never acknowledged and no transfer occurs. The mapping table is in the 'DMA Request Mapping' section of the Reference Manual for the specific series (RM0090 for STM32F405/407). CubeMX reads this table automatically and assigns a valid combination when you enable a DMA request on a peripheral.
On STM32F4, memory-to-memory transfers are supported on DMA2 only.
STM32H7, G0, G4, and L4 — DMAMUX model:
Newer STM32 series use a DMAMUX block that provides flexible routing: any peripheral request can be directed to any DMA channel, eliminating the fixed stream/channel constraint. The STM32H7 also includes a BDMA (Basic DMA) with 8 channels, intended for peripherals in the D3 power domain. For most peripherals, DMA1 and DMA2 are used as on the F4. CubeMX handles DMAMUX configuration automatically. This architectural split between stream/channel and DMAMUX models is one of the main reasons DMA code does not port cleanly between STM32 families — see Which STM32 Family Should You Use? for a comparison of DMA architecture and other key hardware differences across the major series.
Key parameters per stream:
- Transfer direction — peripheral-to-memory (P2M), memory-to-peripheral (M2P), or memory-to-memory (M2M)
- Data width — byte (8-bit), halfword (16-bit), or word (32-bit) for both source and destination independently
- Memory increment — increment the memory address after each item (normal case for a buffer); peripheral address stays fixed (always points to the peripheral data register)
- Transfer mode — Normal or Circular (see below)
- Priority — Low, Medium, High, or Very High; when two streams contend for the bus simultaneously, the higher-priority stream wins
DMA Mode Selection: Normal vs Circular
Normal mode:
The DMA transfers exactly the count specified, then stops. The transfer-complete interrupt flag fires, and the HAL calls the associated callback (HAL_UART_TxCpltCallback(), HAL_SPI_TxCpltCallback(), HAL_ADC_ConvCpltCallback(), etc.). The DMA stream returns to idle. To perform another transfer, call the HAL DMA function again.
Normal mode suits:
- One-shot transmissions of a known byte count — sending a command over SPI, transmitting a telemetry packet over UART
- Single ADC conversion sequences
- Memory copy operations
Circular mode:
When the DMA reaches the end of the buffer, it wraps to the start and continues indefinitely. Two callbacks fire on each full buffer cycle:
- Half-transfer callback (
HalfCplt) — fires when the DMA has completed the first half of the buffer - Transfer-complete callback (
CpltCallback) — fires when the DMA reaches the end of the full buffer and wraps
void HAL_UART_RxHalfCpltCallback(UART_HandleTypeDef *huart) {
// DMA is now writing the second half — process first half safely
}
void HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart) {
// DMA has wrapped back to start — process second half safely
}
This double-buffer pattern — processing one half while the DMA fills the other — is the standard approach for continuous data streams. It is used for:
- ADC continuous scan — the most common circular DMA use case; constant-rate conversion results streamed into a buffer for averaging, filtering, or threshold detection
- UART receive ring buffers — continuous reception where frame boundaries are unknown in advance
- I2S and SAI audio — streaming audio samples at a fixed sample rate
Configuring DMA in STM32CubeMX
- Open the peripheral configuration panel (e.g. Connectivity → USART2).
- Select the DMA Settings tab.
- Click Add and select the request direction (Tx, Rx, or both for bidirectional peripherals).
- CubeMX assigns a valid DMA stream/channel automatically. Change Mode to Circular for continuous reception or ADC streaming; leave it Normal for one-shot transfers.
- In NVIC Settings, verify the DMA stream interrupt is enabled — this is how the HAL callback fires on completion.
- Generate code. CubeMX generates
MX_DMA_Init()called beforeMX_xxx_Init()inmain()— the DMA controller must be initialised before the peripheral that references it.
For the broader CubeMX workflow — pin assignment, clock dividers, SPI mode, and I2C timing — see how to configure STM32 peripherals with HAL and CubeMX.
DMA with UART
Transmit — Normal mode (typical case):
uint8_t tx_buf[] = "Telemetry frame\r\n";
// Returns immediately — DMA handles the byte-by-byte transfer
HAL_UART_Transmit_DMA(&huart2, tx_buf, sizeof(tx_buf) - 1);
void HAL_UART_TxCpltCallback(UART_HandleTypeDef *huart) {
if (huart->Instance == USART2) {
// Transfer complete — tx_buf can be reused or next transfer started
}
}
Receive — variable-length frames with Idle Line Detection:
Fixed-length DMA receive (HAL_UART_Receive_DMA() with a byte count) fires the callback only when exactly that count arrives — it does not trigger mid-frame. For variable-length protocols, use HAL_UARTEx_ReceiveToIdle_DMA(), which starts a circular DMA receive and fires HAL_UARTEx_RxEventCallback() whenever the UART line is detected idle after the last byte:
uint8_t rx_buf[256];
// Start continuous DMA reception; frames detected by UART idle line
HAL_UARTEx_ReceiveToIdle_DMA(&huart2, rx_buf, sizeof(rx_buf));
void HAL_UARTEx_RxEventCallback(UART_HandleTypeDef *huart, uint16_t Size) {
if (huart->Instance == USART2) {
// Size = actual bytes received in this frame
process_frame(rx_buf, Size);
// Restart for the next frame
HAL_UARTEx_ReceiveToIdle_DMA(&huart2, rx_buf, sizeof(rx_buf));
}
}
This pattern handles Modbus RTU, NMEA sentences, custom binary framing, and any protocol where frame length is unknown in advance. For UART framing fundamentals, see what is UART?.
DMA with SPI
Full-duplex SPI DMA transfers TX and RX data simultaneously, as SPI clocks data in both directions on every cycle. Both a Tx DMA stream and an Rx DMA stream must be configured in CubeMX:
uint8_t spi_tx[8] = {0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
uint8_t spi_rx[8] = {0};
HAL_GPIO_WritePin(CS_GPIO_Port, CS_Pin, GPIO_PIN_RESET); // assert CS
HAL_SPI_TransmitReceive_DMA(&hspi1, spi_tx, spi_rx, 8);
void HAL_SPI_TxRxCpltCallback(SPI_HandleTypeDef *hspi) {
HAL_GPIO_WritePin(CS_GPIO_Port, CS_Pin, GPIO_PIN_SET); // deassert CS
// spi_rx now contains the peripheral's response
}
For write-only SPI (display controllers, DACs), use HAL_SPI_Transmit_DMA(). The SPI Rx FIFO still accumulates MISO data during the transfer; on STM32F4/H7, this sets the OVR (overrun) flag after the FIFO fills. Clear it with __HAL_SPI_CLEAR_OVRFLAG(&hspiX) in the TxCpltCallback, or configure a dummy Rx DMA channel to drain the FIFO continuously.
For SPI protocol fundamentals and CPOL/CPHA mode selection, see what is SPI?. For a real-world diagnosis of SPI mode mismatch causing all-0xFF reads — a failure DMA makes harder to spot — see the SPI CPOL/CPHA mismatch forum discussion.
DMA with ADC
Circular-mode DMA is the standard approach for continuous ADC acquisition, streaming conversion results into a buffer for filtering or threshold monitoring:
// Buffer must be 32-byte aligned on STM32H7/F7
__attribute__((aligned(32))) uint16_t adc_buf[32];
// CubeMX: ADC → Continuous Conversion Mode = Enabled
// ADC → DMA Settings: Add Rx, Mode = Circular, Data Width = Half Word
HAL_ADC_Start_DMA(&hadc1, (uint32_t*)adc_buf, 32);
void HAL_ADC_ConvHalfCpltCallback(ADC_HandleTypeDef *hadc) {
// Process adc_buf[0..15] while DMA fills adc_buf[16..31]
}
void HAL_ADC_ConvCpltCallback(ADC_HandleTypeDef *hadc) {
// Process adc_buf[16..31] while DMA fills adc_buf[0..15]
}
In CubeMX, enabling this pattern requires ADC Parameter Settings with Continuous Conversion Mode enabled, Scan Conversion Mode enabled (if using multiple channels in a sequence), and a Circular DMA request with data width set to Half Word (to match the 12-bit result in a uint16_t).
Interrupt-Driven Completion
HAL DMA functions return immediately after starting the transfer. Completion is signalled through the DMA interrupt → HAL ISR handler → callback chain. The DMA interrupt must be enabled in NVIC — CubeMX generates the HAL_NVIC_SetPriority() and HAL_NVIC_EnableIRQ() calls for each enabled DMA stream interrupt.
Polling for completion via HAL_DMA_PollForTransfer() blocks the CPU until the transfer finishes, defeating the purpose of DMA. It is only appropriate for bare-metal contexts where blocking is intentional for a short, infrequent transfer.
DMA interrupt priorities interact with FreeRTOS: any DMA callback that calls a FreeRTOS FromISR API must have a numeric interrupt priority equal to or greater than configMAX_SYSCALL_INTERRUPT_PRIORITY. An ISR with a numerically lower value (higher hardware priority) than this threshold causes a hard fault on the first FromISR() call. See how to configure STM32 NVIC interrupt priorities for the full priority model, grouping configuration, and FreeRTOS constraint.
Cache Coherency on STM32H7 and STM32F7
The Cortex-M7 core (STM32H7 and STM32F7) includes a D-cache (typically 32 KB on STM32H7, 16 KB on STM32F7) that is enabled by default in CubeMX-generated system initialisation. DMA is a bus master that accesses SRAM directly, bypassing the CPU cache entirely. This creates two coherency hazards:
1. CPU writes TX data — DMA reads stale SRAM:
The CPU writes bytes into a TX buffer; those writes may remain in D-cache dirty lines, not yet flushed to SRAM. The DMA reads SRAM and sends the pre-write stale data.
Fix — flush dirty cache lines to SRAM before starting the DMA transfer:
__attribute__((aligned(32))) uint8_t tx_buf[128];
// Write TX data into tx_buf ...
// Flush D-cache dirty lines for this buffer to SRAM
SCB_CleanDCache_by_Addr((uint32_t*)tx_buf, sizeof(tx_buf));
// Now safe to start DMA — SRAM contains the correct data
HAL_SPI_Transmit_DMA(&hspi1, tx_buf, sizeof(tx_buf));
2. DMA writes RX data to SRAM — CPU reads cached stale data:
The DMA writes received bytes to SRAM. The CPU's D-cache still holds the previous contents of the RX buffer. The CPU reads stale cached data rather than the new DMA-written result.
Fix — invalidate the cache region after DMA completes, before the CPU reads:
__attribute__((aligned(32))) uint8_t rx_buf[128];
void HAL_SPI_TxRxCpltCallback(SPI_HandleTypeDef *hspi) {
// Discard stale D-cache lines — next read fetches fresh data from SRAM
SCB_InvalidateDCache_by_Addr((uint32_t*)rx_buf, sizeof(rx_buf));
process_response(rx_buf);
}
Alignment requirement: Cache maintenance operates on 32-byte cache line granularity. A DMA buffer that does not start at a 32-byte boundary causes SCB_CleanDCache_by_Addr() or SCB_InvalidateDCache_by_Addr() to start at the previous aligned address, potentially cleaning or invalidating the adjacent variable in memory. Always declare DMA buffers with __attribute__((aligned(32))) and round their sizes up to the nearest multiple of 32 bytes.
An alternative is to place DMA buffers in a non-cacheable memory region via the MPU. On STM32H7, configuring an MPU region over the buffer range with non-cacheable, non-bufferable attributes eliminates all per-transfer cache maintenance calls. This approach trades simpler transfer code for a one-time MPU configuration, and is worth considering when many peripherals use DMA on the same device.
For production firmware development covering DMA streaming, RTOS integration, and cache coherency on STM32H7 — Zeus Design's embedded firmware team develops HAL and RTOS firmware for commercial STM32 products.
Double Buffering
STM32F4 and H7 DMA streams support a hardware double-buffer mode (DBM — Dual Buffer Mode), in which the DMA alternates between two memory addresses (M0AR and M1AR) each time the transfer completes. While the DMA fills M1AR, the CPU processes M0AR; when the DMA switches back to M0AR, a callback fires and the CPU swaps to process M1AR.
The HAL does not expose a clean double-buffer API — applications use __HAL_DMA_GET_COUNTER() and direct DMA_SxCR register access. For most embedded applications, circular mode with half-transfer and transfer-complete callbacks provides sufficient double-buffering behaviour without register-level programming. Double-buffer mode is warranted only for very high-bandwidth continuous streams — I2S/SAI audio, DCMI camera, SDIO — where the processing latency of circular mode's half-buffer pattern is not acceptable.
Design Considerations
- Declare DMA buffers as global or static, never as stack locals. A local variable is valid only while the function is executing. If the function returns before the DMA transfer completes, the DMA continues writing to memory now being used by other function frames. Declare DMA buffers at file scope or as
staticlocal variables. - STM32H7/F7 DMA buffers require 32-byte alignment. Use
__attribute__((aligned(32)))and round sizes up to 32-byte multiples. Misaligned buffers cause cache maintenance to operate on wrong cache lines, silently corrupting adjacent variables. - Ensure the DMA stream interrupt is enabled in NVIC. Without it, the HAL callback never fires and the peripheral appears to hang indefinitely. CubeMX generates the correct NVIC enable; manually written or restructured init code often misses
HAL_NVIC_EnableIRQ(DMAx_Streamx_IRQn). - Keep
MX_DMA_Init()before all peripheral init calls. CubeMX generates this order correctly, but it is commonly broken when developers reorder init calls or add manual peripheral init. The DMA handle must exist beforeMX_USARTx_UART_Init()orMX_SPIx_Init()references it. - On STM32F4, verify the stream/channel assignment against the Reference Manual. Each peripheral maps to specific stream/channel combinations. CubeMX selects a valid combination; if a stream is reassigned manually, re-verify it against the DMA request mapping table.
Common Mistakes
- Re-entering a DMA transfer while the previous one is active. Calling
HAL_UART_Transmit_DMA()while a DMA UART transfer is in progress returnsHAL_BUSY. The application must wait forHAL_UART_TxCpltCallback()before starting the next transfer. Use a flag (dma_tx_busy) cleared in the callback, or a binary semaphore in FreeRTOS designs. - STM32H7/F7: missing
SCB_CleanDCache_by_Addr()before DMA TX. The symptom is correct data on the first transfer, then stale or all-zero data on subsequent transfers (after D-cache has cached the buffer region). Add the clean call immediately before everyHAL_SPI_Transmit_DMA()orHAL_UART_Transmit_DMA()call on Cortex-M7 devices. - STM32H7: DMA buffer in ITCM or DTCM. ITCM (instruction TCM) and DTCM (data TCM) are tightly coupled memories accessible only by the CPU — the DMA controller cannot access them. Placing DMA buffers in DTCM (which may happen if the linker script maps
.datathere for performance) results in a DMA transfer error interrupt (TEIF) and a failed transfer. Use SRAM1, SRAM2, or SRAM3 for DMA buffers on STM32H7. - Using Circular mode for UART TX. Circular mode causes the DMA to re-transmit the buffer continuously as fast as the UART can accept data. For UART TX, this is almost never the intent. Use Normal mode for UART TX; reserve Circular mode for UART RX and ADC continuous acquisition.
- Not clearing the SPI OVR flag between DMA TX-only transfers. In SPI TX-only DMA mode, the SPI Rx FIFO accumulates clocked-in MISO data. On STM32F4/H7, this sets the OVR flag after the FIFO fills. The next
HAL_SPI_Transmit_DMA()call returnsHAL_ERRORbecause the SPI is in error state. Clear the flag with__HAL_SPI_CLEAR_OVRFLAG(&hspiX)in the TxCpltCallback, or configure a dummy Rx DMA channel to drain the FIFO. - Wrong DMA init order when bypassing CubeMX. Manually calling
HAL_DMA_Init()afterHAL_UART_Init()leaves the UART handle's DMA pointer uninitialised at the timeHAL_UART_Init()runs. The HAL will not associate the DMA with the UART, andHAL_UART_Transmit_DMA()silently falls back to blocking mode or returnsHAL_ERROR. Always initialise DMA before the peripheral.
Frequently Asked Questions
- How do I find the right DMA stream and channel for my STM32 peripheral?
- On STM32F4, each peripheral DMA request maps to specific stream/channel combinations — for example, USART2 TX is available on DMA1 Stream6 Channel4, while SPI1 TX is on DMA2 Stream3 Channel3. The complete mapping table is in the 'DMA Request Mapping' section of the Reference Manual for your specific series (Table 27 in RM0090 for STM32F405/407). CubeMX reads this table and selects a valid stream/channel automatically when you enable a DMA request on a peripheral — using CubeMX eliminates the most common source of DMA misconfiguration. On STM32H7, G0, G4, and L4, DMAMUX provides flexible routing so any peripheral request can target any DMA channel; CubeMX still handles the DMAMUX configuration.
- Why does my STM32H7 DMA receive buffer contain garbage or stale data?
- This is a D-cache coherency problem specific to Cortex-M7 devices (STM32H7, STM32F7). DMA bypasses the D-cache and writes directly to SRAM; the CPU reads its cached copy and sees pre-DMA stale data. Fix: call SCB_InvalidateDCache_by_Addr((uint32_t*)rx_buf, sizeof(rx_buf)) in the DMA RxCpltCallback before reading the buffer. The buffer must be declared with __attribute__((aligned(32))) and its size rounded up to a multiple of 32 bytes — cache maintenance operates on 32-byte cache lines, and a misaligned buffer causes the operation to target the wrong line, potentially corrupting adjacent memory. Alternatively, configure the MPU to mark the DMA buffer region as non-cacheable, eliminating the need for per-transfer cache maintenance.
- How do I receive UART frames of unknown length using DMA?
- Fixed-length DMA reception via HAL_UART_Receive_DMA() only fires the completion callback when the exact configured byte count arrives — it will not trigger mid-frame. For variable-length protocols, use UART Idle Line Detection: call HAL_UARTEx_ReceiveToIdle_DMA(&huartX, rx_buf, sizeof(rx_buf)). This starts a circular DMA reception and fires HAL_UARTEx_RxEventCallback() whenever the UART line is detected idle after the last byte, with the Size parameter reporting the actual byte count received. This pattern covers Modbus RTU, custom binary framing, NMEA sentences, and any protocol where frame length is not known in advance.
- Can I use STM32 HAL DMA inside a FreeRTOS task?
- Yes. Start the DMA transfer from a FreeRTOS task as normal; the HAL callback fires from the DMA interrupt. Synchronise using a binary semaphore: give it from the callback (xSemaphoreGiveFromISR()), and take it in the task (xSemaphoreTake() with a timeout). DMA interrupt priorities must have a numeric value equal to or greater than configMAX_SYSCALL_INTERRUPT_PRIORITY — ISRs with a numerically lower value than this threshold cause a hard fault when calling any FreeRTOS FromISR() function. CubeMX sets DMA interrupt priorities to 5 by default for FreeRTOS-enabled projects; verify this in Project Manager → Advanced Settings → NVIC.
References
- STMicroelectronics AN4031 — Using the STM32F2, STM32F4 and STM32F7 Series DMA Controller
- STMicroelectronics RM0090 — STM32F405/407/415/417 Reference Manual (DMA controller chapter)
- STMicroelectronics RM0433 — STM32H7 Reference Manual (DMA controller and DMAMUX)
- STMicroelectronics UM1725 — Description of STM32F4 HAL and Low-Layer Drivers
Related Questions
Which STM32 Family Should You Use?
Compare STM32 families for new designs: G0, G4, F4, H7, L4, U5, WB, and WL — performance tiers, power profiles, peripheral sets, and which to choose.
How Do You Configure STM32 Peripherals with HAL and CubeMX?
STM32CubeMX generates HAL initialisation code for UART, SPI, and I2C from a GUI. This guide explains key settings and how generated code maps to the hardware.
How Do You Configure STM32 NVIC Interrupt Priorities?
Learn how to configure STM32 NVIC interrupt priorities using HAL, priority grouping, and the FreeRTOS configMAX_SYSCALL_INTERRUPT_PRIORITY constraint.
How Does the STM32 Clock Tree Work?
The STM32 clock tree routes HSE or HSI through a PLL to generate SYSCLK, then divides it across AHB and APB buses. Learn how it works and how to configure it.
What Is SPI (Serial Peripheral Interface)?
SPI is a synchronous full-duplex serial bus for connecting microcontrollers to peripherals at high speed. Learn how SCLK, MOSI, MISO, and CS work.
What Is UART (Universal Asynchronous Receiver-Transmitter)?
UART sends serial data asynchronously over TX and RX with no shared clock. Learn how framing, baud rate, RS-232 voltage levels, and common UART pitfalls work.