Electronics Design AU
Raspberry Pi

What Is the RP2040/RP2350 PIO (Programmable I/O) Peripheral?

Last updated 4 July 2026 · 8 min read

Direct Answer

PIO (Programmable I/O) is a peripheral unique to the Raspberry Pi RP2040 and RP2350 microcontrollers: a set of small, independent state machines that execute a tiny, purpose-built instruction set to generate or sample GPIO signals with cycle-accurate timing, entirely independent of the CPU. Each state machine runs a program of up to 32 instructions, has its own programmable clock divider, and moves data to and from the CPU through FIFOs that can be serviced by DMA — letting firmware implement a custom serial protocol (WS2812 LED timing, a nonstandard sensor bus, software-defined UART variants) in hardware, with the deterministic timing of a peripheral rather than the jitter of bit-banging.

Detailed Explanation

Every general-purpose microcontroller offers a fixed menu of hardware peripherals — UART, SPI, I2C, PWM timers, ADC — each implementing one specific protocol in dedicated silicon. When a design needs something the menu doesn't offer (see the RP2040 vs STM32 co-processor comparison, where PIO is named as the RP2040's distinguishing feature), the usual fallback is bit banging: implementing the protocol's timing in software on the CPU. PIO is Raspberry Pi Silicon's answer to this gap — a small, programmable peripheral purpose-built to generate and sample custom digital waveforms without spending CPU cycles once it's running.

Architecture: State Machines, Not a CPU

Each PIO state machine is not a general-purpose processor — it is a minimal execution unit built around a 9-instruction instruction set, deliberately restricted so that every instruction executes in exactly one clock cycle (with the exception of an optional programmable delay after each instruction). This is what makes PIO timing deterministic: unlike a CPU core, there is no cache, no branch prediction, no interrupt preemption mid-instruction to introduce jitter.

Each state machine has:

  • Its own instruction pointer, executing a shared program loaded into the PIO block's 32-instruction memory (all four state machines in a block share this memory, but can run independent programs if they collectively fit).
  • Its own clock divider — an 8.8 fixed-point (RP2040) or 16.16 (RP2350) fractional divider from the system clock, letting each state machine run its instructions at a different effective rate independent of the others.
  • Two FIFOs (transmit and receive, 4 words deep each on RP2040, configurable depth on RP2350), the interface between the state machine and the CPU or DMA.
  • Configurable GPIO mapping — any state machine can be assigned to (almost) any GPIO pin(s) at configuration time, including a "side-set" pin group that toggles alongside the main instruction stream (used for generating a clock signal alongside data, for example).

The Instruction Set

PIO's instruction set has exactly nine instructions:

InstructionPurpose
JMPConditional or unconditional jump, including looping constructs
WAITStall until a GPIO pin, IRQ flag, or another state machine reaches a specified state
INShift bits from GPIO pins (or another source) into the Input Shift Register
OUTShift bits from the Output Shift Register to GPIO pins (or another destination)
PUSHMove the Input Shift Register's contents to the RX FIFO
PULLMove data from the TX FIFO into the Output Shift Register
MOVCopy data between registers, optionally inverting or reversing bits
IRQSet or wait on an interrupt flag, usable for synchronising multiple state machines
SETWrite an immediate value directly to pins or a register

Every instruction can carry an optional 1–31 cycle delay, which is how sub-microsecond timing (a specific number of system clock cycles between transitions) is encoded directly into the program rather than computed at runtime. A complete protocol implementation — including the RP2040 SDK's own WS2812 example — is typically only 4–6 instructions long.

FIFOs and DMA: Removing the CPU From the Data Path

A PIO state machine only interacts with the CPU (or DMA) through its FIFOs. With autopull and autopush configured, the state machine automatically pulls a new word from the TX FIFO once the Output Shift Register empties, or pushes a full Input Shift Register to the RX FIFO — without an explicit PULL/PUSH instruction consuming a program-memory slot or CPU intervention.

Because the FIFOs present as standard DMA request sources, a DMA channel can stream an entire buffer (an LED strip's colour data, a block of samples to transmit) into or out of a state machine's FIFO with zero CPU involvement after the transfer is configured — the CPU only receives a completion interrupt once the whole buffer has moved. This is the mechanism behind driving a long WS2812 LED strip or generating an extended waveform without the CPU touching every individual sample.

Practical Example: Driving WS2812 (NeoPixel) LEDs

WS2812-style addressable LEDs encode each bit as a specific high/low pulse-width ratio (a 0 bit and a 1 bit differ only in how long the line is held high within a fixed ~1.25 µs period), with no separate clock line — timing must be accurate to within roughly ±150 ns. This is a canonical PIO use case because it's a protocol no standard peripheral produces and bit-banging struggles to hold accurately under any interrupt load.

A minimal PIO program for this:

.program ws2812
.side_set 1

.wrap_target
bitloop:
    out x, 1       side 0 [2]   ; Shift out one bit; drive low for the "always low" portion
    jmp !x do_zero side 1 [1]   ; Branch on the bit value while driving high
do_one:
    jmp bitloop    side 1 [4]   ; A '1' bit: stay high longer
    do_zero:
    nop            side 0 [4]   ; A '0' bit: go low sooner
.wrap

The side 0/side 1 annotations use the side-set feature to toggle the data pin in lock-step with the main instruction flow, and the [n] delay values encode the precise high/low durations required by the WS2812 datasheet at the state machine's configured clock rate. Firmware simply computes each pixel's 24-bit GRB colour value, pushes it into the TX FIFO (or lets DMA stream an entire frame buffer of pixel values), and the state machine handles every bit's exact timing autonomously — freeing the CPU to compute the next frame's colours, run other peripherals, or sleep.

Design Considerations

  • Choose PIO over bit banging when timing precision or CPU availability matters. If a protocol's timing tolerance is loose (a slow, non-critical sensor poll) and only used occasionally, bit banging remains simpler to write and debug. PIO earns its complexity when timing must hold under interrupt load, when the CPU needs to be doing other work simultaneously, or when the required bit rate exceeds what bit banging can reliably sustain.
  • Budget instruction memory before committing to a design. Each PIO block has only 32 instruction slots shared across its four state machines. A complex custom protocol competing with several other simultaneous state machine programs in the same block can run out of space — check whether the design needs to spread state machines across both (RP2040) or all three (RP2350) PIO blocks.
  • Compute the clock divider from the protocol's actual bit timing, not a round number. PIO's fractional clock divider lets a state machine run at almost any sub-multiple of the system clock, but sets an actual output timing that must be verified against the target protocol's tolerance — for tight tolerances like WS2812's, verify the resulting waveform on a logic analyser rather than trusting the divider arithmetic alone.
  • Use DMA for any sustained data stream. Configuring a state machine's FIFO to trigger DMA transfers removes the CPU from the data path for buffer-sized transfers, which is the difference between "flicker-free LED strip updates while other firmware runs" and "polling loop that steals cycles from everything else."

Common Mistakes

  • Forgetting .wrap and re-checking program bounds. A PIO program that doesn't loop back correctly (or exceeds 32 instructions) either halts after one pass or fails to assemble; the .wrap_target/.wrap directives define the loop point explicitly and are easy to omit when adapting an example program.
  • FIFO underrun or overrun going unnoticed. If the CPU or DMA doesn't refill the TX FIFO fast enough, the state machine stalls waiting on PULL (or, without autopull's blocking behaviour disabled, repeats stale data) — a symptom that looks like corrupted or frozen output but is actually a data-supply timing problem, not a PIO program bug.
  • Assuming PIO pins are interchangeable with any GPIO without checking constraints. While PIO can map to nearly any GPIO, side-set pins, jump pins, and the base pin for a group (OUT/IN/SET pin mappings) each have their own configuration field with a limited pin count — a program written assuming five side-set pins will not silently reconfigure itself to your board's pin count.
  • Debugging PIO timing without a logic analyser. Because PIO's entire value proposition is cycle-accurate timing, an incorrect delay value or divider calculation often "mostly works" (an LED strip that's slightly wrong-coloured, a protocol that decodes most of the time) rather than failing outright — capture the actual waveform rather than assuming the program logic is correct because it compiles and runs.

Frequently Asked Questions

How many PIO state machines does the RP2040 have, and how many does the RP2350 have?
The RP2040 has two PIO blocks, each with four state machines, for a total of eight. The RP2350 increases this to three PIO blocks of four state machines each, for a total of twelve. Within a single PIO block, all four state machines share the same instruction memory (32 instruction slots) and can run either independent programs (if they collectively fit in 32 instructions) or the same program with different configuration — different pins, clock dividers, or starting offsets.
Is PIO the same thing as bit banging?
No, though they solve similar problems. Bit banging (see What Is Bit Banging?) executes GPIO timing entirely as CPU instructions, competing with every other interrupt and task for CPU cycles and losing accuracy under any scheduling jitter. A PIO state machine runs autonomously in dedicated hardware once started — it continues generating or sampling its waveform with cycle-accurate timing regardless of what the CPU cores are doing, and only interrupts the CPU (or DMA) when its FIFO needs servicing. This is the key advantage: PIO gets bit-banging's flexibility (any protocol, any pin) without bit-banging's CPU-time cost or jitter sensitivity.
Can PIO replace a hardware UART, SPI, or I2C peripheral?
Yes, and the RP2040/RP2350 SDK ships working PIO programs that do exactly this for cases where the dedicated hardware peripherals are already in use or need a nonstandard variant (e.g. a UART baud rate or frame format the hardware UART can't produce). It's rarely worth using PIO to reimplement a standard protocol when a dedicated peripheral is free and sufficient — PIO's real value is nonstandard, high-precision, or entirely custom protocols the fixed-function peripherals can't produce at all, such as WS2812 LED timing or a proprietary sensor's timing-critical bus.

References

Related Questions

Related Forum Discussions