How Do You Use Zynq / SoC-FPGA Devices?
Last updated 3 July 2026 · 9 min read
Direct Answer
A Zynq-class SoC-FPGA (AMD/Xilinx Zynq-7000, Zynq UltraScale+, or Intel Cyclone V SoC) integrates a hard ARM processor — the Processing System (PS) — alongside FPGA fabric — the Programmable Logic (PL) — on a single die, connected by an on-chip AXI interconnect. The PS boots first and independently of the PL: a BootROM loads a First Stage Boot Loader (FSBL), which can optionally load the PL bitstream before handing off to U-Boot and then either an embedded Linux distribution (typically built with PetaLinux/Yocto) or bare-metal/RTOS firmware. The PS communicates with custom PL logic through AXI ports of three types — low-bandwidth General Purpose (GP) ports for register control, high-bandwidth High Performance (HP) ports for DMA-style bulk data transfer to DDR, and, on Zynq-7000, a cache-coherent Accelerator Coherency Port (ACP). Getting a Zynq design working means treating the PS software stack and the PL hardware design as two genuinely separate development flows that meet at a well-defined AXI and interrupt boundary, not as one combined build.
Detailed Explanation
A Zynq-class device (AMD/Xilinx Zynq-7000, Zynq UltraScale+, or the equivalent Intel Cyclone V/Agilex SoC) is architecturally two devices on one die: a hard Processing System (PS) — one or more ARM Cortex-A cores, with peripherals like UART, SPI, I2C, USB, and Ethernet controllers built in silicon — and Programmable Logic (PL), the FPGA fabric, connected by an on-chip AXI interconnect. Unlike the FPGA + MCU dual-chip approach, where the processor and FPGA are physically separate parts communicating over a board-level bus (typically SPI), the PS and PL share a die and a wide, high-bandwidth on-chip interconnect — eliminating chip-to-chip latency and freeing board-level I/O for other uses. See the FPGA topic for the full set of FPGA implementation guides.
The PS-PL Split
The PS is not a soft processor implemented in the FPGA fabric — it is dedicated silicon, running independently of whatever (if anything) is loaded into the PL. This has a critical practical consequence: the PS can boot and run without any PL bitstream loaded at all. A design can bring up Linux or bare-metal firmware on the PS first, debug it in isolation, and add the PL bitstream once the software side is stable — a much easier bring-up sequence than trying to debug both halves simultaneously.
- Zynq-7000: dual-core ARM Cortex-A9 PS, single FPGA fabric (PL), one AXI interconnect generation.
- Zynq UltraScale+ MPSoC: quad-core ARM Cortex-A53 (application processing unit) plus a dual-core ARM Cortex-R5 (real-time processing unit) PS, giving a genuine asymmetric multiprocessing split — Linux on the A53 cores, hard-real-time control loops on the R5 cores, both alongside the PL fabric.
AXI Interconnect: GP, HP, and ACP Ports
The PS exposes several AXI port types to the PL, each suited to a different traffic pattern:
- AXI GP (General Purpose) — 32-bit, lower throughput, intended for register-style control: writing configuration into a custom IP block, polling a status register. Both PS-to-PL (
M_AXI_GP) and PL-to-PS (S_AXI_GP) directions exist. - AXI HP (High Performance) — 64-bit or 128-bit, intended for DMA-style bulk data movement between PL logic and DDR memory without CPU involvement in every transfer — the port a custom PL peripheral streaming sensor or video data would use to write directly into a DDR buffer the PS software then reads.
- AXI ACP (Accelerator Coherency Port) — Zynq-7000 specific: gives PL logic cache-coherent access to the PS's L2 cache, so PL and PS can share data structures without explicit software cache-flush/invalidate management. Comes at a throughput cost relative to HP ports and is best reserved for genuinely latency-sensitive coherent sharing rather than bulk streaming.
Choosing the wrong port type for the traffic pattern is a common source of disappointing PL-PS bandwidth — a GP port used for bulk streaming data will bottleneck badly compared to an HP port doing the same job.
Boot Flow
The Zynq boot sequence runs through several stages before application code (Linux or bare-metal) executes:
- BootROM — fixed, on-chip, unmodifiable silicon code that reads the boot mode pins (JTAG, QSPI, SD, NAND depending on part and board strapping) and loads the next stage.
- FSBL (First Stage Boot Loader) — generated by the Xilinx tools from the specific hardware configuration; initialises DDR, clocks, and PS peripherals, and can optionally load the PL bitstream at this stage (bitstream-before-Linux) before handing off.
- U-Boot — the second-stage bootloader that loads the Linux kernel, device tree, and root filesystem (or, in a bare-metal flow, this stage may be skipped entirely).
- Linux kernel / bare-metal application — the final application, either a PetaLinux-built embedded Linux image or a bare-metal executable running directly after the FSBL (or after U-Boot, for a bare-metal image loaded as U-Boot's "kernel").
Zynq UltraScale+ adds additional early stages (a Platform Management Unit firmware stage and, for secure/trusted-boot designs, ARM Trusted Firmware) ahead of U-Boot, reflecting its more complex multi-core PS.
Software Stack: PetaLinux vs Bare-Metal
The PS software choice is an independent decision from the PL hardware design, made per-application:
- PetaLinux (Yocto-based) — AMD/Xilinx's toolchain for building a custom embedded Linux distribution (kernel, U-Boot, root filesystem, device tree) targeted at a specific Zynq hardware configuration. The right choice when the application needs a networking stack, file system, package ecosystem, or any software complexity that would be painful to reimplement bare-metal. Boot time is measured in seconds, and the image size and build complexity are correspondingly larger.
- Bare-metal (standalone BSP) — a lightweight board support package generated directly from the Vivado hardware description, built with Vitis (or the older Xilinx SDK). The right choice when the PS's job is narrowly to configure and monitor PL logic with no networking or file system requirement — boot time is milliseconds, not seconds, and the resulting image is a fraction of the size of a Linux build.
- FreeRTOS or Zephyr on the PS — a middle ground offering multitasking and standard RTOS primitives without full Linux's boot time and resource footprint; commonly used on the Cortex-R5 real-time cores in a Zynq UltraScale+ design running control loops alongside Linux on the A53 cores.
Vivado Block Design Workflow
Building the PL side of a Zynq design uses Vivado's IP Integrator (block design) tool rather than writing the PS interconnect by hand:
- Add the Zynq PS7 (or PS8, for UltraScale+) IP block to a new block design and run the PS configuration wizard, which presents the specific PS peripherals, clocks, and DDR configuration available on the target part and board — this generates the correct PS initialisation settings the FSBL will later use.
- Enable and configure the AXI ports the design needs (GP, HP, ACP) in the PS configuration wizard; unused ports should stay disabled to reduce resource usage and configuration complexity.
- Add custom PL IP blocks (your own HDL, wrapped as IP, or Xilinx-provided IP like AXI DMA or AXI GPIO) and connect them to the PS through Vivado's AXI Interconnect IP — the tool auto-generates the necessary address decoding and routing when you use the "Run Connection Automation" feature, though verifying the resulting address map against what your PS software expects is still essential.
- Generate the bitstream and export the hardware description (an
.xsafile for current Vivado versions) — this file is the handoff artifact that PetaLinux or Vitis consumes to build software matched to the exact PL configuration, including the AXI address map and any custom IP driver stubs.
Loading the PL Bitstream at Runtime
On a Linux-based design, the PL bitstream doesn't have to be loaded only by the FSBL at boot — Linux's FPGA Manager framework (/sys/class/fpga_manager) allows loading or reloading a PL bitstream at runtime, either from userspace or via a device tree overlay that binds a bitstream to newly-appearing hardware. This is the mechanism behind partial reconfiguration workflows and designs where the PL configuration needs to change without a full system reboot — the PS keeps running Linux throughout.
Design Considerations
- Decide the PS software stack before finalising the PL address map. Bare-metal and Linux BSPs consume the same
.xsahardware description, but a Linux device tree needs every AXI-addressed PL peripheral described correctly — get the address map right in Vivado before generating downstream software artifacts, since regenerating them after a PL change is a real but avoidable rebuild cost. - Match the AXI port type to the actual traffic pattern. Use GP for control-plane register access and HP for bulk data movement; using a GP port for a job that needs HP bandwidth is a common and easily avoided performance mistake.
- Bring up the PS and PL independently before integrating them. Verify Linux or bare-metal firmware boots cleanly on the PS with a minimal or no PL bitstream first, and verify the PL logic in simulation before connecting it to the PS — debugging both halves simultaneously on first bring-up multiplies the search space for any fault.
- Plan the boot-time PL bitstream loading strategy deliberately. Loading the bitstream in the FSBL (before Linux) is required if Linux itself needs PL hardware (a PL-based Ethernet MAC, for instance) to boot; loading it later via FPGA Manager is preferable when the PL configuration might change at runtime or isn't needed until later in the boot sequence.
For SoC-FPGA hardware and firmware design, including PS-PL integration, PetaLinux or bare-metal software, and custom PL IP, Zeus Design's firmware team develops embedded systems on Zynq and equivalent SoC-FPGA platforms.
Common Mistakes
- Treating the PS and PL as one combined build instead of two coordinated flows. The hardware (Vivado block design → bitstream →
.xsa) and software (PetaLinux or Vitis, consuming the.xsa) toolchains are genuinely separate, and confusing the handoff points between them is a frequent source of "it worked before" build breakage after a PL change. - Choosing the wrong AXI port type for the traffic pattern, then attributing the resulting bandwidth shortfall to the FPGA fabric rather than the port choice — a GP port used for bulk streaming will never match HP port throughput regardless of PL logic optimisation.
- Forgetting that the device tree must match the PL hardware configuration. On a Linux build, a device tree that doesn't accurately describe the PL's AXI-addressed peripherals (wrong base address, missing interrupt mapping) produces drivers that fail to probe or access the wrong registers — regenerate the device tree source from the current
.xsaafter any PL address map change rather than hand-patching an old one. - Skipping independent PS bring-up. Attempting to bring up Linux and custom PL logic simultaneously on first power-on makes it far harder to isolate whether a fault is a PS software issue, a PL hardware issue, or an interconnect/address-map mismatch between the two.
- Underestimating PetaLinux build and iteration time. A full PetaLinux rebuild after a hardware description change can take significantly longer than a typical firmware rebuild; plan development iteration around this, and use the bare-metal flow for early PL-focused debugging where a fast rebuild-flash-test loop matters more than the final software stack.
For the higher-level decision between an FPGA, a microcontroller, and a Zynq-style SoC-FPGA hybrid, see FPGA vs Microcontroller vs ASIC. For the FPGA-side toolchain (synthesis, place-and-route, timing closure) that produces the PL bitstream itself, see FPGA development flow.
Frequently Asked Questions
- What is a Zynq SoC and when should I use one instead of a separate FPGA and MCU?
- A Zynq SoC combines a hard ARM processor and FPGA fabric on one die, eliminating the chip-to-chip communication overhead of a dual-chip FPGA+MCU design. It makes sense when an application genuinely needs both a real-time or high-throughput data path (PL) and a rich software environment — networking stacks, file systems, machine learning inference, a display stack — that would be complex to implement in HDL alone (PS). It is unnecessary complexity when only one capability is needed: a design that's purely control-plane software fits a standard MCU better, and a design that's purely a fixed high-throughput data path with no software requirement fits a standalone FPGA better.
- Do I have to use Linux on the PS, or can I run bare-metal firmware?
- Bare-metal firmware is fully supported and often the right choice when the PS side just needs to configure and monitor the PL logic, with no networking stack, file system, or complex software ecosystem required. AMD/Xilinx ships a standalone bare-metal BSP (generated from the hardware description) alongside the Linux/PetaLinux path, built with the Vitis (or older SDK) toolchain. A bare-metal build boots dramatically faster than Linux (milliseconds vs seconds) and has a much smaller attack surface and resource footprint, at the cost of writing your own drivers for anything beyond what the BSP provides.
- How does the PS access data produced by custom logic in the PL?
- Through the AXI interconnect, using one of three port types depending on the access pattern. AXI GP (General Purpose) ports are 32-bit and intended for low-bandwidth register-style control — writing configuration values into a custom IP block's control registers, reading status. AXI HP (High Performance) ports are wider (64 or 128-bit) and intended for high-bandwidth DMA-style transfers, typically moving streaming data from a PL peripheral directly into DDR memory without CPU involvement. On Zynq-7000 specifically, the AXI ACP (Accelerator Coherency Port) gives PL logic cache-coherent access to the PS's L2 cache, useful when PL and PS need to share data structures without explicit cache-flush management in software.
References
Related Questions
FPGA vs Microcontroller vs ASIC: Which Should You Use?
Learn when to choose an FPGA, microcontroller, or ASIC — covering parallel vs sequential workloads, volume economics, NRE costs, and hybrid approaches.
What Is an FPGA and How Does It Work?
What is an FPGA, how do LUTs implement any logic function, when to choose FPGA vs MCU vs ASIC, and the basics of Verilog and VHDL for digital design.
FPGA Development Flow: From HDL to Working Hardware
Learn the complete FPGA development flow: synthesis, place-and-route, timing constraints, timing closure, and bitstream generation in Vivado and Quartus.
How Do You Choose the Right FPGA Family?
How to choose an FPGA family: compare Xilinx Artix-7, Lattice ECP5 and iCE40, and Intel Cyclone on LUT count, toolchain, power, cost, and dev board ecosystem.
How Do You Write Verilog and VHDL for an FPGA?
Learn to write synthesisable Verilog and VHDL for FPGAs — modules, always blocks, non-blocking assignments, latch inference rules, and test bench basics.
What Is a Bootloader in an Embedded System?
A bootloader is the first code a microcontroller runs — it validates and launches the application, and enables field firmware updates without a debug probe.