Electronics Design AU
Digital

What Is an FPGA and How Does It Work?

Last updated 28 June 2026 · 10 min read

Direct Answer

An FPGA (Field-Programmable Gate Array) is an integrated circuit containing a large array of configurable logic blocks, programmable interconnects, and specialised hard IP — that can be reprogrammed to implement any digital circuit after manufacture. The core logic element is the LUT (Look-Up Table): a small RAM that stores the output of any N-input Boolean function for every possible input combination. A 4-input LUT can implement any logic function with up to 4 inputs. Each LUT is paired with a flip-flop, forming a logic cell (also called a Configurable Logic Block or Logic Element depending on the vendor). FPGAs also contain block RAM (BRAM) for on-chip memory and DSP blocks for multiply-accumulate operations. FPGAs are programmed using HDL (Hardware Description Language) — Verilog or VHDL — which describes the hardware structure, not a sequence of instructions. Use an FPGA when you need custom parallel hardware, high-speed data processing, or hardware that must be reconfigurable; use a microcontroller for sequential, interrupt-driven embedded firmware.

Detailed Explanation

An FPGA sits in a unique position in the digital design landscape: it is reconfigurable hardware — a physical chip whose internal logic structure is defined by software (HDL) and can be rewritten as many times as needed. This makes it the prototyping vehicle of choice for ASIC designs and the production platform for applications where custom parallel hardware provides capabilities that no microcontroller or DSP can match.

How an FPGA Is Built

The Look-Up Table (LUT)

The fundamental logic primitive inside an FPGA is the LUT — a small SRAM that maps every possible combination of N binary inputs to an output value. A 4-input LUT has 2⁴ = 16 address lines and stores 16 bits. Those 16 bits can be set to implement any Boolean function of 4 variables — AND, OR, XOR, a full adder, part of a multiplexer, or any arbitrary truth table.

                  4-input LUT
                ┌──────────────┐
Input A ───────►│              │
Input B ───────►│  16-bit SRAM ├────► Output
Input C ───────►│  (truth tbl) │
Input D ───────►│              │
                └──────────────┘

Modern FPGAs use 4-input or 6-input LUTs (6-input is common in current Xilinx/Intel devices). A 6-input LUT can implement any 6-variable Boolean function, which is equivalent to roughly 3–4 standard logic gates on average.

Each LUT is paired with a D flip-flop, forming a logic cell (Xilinx calls it a Slice or CLB — Configurable Logic Block; Intel/Altera calls it a LE — Logic Element):

Input signals ──► [LUT] ──► Mux ──► Q (flip-flop or combinational output)
                                     ▲
                              Clock ─┘

The LUT output can pass directly as a combinational output, or be registered through the flip-flop for synchronous logic. The thousands (or millions) of these cells, connected by a programmable routing fabric, implement the user's design.

Block RAM (BRAM)

On-chip SRAM embedded throughout the FPGA fabric, available in discrete 18 Kb or 36 Kb blocks. BRAM is true dual-port: two independent read/write ports with potentially different widths and independent clocks. Used for:

  • FIFOs between clock domains
  • Look-up tables (large coefficient tables, sin/cos ROM)
  • Packet buffers in communication IP
  • Small embedded memory systems

A Xilinx Artix-7 XC7A35T has 50 × 18 Kb BRAMs = 900 Kb of on-chip RAM. A larger Artix-7 XC7A200T has 365 × 18 Kb BRAMs = 6.6 Mb. This is far less than external DDR but has deterministic single-cycle access.

DSP Blocks

Hard multiply-accumulate (MAC) circuits embedded in the FPGA fabric. A typical DSP48 block (Xilinx) implements: P = A × B + C in a single clock cycle at 400–500 MHz, with optional pipeline registers. Much faster and more area-efficient than implementing a multiplier from LUTs. Used for:

  • Digital filters (FIR, IIR)
  • FFT implementations
  • Motor control PID loops
  • Image and signal processing

A DSP block can also implement wide additions, accumulations, and pattern detectors without using LUT resources.

Hard IP and Other Resources

Modern FPGAs include additional hard (non-reprogrammable) blocks:

  • I/O buffers — configurable for LVDS, LVCMOS 3.3 V/1.8 V/1.2 V, SSTL (DDR), etc.
  • High-speed serial transceivers (SERDES) — PCIe, Gigabit Ethernet (GigE), USB 3.0 at 1–28 Gbps.
  • PLLs / MMCMs — on-chip clock multiplication, division, and phase shifting.
  • Hard processor blocks — larger FPGAs include hard ARM Cortex-A9/A53 cores (Xilinx Zynq, Intel Cyclone V SoC) or RISC-V cores, creating a hybrid FPGA+CPU system on one device.
  • PCIe hard blocks — PCIe Gen1/2/3 endpoint, eliminates the need to implement PCIe in fabric.

How an FPGA Is Programmed

FPGAs use SRAM-based configuration: billions of SRAM cells hold the configuration data (which LUT bit is 0 or 1, which routing switch is connected). On power-up, configuration is loaded from an external SPI flash (or over JTAG). Once loaded, the FPGA's logic is live.

The design flow:

  1. HDL authoring — write Verilog or VHDL describing the hardware.
  2. Synthesis — the synthesis tool (Vivado, Quartus, Yosys) converts HDL to a netlist of LUTs, flip-flops, and primitives.
  3. Place and route (PnR) — the PnR tool (part of Vivado/Quartus) assigns each netlist element to a physical LUT/flip-flop location and routes connections through the switching fabric.
  4. Timing analysis — static timing analysis confirms setup and hold time margins across all paths. The critical path determines maximum clock frequency.
  5. Bitstream generation — the PnR tool generates the SRAM configuration bitstream.
  6. Programming — loaded to FPGA via JTAG (volatile, lost on power cycle) or to SPI flash (non-volatile).

For a detailed guide to each of these steps — synthesis settings and report analysis, XDC timing constraints, reading WNS/TNS timing reports, timing closure strategies, and JTAG vs SPI flash programming — see FPGA Development Flow: From HDL to Working Hardware.

Verilog vs VHDL

Both describe hardware at the Register Transfer Level (RTL) — the behaviour of the design cycle by cycle. Neither is a sequential programming language.

Verilog:

module counter #(parameter WIDTH = 8) (
    input  wire             clk,
    input  wire             rst_n,
    output reg [WIDTH-1:0]  count
);
    always @(posedge clk or negedge rst_n) begin
        if (!rst_n)
            count <= 0;
        else
            count <= count + 1;
    end
endmodule

VHDL equivalent:

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity counter is
    generic (WIDTH : natural := 8);
    port (
        clk   : in  std_logic;
        rst_n : in  std_logic;
        count : out std_logic_vector(WIDTH-1 downto 0)
    );
end entity;

architecture rtl of counter is
    signal count_r : unsigned(WIDTH-1 downto 0);
begin
    process(clk, rst_n) begin
        if rst_n = '0' then
            count_r <= (others => '0');
        elsif rising_edge(clk) then
            count_r <= count_r + 1;
        end if;
    end process;
    count <= std_logic_vector(count_r);
end architecture;

Both describe the same hardware. Verilog is more concise; VHDL is more explicit about types, which catches bugs at compile time but adds verbosity. For new projects in a mixed-discipline team, Verilog (or SystemVerilog for larger projects) is the more common choice. For a practical guide to writing synthesisable HDL — modules, always blocks, blocking vs non-blocking assignments, latch inference, and test benches — see How Do You Write Verilog and VHDL for an FPGA?.

FPGA vs MCU vs ASIC

For a detailed decision guide — including NRE economics, volume break-even calculations, hybrid FPGA+MCU architectures, and a full comparison table — see FPGA vs Microcontroller vs ASIC: Which Should You Use?. The summary below introduces the key differences.

MCUFPGAASIC
Execution modelSequential instructionsParallel custom hardwareParallel custom hardware
Performance (parallel)LowHighHighest
Development costLowMediumVery high (NRE: $100K–$10M+)
Unit cost (low volume)LowMedium–HighMedium–Low
Unit cost (high volume)LowMedium–HighLow
ReconfigurabilityFirmware updateFull hardware reprogrammingFixed at manufacture
Power efficiencyModerateModerate–GoodBest
Time to marketFastMediumSlow (12–24 months)
Development toolsC/C++, GDBVivado, Quartus, OpenFPGACadence, Synopsys
Best forEmbedded control, comms protocol stacks, UI, general embeddedDSP, parallel IO, custom interfaces, algorithms too fast for MCU, ASIC prototypingHigh-volume products (>100k units), performance/power critical

When to choose an FPGA:

  • Data rates too high for an MCU (>100 Mbps digital interfaces).
  • Algorithms that benefit from massive parallelism (radar DSP, image processing, machine learning inference).
  • Custom digital interfaces not available in a standard MCU (e.g. MIPI CSI-2, PCIe endpoint, 100GbE).
  • Prototyping hardware that will eventually become an ASIC.
  • Low-volume or mixed-volume products where ASIC NRE is not justified.

Major Vendors

  • AMD/Xilinx: Artix-7 (cost-optimised), Kintex-7 (balanced), Virtex-7 (high-performance), Zynq-7000 (ARM Cortex-A9 + fabric), Zynq Ultrascale+ (Cortex-A53 + fabric + GPU). Most common in commercial and industrial products globally.
  • Intel/Altera: Cyclone V (cost-optimised, SoC version with Cortex-A9), Arria 10 (mid-range), Stratix 10 (high-end). Dominant in networking and telecom.
  • Lattice Semiconductor: iCE40 (ultra-low power, mobile, wearable), ECP5 (mid-range, open-source toolchain support). Popular for hobbyists and low-power applications due to open-source toolchain (Project IceStorm, nextpnr).
  • Microchip (Microsemi): PolarFire (mid-range, power-efficient, anti-tamper, radiation-tolerant variants). Used in defence, space, and functional-safety applications.

For FPGA design, implementation, and integration into embedded hardware systems, Zeus Design's engineering team delivers digital design from RTL through to FPGA bring-up and validation.

Design Considerations

  • Synthesis ≠ simulation. A Verilog design that passes behavioural simulation may fail in synthesis or post-route timing. Always run synthesis and timing analysis, not just simulation, before considering a design correct. The synthesis tool infers hardware from your HDL — and inferred latches (from incomplete if/case coverage) or combinational loops are common synthesis mistakes that behavioural simulation does not catch.
  • FPGA power is not trivially low. FPGA dynamic power scales with clock frequency and logic utilisation. A moderately utilised Artix-7 at 100 MHz can draw 200–500 mW from the core supply. Use vendor power estimator tools (Xilinx Power Estimator, Intel Power Estimator) early in the design cycle, not after choosing the power supply.
  • Timing closure is the hard part. Achieving timing closure — meeting all setup and hold time constraints at the target clock frequency — often requires floorplanning (constraining where logic is placed), pipelining long paths, or partitioning the design into clock domains. Add 20–30% margin to target clock frequencies during early RTL development.
  • Configuration time matters for production. Loading an FPGA configuration from SPI flash takes 10 ms to several seconds depending on bitstream size and SPI frequency. For systems with fast boot requirements (automotive wake-up, for example), this must be planned for. Some FPGAs support partial reconfiguration (updating one region of the fabric while the rest runs).
  • PCB signal integrity at high speeds. FPGAs using SERDES transceivers (1–28 Gbps), DDR memory interfaces, or high-frequency I/O produce signals with sub-nanosecond edge rates. At those speeds, PCB trace length, characteristic impedance, and reference-plane continuity directly determine whether signals arrive cleanly or with reflections and ringing. See signal integrity in PCB design for the transmission-line effects, termination strategies, and return-path rules that apply when routing a high-speed FPGA board.

Common Mistakes

  • Treating FPGAs like a processor. Writing HDL as if it were sequential software — with loops that iterate over cycles, or for loops in synthesis expecting them to execute one iteration at a time — produces either unintended hardware (a for-loop is fully unrolled in synthesis) or synthesis errors. Hardware description requires thinking about what the circuit is, not what it does step by step.
  • Ignoring metastability at clock domain crossings. See combinational vs sequential logic for the dual-flop synchroniser pattern. Asynchronous signals entering the FPGA fabric from external sources or from a different clock domain must be synchronised — failing to do so causes intermittent failures that are extremely difficult to debug.
  • Underestimating resource usage. FPGA resource estimates based on logic function analysis are often optimistic. Synthesis tools add routing overhead, control logic, and handshaking. Design with 60–70% LUT utilisation as a target maximum for the first implementation; above 80% timing closure becomes significantly harder.
  • Choosing the wrong vendor or family too early. FPGA toolchains are vendor-specific and not interchangeable. Switching from a Xilinx Artix-7 to an Intel Cyclone V midway through a project means relearning the toolchain, re-running timing analysis, and potentially re-targeting hard IP (SERDES, PLL constraints). Choose the family after confirming the device has the required transceivers, IO banks, DSPs, and BRAM — and that the vendor's tools and support meet your team's needs.

Frequently Asked Questions

Is an FPGA faster than a microcontroller?
It depends on the task. For sequential, instruction-driven tasks, a modern MCU (200+ MHz Cortex-M7) running optimised C is competitive with an FPGA running the same algorithm, and far simpler to develop. The FPGA wins decisively when the task is inherently parallel: signal processing, protocol handling, multi-channel data acquisition, or any function that benefits from many independent operations happening simultaneously in hardware — not sequentially in software. An FPGA can process 16 ADC channels simultaneously each clock cycle; an MCU must process them in a loop. An FPGA running at 100 MHz with 100 independent logic paths can outperform an MCU at 400 MHz on parallel workloads by orders of magnitude. The comparison is not clock speed against clock speed but architecture against architecture.
What is the difference between an FPGA and a CPLD?
A CPLD (Complex Programmable Logic Device) is the smaller, simpler predecessor to the FPGA. CPLDs consist of a set of logic macrocells (AND-OR arrays) with a programmable interconnect matrix between them. They have non-volatile configuration (the design is retained when powered off without a separate configuration flash), small logic capacity (hundreds to a few thousand gates), deterministic propagation delay, and are low-cost. FPGAs have orders of magnitude more logic capacity (millions of logic elements), SRAM-based configuration (requires reconfiguration on power-up from an external flash), block RAM, DSP blocks, high-speed transceivers, and embedded processor blocks (hard ARM cores in Xilinx Zynq or Intel SoC FPGAs). CPLDs are used for simple glue logic, interface bridging, and power sequencing. FPGAs handle complex digital systems that exceed what a CPLD or MCU can implement.
What programming language is used for FPGAs?
FPGAs are programmed (more precisely: described) using Hardware Description Languages (HDL). The two industry-standard HDLs are Verilog (C-like syntax, dominant in North America and Asia) and VHDL (Ada-based syntax, verbose but strongly typed, more common in Europe and defence/aerospace). Both describe hardware behaviour and structure — they do not describe software instructions. A modern alternative is SystemVerilog, which extends Verilog with object-oriented constructs and assertion-based verification features. High-level synthesis (HLS) tools such as Xilinx Vitis HLS allow C/C++ functions to be compiled to RTL (Register Transfer Level) hardware, but the generated logic often requires tuning. For new projects, Verilog or SystemVerilog is the practical starting point with the widest ecosystem support.

References

Related Questions

Related Forum Discussions