Electronics Design AU
FPGA

How Do You Write Verilog and VHDL for an FPGA?

Last updated 30 June 2026 · 16 min read

Direct Answer

RTL (Register Transfer Level) HDL describes hardware structure, not software instructions. In Verilog, a module declares input/output ports and internal logic: combinational logic uses assign statements or always @(*) blocks with blocking assignments (=); sequential (clocked) logic uses always @(posedge clk) blocks with non-blocking assignments (<=). In VHDL, the equivalent is an entity with port declarations and an architecture body with concurrent signal assignments and process statements. Three fundamental rules for synthesisable HDL: always use non-blocking assignments (<=) inside clocked blocks; assign a default to every output in combinational blocks to avoid inferring unintentional latches; and simulate your design before synthesising — synthesis tools infer hardware structure from HDL and cannot catch logic bugs, only structural errors.

Detailed Explanation

HDL (Hardware Description Language) is fundamentally different from software programming. A C function describes a sequence of instructions executed one at a time on a CPU. A Verilog module or VHDL entity describes hardware structure — wires connecting logic elements and registers that all exist simultaneously and operate in parallel. The synthesis tool reads your HDL and infers what physical hardware to build from it: which LUTs to configure, which flip-flops to use, how to route signals between them. Understanding this distinction — that you are describing hardware, not writing software — is the prerequisite for writing correct, synthesisable HDL.

For background on how FPGAs implement logic (LUTs, BRAM, DSP blocks) and the full synthesis-to-bitstream flow, see What Is an FPGA and How Does It Work?. This page covers the practical HDL writing skills that come after that foundation. After writing and simulating your HDL, the next step is running the toolchain — for synthesis, place-and-route, timing constraints, and bitstream generation, see FPGA Development Flow: From HDL to Working Hardware.

The Verilog Module

A Verilog module is the fundamental building block. Every module has a port list (its interface) and a body (its implementation):

module mux4to1 #(parameter WIDTH = 8) (
    input  wire [1:0]       sel,
    input  wire [WIDTH-1:0] d0,
    input  wire [WIDTH-1:0] d1,
    input  wire [WIDTH-1:0] d2,
    input  wire [WIDTH-1:0] d3,
    output reg  [WIDTH-1:0] y
);
    // implementation here
endmodule

Key elements:

  • module / endmodule delimit the module.
  • #(parameter WIDTH = 8) declares a parameterised width with a default of 8 — the same module can be instantiated at different widths without duplicating code.
  • Port directions are input, output, or inout.
  • wire is a combinational net driven by assign statements or module ports. reg is a signal that can be assigned inside an always block — in a combinational block it synthesises to a wire; in a clocked block it synthesises to a flip-flop.
  • In SystemVerilog (supported by Vivado and Quartus), logic replaces both wire and reg. The tool infers the correct hardware from context, and multi-driver conflicts are a compiler error rather than a silent bug.

The VHDL Entity and Architecture

VHDL separates the interface (entity) from the implementation (architecture):

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity mux4to1 is
    generic (WIDTH : natural := 8);
    port (
        sel : in  std_logic_vector(1 downto 0);
        d0  : in  std_logic_vector(WIDTH-1 downto 0);
        d1  : in  std_logic_vector(WIDTH-1 downto 0);
        d2  : in  std_logic_vector(WIDTH-1 downto 0);
        d3  : in  std_logic_vector(WIDTH-1 downto 0);
        y   : out std_logic_vector(WIDTH-1 downto 0)
    );
end entity;

architecture rtl of mux4to1 is
    -- internal signal declarations here
begin
    -- implementation here
end architecture;

Key elements:

  • library ieee; use ieee.std_logic_1164.all; — always required; defines std_logic (the standard 9-value logic type including high-impedance 'Z' and undefined 'X').
  • use ieee.numeric_std.all; — required for arithmetic. Use unsigned or signed for arithmetic vectors, not std_logic_vector directly. Avoid the deprecated std_logic_arith and std_logic_unsigned packages.
  • generic is the VHDL equivalent of Verilog's parameter.
  • Internal signals are declared between architecture rtl of mux4to1 is and begin, as signal name : type;.
  • The architecture name (rtl) is arbitrary; rtl is the conventional name for a synthesisable implementation.

Describing Combinational Logic

Combinational logic outputs depend only on current inputs — no clock, no state.

Verilog — assign statement (suitable for simple expressions):

// 4-to-1 mux with a conditional expression
assign y = (sel == 2'b00) ? d0 :
           (sel == 2'b01) ? d1 :
           (sel == 2'b10) ? d2 : d3;

assign drives a wire (or logic). The right-hand side is re-evaluated whenever any signal on the right changes.

Verilog — always @(*) block (recommended for complex combinational logic):

always @(*) begin
    case (sel)
        2'b00:   y = d0;
        2'b01:   y = d1;
        2'b10:   y = d2;
        default: y = d3;
    endcase
end

always @(*) — re-evaluates whenever any signal read inside the block changes. In SystemVerilog, always_comb is equivalent and is preferred because the compiler enforces that no flip-flop is inferred. Use blocking assignments (=) inside combinational blocks — the assignment takes effect immediately within the block.

Always include a default: branch in every case statement (see Latch Inference below).

VHDL — concurrent signal assignment:

y <= d0 when sel = "00" else
     d1 when sel = "01" else
     d2 when sel = "10" else
     d3;

VHDL — combinational process:

process(all)   -- VHDL 2008: sensitivity list is automatic
begin
    case sel is
        when "00"   => y <= d0;
        when "01"   => y <= d1;
        when "10"   => y <= d2;
        when others => y <= d3;
    end case;
end process;

process(all) (VHDL 2008 and later, supported by Vivado and Quartus) automatically includes all signals read inside the process — equivalent to Verilog's always @(*). In VHDL 1993, list all inputs explicitly: process(sel, d0, d1, d2, d3). Omitting an input from the sensitivity list causes a simulation/synthesis mismatch — the simulation will not re-evaluate when the omitted signal changes, but the synthesised hardware will.

Describing Sequential (Clocked) Logic

Sequential logic updates on a clock edge. The synthesised hardware is a flip-flop that holds its output value between clock edges.

Verilog — D flip-flop with synchronous reset:

module dff #(parameter WIDTH = 8) (
    input  wire             clk,
    input  wire             rst_n,   // active-low synchronous reset
    input  wire [WIDTH-1:0] d,
    output reg  [WIDTH-1:0] q
);
    always @(posedge clk) begin
        if (!rst_n)
            q <= {WIDTH{1'b0}};   // reset to zero
        else
            q <= d;               // capture input on rising edge
    end
endmodule

always @(posedge clk) triggers only on the rising edge of clk. Inside a clocked block, use non-blocking assignments (<=). This is the most important rule in synthesisable Verilog.

Synchronous vs asynchronous reset: The example above uses a synchronous reset — the reset only takes effect at the next rising clock edge. An asynchronous reset uses always @(posedge clk or negedge rst_n) and takes effect immediately when rst_n goes low without waiting for a clock edge. Most modern FPGA synthesis targets prefer synchronous reset unless the device's flip-flop primitives include a dedicated asynchronous-reset pin. Follow your device's synthesis guide recommendation.

VHDL — D flip-flop with synchronous reset:

architecture rtl of dff is
begin
    process(clk)
    begin
        if rising_edge(clk) then
            if rst_n = '0' then
                q <= (others => '0');
            else
                q <= d;
            end if;
        end if;
    end process;
end architecture;

rising_edge(clk) is the VHDL idiom for positive clock edge — equivalent to posedge clk in Verilog. In VHDL, all signal assignments inside a process are non-blocking by default: they take effect after the process suspends, not immediately. There is no blocking-vs-non-blocking distinction to manage.

Blocking vs Non-Blocking Assignments (Verilog)

This is the most common source of correctness bugs for Verilog beginners.

  • = (blocking) — the assignment executes and completes before the next statement executes. Use in combinational always blocks.
  • <= (non-blocking) — the right-hand side is evaluated immediately, but the assignment does not take effect until the end of the current time step, after all non-blocking right-hand sides have been evaluated. Use in clocked always blocks.

Why this matters — the two-register shift register:

// CORRECT: two flip-flops in series
always @(posedge clk) begin
    b <= a;   // non-blocking: reads a (current value), schedules b = a
    c <= b;   // non-blocking: reads b (OLD value, not yet updated), schedules c = b
end
// Result: b captures a; c captures the old value of b. Two-stage shift register. Correct.
// WRONG: collapses to one flip-flop
always @(posedge clk) begin
    b = a;    // blocking: b is updated immediately to a's value
    c = b;    // blocking: b already holds a's new value
end
// Result: both b and c receive a's current value. The intermediate register is lost.

Rule: use <= unconditionally inside clocked always blocks. Use = only in combinational blocks.

How to Avoid Inferring Latches

A latch is a level-sensitive storage element — its output follows its input whenever enabled, rather than only on a clock edge. In synchronous FPGA design, latches are almost always unintentional bugs. They are opaque to timing analysis and create combinational feedback paths that are extremely difficult to debug. Synthesis tools warn when a latch is inferred; treat every such warning as a bug.

A latch is inferred in a combinational block whenever a signal is not assigned in every branch of the logic:

// BAD: latch inferred for y — sel = 2'b10 and 2'b11 have no assignment
always @(*) begin
    case (sel)
        2'b00: y = d0;
        2'b01: y = d1;
    endcase
end

Fix — assign a default at the top of the block:

// GOOD: no latch — y is defined in every code path
always @(*) begin
    y = d3;          // default assignment before any branching
    case (sel)
        2'b00: y = d0;
        2'b01: y = d1;
        2'b10: y = d2;
        // sel = 2'b11 falls through to the default d3 above
    endcase
end

The same rule applies in VHDL: assign every output signal at the top of a combinational process before any if/case statement.

As combinational vs sequential logic explains, latches are level-sensitive and lack the clock edge requirement that allows static timing analysis to calculate setup and hold margins. This is why synthesis tools treat them as warnings — they are almost always bugs in a synchronous design.

Parameterised Designs

Parameterisation makes modules reusable at different widths or depths without code duplication.

Verilog parameter and localparam:

module fifo #(
    parameter DATA_WIDTH = 8,
    parameter DEPTH      = 16
) (
    input  wire                   clk, rst_n, wr_en, rd_en,
    input  wire [DATA_WIDTH-1:0]  wr_data,
    output reg  [DATA_WIDTH-1:0]  rd_data,
    output wire                   full, empty
);
    localparam ADDR_WIDTH = $clog2(DEPTH);  // address bits needed

    reg [DATA_WIDTH-1:0] mem [0:DEPTH-1];
    reg [ADDR_WIDTH:0]   wr_ptr, rd_ptr;   // extra bit for full/empty detection
    // ...
endmodule

localparam is a constant derived from other parameters — it cannot be overridden at instantiation time. $clog2() computes the ceiling of log₂, useful for deriving address widths from depth parameters.

Important: synthesisable for loops are fully unrolled. A for loop in an always @(*) or always @(posedge clk) block generates parallel hardware — the loop body becomes N independent logic paths. An 8-iteration loop does not execute sequentially in hardware; all 8 iterations exist simultaneously as logic. Use loops deliberately for replication, not for sequential operations.

Writing a Simulation Test Bench

A test bench is a non-synthesisable file that drives inputs to the design under test (DUT) and observes its outputs. It uses behavioural constructs — delays, initial blocks, file I/O — that the synthesis tool ignores.

Verilog test bench for the 4-to-1 MUX:

`timescale 1ns / 1ps   // time unit / time precision

module mux4to1_tb;

    // DUT interface signals
    reg  [1:0] sel;
    reg  [7:0] d0, d1, d2, d3;
    wire [7:0] y;

    // Instantiate the DUT
    mux4to1 #(.WIDTH(8)) dut (
        .sel(sel), .d0(d0), .d1(d1), .d2(d2), .d3(d3), .y(y)
    );

    // Stimulus — not synthesisable
    initial begin
        $dumpfile("mux4to1_tb.vcd");    // waveform dump (GTKWave)
        $dumpvars(0, mux4to1_tb);

        d0 = 8'hAA; d1 = 8'hBB; d2 = 8'hCC; d3 = 8'hDD;

        sel = 2'b00; #10;  // expect y = 0xAA
        sel = 2'b01; #10;  // expect y = 0xBB
        sel = 2'b10; #10;  // expect y = 0xCC
        sel = 2'b11; #10;  // expect y = 0xDD

        $display("Simulation complete");
        $finish;
    end

endmodule

Key behavioural-only constructs (do not use in synthesisable RTL):

  • `timescale — sets simulation time unit and precision.
  • initial begin...end — runs once from simulation time 0.
  • #10 — waits 10 time units (10 ns with a 1ns timescale).
  • $dumpfile / $dumpvars — write signal values to a VCD file viewable in GTKWave.
  • $display / $finish — print to the console and end simulation.

VHDL test bench:

library ieee;
use ieee.std_logic_1164.all;

entity mux4to1_tb is
end entity;

architecture sim of mux4to1_tb is
    signal sel            : std_logic_vector(1 downto 0);
    signal d0, d1, d2, d3 : std_logic_vector(7 downto 0);
    signal y              : std_logic_vector(7 downto 0);
begin
    dut : entity work.mux4to1
        generic map (WIDTH => 8)
        port map (sel => sel, d0 => d0, d1 => d1,
                  d2 => d2, d3 => d3, y => y);

    process
    begin
        d0 <= x"AA"; d1 <= x"BB"; d2 <= x"CC"; d3 <= x"DD";
        sel <= "00"; wait for 10 ns;
        sel <= "01"; wait for 10 ns;
        sel <= "10"; wait for 10 ns;
        sel <= "11"; wait for 10 ns;
        wait;   -- halt simulation
    end process;
end architecture;

In VHDL test benches, wait for 10 ns; replaces Verilog's #10, and a bare wait; halts simulation without $finish.

Simulation tools that accept Verilog: Icarus Verilog (open-source), Verilator (open-source, compiles RTL to C++), Vivado Simulator, ModelSim/Questa, VCS. VHDL simulators: GHDL (open-source), ModelSim/Questa, Vivado Simulator.

How Synthesis Turns HDL Into Hardware

Understanding how the synthesis tool interprets HDL prevents common mistakes.

  1. Synthesis infers hardware from patterns in your HDL. An always @(posedge clk) block infers flip-flops. An always @(*) block and assign statements infer combinational logic. The tool does not execute your HDL; it analyses its structure.

  2. Behavioural simulation does not guarantee synthesis correctness. Blocking vs non-blocking assignment misuse, latch inference, and incomplete sensitivity lists can produce code that simulates correctly in the behavioural model but synthesises to the wrong hardware. Run synthesis and review all warnings before treating the design as correct.

  3. Post-synthesis and post-route simulation run your test bench against the synthesised netlist rather than RTL. Post-synthesis simulation catches mismatches between RTL behaviour and the gate-level netlist. Post-route simulation adds propagation delays and catches timing violations.

  4. Timing analysis determines maximum clock frequency. The synthesis and place-and-route tools run Static Timing Analysis (STA) to measure the longest combinational path between any two flip-flops — the critical path. STA determines whether the design meets the target clock frequency with adequate setup-time margin. Simulation does not show timing violations; STA does.

  5. Synthesis warnings are design bugs. A warning about an inferred latch, a combinational loop, an undriven output, or a width mismatch describes hardware that does not match your intent. Address every synthesis warning before moving to place-and-route or hardware testing. For debugging on hardware, JTAG and embedded logic analysers (Xilinx ILA, Intel SignalTap) provide post-synthesis hardware observability.

Design Considerations

  • Choose one language and use it consistently. Mixed-language projects (some files Verilog, some VHDL) are supported by Vivado and Quartus but complicate simulation, peer review, and IP integration. Choose one language per project and apply it uniformly.
  • Simulate exhaustively before synthesising. Simulation is the most efficient environment for catching logic bugs: you can inspect every signal at every cycle, insert $display assertions, and replay failing scenarios instantly. Debugging on hardware using JTAG and an embedded logic analyser is slower and catches different classes of problems (timing, glitches, board noise). Do not use hardware bring-up to find logic bugs that simulation would have caught in minutes.
  • Write synthesisable RTL from the start in module files. Synthesis tools support a well-defined synthesisable subset of Verilog/VHDL. Avoid constructs that simulate but do not synthesise — delays (#10), initial blocks, time-type signals, $display — in any file that will be passed to the synthesiser. Reserve behavioural constructs for test bench files.
  • for loops in RTL generate parallel hardware, not sequential iterations. A for (i=0; i<16; i++) loop in a combinational block generates 16 parallel logic paths. At 32 bits wide, that is 512 bits of logic operating simultaneously. Use parameterised loops deliberately and verify the resource cost in the synthesis utilisation report.
  • Clock domain crossing (CDC) is not caught by synthesis or simulation alone. Signals crossing between independent clock domains without a synchroniser cause metastability failures — intermittent, temperature-sensitive, and almost impossible to reproduce in simulation. Use a dual-flop synchroniser for single-bit control signals and an asynchronous FIFO for multi-bit data transfers. Review all CDC paths manually or with a dedicated CDC analysis tool. The FPGA clock domain crossing forum thread illustrates how these failures manifest in practice.
  • State machine implementation: For Verilog state machine coding style — Moore vs Mealy output encoding, one-hot vs binary state register, always-block separation between state register and next-state logic — see How Do You Design a Finite State Machine?, which includes a worked Verilog FSM implementation.
  • For FPGA RTL development, timing closure, and hardware bring-up, Zeus Design's engineering team provides HDL design services from first RTL through to production-ready FPGA integration.

Common Mistakes

  • Using blocking assignments (=) in clocked always blocks. In simple single-register designs this produces correct simulation, masking the bug. In multi-register designs (shift registers, pipelines, state machines with update logic), blocking assignments collapse sequential stages into a single cycle and produce wrong hardware. Use <= unconditionally inside every clocked block.
  • Incomplete sensitivity lists in Verilog 2001. An always @(a, b) block that omits signal c used inside it will not re-evaluate when c changes in simulation — but the synthesised hardware will respond to c immediately, creating a simulation/synthesis mismatch. Use always @(*) (or always_comb in SystemVerilog) to avoid managing sensitivity lists manually.
  • Ignoring synthesis latch warnings. Every synthesis warning about an inferred latch is a design bug: an output has a code path where it is not assigned, and the tool inserted a latch to hold the last value. Add a default assignment at the top of every combinational block. This is not a stylistic preference — it directly determines whether the synthesised hardware matches the intended logic.
  • Treating HDL as software. A for loop in synthesisable RTL is fully unrolled parallel hardware. A function call in Verilog is inlined. There is no instruction stream, no stack, no heap, and no branch predictor. Every statement describes hardware that exists simultaneously with all other hardware in the module. Engineers coming from software development need to build a parallel hardware mental model before their HDL is reliable.
  • Skipping simulation before targeting hardware. Loading a bitstream is not the first step — it is the last. Write a test bench, run simulation (Icarus Verilog, GHDL, Vivado Simulator), and verify cycle-by-cycle behaviour before synthesising. Hardware bring-up catches different problems (PCB noise, configuration sequencing, timing across physical paths); it should not be where you find logic bugs.
  • Mismatched types in VHDL arithmetic. VHDL's type system requires explicit casts between std_logic_vector, unsigned, and signed. Adding a std_logic_vector to another without casting produces a type error. Use ieee.numeric_std exclusively — convert to unsigned or signed for arithmetic, then convert back to std_logic_vector for port assignments. The deprecated std_logic_arith and std_logic_unsigned packages from Synopsys produce inconsistent results across tools and should not be used in new designs.

Frequently Asked Questions

Should I learn Verilog or VHDL for FPGA design?
For most new projects, start with Verilog (or SystemVerilog, its superset). Verilog has C-like syntax, is dominant in North American and Asian industry, and has the widest open-source toolchain support — Icarus Verilog, Verilator, Yosys, and most vendor example code are Verilog-first. VHDL has more syntax overhead (explicit type conversions, library declarations, verbose concurrent statements) but is common in European organisations, aerospace, and defence, where its strong typing catches bugs at compile time. If your target employer, IP ecosystem, or existing project mandates VHDL, learn VHDL. Otherwise Verilog is the practical starting point with the shortest path to a working FPGA design.
What is the difference between RTL and behavioural HDL?
RTL (Register Transfer Level) describes your design cycle-by-cycle as registers and the combinational logic that computes their next values. RTL is the standard abstraction for synthesisable FPGA and ASIC design — synthesis tools map RTL directly to gates and flip-flops. Behavioural HDL uses constructs (delays, timing statements, initial blocks, file I/O) that describe what the circuit does rather than what hardware implements it. Behavioural constructs are used in simulation test benches and do not synthesise. The practical rule: synthesisable module files must contain only RTL constructs; test bench files can use behavioural constructs freely.
Why does my synthesis tool warn about a latch where I didn't intend one?
A latch is inferred in a combinational always @(*) block or VHDL process when a signal is not assigned in all branches of the logic — the synthesis tool must hold the output for the unassigned case, which requires a latch. The most common causes are an if statement without an else clause, a case statement without a default, or a signal assigned in some branch combinations but not all. Fix by adding a default assignment at the top of the always block before the if/case logic — this ensures the synthesis tool sees a defined value for every output in every path. In VHDL, assign every output signal at the start of the combinational process before any if/case statement.

References

Related Questions

Related Forum Discussions