How Do You Write Verilog and VHDL for an FPGA?
Last updated 30 June 2026 · 16 min read
Direct Answer
RTL (Register Transfer Level) HDL describes hardware structure, not software instructions. In Verilog, a module declares input/output ports and internal logic: combinational logic uses assign statements or always @(*) blocks with blocking assignments (=); sequential (clocked) logic uses always @(posedge clk) blocks with non-blocking assignments (<=). In VHDL, the equivalent is an entity with port declarations and an architecture body with concurrent signal assignments and process statements. Three fundamental rules for synthesisable HDL: always use non-blocking assignments (<=) inside clocked blocks; assign a default to every output in combinational blocks to avoid inferring unintentional latches; and simulate your design before synthesising — synthesis tools infer hardware structure from HDL and cannot catch logic bugs, only structural errors.
Detailed Explanation
HDL (Hardware Description Language) is fundamentally different from software programming. A C function describes a sequence of instructions executed one at a time on a CPU. A Verilog module or VHDL entity describes hardware structure — wires connecting logic elements and registers that all exist simultaneously and operate in parallel. The synthesis tool reads your HDL and infers what physical hardware to build from it: which LUTs to configure, which flip-flops to use, how to route signals between them. Understanding this distinction — that you are describing hardware, not writing software — is the prerequisite for writing correct, synthesisable HDL.
For background on how FPGAs implement logic (LUTs, BRAM, DSP blocks) and the full synthesis-to-bitstream flow, see What Is an FPGA and How Does It Work?. This page covers the practical HDL writing skills that come after that foundation. After writing and simulating your HDL, the next step is running the toolchain — for synthesis, place-and-route, timing constraints, and bitstream generation, see FPGA Development Flow: From HDL to Working Hardware.
The Verilog Module
A Verilog module is the fundamental building block. Every module has a port list (its interface) and a body (its implementation):
module mux4to1 #(parameter WIDTH = 8) (
input wire [1:0] sel,
input wire [WIDTH-1:0] d0,
input wire [WIDTH-1:0] d1,
input wire [WIDTH-1:0] d2,
input wire [WIDTH-1:0] d3,
output reg [WIDTH-1:0] y
);
// implementation here
endmodule
Key elements:
module/endmoduledelimit the module.#(parameter WIDTH = 8)declares a parameterised width with a default of 8 — the same module can be instantiated at different widths without duplicating code.- Port directions are
input,output, orinout. wireis a combinational net driven byassignstatements or module ports.regis a signal that can be assigned inside analwaysblock — in a combinational block it synthesises to a wire; in a clocked block it synthesises to a flip-flop.- In SystemVerilog (supported by Vivado and Quartus),
logicreplaces bothwireandreg. The tool infers the correct hardware from context, and multi-driver conflicts are a compiler error rather than a silent bug.
The VHDL Entity and Architecture
VHDL separates the interface (entity) from the implementation (architecture):
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity mux4to1 is
generic (WIDTH : natural := 8);
port (
sel : in std_logic_vector(1 downto 0);
d0 : in std_logic_vector(WIDTH-1 downto 0);
d1 : in std_logic_vector(WIDTH-1 downto 0);
d2 : in std_logic_vector(WIDTH-1 downto 0);
d3 : in std_logic_vector(WIDTH-1 downto 0);
y : out std_logic_vector(WIDTH-1 downto 0)
);
end entity;
architecture rtl of mux4to1 is
-- internal signal declarations here
begin
-- implementation here
end architecture;
Key elements:
library ieee; use ieee.std_logic_1164.all;— always required; definesstd_logic(the standard 9-value logic type including high-impedance'Z'and undefined'X').use ieee.numeric_std.all;— required for arithmetic. Useunsignedorsignedfor arithmetic vectors, notstd_logic_vectordirectly. Avoid the deprecatedstd_logic_arithandstd_logic_unsignedpackages.genericis the VHDL equivalent of Verilog'sparameter.- Internal signals are declared between
architecture rtl of mux4to1 isandbegin, assignal name : type;. - The architecture name (
rtl) is arbitrary;rtlis the conventional name for a synthesisable implementation.
Describing Combinational Logic
Combinational logic outputs depend only on current inputs — no clock, no state.
Verilog — assign statement (suitable for simple expressions):
// 4-to-1 mux with a conditional expression
assign y = (sel == 2'b00) ? d0 :
(sel == 2'b01) ? d1 :
(sel == 2'b10) ? d2 : d3;
assign drives a wire (or logic). The right-hand side is re-evaluated whenever any signal on the right changes.
Verilog — always @(*) block (recommended for complex combinational logic):
always @(*) begin
case (sel)
2'b00: y = d0;
2'b01: y = d1;
2'b10: y = d2;
default: y = d3;
endcase
end
always @(*) — re-evaluates whenever any signal read inside the block changes. In SystemVerilog, always_comb is equivalent and is preferred because the compiler enforces that no flip-flop is inferred. Use blocking assignments (=) inside combinational blocks — the assignment takes effect immediately within the block.
Always include a default: branch in every case statement (see Latch Inference below).
VHDL — concurrent signal assignment:
y <= d0 when sel = "00" else
d1 when sel = "01" else
d2 when sel = "10" else
d3;
VHDL — combinational process:
process(all) -- VHDL 2008: sensitivity list is automatic
begin
case sel is
when "00" => y <= d0;
when "01" => y <= d1;
when "10" => y <= d2;
when others => y <= d3;
end case;
end process;
process(all) (VHDL 2008 and later, supported by Vivado and Quartus) automatically includes all signals read inside the process — equivalent to Verilog's always @(*). In VHDL 1993, list all inputs explicitly: process(sel, d0, d1, d2, d3). Omitting an input from the sensitivity list causes a simulation/synthesis mismatch — the simulation will not re-evaluate when the omitted signal changes, but the synthesised hardware will.
Describing Sequential (Clocked) Logic
Sequential logic updates on a clock edge. The synthesised hardware is a flip-flop that holds its output value between clock edges.
Verilog — D flip-flop with synchronous reset:
module dff #(parameter WIDTH = 8) (
input wire clk,
input wire rst_n, // active-low synchronous reset
input wire [WIDTH-1:0] d,
output reg [WIDTH-1:0] q
);
always @(posedge clk) begin
if (!rst_n)
q <= {WIDTH{1'b0}}; // reset to zero
else
q <= d; // capture input on rising edge
end
endmodule
always @(posedge clk) triggers only on the rising edge of clk. Inside a clocked block, use non-blocking assignments (<=). This is the most important rule in synthesisable Verilog.
Synchronous vs asynchronous reset: The example above uses a synchronous reset — the reset only takes effect at the next rising clock edge. An asynchronous reset uses always @(posedge clk or negedge rst_n) and takes effect immediately when rst_n goes low without waiting for a clock edge. Most modern FPGA synthesis targets prefer synchronous reset unless the device's flip-flop primitives include a dedicated asynchronous-reset pin. Follow your device's synthesis guide recommendation.
VHDL — D flip-flop with synchronous reset:
architecture rtl of dff is
begin
process(clk)
begin
if rising_edge(clk) then
if rst_n = '0' then
q <= (others => '0');
else
q <= d;
end if;
end if;
end process;
end architecture;
rising_edge(clk) is the VHDL idiom for positive clock edge — equivalent to posedge clk in Verilog. In VHDL, all signal assignments inside a process are non-blocking by default: they take effect after the process suspends, not immediately. There is no blocking-vs-non-blocking distinction to manage.
Blocking vs Non-Blocking Assignments (Verilog)
This is the most common source of correctness bugs for Verilog beginners.
=(blocking) — the assignment executes and completes before the next statement executes. Use in combinationalalwaysblocks.<=(non-blocking) — the right-hand side is evaluated immediately, but the assignment does not take effect until the end of the current time step, after all non-blocking right-hand sides have been evaluated. Use in clockedalwaysblocks.
Why this matters — the two-register shift register:
// CORRECT: two flip-flops in series
always @(posedge clk) begin
b <= a; // non-blocking: reads a (current value), schedules b = a
c <= b; // non-blocking: reads b (OLD value, not yet updated), schedules c = b
end
// Result: b captures a; c captures the old value of b. Two-stage shift register. Correct.
// WRONG: collapses to one flip-flop
always @(posedge clk) begin
b = a; // blocking: b is updated immediately to a's value
c = b; // blocking: b already holds a's new value
end
// Result: both b and c receive a's current value. The intermediate register is lost.
Rule: use <= unconditionally inside clocked always blocks. Use = only in combinational blocks.
How to Avoid Inferring Latches
A latch is a level-sensitive storage element — its output follows its input whenever enabled, rather than only on a clock edge. In synchronous FPGA design, latches are almost always unintentional bugs. They are opaque to timing analysis and create combinational feedback paths that are extremely difficult to debug. Synthesis tools warn when a latch is inferred; treat every such warning as a bug.
A latch is inferred in a combinational block whenever a signal is not assigned in every branch of the logic:
// BAD: latch inferred for y — sel = 2'b10 and 2'b11 have no assignment
always @(*) begin
case (sel)
2'b00: y = d0;
2'b01: y = d1;
endcase
end
Fix — assign a default at the top of the block:
// GOOD: no latch — y is defined in every code path
always @(*) begin
y = d3; // default assignment before any branching
case (sel)
2'b00: y = d0;
2'b01: y = d1;
2'b10: y = d2;
// sel = 2'b11 falls through to the default d3 above
endcase
end
The same rule applies in VHDL: assign every output signal at the top of a combinational process before any if/case statement.
As combinational vs sequential logic explains, latches are level-sensitive and lack the clock edge requirement that allows static timing analysis to calculate setup and hold margins. This is why synthesis tools treat them as warnings — they are almost always bugs in a synchronous design.
Parameterised Designs
Parameterisation makes modules reusable at different widths or depths without code duplication.
Verilog parameter and localparam:
module fifo #(
parameter DATA_WIDTH = 8,
parameter DEPTH = 16
) (
input wire clk, rst_n, wr_en, rd_en,
input wire [DATA_WIDTH-1:0] wr_data,
output reg [DATA_WIDTH-1:0] rd_data,
output wire full, empty
);
localparam ADDR_WIDTH = $clog2(DEPTH); // address bits needed
reg [DATA_WIDTH-1:0] mem [0:DEPTH-1];
reg [ADDR_WIDTH:0] wr_ptr, rd_ptr; // extra bit for full/empty detection
// ...
endmodule
localparam is a constant derived from other parameters — it cannot be overridden at instantiation time. $clog2() computes the ceiling of log₂, useful for deriving address widths from depth parameters.
Important: synthesisable for loops are fully unrolled. A for loop in an always @(*) or always @(posedge clk) block generates parallel hardware — the loop body becomes N independent logic paths. An 8-iteration loop does not execute sequentially in hardware; all 8 iterations exist simultaneously as logic. Use loops deliberately for replication, not for sequential operations.
Writing a Simulation Test Bench
A test bench is a non-synthesisable file that drives inputs to the design under test (DUT) and observes its outputs. It uses behavioural constructs — delays, initial blocks, file I/O — that the synthesis tool ignores.
Verilog test bench for the 4-to-1 MUX:
`timescale 1ns / 1ps // time unit / time precision
module mux4to1_tb;
// DUT interface signals
reg [1:0] sel;
reg [7:0] d0, d1, d2, d3;
wire [7:0] y;
// Instantiate the DUT
mux4to1 #(.WIDTH(8)) dut (
.sel(sel), .d0(d0), .d1(d1), .d2(d2), .d3(d3), .y(y)
);
// Stimulus — not synthesisable
initial begin
$dumpfile("mux4to1_tb.vcd"); // waveform dump (GTKWave)
$dumpvars(0, mux4to1_tb);
d0 = 8'hAA; d1 = 8'hBB; d2 = 8'hCC; d3 = 8'hDD;
sel = 2'b00; #10; // expect y = 0xAA
sel = 2'b01; #10; // expect y = 0xBB
sel = 2'b10; #10; // expect y = 0xCC
sel = 2'b11; #10; // expect y = 0xDD
$display("Simulation complete");
$finish;
end
endmodule
Key behavioural-only constructs (do not use in synthesisable RTL):
`timescale— sets simulation time unit and precision.initial begin...end— runs once from simulation time 0.#10— waits 10 time units (10 ns with a 1ns timescale).$dumpfile/$dumpvars— write signal values to a VCD file viewable in GTKWave.$display/$finish— print to the console and end simulation.
VHDL test bench:
library ieee;
use ieee.std_logic_1164.all;
entity mux4to1_tb is
end entity;
architecture sim of mux4to1_tb is
signal sel : std_logic_vector(1 downto 0);
signal d0, d1, d2, d3 : std_logic_vector(7 downto 0);
signal y : std_logic_vector(7 downto 0);
begin
dut : entity work.mux4to1
generic map (WIDTH => 8)
port map (sel => sel, d0 => d0, d1 => d1,
d2 => d2, d3 => d3, y => y);
process
begin
d0 <= x"AA"; d1 <= x"BB"; d2 <= x"CC"; d3 <= x"DD";
sel <= "00"; wait for 10 ns;
sel <= "01"; wait for 10 ns;
sel <= "10"; wait for 10 ns;
sel <= "11"; wait for 10 ns;
wait; -- halt simulation
end process;
end architecture;
In VHDL test benches, wait for 10 ns; replaces Verilog's #10, and a bare wait; halts simulation without $finish.
Simulation tools that accept Verilog: Icarus Verilog (open-source), Verilator (open-source, compiles RTL to C++), Vivado Simulator, ModelSim/Questa, VCS. VHDL simulators: GHDL (open-source), ModelSim/Questa, Vivado Simulator.
How Synthesis Turns HDL Into Hardware
Understanding how the synthesis tool interprets HDL prevents common mistakes.
-
Synthesis infers hardware from patterns in your HDL. An
always @(posedge clk)block infers flip-flops. Analways @(*)block andassignstatements infer combinational logic. The tool does not execute your HDL; it analyses its structure. -
Behavioural simulation does not guarantee synthesis correctness. Blocking vs non-blocking assignment misuse, latch inference, and incomplete sensitivity lists can produce code that simulates correctly in the behavioural model but synthesises to the wrong hardware. Run synthesis and review all warnings before treating the design as correct.
-
Post-synthesis and post-route simulation run your test bench against the synthesised netlist rather than RTL. Post-synthesis simulation catches mismatches between RTL behaviour and the gate-level netlist. Post-route simulation adds propagation delays and catches timing violations.
-
Timing analysis determines maximum clock frequency. The synthesis and place-and-route tools run Static Timing Analysis (STA) to measure the longest combinational path between any two flip-flops — the critical path. STA determines whether the design meets the target clock frequency with adequate setup-time margin. Simulation does not show timing violations; STA does.
-
Synthesis warnings are design bugs. A warning about an inferred latch, a combinational loop, an undriven output, or a width mismatch describes hardware that does not match your intent. Address every synthesis warning before moving to place-and-route or hardware testing. For debugging on hardware, JTAG and embedded logic analysers (Xilinx ILA, Intel SignalTap) provide post-synthesis hardware observability.
Design Considerations
- Choose one language and use it consistently. Mixed-language projects (some files Verilog, some VHDL) are supported by Vivado and Quartus but complicate simulation, peer review, and IP integration. Choose one language per project and apply it uniformly.
- Simulate exhaustively before synthesising. Simulation is the most efficient environment for catching logic bugs: you can inspect every signal at every cycle, insert
$displayassertions, and replay failing scenarios instantly. Debugging on hardware using JTAG and an embedded logic analyser is slower and catches different classes of problems (timing, glitches, board noise). Do not use hardware bring-up to find logic bugs that simulation would have caught in minutes. - Write synthesisable RTL from the start in module files. Synthesis tools support a well-defined synthesisable subset of Verilog/VHDL. Avoid constructs that simulate but do not synthesise — delays (
#10),initialblocks,time-type signals,$display— in any file that will be passed to the synthesiser. Reserve behavioural constructs for test bench files. forloops in RTL generate parallel hardware, not sequential iterations. Afor (i=0; i<16; i++)loop in a combinational block generates 16 parallel logic paths. At 32 bits wide, that is 512 bits of logic operating simultaneously. Use parameterised loops deliberately and verify the resource cost in the synthesis utilisation report.- Clock domain crossing (CDC) is not caught by synthesis or simulation alone. Signals crossing between independent clock domains without a synchroniser cause metastability failures — intermittent, temperature-sensitive, and almost impossible to reproduce in simulation. Use a dual-flop synchroniser for single-bit control signals and an asynchronous FIFO for multi-bit data transfers. Review all CDC paths manually or with a dedicated CDC analysis tool. The FPGA clock domain crossing forum thread illustrates how these failures manifest in practice.
- State machine implementation: For Verilog state machine coding style — Moore vs Mealy output encoding, one-hot vs binary state register, always-block separation between state register and next-state logic — see How Do You Design a Finite State Machine?, which includes a worked Verilog FSM implementation.
- For FPGA RTL development, timing closure, and hardware bring-up, Zeus Design's engineering team provides HDL design services from first RTL through to production-ready FPGA integration.
Common Mistakes
- Using blocking assignments (
=) in clockedalwaysblocks. In simple single-register designs this produces correct simulation, masking the bug. In multi-register designs (shift registers, pipelines, state machines with update logic), blocking assignments collapse sequential stages into a single cycle and produce wrong hardware. Use<=unconditionally inside every clocked block. - Incomplete sensitivity lists in Verilog 2001. An
always @(a, b)block that omits signalcused inside it will not re-evaluate whencchanges in simulation — but the synthesised hardware will respond tocimmediately, creating a simulation/synthesis mismatch. Usealways @(*)(oralways_combin SystemVerilog) to avoid managing sensitivity lists manually. - Ignoring synthesis latch warnings. Every synthesis warning about an inferred latch is a design bug: an output has a code path where it is not assigned, and the tool inserted a latch to hold the last value. Add a default assignment at the top of every combinational block. This is not a stylistic preference — it directly determines whether the synthesised hardware matches the intended logic.
- Treating HDL as software. A
forloop in synthesisable RTL is fully unrolled parallel hardware. A function call in Verilog is inlined. There is no instruction stream, no stack, no heap, and no branch predictor. Every statement describes hardware that exists simultaneously with all other hardware in the module. Engineers coming from software development need to build a parallel hardware mental model before their HDL is reliable. - Skipping simulation before targeting hardware. Loading a bitstream is not the first step — it is the last. Write a test bench, run simulation (Icarus Verilog, GHDL, Vivado Simulator), and verify cycle-by-cycle behaviour before synthesising. Hardware bring-up catches different problems (PCB noise, configuration sequencing, timing across physical paths); it should not be where you find logic bugs.
- Mismatched types in VHDL arithmetic. VHDL's type system requires explicit casts between
std_logic_vector,unsigned, andsigned. Adding astd_logic_vectorto another without casting produces a type error. Useieee.numeric_stdexclusively — convert tounsignedorsignedfor arithmetic, then convert back tostd_logic_vectorfor port assignments. The deprecatedstd_logic_arithandstd_logic_unsignedpackages from Synopsys produce inconsistent results across tools and should not be used in new designs.
Frequently Asked Questions
- Should I learn Verilog or VHDL for FPGA design?
- For most new projects, start with Verilog (or SystemVerilog, its superset). Verilog has C-like syntax, is dominant in North American and Asian industry, and has the widest open-source toolchain support — Icarus Verilog, Verilator, Yosys, and most vendor example code are Verilog-first. VHDL has more syntax overhead (explicit type conversions, library declarations, verbose concurrent statements) but is common in European organisations, aerospace, and defence, where its strong typing catches bugs at compile time. If your target employer, IP ecosystem, or existing project mandates VHDL, learn VHDL. Otherwise Verilog is the practical starting point with the shortest path to a working FPGA design.
- What is the difference between RTL and behavioural HDL?
- RTL (Register Transfer Level) describes your design cycle-by-cycle as registers and the combinational logic that computes their next values. RTL is the standard abstraction for synthesisable FPGA and ASIC design — synthesis tools map RTL directly to gates and flip-flops. Behavioural HDL uses constructs (delays, timing statements, initial blocks, file I/O) that describe what the circuit does rather than what hardware implements it. Behavioural constructs are used in simulation test benches and do not synthesise. The practical rule: synthesisable module files must contain only RTL constructs; test bench files can use behavioural constructs freely.
- Why does my synthesis tool warn about a latch where I didn't intend one?
- A latch is inferred in a combinational always @(*) block or VHDL process when a signal is not assigned in all branches of the logic — the synthesis tool must hold the output for the unassigned case, which requires a latch. The most common causes are an if statement without an else clause, a case statement without a default, or a signal assigned in some branch combinations but not all. Fix by adding a default assignment at the top of the always block before the if/case logic — this ensures the synthesis tool sees a defined value for every output in every path. In VHDL, assign every output signal at the start of the combinational process before any if/case statement.
References
Related Questions
What Is an FPGA and How Does It Work?
What is an FPGA, how do LUTs implement any logic function, when to choose FPGA vs MCU vs ASIC, and the basics of Verilog and VHDL for digital design.
FPGA vs Microcontroller vs ASIC: Which Should You Use?
Learn when to choose an FPGA, microcontroller, or ASIC — covering parallel vs sequential workloads, volume economics, NRE costs, and hybrid approaches.
FPGA Development Flow: From HDL to Working Hardware
Learn the complete FPGA development flow: synthesis, place-and-route, timing constraints, timing closure, and bitstream generation in Vivado and Quartus.
What Is the Difference Between Combinational and Sequential Logic?
How flip-flops differ from logic gates: sequential vs combinational logic, D flip-flop setup and hold time, registers, and clock domain crossing synchronisers.
How Do You Design a Finite State Machine?
FSM design: Moore vs Mealy machines, state diagrams, one-hot vs binary encoding, implementation in C switch statements and FPGA, and common design pitfalls.
What Are Logic Gates and How Do They Work?
Logic gates are the building blocks of digital circuits. AND, OR, NOT, NAND, NOR, XOR: truth tables, Boolean algebra, CMOS implementation, and universal gates.