Electronics Design AU
Firmware

What Does Embedded Startup Code Do Before main()?

Last updated 28 June 2026 · 9 min read

Direct Answer

Embedded startup code runs between MCU reset and the first line of main(). On a Cortex-M (using the STM32 vendor startup file as the example): (1) the hardware loads the initial stack pointer and reset handler address from the vector table; (2) the reset handler calls SystemInit() to configure the clock tree; (3) startup code copies the .data section from flash to RAM using linker symbols (_sidata, _sdata, _edata); (4) it zeros the .bss section from _sbss to _ebss; (5) for C++ projects, __libc_init_array() runs static constructors; (6) main() is called. If any step is absent or incorrect, globals with initialisers contain garbage, zero-init variables are non-zero, or clocks run at the wrong frequency.

Detailed Explanation

When you write int main() { ... } in firmware, you assume the stack is valid, global variables have their initial values, and the MCU is running at the correct clock speed. None of those assumptions are true immediately after a hardware reset. The startup file — a small assembly program that runs before main() — makes them true.

Every Cortex-M firmware project contains a startup file. In STM32CubeMX projects it is named something like startup_stm32f405rgtx.s. Most engineers open it once, close it immediately, and never look again. Understanding what it does explains a significant class of initialisation bugs.

Step 0: Hardware Reset (Automatic)

Before a single instruction from the startup file executes, the Cortex-M hardware does two things automatically:

  1. Reads the initial stack pointer from address 0x00000000 (the first word of the vector table, remapped to flash via the boot alias) and loads it into the SP register.
  2. Reads the reset handler address from address 0x00000004 (the second word of the vector table) and jumps to it.

These two values are placed at the very start of the flash binary by the linker script's KEEP(*(.isr_vector)) directive. If either is wrong, the MCU faults before executing a single line of startup code.

Step 1: Reset Handler Entry

The reset handler is the first code that runs. A condensed view of a typical STM32 startup file:

g_pfnVectors:
  .word  _estack          /* Initial SP: top of RAM */
  .word  Reset_Handler    /* First instruction after reset */
  /* ... remaining interrupt vectors ... */

Reset_Handler:
  ldr   sp, =_estack      /* Reload SP (safety: handles software reset entry) */
  bl    SystemInit         /* Configure clock tree */
  /* Copy .data and zero .bss (see below) */
  bl    __libc_init_array  /* C++ global constructors */
  bl    main               /* Enter application */
  b     .                  /* Infinite loop if main() returns */

_estack is the linker-generated symbol for the top of RAM — the highest valid stack address on this device. Setting SP to this value means the stack grows downward from the top of RAM, which is the Cortex-M convention.

Step 2: SystemInit() — Clock Configuration

SystemInit() is a vendor-supplied C function in system_stm32f4xx.c. It runs at the default post-reset clock frequency (typically the internal HSI RC oscillator, commonly 16 MHz on STM32F4 — confirm in the specific device's reference manual) and brings the MCU to its target operating frequency.

On a typical STM32F405 running at 168 MHz, SystemInit():

  1. Resets the RCC registers to their reset defaults.
  2. Enables HSE (the external crystal oscillator) and waits for it to lock.
  3. Configures the PLL with the correct M/N/P dividers to generate 168 MHz from HSE.
  4. Sets Flash wait states to match the new CPU frequency (5 wait states are typical at 168 MHz — the STM32F4 reference manual RM0090 specifies the required wait states per frequency range).
  5. Switches SYSCLK to PLL.
  6. Updates the global SystemCoreClock variable to 168000000.

If SystemInit() is absent or misconfigured, every subsequent timing calculation is wrong. HAL delays use SystemCoreClock to calibrate SysTick — if the MCU is running at 16 MHz but SystemCoreClock says 168 MHz, HAL_Delay(1) produces a delay roughly 10× too short. UART baud rate generators use the peripheral clock, which is derived from SYSCLK — wrong SYSCLK produces wrong baud rates on every peripheral simultaneously. For the full breakdown of the clock sources (HSE/HSI/PLL), AHB/APB bus dividers, and flash wait-state requirements that SystemInit() configures, see how the STM32 clock tree works.

Step 3: Copy .data from Flash to RAM

After clock configuration, startup code copies every explicitly-initialised global and static variable from flash to RAM.

In C (the startup file implements this in assembly):

uint32_t *src = &_sidata;   /* LMA: initial values stored in flash binary */
uint32_t *dst = &_sdata;    /* VMA: where variables live at runtime in RAM */
while (dst < &_edata) {
    *dst++ = *src++;
}

_sidata, _sdata, and _edata are linker-generated symbols. _sidata is the flash address where the .data section's initial values are stored (set by LOADADDR(.data) in the linker script). _sdata and _edata mark the start and end of the RAM range where those variables live at runtime.

Example: int retry_count = 3; compiles to the value 3 stored in flash at _sidata + some_offset. After the copy loop, RAM address _sdata + some_offset also contains 3, and all subsequent reads and writes of retry_count use the RAM address.

Without this step, every global variable with an explicit initialiser contains whatever power-on garbage was in RAM. The code compiles and links without error; the bug is invisible until runtime. Typical symptom: unpredictable behaviour that changes between power cycles.

Step 4: Zero-Initialise .bss

uint32_t *p = &_sbss;
while (p < &_ebss) {
    *p++ = 0;
}

The .bss section holds global and static variables declared without an explicit initialiser: int counter;, static uint8_t rx_buffer[512];. The C standard requires these to be zero at program start. RAM is not guaranteed to contain zero after power-on. The startup code fulfils the C standard's guarantee by writing zero across the entire [_sbss, _ebss) range.

Without this step, uninitialised globals contain random RAM contents. Bugs are intermittent: on a cold power cycle, RAM content is somewhat random; on a warm reset, RAM retains the previous session's values. A check like if (first_run_flag == 0) may or may not detect first boot correctly depending on the previous RAM state.

Step 5: __libc_init_array() — C++ Constructors

__libc_init_array();

If the firmware uses C++ with global objects, each object's constructor must run before main(). The linker collects constructor function pointers into the .init_array section; __libc_init_array() walks that table and calls each constructor in order.

For pure-C firmware, this call is a no-op — the .init_array section is empty. It is always safe to include and is present in all ST-supplied startup files.

The order matters: .data copy must complete before .bss zero, and both must complete before __libc_init_array(). A C++ object whose constructor reads a zero-initialised global must find that global already zeroed, not still containing RAM garbage.

Step 6: main()

After all initialisation is complete, the startup code branches to main():

bl    main     /* call application entry point */
b     .        /* spin if main() returns (should not happen) */

At this point, main() can rely on:

  • A valid stack pointer.
  • All global variables having correct initial values.
  • The MCU running at its target frequency.
  • C++ global objects being fully constructed.

See what are interrupts in embedded systems for how the interrupt vector table — whose address is placed at the start of flash by the linker script — connects hardware interrupt events to C handler functions once the application is running.

Inspecting Startup Completion in the Debugger

A practical technique: set a breakpoint at the opening brace of main() and inspect global variable values in the debugger's variable watch before any application code runs. If .data copy worked correctly, explicitly-initialised globals should already contain their initialiser values. If they contain 0xCCCCCCCC, 0xDEADBEEF, or other patterns, the startup code's copy loop used wrong linker symbol names.

To check .bss zeroing: inspect the raw memory view at _sbss. All bytes should be 0x00. Non-zero values indicate the zero-init loop was skipped or used the wrong symbol range.

For embedded firmware development involving custom startup sequences, clock configuration, or hardware bring-up, Zeus Design's firmware team delivers complete firmware stacks from reset handler to application layer.

Design Considerations

  • Do not use global variables in SystemInit(). SystemInit() runs before .data is copied to RAM — any global variable it reads still contains uninitialised RAM content. SystemInit() should use only local variables and direct register writes.
  • Verify the FPU is enabled before any floating-point operations. On Cortex-M4F and M7 devices, the floating-point unit is disabled at reset. STM32's SystemInit() enables it via the CPACR register. If a global C++ object's constructor performs floating-point arithmetic before SystemInit() enables the FPU, it generates a fault. Ensure SystemInit() runs before __libc_init_array().
  • Semihosting in startup code will hang without a connected probe. ARM semihosting (printf via SWO before any UART is configured) requires a debug probe to intercept the syscall. On a standalone board, a semihosting call in startup code stalls the MCU indefinitely. Replace with UART or SWO ITM output for field firmware.

Common Mistakes

  • Calling application code from SystemInit(). Functions called from SystemInit() execute before .data and .bss are initialised. Reading any initialised global variable in this window produces garbage.
  • Mismatched linker symbol names. If the startup assembly references _start_data but the linker script exports _sdata, the link may succeed (the assembler may treat unknown symbols as zero), but the .data copy reads from address 0x00000000 and writes to the wrong RAM location. The MCU boots silently with corrupted global state.
  • Forgetting __libc_init_array() in a C++ project. Without this call, C++ global objects are not constructed. Their member functions will operate on uninitialised member variables. The bug is often intermittent and dependent on RAM power-on content.
  • Using the wrong startup file for the MCU variant. Each STM32 device has a specific startup file because _estack (the RAM top address), the vector table length, and FPU initialisation requirements differ between variants. Using the STM32F405 startup file for an STM32F401 (which has less RAM) results in a stack pointer pointing past the end of physical RAM.
  • Assuming warm resets re-run startup code. On most embedded systems, a warm reset (NVIC_SystemReset() or watchdog reset) does re-execute the startup file from the reset handler. However, RAM retains its values — .bss will be zeroed to zero again, which is correct, but RAM used as a communication channel between the old and new firmware (e.g. a boot flag in a specific RAM location) must be in a section explicitly excluded from the .bss zero loop.

Frequently Asked Questions

Can I skip SystemInit() and configure clocks in main() instead?
Technically yes, but there is a window of risk. Between reset and the clock configuration call in main(), the MCU runs at its default post-reset oscillator frequency — typically the internal RC oscillator at 4–16 MHz depending on the device (check the specific MCU reference manual). Any code that runs in this window — global constructors, early peripheral init — executes at the default speed. On STM32, HAL_Init() expects SysTick to be calibrated to the actual CPU clock; if SysTick is configured before the clock is set, HAL timeout calculations are wrong. ST places SystemInit() in the startup file to eliminate this window.
What happens if main() returns in bare-metal firmware?
There is no OS to return to. The GCC arm-none-eabi startup file handles this with an infinite loop immediately after the 'bl main' instruction — the program counter spins indefinitely rather than running off into unmapped memory or past the end of flash, which would cause a hard fault. Some projects replace this with NVIC_SystemReset() for an emergency restart, or deliberately let the [watchdog timer](/questions/what-is-a-watchdog-timer) timeout to trigger a controlled reset.
Why is the startup file written in assembly rather than C?
The earliest startup operations must run before the C runtime is valid — before the stack pointer is set and before globals can be used. Assembly gives precise control over those first instructions. Setting SP and branching to SystemInit() must happen in a specific order that a C compiler with optimisations enabled might not preserve. Once the stack is valid and the minimum environment is established, all remaining work is done via C function calls.

References

Related Questions

Related Forum Discussions