What Does Embedded Startup Code Do Before main()?
Last updated 28 June 2026 · 9 min read
Direct Answer
Embedded startup code runs between MCU reset and the first line of main(). On a Cortex-M (using the STM32 vendor startup file as the example): (1) the hardware loads the initial stack pointer and reset handler address from the vector table; (2) the reset handler calls SystemInit() to configure the clock tree; (3) startup code copies the .data section from flash to RAM using linker symbols (_sidata, _sdata, _edata); (4) it zeros the .bss section from _sbss to _ebss; (5) for C++ projects, __libc_init_array() runs static constructors; (6) main() is called. If any step is absent or incorrect, globals with initialisers contain garbage, zero-init variables are non-zero, or clocks run at the wrong frequency.
Detailed Explanation
When you write int main() { ... } in firmware, you assume the stack is valid, global variables have their initial values, and the MCU is running at the correct clock speed. None of those assumptions are true immediately after a hardware reset. The startup file — a small assembly program that runs before main() — makes them true.
Every Cortex-M firmware project contains a startup file. In STM32CubeMX projects it is named something like startup_stm32f405rgtx.s. Most engineers open it once, close it immediately, and never look again. Understanding what it does explains a significant class of initialisation bugs.
Step 0: Hardware Reset (Automatic)
Before a single instruction from the startup file executes, the Cortex-M hardware does two things automatically:
- Reads the initial stack pointer from address 0x00000000 (the first word of the vector table, remapped to flash via the boot alias) and loads it into the SP register.
- Reads the reset handler address from address 0x00000004 (the second word of the vector table) and jumps to it.
These two values are placed at the very start of the flash binary by the linker script's KEEP(*(.isr_vector)) directive. If either is wrong, the MCU faults before executing a single line of startup code.
Step 1: Reset Handler Entry
The reset handler is the first code that runs. A condensed view of a typical STM32 startup file:
g_pfnVectors:
.word _estack /* Initial SP: top of RAM */
.word Reset_Handler /* First instruction after reset */
/* ... remaining interrupt vectors ... */
Reset_Handler:
ldr sp, =_estack /* Reload SP (safety: handles software reset entry) */
bl SystemInit /* Configure clock tree */
/* Copy .data and zero .bss (see below) */
bl __libc_init_array /* C++ global constructors */
bl main /* Enter application */
b . /* Infinite loop if main() returns */
_estack is the linker-generated symbol for the top of RAM — the highest valid stack address on this device. Setting SP to this value means the stack grows downward from the top of RAM, which is the Cortex-M convention.
Step 2: SystemInit() — Clock Configuration
SystemInit() is a vendor-supplied C function in system_stm32f4xx.c. It runs at the default post-reset clock frequency (typically the internal HSI RC oscillator, commonly 16 MHz on STM32F4 — confirm in the specific device's reference manual) and brings the MCU to its target operating frequency.
On a typical STM32F405 running at 168 MHz, SystemInit():
- Resets the RCC registers to their reset defaults.
- Enables HSE (the external crystal oscillator) and waits for it to lock.
- Configures the PLL with the correct M/N/P dividers to generate 168 MHz from HSE.
- Sets Flash wait states to match the new CPU frequency (5 wait states are typical at 168 MHz — the STM32F4 reference manual RM0090 specifies the required wait states per frequency range).
- Switches SYSCLK to PLL.
- Updates the global
SystemCoreClockvariable to 168000000.
If SystemInit() is absent or misconfigured, every subsequent timing calculation is wrong. HAL delays use SystemCoreClock to calibrate SysTick — if the MCU is running at 16 MHz but SystemCoreClock says 168 MHz, HAL_Delay(1) produces a delay roughly 10× too short. UART baud rate generators use the peripheral clock, which is derived from SYSCLK — wrong SYSCLK produces wrong baud rates on every peripheral simultaneously. For the full breakdown of the clock sources (HSE/HSI/PLL), AHB/APB bus dividers, and flash wait-state requirements that SystemInit() configures, see how the STM32 clock tree works.
Step 3: Copy .data from Flash to RAM
After clock configuration, startup code copies every explicitly-initialised global and static variable from flash to RAM.
In C (the startup file implements this in assembly):
uint32_t *src = &_sidata; /* LMA: initial values stored in flash binary */
uint32_t *dst = &_sdata; /* VMA: where variables live at runtime in RAM */
while (dst < &_edata) {
*dst++ = *src++;
}
_sidata, _sdata, and _edata are linker-generated symbols. _sidata is the flash address where the .data section's initial values are stored (set by LOADADDR(.data) in the linker script). _sdata and _edata mark the start and end of the RAM range where those variables live at runtime.
Example: int retry_count = 3; compiles to the value 3 stored in flash at _sidata + some_offset. After the copy loop, RAM address _sdata + some_offset also contains 3, and all subsequent reads and writes of retry_count use the RAM address.
Without this step, every global variable with an explicit initialiser contains whatever power-on garbage was in RAM. The code compiles and links without error; the bug is invisible until runtime. Typical symptom: unpredictable behaviour that changes between power cycles.
Step 4: Zero-Initialise .bss
uint32_t *p = &_sbss;
while (p < &_ebss) {
*p++ = 0;
}
The .bss section holds global and static variables declared without an explicit initialiser: int counter;, static uint8_t rx_buffer[512];. The C standard requires these to be zero at program start. RAM is not guaranteed to contain zero after power-on. The startup code fulfils the C standard's guarantee by writing zero across the entire [_sbss, _ebss) range.
Without this step, uninitialised globals contain random RAM contents. Bugs are intermittent: on a cold power cycle, RAM content is somewhat random; on a warm reset, RAM retains the previous session's values. A check like if (first_run_flag == 0) may or may not detect first boot correctly depending on the previous RAM state.
Step 5: __libc_init_array() — C++ Constructors
__libc_init_array();
If the firmware uses C++ with global objects, each object's constructor must run before main(). The linker collects constructor function pointers into the .init_array section; __libc_init_array() walks that table and calls each constructor in order.
For pure-C firmware, this call is a no-op — the .init_array section is empty. It is always safe to include and is present in all ST-supplied startup files.
The order matters: .data copy must complete before .bss zero, and both must complete before __libc_init_array(). A C++ object whose constructor reads a zero-initialised global must find that global already zeroed, not still containing RAM garbage.
Step 6: main()
After all initialisation is complete, the startup code branches to main():
bl main /* call application entry point */
b . /* spin if main() returns (should not happen) */
At this point, main() can rely on:
- A valid stack pointer.
- All global variables having correct initial values.
- The MCU running at its target frequency.
- C++ global objects being fully constructed.
See what are interrupts in embedded systems for how the interrupt vector table — whose address is placed at the start of flash by the linker script — connects hardware interrupt events to C handler functions once the application is running.
Inspecting Startup Completion in the Debugger
A practical technique: set a breakpoint at the opening brace of main() and inspect global variable values in the debugger's variable watch before any application code runs. If .data copy worked correctly, explicitly-initialised globals should already contain their initialiser values. If they contain 0xCCCCCCCC, 0xDEADBEEF, or other patterns, the startup code's copy loop used wrong linker symbol names.
To check .bss zeroing: inspect the raw memory view at _sbss. All bytes should be 0x00. Non-zero values indicate the zero-init loop was skipped or used the wrong symbol range.
For embedded firmware development involving custom startup sequences, clock configuration, or hardware bring-up, Zeus Design's firmware team delivers complete firmware stacks from reset handler to application layer.
Design Considerations
- Do not use global variables in SystemInit().
SystemInit()runs before.datais copied to RAM — any global variable it reads still contains uninitialised RAM content.SystemInit()should use only local variables and direct register writes. - Verify the FPU is enabled before any floating-point operations. On Cortex-M4F and M7 devices, the floating-point unit is disabled at reset. STM32's
SystemInit()enables it via the CPACR register. If a global C++ object's constructor performs floating-point arithmetic beforeSystemInit()enables the FPU, it generates a fault. EnsureSystemInit()runs before__libc_init_array(). - Semihosting in startup code will hang without a connected probe. ARM semihosting (
printfvia SWO before any UART is configured) requires a debug probe to intercept the syscall. On a standalone board, a semihosting call in startup code stalls the MCU indefinitely. Replace with UART or SWO ITM output for field firmware.
Common Mistakes
- Calling application code from SystemInit(). Functions called from
SystemInit()execute before.dataand.bssare initialised. Reading any initialised global variable in this window produces garbage. - Mismatched linker symbol names. If the startup assembly references
_start_databut the linker script exports_sdata, the link may succeed (the assembler may treat unknown symbols as zero), but the.datacopy reads from address 0x00000000 and writes to the wrong RAM location. The MCU boots silently with corrupted global state. - Forgetting
__libc_init_array()in a C++ project. Without this call, C++ global objects are not constructed. Their member functions will operate on uninitialised member variables. The bug is often intermittent and dependent on RAM power-on content. - Using the wrong startup file for the MCU variant. Each STM32 device has a specific startup file because
_estack(the RAM top address), the vector table length, and FPU initialisation requirements differ between variants. Using the STM32F405 startup file for an STM32F401 (which has less RAM) results in a stack pointer pointing past the end of physical RAM. - Assuming warm resets re-run startup code. On most embedded systems, a warm reset (NVIC_SystemReset() or watchdog reset) does re-execute the startup file from the reset handler. However, RAM retains its values —
.bsswill be zeroed to zero again, which is correct, but RAM used as a communication channel between the old and new firmware (e.g. a boot flag in a specific RAM location) must be in a section explicitly excluded from the.bsszero loop.
Frequently Asked Questions
- Can I skip SystemInit() and configure clocks in main() instead?
- Technically yes, but there is a window of risk. Between reset and the clock configuration call in main(), the MCU runs at its default post-reset oscillator frequency — typically the internal RC oscillator at 4–16 MHz depending on the device (check the specific MCU reference manual). Any code that runs in this window — global constructors, early peripheral init — executes at the default speed. On STM32, HAL_Init() expects SysTick to be calibrated to the actual CPU clock; if SysTick is configured before the clock is set, HAL timeout calculations are wrong. ST places SystemInit() in the startup file to eliminate this window.
- What happens if main() returns in bare-metal firmware?
- There is no OS to return to. The GCC arm-none-eabi startup file handles this with an infinite loop immediately after the 'bl main' instruction — the program counter spins indefinitely rather than running off into unmapped memory or past the end of flash, which would cause a hard fault. Some projects replace this with NVIC_SystemReset() for an emergency restart, or deliberately let the [watchdog timer](/questions/what-is-a-watchdog-timer) timeout to trigger a controlled reset.
- Why is the startup file written in assembly rather than C?
- The earliest startup operations must run before the C runtime is valid — before the stack pointer is set and before globals can be used. Assembly gives precise control over those first instructions. Setting SP and branching to SystemInit() must happen in a specific order that a C compiler with optimisations enabled might not preserve. Once the stack is valid and the minimum environment is established, all remaining work is done via C function calls.
References
Related Questions
What Is a Linker Script and What Does It Do?
A linker script controls where firmware code and data land in flash and RAM. Covers MEMORY regions, SECTIONS, LMA/VMA, and the startup symbols it exports.
How Does the Memory Map Work in an Embedded Microcontroller?
The Cortex-M memory map assigns flash, RAM, and peripherals to fixed address regions. Covers STM32 layout, volatile keyword, and how linker scripts map to it.
How Does the STM32 Clock Tree Work?
The STM32 clock tree routes HSE or HSI through a PLL to generate SYSCLK, then divides it across AHB and APB buses. Learn how it works and how to configure it.
What Is a Watchdog Timer and How Do You Use It?
A watchdog timer resets an MCU when firmware hangs. Covers IWDG vs WWDG on STM32, prescaler setup, kick strategy, and window mode for fault detection.
What Are Interrupts in Embedded Systems and How Do They Work?
Interrupts let a microcontroller respond to hardware events instantly without polling. Learn how ISRs, NVIC priority, and interrupt latency work.
What Is a Microcontroller (MCU)?
A microcontroller (MCU) combines a CPU, flash, RAM, and peripherals on one chip. Learn how MCUs work and how they differ from microprocessors and FPGAs.