General
Software is generally loaded from files at runtime, either by the user typing the name of the program into a command line interpreter, or programatically.
Compared to the monotlithic microcontroller software approach, this makes it easy to allocate separate protected memory blocks for each running program. Access errors are then easily caught, forcing the offending program to terminate while the rest remains unaffected. The stability of the system is improved, and many hard to detect bugs are avoided completely.
The added flexibility makes it possible to unload and reload software as needed. It is then possible to service a running system with ease, changing configuration files, updating executables etc. on the fly even across a network.
Details
When a program is loaded, it would have to know where functions such as file open, read, write etc. are. One solution to this is to resolve the addresses for these functions in the program that is loaded. This may be done in a similar way as with relocation, but it is really not what we want in an operating system. One reason is that kernel code is executed as if it was part of the user program, which makes it very hard to control access rights, enforce security and make the system stable.
The solution is system calls. System calls uses an interrupt routine that is triggered by software instead of hardware. The user program will set up a few registers according to its arguments and execute a special instruction. This instruction will put the processor in a privileged handler mode and jump to a fixed location in memory. The handlers job is then to figure out what the user program wanted to do based on its registers (open, read, write etc.) and perform that operation. Once it is done it will return from the interrupt handler and the processor jumps back to the user code with the proper access rights.
The effect of this is that the user program only has to know how to set its registers. It does not need access to anything in the kernel, nor does it have to know where it is. The system call mechanism is also the way x86 computers does it, and is essentially the standard way to separate kernel- and userspace.
The embedded OS uses system calls in exactly the same manner. The number identifying the system call is encoded in the instruction as a byte, while the arguments are placed in registers r0 to r2. The last register, r3, is set as a pointer to errno for most calls so that error values are properly set. Because these registers are stacked and unstacked with the interrupt they may also be modified from the kernel. As return values are set in R0, the kernel is able to set the return value of the system call as well.