Note, in general, except for what the IDE does, everything is part of a normal compiler process. It is pretty much the same on a hosted environment as an embedded processor, except on the embedded processor you are using a cross compiler that runs on the host that generates the binary that runs on a different platform (or is in a format that download tools can load to the platform).
In terms of the compilation, logically you have the preprocessor step (where #defines, #ifdef, #include, etc. are handled), then the main compilation step that produces object files. I say logically, because most compilers now do pre-processing as part of compilation. The object files are gathered together by the linker to make the executable. In host environments, the executable is marked with shared libraries that must be loaded, and a runtime loader, loads the main program and the shared libraries into the address jump and starts execution. In an embedded environment you typically convert the executable into a format that the download process uses (s-records, regular ELF files, etc.). Some download agent writes the code to flash memory, and arranges for the processor to start there when it is started up. Typically the first thing that the executable does is copy some values from flash memory to read/write memory (i.e. initialized variables), clear other bits of read/write memory, set up the stack and jump to the process.
This is greatly simplified of course, and at each step of the way there are differences. For example, the GCC compiler in creating the object code writes out an assembly code, and the assembler then reads the text and spits out the binary representation of the functions, while other compilers produce the object code directly.
In general, there are multiple ABI's (application binary interfaces) that come into play. ABI's for instance govern what the calling sequence is (what registers arguments are passed in and returned in, what the stack looks like, what sizes and representation are the basic types). Other ABI's govern what the object format looks like, etc. Sometimes, there is an official group that decides on the contents of the ABI, sometimes it is more ad-hoc.
Within the compiler there are many passes, and depending on the compiler, the code goes through many transformations. Some of these transformations use the same internal format, and add optimizations or flesh out details. Sometimes the transformation is from a more general high level representation to a more detailed format that might use a different representation.
For example, GCC's passes are listed at
https://gcc.gnu.org/onlinedocs/gccint/Passes.html. Roughly, they are parsing/front end that produces the tree data structure, and then it goes through gimpilification, then SSA (static single assignment), and then RTL, which is a represention close to the machine language. Finally the backend takes the RTL, and emits assembly language.
I would imagine any intro to compilers class will go in more details. I took mine in 1975 - 1979, and the texts I used back then are long out of date. I currently work on the PowerPC backend of GCC (I have worked on x86, MIPS back-ends in the past, and several others). I have never worked on ARM nor AVR backends, and for these hobby projects where I play with microprocessors, I don't go delving into the details. However, the recent AVR compilers use the infrastructure I previously worked on for the SPU/CELL processor for dealing with multiple address spaces. I have plenty to do in my day job than to go chasing other work. For my purposes, they work, and I don't have to worry about it.