The part of the code on an ATmega core that does setup() and loop() is at follows:
#include <Arduino.h>
int main(void)
{
init();
#if defined(USBCON)
USBDevice.attach();
#endif
setup();
for (;;) {
loop();
if (serialEventRun) serialEventRun();
}
return 0;
}
Pretty simple, but there is the overhead of the serialEventRun(); in there.
Let's compare two simple sketches:
void setup()
{
}
volatile uint8_t x;
void loop()
{
x = 1;
}
and
void setup()
{
}
volatile uint8_t x;
void loop()
{
while(true)
{
x = 1;
}
}
The x and volatile is just to ensure it isn't optimised out.
In the ASM produced, you get different results:

You can see the while(true) just performs a rjmp (relative jump) back a few instructions, whereas loop() performs a subtraction, comparison and call. This is 4 instructions vs 1 instruction.
To generate ASM as above, you need to use a tool called avr-objdump. This is included with avr-gcc. Location varies depending on OS so it is easiest to search for it by name.
avr-objdump can operate on .hex files, but these are missing the original source and comments. If you have just built code, you will have a .elf file that does contain this data. Again, the location of these files varies by OS - the easiest way to locate them is to turn on verbose compilation in preferences and see where the output files are being stored.
Run the command as follows:
avr-objdump -S output.elf > asm.txt
And examine the output in a text editor.