Take the 2-minute tour ×
Programmers Stack Exchange is a question and answer site for professional programmers interested in conceptual questions about software development. It's 100% free.

I was taking this course - CMU 18-447, Computer Architecture at Carnegie Mellon to brush my knowledge and concepts.

They say that most of the machine level details and implementations is taken care at the Instruction Set Architecture(ISA) level and is abstracted at that level.

Some Intel processors even have hardware level translation layers that take the front level ISA that is exposed to the programmer and translate them further closer to the machine.

Given such power is provided by the ISA/Processor itself, why do compilers generate direct machine code, or is it just a black box and internally it uses assemblers to convert them into direct machine code?

I hear that JVM takes byte code and translate them directly to machine code(exe).Is this true or is my understanding wrong here?

share|improve this question

3 Answers 3

up vote 7 down vote accepted

I find this definition pretty clear:

The Instruction Set Architecture (ISA) is the part of the processor that is visible to the programmer or compiler writer. The ISA serves as the boundary between software and hardware.

The ISA provides an abstraction of the actual microarchitecture, which is the implementation of the instruction set by the processor.

So when your compiler is said to produce machine code, it produces instructions for the configured instruction set, not the actual processor, so any processor implementing that instruction set can run that machine code.

A virtual machine such as the JVM or CLR runs code that has been compiled to bytecode specific to that virtual machine's architecture, Java bytecode and CIL in this case.

The runtime of such a virtual machine, which in turn is built for and runs on a specific processor architecture and OS, translates the bytecode to the machine code and API calls for the platform it runs on, and usually does so just-in-time.

share|improve this answer
    
@Greedy correct. The JVM executes .jars for example, which contain Java bytecode. The Java source code has already been compiled to this bytecode, the JVM will transform/compile the bytecode into machine code. See Machine code (Wiki) and Assembly code vs Machine code vs Object code? (SO). –  CodeCaster Dec 17 '13 at 11:15
    
Hmm.Makes some sense now.Traditional compilers take that route and ones like bytecode/IL gets directly executed to machine code.So the runtimes for JVM/CLR are optimized to produce efficient machine code rather than produce optimized assembly? –  Greedy Coder Dec 17 '13 at 11:18
    
@Greedy please see the links in my comment, I think "Assembly code is plain-text and (somewhat) human read-able source code that has a mostly direct 1:1 analog with machine instructions" is relevant to your question. –  CodeCaster Dec 17 '13 at 11:19
1  
I was editing my comment when you posted,sorry for that.Thanks for clearing this up,I now have a better picture of things :) –  Greedy Coder Dec 17 '13 at 11:21

Some Intel processors even have hardware level translation layers that take the front level ISA that is exposed to the programmer and translate them further closer to the machine.

That concept is called microprogramming and has been around in theory since Charles Babbage and in implementation since the 1960s. Some processors aren't microprogrammed at all and use hardware to execute the instructions, others are entirely microprogrammed and some take a hybrid approach depending on what instruction is being executed.

Given such power is provided by the ISA/Processor itself...

That power exists in the processor but isn't provided. The API for a processor is its instruction set. The microprogramming, if there is any, is an implementation detail that you're not supposed to know or care about.

... why do compilers generate direct machine code, or is it just a black box and internally it uses assemblers to convert them into direct machine code?

I'm sure there are a few compilers that directly generate object code, but most generate assembly and then assemble it behind the scenes. Assembly is just a human-readable longhand for the corresponding object code that's a lot easier to generate and use for debugging or just understanding what the compiler produces. Odds are very good that if you're writing a compiler for a target, you'll already have a perfectly serviceable assembler (or cross-assembler) that's preferable to use over re-inventing that wheel yourself.

I hear that JVM takes byte code and translate them directly to machine code(exe).Is this true or is my understanding wrong here?

Early versions of the JVM were, essentially, emulators that would examine each bytecode instruction and carry out the operations needed. This is very similar to what microcode does, just implemented in software on a general-purpose computer. Early JVMs were also very slow, which led to the use of things like just-in-time compilation of bytecode to native instructions.

None of this matters, though, because the lingua franca for Java is Java bytecode. This is no different than the instruction set on a real CPU in that you don't care how it's implemented under the covers. There are, in fact, a number of CPUs called "Java processors" that can run Java bytecode directly, a variant of the ARM9 being one of them. Java software won't know or care if it's being run on a Java Processor or JVM.

share|improve this answer

Compilers may output assembly code, linkable object code, or ready-to-run machine code; I've worked with examples of all three types. Although compilers which output assembly code may be more adaptable to different platforms than those which output object code, outputting assembly code is often slower than outputting machine code, and running a program will require the assembly code undergo further processing steps. It really isn't much harder for a compiler to generate machine code than to generate assembly code; in some cases it may be easier.

As for whether to generate linkable or ready-to-run code, it's often faster and easier for a compiler to produce the latter; the only advantage of producing the former is that building a program will require recompiling only the parts that have changed, and then linking everything, versus having to feed everything through the compiler. If projects will generally be large, and the portion that changes on any given build small, then being able to selectively compile the parts of the code that have changed may be a 'win'. If, however, one is apt to want to rebuild most of the program on each build, producing runnable machine code may be faster than going through an intermediate link stage.

An additional benefit of producing runnable machine code is that a compiler may be able to take advantage of things it knows about the locations of objects in ways that would not be possible when generating relocatable code. For example, if foo and bar are declared in separate modules, a statement foo=bar; would require something like:

    ldr r0,[pc+(_addr_of_bar-*)]
    ldr r1,[pc+(_addr_of_foo-*)]
    ldr r2,[r0]
    str r2,[r1]
 ...somewhere within 2K of the above:
 _addr_of_bar: dw _bar
 _addr_of_foo: dw _foo

If variables foo and bar happen to be at known addresses which are within 4K of each other [e.g. 1600 bytes apart], the code could be simplified to something like:

    ldr r0,[pc+(_addr_of_foo_bar-*)]
    ldr r1,[r0+600]
    str r1,[r0-1000]
 ...somewhere within 2K of the above:
 _addr_of_foo_bar: dw _foo+1000

Such optimizations are only possible if the compiler knows how things are going to be placed.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.