Sign up ×
Programmers Stack Exchange is a question and answer site for professional programmers interested in conceptual questions about software development. It's 100% free.

I study the topics of compilers and interpreters intensively. I want to check if my base understanding is right, so let's assume the following:

I have a language called "Foobish" and its keywords are

<OUTPUT> 'TEXT', <Number_of_Repeats>;

So if I want to print to the console 10 times, I would write

OUTPUT 'Hello World', 10;

Hello World.foobish-file.

Now I write an interpreter in the language of my choice - C# in this case:

using System;

namespace FoobishInterpreter
{
    internal class Program
    {
        private static void Main(string[] args)
        {
            analyseAndTokenize(Hello World.foobish-file)//Pseudocode
            int repeats = Token[1];
            string outputString = Token[0];
            for (var i = 0; i < repeats; i++)
            {
                Console.WriteLine(outputString);
            }
        }
    }
}

On a very easy interpreter level, the interpreter would analyze the script-file etc and executes the foobish-language in the way of the interpreter's implementation.

A compiler would create machine language which runs on the physical hardware directly?

So an interpreter doesn't produce machine language but a compiler does it for its input?

Do I have any misunderstandings in the basic way how compiler and interpreters work?

share|improve this question
15  
What do you think the C# "compiler" does? As a hint, it doesn't produce machine code. – Philip Kendall yesterday
2  
A Java compiler produces code for the JVM. So the target machine of a compiler can be a virtual machine that is not executed directly by the hardware. The main difference between interpreter and compiler is that a compiler first checks and translates the whole source code into a target machine language. This compiled code is then executed by the machine it was meant for. On the other hand, an interpreter will translate and execute chunks of your program on the fly. – Giorgio yesterday
    
@Giorgio: You mean, like a JIT? – Robert Harvey yesterday
2  
@RobertHarvey: I meant the Java Compiler (javac): as far as I know it produces bytecode for the JVM. And, again AFAIK, the JIT later (at runtime) compiles some bytecode that is used very often into native machine language. – Giorgio yesterday
4  
a compiler means translating. It can emit all kinds of language: c, assembly, javascript, machine code. – Esben Skov Pedersen yesterday

6 Answers 6

The terms "interpreter" and "compiler" are much more fuzzy than they used to be. Many years ago it was more common for compilers to produce machine code to be executed later, while interpreters more or less "executed" the source code directly. So those two terms were well understood back then.

But today there are many variations on the use of "compiler" and "interpreter." For example, VB6 "compiles" to byte code (a form of Intermediate Language), which is then "interpreted" by the VB Runtime. A similar process takes place in C#, which produces CIL that is then executed by a Just-In-Time Compiler (JIT) which, in the old days, would have been thought of as an interpreter. You can "freeze-dry" the output of the JIT into an actual binary executable by using NGen.exe, the product of which would have been the result of a compiler in the old days.

So the answer to your question is not nearly as straightforward as it once was.

Further Reading
Compilers vs. Interpreters on Wikipedia

share|improve this answer
6  
@Giorgio: Most interpreters nowadays don't actually execute the source code, but rather the output of an AST or something similar. Compilers have a similar process. The distinction is not nearly as clear-cut as you think it is. – Robert Harvey yesterday
2  
So when I run gcc -S hello_world.c, the content of the resulting file hello_world.s is the result of interpreting hello_world.c? – Giorgio yesterday
1  
"You can "freeze-dry" the output of the JIT into an actual binary executable by using NGen.exe, the product of which would have been the result of a compiler in the old days.": But it is still today the result of a compiler (namely, the just-in-time compiler). It does not matter when the compiler is run, but what it does. A compiler takes as input a representation of a piece of code and outputs a new representation. An interpreter will output the result of executing that piece of code. These are two different processes, no matter how you mix them and when you execute what. – Giorgio yesterday
2  
"Compiler" is simply the term they've chosen to attach to GCC. They chose not to call NGen a compiler, even though it produces machine code, preferring instead to attach that term to the previous step, which could arguably be called an interpreter, even though it produces machine code (some interpreters do that as well). My point is that nowadays there is no binding principle that you can invoke to definitively call something a compiler or interpreter, other than "that's what they've always called it." – Robert Harvey yesterday
2  
As my very limited understanding goes, these days x86 CPUs are halfway to being hardware-based JIT engines anyway, with the assembly bearing an ever-fading relation to what exactly gets executed. – Leushenko 22 hours ago

The summary I give below is based on "Compilers, Principles, Techniques, & Tools", Aho, Lam, Sethi, Ullman, (Pearson International Edition, 2007), pages 1, 2, with the addition of some ideas of my own.

The two basic mechanisms for processing a program are compilation and interpretation.

Compilation takes as input a source program in a given language and outputs a target program in a target language.

source program --> | compiler | --> target program

If the target language is machine code, it can be executed directly on some processor:

input --> | target program | --> output

Compilation involves scanning and translating the entire input program (or module) and does not involve executing it.

Interpretation takes as input the source program and its input, and produces the source program's output

source program, input --> | interpreter | --> output

Interpretation usually involves processing (analyzing and executing) the program one statement at a time.

In practice, many language processors use a mix of the two approaches. E.g., Java programs are first translated (compiled) into an intermediate program (byte code):

source program --> | translator | --> intermediate program

the output of this step is then executed (interpreted) by a virtual machine:

intermediate program + input --> | virtual machine | --> output

To complicate things even further, the JVM can perform just-in-time compilation at runtime to convert byte code into another format, which is then executed.

Also, even when you compile to machine language, there is an interpreter running your binary file which is implemented by the underlying processor. Therefore, even in this case you are using a hybrid of compilation + interpretation.

So, real systems use a mix of the two so it is difficult to say whether a given language processor is a compiler or an interpreter, because it will probably use both mechanisms at different stages of its processing. In this case it would probably more appropriate to use another, more neutral term.

Nevertheless, compilation and interpretation are two distinct kinds of processing, as described in the diagrams above,

To answer the initial questions.

A compiler would create machine language which runs on the physical hardware directly?

Not necessarily, a compiler translates a program written for a machine M1 to an equivalent program written for a machine M2. The target machine can be implemented in hardware or be a virtual machine. Conceptually there is no difference. The important point is that a compiler looks at a piece of code and translates it to another language without executing it.

So an interpreter doesn't produce machine language but a compiler does it for its input?

If by producing you are referring to the output, then a compiler produces a target program which may be in machine language, an interpreter does not.

share|improve this answer
5  
In other words: an interpreter takes a program P and produces its output O, a compiler takes P and produces a program P′ that outputs O; interpreters often include components that are compilers (e.g., to a bytecode, an intermediate representation, or JIT machine instructions) and likewise a compiler may include an interpreter (e.g., for evaluating compile-time computations). – Jon Purdy yesterday
    
"a compiler may include an interpreter (e.g., for evaluating compile-time computations)": Good point. I guess Lisp macros and C++ templates might be pre-processed in this way. – Giorgio yesterday
    
Even simpler, the C preprocessor compiles C source code with CPP directives into plain C, and includes an interpreter for boolean expressions such as defined A && !defined B. – Jon Purdy yesterday
    
@JonPurdy I would agree with that, but I would also add a class, "traditional interpreters", that don't make use of intermediate representations beyond perhaps a tokenized version of the source. Examples would be shells, many BASICs, classic Lisp, Tcl prior to 8.0, and bc. – hobbs 22 hours ago
1  
@naxa - see Lawrence's answer and Paul Draper's comments on types of compiler. An assembler is a special kind of compiler where (1) the output language is intended for direct execution by a machine or virtual machine and (2) there is a very simple one-to-one correspondence between input statements and output instructions. – Jules 12 hours ago

A compiler would create machine language

No. A compiler is simply a program which takes as its input a program written in language A and produces as its output a semantically equivalent program in language B. Language B can be anything, it doesn't doesn't have to be machine language.

A compiler can compile from a high-level language to another high-level language (e.g. GWT, which compiles Java to ECMAScript), from a high-level language to a low-level language (e.g. Gambit, which compiles Scheme to C), from a high-level language to machine code (e.g. GCJ, which compiles Java to native code), from a low-level language to a high-level language (e.g. Clue, which compiles C to Java, Lua, Perl, ECMAScript and Common Lisp), from a low-level language to another low-level language (e.g. the Android SDK, which compiles JVML bytecode to Dalvik bytecode), from a low-level language to machine code (e.g. the C1X compiler which is part of HotSpot, which compiles JVML bytecode to machine code), machine code to a high-level language (any so-called "decompiler", also Emscripten, which compiles LLVM machine code to ECMAScript), machine code to low-level language (e.g. the JIT compiler in JPC, which compiles x86 native code to JVML bytecode) and native code to native code (e.g. the JIT compiler in PearPC, which compiles PowerPC native code to x86 native code).

Note also that "machine code" is a really fuzzy term for several reasons. For example, there are CPUs which natively execute JVM byte code, and there are software interpreters for x86 machine code. So, what makes one "native machine code" but not the other? Also, every language is code for an abstract machine for that language.

There are many specialized names for compilers that perform special functions. Despite the fact that these are specialized names, all of these are still compilers, just special kinds of compilers:

  • if language A is perceived to be at roughly the same level of abstraction as language B, the compiler might be called a transpiler (e.g. a Ruby-to-ECMAScript-transpiler or an ECMAScript2015-to-ECMAScript5-transpiler)
  • if language A is perceived to be at a lower level level of abstraction than language B, the compiler might be called a decompiler (e.g. a x86-machine-code-to-C-decompiler)
  • if language A == language B, the compiler might be called an optimizer, obfuscator, or minifier (depending on the particular function of the compiler)

which runs on the physical hardware directly?

Not necessarily. It could be run in an interpreter or in a VM. It could be further compiled to a different language.

So an interpreter doesn't produce machine language but a compiler does it for its input?

An interpreter doesn't produce anything. It just runs the program.

A compiler produces something, but it doesn't necessarily have to be machine language, it can be any language. It can even be the same language as the input language! For example, Supercompilers, LLC has a compiler that takes Java as its input and produces optimized Java as its output. There are many ECMAScript compilers which take ECMAScript as their inputs and produce optimized, minified, and obfuscated ECMAScript as their output.


You may also be interested in:

share|improve this answer

I think you should drop the notion of "compiler versus interpreter" entirely. They are different tools, built for different purposes.

  • A compiler is a transformer: It transforms a computer program written in a source language and outputs an equivalent in a target language. Usually, the source language is higher-level that the target language - and if it's the other way around, we often call that kind of transformer a decompiler.
  • An interpreter is an execution engine. It executes a computer program written in one language, according to the specification of that language. We mostly use the term for software (but in a way, a classical CPU can be viewed as a "hardware interpreter" for its machine code).

Programming languages are abstract things, described in abstract terms. To make them useful in the real world, we create implementations of them.

In the past, a programming language implementation often consisted of just a compiler (and the CPU it generated code for) or just an interpreter - so it may have looked like these two kinds of tools are mutually exclusive. Today, you can clearly see that this isn't the case (and it never was to begin with). Taking a sophisticated programming language implementation, and attempting to shove the name "compiler" or "interpreter" to it, will often lead you to inconclusive results.

A single programming language implementation can involve any number of compilers and interpreters, often in multiple forms (standalone, on-the-fly), any number of other tools, like static analyzers and optimizers, and any number of steps. It can even include entire implementations of any number of intermediate languages (that may be unrelated to the one being implemented).

Examples of implementation schemes include:

  • A C compiler that transforms C to x86 machine code, and an x86 CPU that executes that code.
  • A C compiler that transforms C to LLVM IR, an LLVM backend compiler that transforms LLVM IR to x86 machine code, and an x86 CPU that executes that code.
  • A C compiler that transforms C to LLVM IR, and an LLVM interpreter that executes LLVM IR.
  • A Java compiler that transforms Java to JVM bytecode, and a JRE with an interpreter that executes that code.
  • A Java compiler that transforms Java to JVM bytecode, and a JRE with both an interpreter that executes some parts of that code and a compiler that transforms other parts of that code to x86 machine code, and an x86 CPU that executes that code.
  • A Java compiler that transforms Java to JVM bytecode, and an ARM CPU that executes that code.
  • A C# compiler that transforms C# to CIL, a CLR with a compiler that transforms CIL to x86 machine code, and an x86 CPU that executes that code.
  • A Ruby interpreter that executes Ruby.
  • A Ruby environment with both an interpreter that executes Ruby and a compiler that transforms Ruby to x86 machine code, and an x86 CPU that executes that code.

...and so on.

share|improve this answer

Here's a simple conceptual disambiguation between compilers and interpreters.

Consider 3 languages: programming language, P (what the program is written in); domain language, D (for what goes on with the running program); and target language, T (some third language).

Conceptually,

  • a compiler translates P to T so that you can evaluate T(D); whereas

  • an interpreter evaluates P(D) directly.

share|improve this answer
1  
Most modern interpreters don't actually evaluate the source language directly, but rather some intermediate representation of the source language. – Robert Harvey yesterday
4  
@RobertHarvey That doesn't change the conceptual distinction between the terms. – Lawrence yesterday
1  
So what you're really referring to as the interpreter is the part that evaluates the intermediate representation. The part that creates the intermediate representation is a compiler, by your definition. – Robert Harvey yesterday
6  
@RobertHarvey Not really. The terms are dependent on the level of abstraction you're working at. If you look underneath, the tool could be doing anything. By analogy, say you go to a foreign country and bring a bilingual friend Bob along. If you communicate with the locals by talking to Bob who in turn talks to the locals, Bob acts as an interpreter to you (even if he scribbles in their language before talking). If you ask Bob for phrases and Bob writes them in the foreign language, and you communicate with the locals by referring to those writings (not Bob) Bob acts as a compiler for you. – Lawrence yesterday
1  
Excellent answer. Worth noting: Nowadays you may hear "transpiler". That's a compiler where P and T are similar levels of abstraction, for some definition of similar. (E.g. a ES5 to ES6 transpiler.) – Paul Draper yesterday

While the lines between compilers and interpreters has gotten fuzzy over time, one can still draw a line between them by looking at the semantics of what the program should do and what the compiler/interpreter does.

A compiler will generate another program (typically in a lower-level language like machine code) which, if that program is run, will do what your program should do.

An interpreter will do what your program should do.

With these definitions, the places where it gets fuzzy are the cases where your compiler/interpreter can be thought of as doing different things depending on how you look at it. For example, python takes your python code and compiles it into a compiled python bytecode. If this python bytecode is run through a python bytecode interpreter, it does what your program was supposed to do. In most situations, however, python developers think of both of those steps being done in one big step, so they choose to think of the cpython interpreter as interpreting their sourcecode, and the fact that it got compiled along the way is considered an implementation detail. IN this way, its all a matter of perspective

share|improve this answer

protected by Thomas Owens 7 hours ago

Thank you for your interest in this question. Because it has attracted low-quality answers, posting an answer now requires 10 reputation on this site.

Would you like to answer one of these unanswered questions instead?

Not the answer you're looking for? Browse other questions tagged or ask your own question.