What properties of a programming language make compilation impossible?

Question

Question:

"Certain properties of a programming language may require that the only way to get the code written in it be executed is by interpretation. In other words, compilation to a native machine code of a traditional CPU is not possible. What are these properties?"

Compilers: Principles and Practice by Parag H. Dave and Himanshu B. Dave (May 2, 2012)

The book gives no clue about the answer. I tried to find the answer on Concepts of Programming Languages (SEBESTA), but to no avail. Web searches were of little avail too. Do you have any clue?

Famously, Perl can't even be parsed. Other than that, the claim seems to be trivially wrong without further assumptions: if there is an interpreter, I can always bundle interpreter and code in one executable, voila. — Raphael♦, yesterday
@Raphael: Nice idea, but ... 1) You're assuming the code is available prior to being executed. That doesn't hold for interactive use. Sure, you can use just-in-time compilation to native code on bash statements or PostScript stack contents, but it's a pretty crazy idea. 2) Your idea doesn't actually compile the code: the bundle isn't a compiled version of the code, but still an interpreter for the code. — reinierpost, 19 hours ago
In the good old days I had self editing gwbasic programs (gwbasic stores basic programs in a kind of bytecode). I currently can't think of a sane way to compile those to native machine code while retaining their ability to edit themselves. — PlasmaHH, 16 hours ago
@PlasmaHH: Self Modifying Code goes back to 1948. The first compiler was written in 1952. The concept of self-modifying code was invented in native machine code. — Mooing Duck, 14 hours ago
@reinierpost Raphael is taking a theoretical stand on this issue. It has the merit of showing the conceptual limitations of the question. Compiling is translation from language S to language T. Language T could be an extension of S to which interpreting code in some other languge can be added. So bundling S and its interpreter is a program in language T. It seems absurd to an engineer, but it shows that it is not easy to formulate the question meaningfully. How do you distinguish an acceptable compiling process from an unacceptable one (such as Raphael's) from an engineering point of view? — babou, 13 hours ago

babou · Answer 1 · 2014-09-02 12:03:33Z

The distinction between interpreted and compiled code is probably a fiction, as underlined by Raphael's comment:

the claim seems to be trivially wrong without further assumptions: if there is
an interpreter, I can always bundle interpreter and code in one executable ...

The fact is that code is always interpreted, by software, by hardware or a combination of both, and the compiling process cannot tell which it will be.

What you perceive as compilation is a translation process from one language $S$ (for source) to another language $T$ (for target). And, the interpreter for $S$ is usually different from the interpreter for $T$.

The compiled program is translated from one syntactic form $P_S$ to another syntactic form $P_T$, such that, given the intended semantics of the languages $S$ and $T$, $P_S$ and $P_T$ have the same computational behavior, up to a few things that you are usually trying to change, possibly to optimize, such as complexity or simple efficiency (time, space, surface, energy consumption). I am trying not to talk of functional equivalence, as it would require precise definitions.

Some compilers have been actually used simply to reduce the size of the code, not to "improve" execution. This was the case for language used in the Plato system (though they did not call it compiling).

You may consider your code fully compiled if, after the compiling process, you no longer need the interpreter for $S$. At least, that is the only way I can read your question, as an engineering rather than theoretical question (since, theoretically, I can always rebuild the interpreter).

One thing that may raise problem, afaik, is meta-circularity. That is when a program will manipulate syntactic structures in its own source language $S$, creating program fragment that are then intepreted as if they had been part of the original program. Since you can produce arbitrary program fragments in the language $S$ as the result of arbitrary computation manipulating meaningless syntactic fragments, I would guess you can make it nearly impossible (from an engineering point of view) to compile the program into the language $T$, so that it now generate fragments of $T$. Hence the interpreter for $S$ will be needed, or at least the compiler from $S$ to $T$ for on-the-fly compiling of generated fragments in $S$ (see also this document).

But I am not sure how this can be formalized properly (and do not have time right now for it). And impossible is a big word for an issue that is not formalized.

If one defines "compilation" to refer to a sequence of steps which take place entirely before an executing program receives its first input, and interpretation as being the process of having data control program flow via means which are not part of the program's abstract machine model, then for a language to be compiled it must be possible for the compiler to identify, before execution begins, every possible meaning a language construct could have. In languages where a language construct could have an unbounded number of meanings, compilation won't work. — supercat, 17 hours ago
If one defines compilation in those terms one lives in the 1960's. — Andrej Bauer, 5 hours ago
@AndrejBauer I assume you are answering the previous comment by supercat. Or does it include my answer? — babou, 3 hours ago

Uday Reddy · Answer 2 · 2014-09-02 22:45:20Z

I think the authors are assuming that compilation means

the source program doesn't need to be present at run-time, and
no compiler or interpreter needs to be present at run-time.

Here are some sample features that would make it problematic if not "impossible" for such a scheme:

If you can interrogate the value of a variable at run-time, by referring to the variable by its name (which is a string), then you will need the variable names to be around at run time.
If you can call a function/procedure at run-time, by referring to it by its name (which is a string), then you will need the function/procedure names at run-time.
If you can construct a piece of program at run-time (as a string), say by running another program, or by reading it from a network connection etc., then you will need either an interpreter or a compiler at run-time to run this piece of program.

Lisp has all three features. So, Lisp systems always have an interpreter loaded at run-time. Languages Java and C# have function names available at run time, and tables to look up what they mean. Probably languages like Basic and Python also have variable names at run time. (I am not 100% sure about that).

What if the "interpreter" is compiled into the code? For example, using dispatch tables to call virtual methods, are these an example of interpretation or compilation? — Erwin Bolwidt, 9 hours ago
"no compiler or interpreter needs to be present at run-time", eh? Well if that's true, then in a deep sense, C can't be "compiled" on most platforms either. The C runtime doesn't have very much to do: startup, to set up stacks and so forth, and shutdown for atexit processing. But it still has to be there. — Pseudonym, 7 hours ago
"Lisp systems always have an interpreter loaded at run-time." – Not necessarily. Many Lisp systems have a compiler at runtime. Some don't even haven an interpreter at all. — Jörg W Mittag, 1 hour ago

Gilles · Answer 3 · 2014-09-03 09:32:29Z

The question is not actually about compilation being impossible. If a language can be interpreted¹, then it can be compiled in a trivial way, by bundling the interpreter with the source code. The question is asking what language features make this essentially the only way.

An interpreter is a program that takes source code as input and behaves as specified by the semantics of that source code. If an interpreter is necessary, this means that the language includes a way to interpret source code. This feature is called eval. If an interpreter is required as part of the language's runtime environment, it means that the language includes eval: either eval exists as a primitive, or it can be encoded in some way. Languages known as scripting languages usually include an eval feature, as do most Lisp dialects.

Just because a language includes eval doesn't mean that the bulk of it can't be compiled to native code. For example, there are optimizing Lisp compilers, that generate good native code, and that nonetheless support eval; eval'ed code may be interpreted, or may be compiled on the fly.

eval is the ultimate needs-an-interpreter feature, but there are other features that require something short of an interpreter. Consider some typical phases of a compiler:

Parsing
Type checking
Code generation
Linking

eval means that all these phases have to be performed at runtime. There are other features that make native compilation difficult. Taking it from the bottom, some languages encourage late linking by providing ways in which functions (methods, procedures, etc.) and variables (objects, references, etc.) can depend on non-local code changes. This makes it difficult (but not impossible) to generate efficient native code: it's easier to keep object references as calls in a virtual machine, and let the VM engine handle the bindings on the fly.

Generally speaking, reflection tends to make languages difficult to compile to native code. An eval primitive is an extreme case of reflection; many languages don't go that far, but nonetheless have a semantics defined in terms of a virtual machine, allowing for example code to retrieve a class by name, inspect its inheritance, list its methods, call a method, etc. Java with JVM and C# with .NET are two famous examples. The most straightforward way to implement these languages is by compiling them to bytecode, but there are nonetheless native compilers (many just-in-time) that compile at least program fragments that don't use advanced reflection facilities.

Type checking determines whether a program is valid. Different languages have different standards for how much analysis is performed at compile time vs run time: a language is known as “statically typed” if it performs many checks before starting to run the code, and “dynamically typed” if it doesn't. Some languages include a dynamic cast feature or unmarshall-and-typecheck feature; these feature require embedding a typechecker in the runtime environment. This is orthogonal to requirements of including a code generator or an interpreter in the runtime environment.

¹ _{Exercise: define a language that cannot be interpreted.}

vzn · Answer 4 · 2014-09-02 22:58:31Z

it is possible the current replies are "overthinking" the statement/ answers. possibly what the authors are referring to is the following phenomenon. many languages have an "eval" like command; eg see javascript eval and its behavior is commonly studied as a special part of CS theory (eg say Lisp). the function of this command is to evaluate the string in the context of the language definition. therefore in effect it has a similarity to a "built in compiler". the compiler cannot know the contents of the string until runtime. therefore compiling the eval result into machine code is not possible at compile time.

other answers point out that the distinction of interpreted vs compiled languages can blur significantly in many cases esp with more modern languages like say Java with a "just in time compiler" aka "Hotspot" (javascript engines eg V8 increasingly use this same technique). "eval-like" functionality is certainly one of them.

V8 is a good example. It is a pure compiler, there is never any interpretation going on. Yet it still supports the full semantics of ECMAScript, including unrestricted eval. — Jörg W Mittag, 1 hour ago

Georgie · Answer 5 · 2014-09-03 04:24:20Z

I feel the original question is not well formed. The authors of the question may have intended to ask a somewhat different question: What properties of a progamming language facilitate writing a compiler for it?

For example, it's easier to write a compiler for a context-free language than a context-sensitive language. The grammar which defines a language can also have issues that make it challenging to compile, such as ambiguities. Such issues can be resolved but require extra effort. Similarly, languages defined by unrestricted grammars are harder to parse than context-sensitive languages (see Chomsky Hierarchy). To my knowledge most widely used procedural programming languages are close to context-free, but have a few context-sensitive elements, making them relatively easy to compile.

The question is clearly intending to oppose/compare compilers and interpreters. While they may work differently, and usually do except for @Raphael limit case above, they have exactly the same problems regarding syntax analysis and ambiguity. So syntax cannot be the issue. I also believe that syntactic problem are not usually the major concern in compiler writing nowadays, though it has been in the past. I am not the downvoter: I prefer commenting. — babou, 3 hours ago

asked	yesterday
viewed	1521 times
active	today

current community

your communities

more stack exchange communities

What properties of a programming language make compilation impossible?

5 Answers 5

Your Answer

Not the answer you're looking for? Browse other questions tagged programming-languages compilers interpreters or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

What properties of a programming language make compilation impossible?

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged programming-languages compilers interpreters or ask your own question.

Related

Hot Network Questions