Sign up ×
Computer Science Stack Exchange is a question and answer site for students, researchers and practitioners of computer science. It's 100% free, no registration required.

Per recommendation I am reposting this from Stack Overflow.

Recently I have been thinking about following issue.

Consider a code for a standard "Hello world!" program:

main()
{
    printf("Hello World");

}

Now almost any change in this code will make it completely useless, in fact almost every change will prevent the code from compiling. For example:

main(5
{
    printf("Hello World");

}

Now to the actual question. Is there a programming language where every possible combination of symbols, that is every expression, makes sense? I tried thinking about some sort of solution and came up with two:

  1. Post fix with limited number of variables. Essentially all variables are already defined before you write any code and you have to work just with them. Theoretically you can than perform arbitrary number of operations by forming a chain of many simple programs, each one of them feeding results to others. Code could be written as series of characters in postfix notation;

  2. "Postfix" with stack of variables. Variables are stored in stack, every operation takes two variables from top and puts the result on their place. Program ends when it reaches last operation all variable.

Personally I hate both of these. Not only they are limited they are inelegant. They are not even real solutions, more like workarounds, essentially "offshoring" some work to external process.

Does anyone have any other idea how to solve this problem?

share|cite|improve this question
36  
Given a compiler $C$, create a new compiler $C'$ which works as follows: given source $s$, pass it to $C$. If $C$ is happy with it and produces an executable, then that is that, but if $C$ complains then output an executable which prints out You are a bimbo. The compiler $C'$ accepts every string as a valid program. – Andrej Bauer 2 days ago
    
in a way its not a "problem"... except a contrived one. nothing in the post indicates its a "problem". – vzn 2 days ago
1  
BF needs a matching [ ] commands (According to the Wiki Page). My thought was to look at the CPU opcodes. But even then, some patterns may yield a problem (e.g., if an opcode is 3 bits, but your program is only 2 bits.) Except for this issue of possibly padding with some extra 0 bits, one can think on any CPU with a complete opcode set that will satisfy the claim "every string is a valid program". Maybe meaningless, but still valid. – Ran G. 2 days ago
1  
Let your hardware be a Z-80 CPU with 64k of RAM. Write a compiler that simply copies the ASCII-coded source code into the 64k memory (truncating or zero-padding if necessary). This compiler never gives a syntax error. – Ben Crowell 2 days ago
2  
Without further conditions, this question is boring: Andrej's comment and David's answer give "trivial" answers. You have to nail down more precisely what you want. – Raphael yesterday

9 Answers 9

up vote 24 down vote accepted

Redcode, the assembly language behind codewars, was explicitly written to have very few halting instructions, because the code often gets mangled before it finally gives out, and the more opportunities it has to halt, the less interesting the game is.

You see very few such languages in practice because we don't just want a program to run, we want it to run in the way we expect. If you can make a typo and change the way the program ran, it must be acceptably close to the original expected behavior, or the programmers with seethe in frustration.

There is some precedence for such things by using natural languages rather than formal languages, but it's not what I would call a large field when you compare it to the use of formal languages. If you're interested in such programming languages, the natural language processing community is where I'd look.

Another field you could look at is genetics. There are remarkably few genetic sequences which are simply invalid. Plenty of them that aren't very effective at reproductions, but very few invalid ones.

share|cite|improve this answer
1  
Genetics doesn't seem like a good example. In terms of valid or invalid, are you just talking about replication? Because of course every string will be a valid program for a language in which the only possible instruction is replicate this string. It's not really a meaningful programming language, though, as it's nowhere near Turing Complete. – tel 2 days ago
1  
@tel: Cort is probably talking about protein synthesis via mRNA, rather than replication. Pretty much any genetic sequence can be transcribed and then put into the protein synthesis machinery: whether the protein that comes out is sufficiently stable that it hasn't already degraded by the time it's done being built, and if so whether it does anything useful to the organism, is another matter... – Steve Jessop yesterday
2  
The genetic code isn't a code to reproduce itself. It's (generally) a code for a protein. Whether the protein is useful is often a different question. Of course it get's more interesting. Some bits of "code" in a genetic sequence end up being more like an instruction along the lines of "that code a few lines farther down - you should sometimes just ignore it." There's all sorts of cool "programs" cells and viruses have written fighting one another. – Joel yesterday
    
TECO is another real-world example. – cjm yesterday
1  
@cjm wow. "An API is finished not when you have finished adding everything in, but when you have finished taking everything out." Unless you're TECO, then you're finished when you run out of characters to assign meaning to. – Cort Ammon yesterday

The idea of a universal Turing machine uses just such a "programming language": a coding of Turing machines as natural numbers, represented for example in binary, such that every natural number denotes a Turing machine, i.e., program. In this language, every string of zeroes and ones is a program.

If you're worried that some numbers might code invalid programs, this can be side-stepped as follows. Imagine writing out all strings in the character set of your programming language (say, Java), in lexicographic order, starting with strings of length one, then two, then three, ... Then, make a new programming language by letting the number $n$ stand for the $n$th string in the list that's a valid Java program. In the new programming language, programs are just natural numbers and every natural number is a valid program.

I'm sure there are also esoteric programming languages where every string is a program; however, if you're just asking for a list of those, I think your question is off-topic here.

share|cite|improve this answer

Extending a programming language so that every expression makes sense is always possible, but not interesting. For example, you can just assign the meaning “do nothing” to any expression that the original language rejects.

Designing a programming language where every expression makes sense in a “you can execute it” way is not particularly useful. A good programming language is not just one where a monkey can type on a keyboard and write a valid program, but one where a programmer can easily write the program that they intended to write. Writing valid programs is not the difficult part of programming: the difficult part is writing a program that performs what was expected of it. Rejecting obviously incorrect programs is very helpful in this respect.

Another way to tackle this is to fully define the semantics of all possible inputs, including specifying what compile-time, load-time or run-time error should be generated for each input if any. That is, “abort the program after printing Syntax error at line 42 on the standard error stream” is part of the defined semantics of the language. Every expression “makes sense” in that it has a defined meaning. Is that a useful meaning? Maybe — after all, if the program is obviously wrong, rejecting it is useful.

share|cite|improve this answer

Check out Jot, a Turing-complete language based on combinatory logic, where every sequence of 0s and 1s (including an empty sequence) is a valid program.

share|cite|improve this answer
    
This is not a computer science answer. – Raphael yesterday
2  
@Abdulrhman It's straight forward to define a bijection between binary strings and natural numbers. So you can encode any program as a natural number if you want to. – CodesInChaos yesterday
1  
@Raphael Please elaborate, or suggest an improvement of the answer, I'll be happy to improve it if you provide grounds for your critique. – Petr Pudlák yesterday
    
w** , who removed my comment ... iam putting some effort into writing a comment then suddenly someone just removes it ! – Abdulrhman yesterday

One nice example is whitespace. In the language proper, any combination of operators are valid. The operators are space, tab and newline (specifically "\n"). All other characters are considered comments.

This answer, and indeed your question (as well as this entire web page) are examples of valid whitespace programs (though they may not do anything particularly interesting).

share|cite|improve this answer
    
I was just thinking about this after posting my brainfuck answer (yours is better since it's correct), but I'm wondering -- is an empty program still a program? (i.e. if those three characters are missing from the whole file stream). -- Like, if my car was missing all the things that made it a car, would it still be a car? – BrainSlugs83 yesterday
    
This is not a computer science answer. (Also, "every whitespace string" != "every string".) – Raphael yesterday
    
@Raphael: But every possible string (including ones that contain no whitespace) are valid whitespace programs - note that any character that's not a whitespace are simply comments in the whitespace programming language – slebetman yesterday
1  
@slebetman You were interpreting my brackets comment too literally. I was talking about unpaired loop tokens. Some similar problems in whitespace could be: Does returning without a preceding call work? (encoded as [LF][Tab][LF]) What happens if you pop an empty stack? What happens if you jump to an undefined label? What happens if you define duplicate labels? – CodesInChaos yesterday

I would like to address the idea that many posters have given, that such a language would be "useless". Perhaps it would be useless for humans to write, manually, with the intention of solving some particular task. However, despite being a majority use-case for programming languages, that's certainly not the only use-case. Several use-cases come to mind where such a language is useful, and we can look to those fields for examples of such languages.

Firstly Cort Ammon's allusion to genetics is spot on: the program transformation in the question (substituting ) for 5) can be seen as a mutation. This kind of manipulation is common in the field of evolutionary computation; in particular genetic algorithms perform such transformations on strings, whilst genetic programming transforms programs. In either case, we usually want to assign meaning to every possibility, since that will produce the most compact search space.

Genetic algorithms rely on some sort of evaluation function for strings; if we use a programming language interpreter as our evaluation function, then we have a scenario where a programming language which assigns meaning to all possible strings is useful. In genetic programming, it is assumed that our evaluation function is a programming language interpreter, but we may choose various representations for our programs; for example, many systems operate on abstract syntax trees. If we choose strings as our representation, then we recover the same scenario as with genetic algorithms.

Another situation where we may want every string to be a valid program is when enumerating programs. This is related to the bijection mentioned by CodesInChaos, but we may prefer to operate on strings rather than Natural numbers for several reasons:

  • If there is some structure in the language, eg. we can assign meaning to sub-strings, this may be lost when translating to Natural numbers. In this case we may prefer to use strings, in order to reason about and transform sub-strings locally, rather than representing the whole-program as a number. This is analogous to how we might prefer to use bitwise operations on an int rather than arithmetic expressions, when each bit has an individual meaning. This is basically a generalisation of the evolutionary scenario.
  • We may want to generate the programs on demand; for example, we might begin executing a program which is completely undetermined, and only generate (eg. randomly) the individual instructions (eg. characters) when/if the instruction pointer reaches them. This is common in algorithmic information theory, where the program is a Turing machine tape, and the aim is to characterise the behaviour of randomly-generated programs. For example, we can formulate the Solomonoff prior over arbitrary strings as the probability that a universal Turing machine with a random tape will output that string.

In terms of example languages, many evolutionary computation systems are based on stack languages like the Push family. These tend to allow arbitrary streams of tokens (which we could represent as individual characters). Sometimes (like with BrainSlugs83's Brainfuck example) there are restrictions on balancing parentheses; however, we can relate this to self-delimiting programs, in that a string like [ may not be a valid program, but it is a valid program prefix. If we imagine a compiler/interpreter reading source code from stdin, then it won't reject a string like [, it will simply wait for more input before continuing.

Languages like Binary Combinatory Logic and Binary Lambda Calculus arose directly out of work on algorithmic information theory, eg. from http://tromp.github.io/cl/cl.html

This design of a minimalistic universal computer was motivated by my desire to come up with a concrete definition of Kolmogorov Complexity, which studies randomness of individual objects.

share|cite|improve this answer

code-golfing languages like CJam (Java/C), Pyth (Python-based), Golfscript (C), etc which use about one character per instruction are very good at this. The program will run but almost certainly not with the previously anticipated result.

share|cite|improve this answer

Real programming languages are to convey meaning to people, not computers. As plenty of fun texts with almost randomly shuffled letters floating around the 'ńet show, people can read gibberish and make sense out of it, even without overtly noticing the mangling. Just think back how hard it is to find typos and other such errors in texts.

A programming language like what you ask for would make people understand what they want to read, not what is written down. Debugging in languages where there are a limited set of legal statements, where there is not much ambiguity possible, is already hard enough. Good languages reduce possible interpretations due to e.g. transposed symbols or typos. Natural languages are also notorious for their redundancy, for the same sort of reason.

share|cite|improve this answer

In the Brainfuck Programming Language, almost every possible binary expression can be interpreted as a program. -- That is, you could take a completely good program, type a bunch of garbage into it, and, it would still be compilable / interpretable without any issues.

(Edit: It turns out you have to match opening and closing square brackets, but otherwise, the above is true.)

It achieves this via these two simple methods:

  1. All of the commands it understands are a single ASCII character (and they are very limited -- all of them are punctuation symbols).

  2. All of the characters it does not understand, it discards as comments.

The programming language is Turing complete (meaning, it can do anything that any other language can do).

share|cite|improve this answer
1  
Not only does this not answer the question, it is not a computer science answer. – Raphael yesterday
    
You could redefine the language to handle those in some sensible way. e.g. by inserting enough openening brackets at the start of the program and closing brackets at the end of the program to make it balanced. It's straight forward to write an interpreter that handles programs as if those brackets existed without actually rewriting the program. Of course starting a brainfuck program with an opening bracket is rather useless, since it'll ignore everything up to the matching closing bracket. – CodesInChaos yesterday

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.