Take the 2-minute tour ×
Programmers Stack Exchange is a question and answer site for professional programmers interested in conceptual questions about software development. It's 100% free, no registration required.

I'm trying to determine the technical details of why software produced using programming languages for certain operating systems only work with them.

It is my understanding that binaries are specific to certain processors due to the processor specific machine language they understand and the differing instruction sets between different processors. But where does the operating system specificity come from? I used to assume it was APIs provided by the OS but then I saw this diagram in a book: Diagram

Operating Systems - Internals and Design Principles 7th ed - W. Stallings (Pearson, 2012)

As you can see, APIs are not indicated as a part of the operating system.

If for example I build a simple program in C using the following code:

#include<stdio.h>

main()
{
    printf("Hello World");

}

Is the compiler doing anything OS specific when compiling this?

share|improve this question
4  
Do you print to a window? or a console? or to graphics memory? How do you put the data there? Looking at printf for Apple][+ would be quiet different than for a Mac OS 7 and again quite different than Mac OS X (just sticking with one 'line' of computers). –  MichaelT yesterday
1  
Because if you wrote that code for Mac OS 7, it would show up in text in a new window. If you did it on Apple ][+, it would be writing to some segment of memory directly. On Mac OS X, it writes it out to a console. Thus, thats three different ways of writing handling the code based on the execution hardware that is handled by the library layer. –  MichaelT yesterday
2  
@StevenBurnap yep - en.wikipedia.org/wiki/Aztec_C –  MichaelT yesterday
3  
Your FFT function will happily run under Windows or Linux (on the same CPU), without even recompiling. But then how are you going to display the result? Using an operating system API, of course. (printf from msvcr90.dll is not the same as printf from libc.so.6) –  immibis 23 hours ago
3  
Even if APIs are "not part of the operating system", they are still different if you go from one OS to the other. (Which, of course, raises the question of what the phrase "not part of the operating system" really means, according to the diagram.) –  Theodoros Chatzigiannakis 20 hours ago
show 8 more comments

8 Answers

As you can see, APIs are not indicated as a part of the operating system.

I think you are reading too much into the diagram. Yes, an OS will specify a binary interface for how operating system functions are called, and it will also define a file format for executables, but it will also provide an API, in the sense of providing a catalog of functions that can be called by an application to invoke OS services.

I think the diagram is just trying to emphasize that operating system functions are usually invoked through a different mechanism then a simple library call. Most of the common OS use processor interrupts to access OS functions. Typical modern operating systems are not going to let a user program directly access any hardware. If you want to write a character to the console, you are going to have to ask the OS to do it for you. The system call used to write to the console will vary from OS to OS, so right there is one example of why software is OS specific.

printf is a function from the C run time library and in a typical implementation is a fairly complex function. If you google you can find the source for several versions online. See this page for a guided tour of one. Down in the grass though it ends up making one or more system calls, and each of those system calls is specific to the host operating system.

share|improve this answer
3  
What if all the program did was to add two number, with no input or output. Would that program still be OS specific? –  Paul 23 hours ago
    
@Paul No, but for example, all graphical work are OS specific. Furthermore, some OS are POSIX compliant, other are not, filesystem are OS dependent too, you can t open /home/user/file on a Windows OS, and can t open C:\ProgramData\file in Unix OS. –  DrakaSAN 22 hours ago
    
OS'es are meant to put most hardware specific stuff behind/in an abstraction layer. However, the OS itself (the abstraction) can differ from implementation to implementation. There's POSIX some OS'es (more or less) adhere to and maybe some others but overall OS'es simply differ too much in their "visible" part of the abstraction. As said before: you can't open /home/user on windows and you can't access HKEY_LOCAL_MACHINE\... on a *N*X system. You can write virtual ("emulation") software for this to help bring these systems closer together but that will always be "3rd party" (from OS POV). –  RobIII 22 hours ago
9  
@Paul Yes. In particular, the way it is packaged into an executable would be OS-specific. –  OrangeDog 21 hours ago
2  
@TimSeguine I disagree with your example of XP vs 7. Much work is done by Microsoft to ensure the same API exists in 7 as was in XP. Clearly what has happened here is that the program was designed to run against a certain API or contract. The new OS just adhered to the same API/contract. In the case of windows however that API is very much proprietary, which is why no other OS vendor supports it. Even then there are a heck of a lot of examples of programs that do NOT run on 7. –  ArTs 12 hours ago
show 3 more comments

You mention on how if the code is specific to a CPU, why must it be specific also to an OS. This is actually more of an interesting question that many of the answers here have assumed.

CPU Security Model

The first program run on most CPU architectures runs inside what is called the inner ring or ring 0. How a specific CPU arch implements rings varies, but it stands that nearly every modern CPU has at least 2 modes of operation, one which is privileged and runs 'bare metal' code which can perform any legal operation the CPU can perform and the other is untrusted and runs protected code which can only perform a defined safe set of capabilities. Some CPUs have far higher granularity however and in order to use VMs securely at least 1 or 2 extra rings are needed (often labelled with negative numbers) however this is beyond the scope of this answer.

Where the OS comes in

Early single tasking OSes

In very early DOS and other early single tasking based systems all code was run in the inner ring, every program you ever ran had full power over the whole computer and could do literally anything if it misbehaved including erasing all your data or even doing hardware damage in a few extreme cases such as setting invalid display modes on very old display screens, worse, this could be caused by simply buggy code with no malice whatsoever.

This code was in fact largely OS agnostic, as long as you had a loader capable of loading the program into memory (pretty simple for early binary formats) and the code did not rely on any drivers, implementing all hardware access itself it should run under any OS as long as it is run in ring 0. Note, a very simple OS like this is usually called a monitor if it is simply used to run other programs and offers no additional functionality.

Modern multi tasking OSes

More modern operating systems including UNIX, versions of Windows starting with NT and various other now obscure OSes decided to improve on this situation, users wanted additional features such as multitasking so they could run more than one application at once and protection, so a bug (or malicious code) in an application could no longer cause unlimited damage to the machine and data.

This was done using the rings mentioned above, the OS would take the sole place running in ring 0 and applications would run in the outer untrusted rings, only able to perform a restricted set of operations which the OS allowed.

However this increased utility and protection came at a cost, programs now had to work with the OS to perform tasks they were not allowed to do themselves, they could no longer for example take direct control over the hard disk by accessing it's memory and change arbitrary data, instead they had to ask the OS to perform these tasks for them so that it could check that they were allowed to perform the operation, not changing files that did not belong to them, it would also check that the operation was indeed valid and would not leave the hardware in an undefined state.

Each OS decided on a different implementation for these protections, partially based on the architecture the OS was designed for and partially based around the design and principles of the OS in question, UNIX for example put focus on machines being good for multi user use and focused the available features for this while windows was designed to be simpler, to run on slower hardware with a single user. The way user-space programs also talk to the OS is completely different on X86 as it would be on ARM or MIPS for example, forcing a multi-platform OS to make decisions based around the need to work on the hardware it is targeted for.

These OS specific interactions are usually called "system calls" and encompass how a user space program interacts with the hardware through the OS completely, they fundamentally differ based on the function of the OS and thus a program that does it's work through system calls needs to be OS specific.

The Program Loader

In addition to system calls, each OS provides a different method to load a program from the secondary storage medium and into memory, in order to be loadable by a specific OS the program must contain a special header which describes to the OS how it may be loaded and run.

This header used to be simple enough that writing a loader for a different format was almost trivial, however with modern formats such as elf which support advanced features such as dynamic linking and weak declarations it is now near impossible for an OS to attempt to load binaries which were not designed for it, this means, even if there were not the system call incompatibilities it is immensely difficult to even place a program in ram in a way in which it can be run.

Libraries

Programs rarely use system calls directly however, they almost exclusively gain their functionality though libraries which wrap the system calls in a slightly friendlier format for the programming language, for example, C has the c standard libraries and glibc under Linux and similar and win32 libs under Windows NT and above, most other programming languages also have similar libraries which wrap system functionality in an appropriate way.

These libraries can some degree even overcome the cross platform issues as described above, there are a range of libraries which are designed around providing a uniform platform to applications which internally managing calls to a wide range of OSes such as SDL, this means that though programs cannot be binary compatible, programs which use these libraries can have common source between platforms, making porting as simple as recompiling.

Exceptions to the Above

Despite all I have said here, there have been attempts to overcome the limitations of not being able to run programs on more than one operating system. Some good examples are the Wine project which has successfully emulated both the win32 program loader, binary format and system libraries allowing Windows programs to run on various UNIXes. There is also a compatibility layer allowing several BSD UNIX operating systems to run Linux software and of course Apple's own shim allowing one to run old MacOS software under MacOS X.

However these projects work through enormous levels of manual development effort. Depending on how different the two OSes are the difficulty ranges from a fairly small shim to near complete emulation of the other OS which is often more complex than writing an entire operating system in itself and so this is the exception and not the rule.

share|improve this answer
    
+1 "Why is software OS specific?" Because history. –  Paul Draper 3 hours ago
add comment

Is the compiler doing anything OS specific when compiling this?

Probably. At some point during the compiling and linking process, your code is turned into an OS-specific binary and linked with any required libraries. Your program has to be saved in a format that the operating system expects so that the OS can load the program and start executing it. Furthermore, you're calling the standard library function printf(), which at some level is implemented in terms of the services that the operating system provides.

Libraries provide an interface -- a layer of abstraction from the operating system and the hardware -- and that makes it possible to recompile your program for a different operating system or different hardware. But that abstraction exists at the source level -- once the program is compiled and linked, it's connected to a specific implementation of that interface that's specific to a given OS.

share|improve this answer
add comment

There are a number of reasons, but one very important reason is that the Operating System has to know how to read the series of bytes that make up your program into memory, find the libraries that go with that program and load them into memory, and then start executing your program code. In order to do this, the creators of the OS create a particular format for that series of bytes so that the OS code knows where to look for the various parts of the structure of your program. Because the major Operating Systems have different authors, these formats often have little to do with each other. In particular, the Windows executable format as little in common with the ELF format most Unix variants use. So all this loading, dynamic linking and executing code has to be OS specific.

Next, each OS provides a different set of libraries for talking to the hardware layer. These are the APIs you mention, and they are generally libraries that present a simpler interface to the developer while translating it to more complex, more specific calls into the depths of the OS itself, these calls often being undocumented or secured. This layer is often quite grey, with newer "OS" APIs are built partially or entirely on older APIs. For example, in Windows, many of the newer APIs Microsoft has created over the years are essentially layers on top of the original win32 APIs.

An issue that does not arise in your example, but that is one of the bigger ones that developers face is the interface with the window manager, to present a graphical UI. Whether the window manager is part of the "OS" sometimes depends on your point of view, as well as the OS itself, with the GUI in Windows being integrated with the OS at a deeper level, while the GUIs on Linux and OS X being more directly separated. This is very important because today what people typically call "The Operating System" is a much bigger beast than what textbooks tend to describe, as it includes many, many application level components.

Finally, not strictly an OS issue, but an important one in executable file generation is that different machines have different assembly language targets, and so the actual generated object code must different. This isn't strictly speaking an "OS" issue but rather a hardware issue, but does mean that you will need different builds for different hardware platforms.

share|improve this answer
1  
It may be worthwhile to note that simpler executable formats can be loaded using only a tiny amount of RAM (if any) beyond that required to hold the loaded code, while more complex formats may require a much larger RAM footprint during, and in some cases even after, loading. MS-DOS would load COM files up to 63.75K by simply reading sequential bytes to RAM starting at offset 0x100 of an arbitrary segment, load CX with the ending address, and jump to that. Single-pass compilation could be accomplished without back-patching (useful with floppies) by... –  supercat 15 hours ago
    
...having the compiler include with each routine a list of all patch-points, each of which would include the address of the previous such list, and putting the address of the last list at the end of the code. The OS would just load the code as raw bytes, but a small routine within the code could apply all necessary address patches before running the main portion of the code. –  supercat 15 hours ago
add comment

Software is not always OS specific. Both Java and the earlier p-code system (and even ScummVM) allow for software that is portable across Operating Systems. Infocom (makers of Zork and the Z-machine), also had a relational database based on another virtual machine. However, at some level something has to translate even those abstractions into actual instructions to be executed on a computer.

share|improve this answer
1  
Java runs on a virtual machine, though, which isn't cross-OS. You have to use a different JVM binary for each OS –  Izkata yesterday
1  
@Izkata True, but you don't recompile the software (just the JVM). Also, see my last sentence. But I will point out Sun did have a micro-processor which could directly execute byte-code. –  Elliott Frisch yesterday
1  
Java is an OS, although it's not usually thought of as one. Java software is specific to the Java OS, and there are Java OS emulators for most "real" OSes. But you could do the same thing with any host and target OS - like running Windows software on Linux using WINE. –  immibis 23 hours ago
    
@immibis I would be more specific. The Java Foundation Classes (JFC, Java's standard library) is a framework. Java itself is a language. The JVM is similar to an OS: it has "Virtual Machine" in its name, and performs similar functions to an OS from the perspective of the code running in it. –  John Gaughan 11 hours ago
add comment

The diagram has the "application" layer (mostly) separated from the "operating system" layer by the "libraries", and that implies that "application" and "OS" don't need need to know about each other. That is a simplification in the diagram, but it's not quite true.

The problem is that the "library" has actually three parts to it: the implementation, the interface to the application, and the interface to the OS. In principle, the first two can be made "universal" as far as the OS is concerned (it depend on where you slice it), but the third part - the interface to the OS - generally cannot. The interface to the OS will necessarily depend on the OS, the APIs it provides, the packaging mechanism (e.g. the file format used by Windows DLL), etc.

Because the "library" is generally made available as a single package, it means that once the program picks a "library" to use, it commits to a specific OS. This happens one of two ways: a) the programmer picks completely in advance, and then the binding between the library and the application can be universal, but the library itself is bound to the OS; or b) the programmer sets things up so the library is selected when you run the program, but then the binding mechanism itself, between the program and the library, is OS-dependent (e.g. The DLL mechanism in Windows). Each has it's advantages and disadvantages, but either way you have to make a choice in advance.

Now, this doesn't mean that it's impossible to do, but you have to be very clever. To overcome the problem, you would have to go the route of picking the library at run-time, and you would have to come up with a universal binding mechanism that doesn't depend on the OS (so you are responsable to maintain it, a lot more work). Some times it's worth it.

You don't have to, but if you are going to put the effort to do that, there is a good chance you don't want to be tied to a specific processor either, so you will write a Virtual Machine and you will compile your program to a processor neutral code format.

By now you should have noticed where I'm going. Language-platforms like Java do exactly that. The Java runtime (library) defines the OS-neutral binding between your Java program and the library (how the Java runtime opens and runs your program), and it provides an implementation specific to the current OS. .NET does the same thing to an extent, except that Microsoft doesn't provide an "library" (runtime) for anything but Windows (but others do - see Mono). And, actually, Flash also does the same thing, although it's more limited in scope to the Browser.

Finally, there are ways to do the same thing without a custom binding mechanism. You could use conventional tools, but defer the binding step to the library until the user picks the OS. That's exactly what happens when you distribute the source code. The user takes your program and binds it to the processor (compile it) and OS (link it) when the user is ready to run it.

It all depends on how you slice the layers. At the end of the day, you always have a computing device made with specific hardware running specific machine code. The layers are there largely as a conceptual framework.

share|improve this answer
add comment

You say

software produced using programming languages for certain operating systems only work with them

But the program you give as an example will work on many operating systems, and even some bare-metal environments.

The important thing here is the distinction between the source code and the compiled binary. The C programming language is specifically designed to be OS independent in source form. It does this by leaving the interpretation of things like "print to the console" up to the implementer. But C may be complied to something which is OS specific (see other answers for reasons). For example, the PE or ELF executable formats.

share|improve this answer
2  
It seems quite clear that the OP is asking about binaries, not source code. –  Caleb 16 hours ago
add comment

From another answer of mine:

Consider early DOS machines, and what Microsoft's real contribution to the world was:

Autocad had to write drivers for each printer they could print to. So did lotus 1-2-3. In fact, if you wanted your software to print, you had to write your own drivers. If there were 10 printers, and 10 programs, then 100 different pieces of essentially the same code had to be written separately and independently.

What windows 3.1 tried to accomplish (along with GEM, and so many other abstraction layers) is make it so the printer manufacturer wrote one driver for their printer, and the programmer wrote one driver for the windows printer class.

Now with 10 programs and 10 printers, only 20 pieces of code have to be written, and since the microsoft side of the code was the same for everyone, then examples from MS meant that you had very little work to do.

Now a program wasn't restricted to just the 10 printers they chose to support, but all the printers whose manufacturers provided drivers for in windows.

So the OS provides services to the applications so the applications don't have to do work that is redundant.

Your example C program uses printf, which sends characters to stdout - an OS specific resource that will display the characters on a user interface. The program doesn't need to know where the user interface is - it could be in DOS, it could be in a graphical window, it could be piped to another program and used as input to another process.

Because the OS provides these resources, programmers can accomplish much more with little work.

However, even starting a program is complicated. The OS expects an executable file to have certain information at the beginning that tells the OS how it should be started, and in some cases (more advanced environments like android or iOS) what resources will be required that need approval since they touch resources outside the "sandbox" - a security measure to help protect users and other apps from misbehaving programs.

So even if the executable machine code is the same, and there are no OS resources required, a program compiled for Windows won't run on an OS X operating system without an additional emulation or translation layer, even on the same exact hardware.

Early DOS style operating systems could often share programs, because they implemented the same API in hardware (BIOS) and the OS hooked into the hardware to provide services. So if you wrote and compiled a COM program - which is just a memory image of a series of processor instructions - you could run it on CP/M, MS-DOS, and several other operating systems. In fact you can still run COM programs on modern windows machines. Other operating systems don't use the same BIOS API hooks, so the COM programs won't run on them without, again, an emulation or translation layer. EXE programs follow a structure that includes much more than mere processor instructions, and so along with the API issues it won't run on a machine that doesn't understand how to load it into memory and execute it.

share|improve this answer
add comment

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.