Take the 2-minute tour ×
Programmers Stack Exchange is a question and answer site for professional programmers interested in conceptual questions about software development. It's 100% free, no registration required.

Is there an easy way to determine which programming languages were used in a project assuming you have access to the source code?

I have been programming for quite some time, so I can recognize most of the major programming language by syntax or file extension, but I realized that I could not answer this fairly simple question (A problem I often ask myself when dealing with unfamiliar open source codebases).

I understand that you could examine each file extension and Google it to determine whether it belongs to a particular programming language. However, this seems tedious (examining every file extension for a large project) and I am guessing there might be a better way.

share|improve this question

closed as too broad by MichaelT, GlenH7, gnat, Bart van Ingen Schenau, MetaFight Jan 21 at 16:33

There are either too many possible answers, or good answers would be too long for this format. Please add details to narrow the answer set or to isolate an issue that can be answered in a few paragraphs. If this question can be reworded to fit the rules in the help center, please edit the question.

    
What does the build script say? –  MichaelT Jun 18 '14 at 3:15
1  
If you have access to the source and project files, and can recognize most languages and file extensions... then just search on those you don't recognize. –  GrandmasterB Jun 18 '14 at 4:07
    
I guess most open source projects leave enough information in the docs, makefile or file extensions that you can identify the language(s) easily. And those which don't are mostly uninteresting. –  Doc Brown Jun 18 '14 at 19:43

3 Answers 3

https://github.com/github/linguist does pretty much exactly what you want:

Linguist defines a list of all languages known to GitHub in a yaml file. In order for a file to be highlighted, a language and a lexer must be defined there.

Most languages are detected by their file extension. For disambiguating between files with common extensions, we first apply some common-sense heuristics to pick out obvious languages. After that, we use a statistical classifier. This process can help us tell the difference between, for example, .h files which could be either C, C++, or Obj-C.

share|improve this answer

Look at the Makefile and find out which compiler is used. If this is gcc, look at its rules for guessing the language.

For scripting languages, look at the first line, the one that starts with #!. If it is not present, find in other scripts how the one you are examining is invoqued.

Pay attention to how the $PATH variable is set.

share|improve this answer

This is a much more difficult question than it first appears. There was a time when major projects would use only a handful of languages and tools, but no longer. For example, a typical project these days could easily use 10 or more languages, once you include browsers, code, scripting, make, configuration and so on, and there are literally hundreds of languages with at least some adherents.

One project I just checked has over 50 file extensions and 8 languages and tools. At least 3 of them use XML. I may have missed some.

If your purpose is to find the major implementation language(s) then a simple cross-reference of lines of code against file extension will tell you which files to concentrate on.

If you can identify a specific IDE such as NetBeans, Eclipse or Visual Studio this can help a lot. However, even something as obvious as Ruby on Rails may include a range of additional languages and tools for specific purposes.

If your purpose is to make sure you have all the tools you need to build the project then search out the build scripts. That won't always help -- many IDEs do not play by the rules.

If you have a number of files in an unfamiliar programming language, or an unfamiliar dialect, it can be challenging to find out what it belongs to. I have resorted to web searches of distinctive keywords.

Ultimately it's detective work and a bunch of guesses, with scientific data gathering to narrow the possibilities.

share|improve this answer

Not the answer you're looking for? Browse other questions tagged or ask your own question.