Take the 2-minute tour ×
Programmers Stack Exchange is a question and answer site for professional programmers interested in conceptual questions about software development. It's 100% free, no registration required.

Is there an easy way to determine which programming languages were used in a project assuming you have access to the source code?

I have been programming for quite some time, so I can recognize most of the major programming language by syntax or file extension, but I realized that I could not answer this fairly simple question (A problem I often ask myself when dealing with unfamiliar open source codebases).

I understand that you could examine each file extension and Google it to determine whether it belongs to a particular programming language. However, this seems tedious (examining every file extension for a large project) and I am guessing there might be a better way.

share|improve this question
    
What does the build script say? –  MichaelT Jun 18 at 3:15
1  
If you have access to the source and project files, and can recognize most languages and file extensions... then just search on those you don't recognize. –  GrandmasterB Jun 18 at 4:07
    
I guess most open source projects leave enough information in the docs, makefile or file extensions that you can identify the language(s) easily. And those which don't are mostly uninteresting. –  Doc Brown Jun 18 at 19:43
add comment

3 Answers

Look at the Makefile and find out which compiler is used. If this is gcc, look at its rules for guessing the language.

For scripting languages, look at the first line, the one that starts with #!. If it is not present, find in other scripts how the one you are examining is invoqued.

Pay attention to how the $PATH variable is set.

share|improve this answer
add comment

https://github.com/github/linguist does pretty much exactly what you want:

Linguist defines a list of all languages known to GitHub in a yaml file. In order for a file to be highlighted, a language and a lexer must be defined there.

Most languages are detected by their file extension. For disambiguating between files with common extensions, we first apply some common-sense heuristics to pick out obvious languages. After that, we use a statistical classifier. This process can help us tell the difference between, for example, .h files which could be either C, C++, or Obj-C.

share|improve this answer
add comment

This is a much more difficult question than it first appears. There was a time when major projects would use only a handful of languages and tools, but no longer. For example, a typical project these days could easily use 10 or more languages, once you include browsers, code, scripting, make, configuration and so on, and there are literally hundreds of languages with at least some adherents.

One project I just checked has over 50 file extensions and 8 languages and tools. At least 3 of them use XML. I may have missed some.

If your purpose is to find the major implementation language(s) then a simple cross-reference of lines of code against file extension will tell you which files to concentrate on.

If you can identify a specific IDE such as NetBeans, Eclipse or Visual Studio this can help a lot. However, even something as obvious as Ruby on Rails may include a range of additional languages and tools for specific purposes.

If your purpose is to make sure you have all the tools you need to build the project then search out the build scripts. That won't always help -- many IDEs do not play by the rules.

If you have a number of files in an unfamiliar programming language, or an unfamiliar dialect, it can be challenging to find out what it belongs to. I have resorted to web searches of distinctive keywords.

Ultimately it's detective work and a bunch of guesses, with scientific data gathering to narrow the possibilities.

share|improve this answer
add comment

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.