Take the 2-minute tour ×
Programmers Stack Exchange is a question and answer site for professional programmers interested in conceptual questions about software development. It's 100% free.

I'm making an app that is required to clone git repositories by a link to it and analyze the codebase.

Pulling to the disk using the git clone command wouldn't scale well. Is there any way to clone to memory, or at least to get a stream of file contents, instead of intermediary disk I/O.

share|improve this question
4  
I hear we have "disks" which are essentially big buckets of memory with really good performance. –  Snowman Jun 15 at 17:46

2 Answers 2

up vote 4 down vote accepted

It is operating system specific.

If your application runs on Linux (so perhaps also on Android), you can use a memory based file system like tmpfs. So git pull (or git clone, etc...) would put it in that FS which sits in virtual memory, and it will run quite fast. However, the bottleneck is probably the network (unless your application is running in some datacenter and can use a big lot of bandwidth).

BTW, on Linux, the page cache is quite effective. So even with ordinary (disk-based) file system, performance can be quite good in practice.

Then your application would access these files thru usual file-related syscalls or library functions (e.g. <stdio.h> in C), and that would only use memory (without any real disk IO) and should scale quite well.

Of course, you'll need to have enough RAM for that to work well.

share|improve this answer
    
I've updated the question to match that the OP is asking about git clone, not git pull. –  MichaelT Jun 15 at 18:54
1  
I wouldn't bother with tmpfs. Basically, if you just clone normally to the disk, everything will stay in cache, unless you run out of RAM, and only then will it need to hit the disk. If you clone to a tmpfs, everything will stay in RAM, unless you run out of RAM, then the tmpfs will be swapped out to disk. IOW: it's more or less the same either way. After all, tmpfs is basically backed by the page cache. Linux disk I/O really is quite good. –  Jörg W Mittag Jun 15 at 19:31

There is a fundamental conflict between with the git model and what you are trying to do here. git clone makes a full copy of the repository on your local machine. The idea of git is to keep a complete copy of the repository locally. There is nearly no communication with the server with most commands. The only time there is communication is when you git fetch (pull down all branch changes from the server) or git push (push all local branch changes to the remote repository) or of course when you make a complete copy with git clone.

So really, I think you are stuck. What I gather is that you want to pull down lots of different projects, run analysis on each, then delete the results. That's not something git was designed for as it is fundamentally about keeping and modifying a local copy of a repository for a long period of time.

I do question your assumption that pulling to memory would even make a difference. Network operations are a couple orders of magnitude slower than disk operations. A decent non-SSD drive is going to give you around 150 MB/s. And SSD is going to triple that. Unless you have a much better connection than I do, pulling to memory would not speed things up at all as your OS is spending all its time waiting for network requests to the git server.

If you are working with github, you may be better off with the "Download ZIP" method on each project page. This will download a branch without all the extraneous branch/history information. That should be faster than doing a git pull for cases where you only need the latest version of one branch.

share|improve this answer
1  
Yes. That's my mistake. I was more thinking about git clone. –  Krzysztof Wende Jun 15 at 17:53
    
Do You think github has any limits on repo cloning per IP address? –  Krzysztof Wende Jun 15 at 17:54
    
git clone pulls down all remote tracking branches. If you are just interested in analyzing the latest version of master, then this is far more than you want. –  Steven Burnap Jun 15 at 17:56
    
I have no clue about what rate limiting github does. –  Steven Burnap Jun 15 at 17:56
1  
@StevenBurnap I've updated the question to match that the OP is asking about git clone, not git pull. –  MichaelT Jun 15 at 18:54

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.