Git clone to memory

Question

I'm making an app that is required to clone git repositories by a link to it and analyze the codebase.

Pulling to the disk using the git clone command wouldn't scale well. Is there any way to clone to memory, or at least to get a stream of file contents, instead of intermediary disk I/O.

I hear we have "disks" which are essentially big buckets of memory with really good performance. — Snowman, Jun 15 at 17:46

Basile Starynkevitch · Accepted Answer · 2015-06-16 07:15:58Z

up vote 4 down vote accepted

It is operating system specific.

If your application runs on Linux (so perhaps also on Android), you can use a memory based file system like tmpfs. So git pull (or git clone, etc...) would put it in that FS which sits in virtual memory, and it will run quite fast. However, the bottleneck is probably the network (unless your application is running in some datacenter and can use a big lot of bandwidth).

BTW, on Linux, the page cache is quite effective. So even with ordinary (disk-based) file system, performance can be quite good in practice.

Then your application would access these files thru usual file-related syscalls or library functions (e.g. <stdio.h> in C), and that would only use memory (without any real disk IO) and should scale quite well.

Of course, you'll need to have enough RAM for that to work well.

edited Jun 16 at 7:15

answered Jun 15 at 17:40

Basile Starynkevitch
10.7k13042

I've updated the question to match that the OP is asking about git clone, not git pull. – MichaelT Jun 15 at 18:54

1

I wouldn't bother with tmpfs. Basically, if you just clone normally to the disk, everything will stay in cache, unless you run out of RAM, and only then will it need to hit the disk. If you clone to a tmpfs, everything will stay in RAM, unless you run out of RAM, then the tmpfs will be swapped out to disk. IOW: it's more or less the same either way. After all, tmpfs is basically backed by the page cache. Linux disk I/O really is quite good. – Jörg W Mittag Jun 15 at 19:31

add a comment |

Steven Burnap · Answer 2 · 2015-06-16 00:15:34Z

There is a fundamental conflict between with the git model and what you are trying to do here. git clone makes a full copy of the repository on your local machine. The idea of git is to keep a complete copy of the repository locally. There is nearly no communication with the server with most commands. The only time there is communication is when you git fetch (pull down all branch changes from the server) or git push (push all local branch changes to the remote repository) or of course when you make a complete copy with git clone.

So really, I think you are stuck. What I gather is that you want to pull down lots of different projects, run analysis on each, then delete the results. That's not something git was designed for as it is fundamentally about keeping and modifying a local copy of a repository for a long period of time.

I do question your assumption that pulling to memory would even make a difference. Network operations are a couple orders of magnitude slower than disk operations. A decent non-SSD drive is going to give you around 150 MB/s. And SSD is going to triple that. Unless you have a much better connection than I do, pulling to memory would not speed things up at all as your OS is spending all its time waiting for network requests to the git server.

If you are working with github, you may be better off with the "Download ZIP" method on each project page. This will download a branch without all the extraneous branch/history information. That should be faster than doing a git pull for cases where you only need the latest version of one branch.

Yes. That's my mistake. I was more thinking about git clone. — Krzysztof Wende, Jun 15 at 17:53
Do You think github has any limits on repo cloning per IP address? — Krzysztof Wende, Jun 15 at 17:54
git clone pulls down all remote tracking branches. If you are just interested in analyzing the latest version of master, then this is far more than you want. — Steven Burnap, Jun 15 at 17:56
@StevenBurnap I've updated the question to match that the OP is asking about git clone, not git pull. — MichaelT, Jun 15 at 18:54

asked	2 months ago
viewed	143 times
active	2 months ago

current community

your communities

more stack exchange communities

Git clone to memory

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged git repository or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Git clone to memory

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged git repository or ask your own question.

Related

Hot Network Questions