Programmers Stack Exchange is a question and answer site for professional programmers interested in conceptual questions about software development. It's 100% free.

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

Our project is about 11GB, 10 of which are binary data (.png images). Consequently, a git diff or git status operations take up more than a minute. Fortunately all data files are separated into a folder with the wonderful name data. The assignment is "Avoid compressing, diffing and other costly operations on binary files."

  • It was considered splitting the project into two repos. Then data would be an external repo, that is checked out by the main source code repo. It was decided that the overhead of keeping the repos in sync would be too much, especially for the artists, who work with the data files.

  • Explicitly telling git those files are binary, excluding files from diffs were considered, but those seem like only a partial solution to the question.

I feel that git attributes is the solution, but how? Or is there a better architecture than a monolithic repo?

share|improve this question
1  
The first big question here is how important are those data files. Does your program need all of those images available in order to do anything useful, or can it get away with a small subset during typical development/testing? – Ixrec Mar 11 at 17:02
    
@Ixrec, the images are actually more important than the source code. All of them must be present, and .png checksums are checked always for corrupt files. – Vorac Mar 14 at 9:56
1  
Why isn't this question on stack overflow? The Q. Seems exactly suited to it. – spirc Mar 17 at 0:41
    
@spirc this question straddles the line between "help with a software tool" which is on-topic at SO, and "version control strategy" which is on-topic here. Since it is not asking for what git command to execute to do something, it is not clearly on the SO side of the line so I voted to leave it open here. – Snowman Mar 18 at 21:49
    
@Snowman thanks for the response. Which item of the on-topic list does that fit into? programmers.stackexchange.com/help/on-topic – spirc Mar 21 at 0:06
up vote 16 down vote accepted

You can use git-lfs or similar tools (git-fat, git-annex, etc.). Those tools basically replace the binary files in your repo with small text file with hashes, and store the actual binary data in a non-git way - like a network share.

Makes diffs and everything superfast as only hashes get compared, and is - at least for git-lfs - transparent to the user (after installing once).

Afaik git-lfs is supported by github, gitlab, VisualStudio, and is open source.

share|improve this answer
2  
Have you tried using git-lfs on a project with many gigabytes of assets with a mixed developer/artist team? I'm interested to know if people are using git-lfs for projects such as games and animation. Since its still fairly new at time of writing. From my own experience the barrier of entry to git for less technical users is already very high, so having an extra layer for file management on-top of it - may be difficult for people to use unless they're already comfortable with git. – ideasman42 Mar 15 at 3:19
    
Only for up to around ~1GB of data, sorry. But git-lfs should add no additional steps for endusers, it should be completely transparent. – kat0r Mar 15 at 8:40
    
This seems to be the correct answer, if some problems arise during the integration I will report back here. So the installation procedure needs to be completed only once on the server, and not on each client machine? – Vorac Mar 15 at 12:06
    
Afaik you need to install a small client addin, too, check the github page. But that should be easy to roll out with a group policy/simpler than any alternative. – kat0r Mar 15 at 12:12

Use both GIT & SVN repos

If the binary files can be separated logically from the source, you might consider using git for text files, and a non DVCS such as subversion for the binary files.

A project I work on does this since we have many GB for per-compiled libraries (for OSX/Win32 dependencies), which we need to keep versioned.


On the other hand if you have non-technical users, using two version control systems may be problematic. However if the artists aren't working on code you could provide a script to perform the update, and they can use subversion to commit binary assets.

Use SVN (with git svn)

While this trade-off isn't always so nice for developers who are used to using regular git, you could use SVN for the main repository, and developers can use git svn tools.

This does make it a little more work for developers using git, but means for everyone who isn't familiar with DVCS (or VCS in general) - they can use SVN's simple model without having to use multiple complex version control systems.


git-lfs is an option too, but I didn't use it so can't speak to how well it works.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.