Programmers Stack Exchange is a question and answer site for professional programmers interested in conceptual questions about software development. It's 100% free.

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

I have several projects on Python I am currently versioning with Git. I also have the input-output data that is gradually changing over the course of the project. I need to be able to re-run my pipeline later exactly in a way it was executed at a given time.

What would be the best way of doing it without inserting the source data into the git repository?

share|improve this question

I'd keep the data versioned separately.

I don't know what your gradual data change workflow is. You could use version control or just named directories, with some sort of de-duplication or plainly. VCSs are usually a poor choice for large binary data.

This way you can always check out the data independently of code, and check in code independently of the data.

share|improve this answer
    
I was wondering if there was a way to couple references in code to the external file pointers in so that I don't have to modify the file names throughout my code. – Andrei Kucharavy Aug 12 '15 at 14:45
1  
I'd rather modify hard-coded file names throughout my code so that they referred to names specified on the command line or a config file. OTOH if you only name versioned directories differently, but the files inside all use the same naming structure, then you can just pick a directory and copy / symlink it to your 'usual data files' directory, to which your code refers. (I hope you don't mix data and code? If you do, try not to.) – 9000 Aug 12 '15 at 15:17
    
If you keep your code and data under version control in two different repositories there is always the problem of which version of the data works with which version of the code. To resolve this, they can be linked by using sub-repositories. This also resolves the issue of external references. GIT supports sub-repositories, see here. – NZD Aug 14 '15 at 8:32

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.