Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gitpython's clone_from() throws exceptions while cloning a repo when called through an flask API #1020

Closed
ShetGanesh opened this issue Jun 11, 2020 · 5 comments

Comments

@ShetGanesh
Copy link

@ShetGanesh ShetGanesh commented Jun 11, 2020

I have python snippet which clones a github repo over https access tokens, creates a branch modifies the content, and pushes back. Below code works perfectly fine if run directly via python interpreter on command line. But I have a use case where I m exposing this functionality as an API via python's flask server. Flask server is run through a python's "gunicorn" framework which is a Python WSGI HTTP Server. Now when I use the API call, it is throwing errors while cloning the repo. I have tested it by only running the flask server and the API call works perfectly fine and does the work without raising exceptions. But the same Flask API when run through Python's gunicorn I m getting this error.

Not sure how to get rid of that error.

Code Snippet:

 import git
    .........
    .........
    .........
    try:
        _repo = git.Repo.clone_from(
            f"https://{_token}:x-oauth-basic@{self.git_url}", _repo_dir
        )
        _new_branch = _repo.create_head(_branch)
        _repo.head.set_reference(_new_branch)
    except Exception as e:
        return False, str(e)
    _update_path = os.path.join(
        _repo_dir, f"repo1/config/"
    )
    if not os.path.exists(_update_path):
        os.mkdir(_update_path)
    with open(f"{_update_path}/users.json", "w+") as _fd:
        json.dump(_users_json, _fd, indent=4)
    _repo.git.add(A=True)
    _repo.git.commit(m=_title)
    _repo.git.push("origin", _branch)

ERROR:

2020-06-11 17:18:16,714 DEBUG: Popen(['git', 'clone', '-v', 'https://<ACCESS_TOKEN>:[email protected]/ganesh/repo1.git', '/tmp/folder-1234_1591895896.185123'], cwd=/opt/ganesh, universal_newlines=True, shell=None, istream=None)
2020-06-11 17:18:16,739 DEBUG: Cmd(['git', 'clone', '-v', 'https://<ACCESS_TOKEN>:[email protected]/ganesh/repo1.git', '/tmp/folder-1234_1591895896.185123'])'s unused stdout: Cloning into '/tmp/folder-1234_1591895896.185123'...

2020-06-11 17:18:16,740 DEBUG: AutoInterrupt wait stderr: b'Error reading command stream\n'

It looks more like gitpython is outputting something while doing the clone as a stream and nothing is reading/listening to it and it goes and raise an exception? This issue is not faced while directly cloning the repo using the commands in debug log nor when running the script via python interpreter directly. I believe this is happening because gunicorn is a WSGI HTTP Server in which case the processes spawned are handed over to Unix systems Init process (PID 1)

How this exception can be disabled for applications running behind gunicorn or anything which works more like an HTTP server? or Can this be better addressed in later versions?

@Byron
Copy link
Member

@Byron Byron commented Jun 12, 2020

Thanks for posting.
The error provided here seems to be more along the lines of DEBUG output, so it's unclear to me what the actual error as surfaced in the program is.
Personally I have no experience with what's happening if a spawned process is taken over by another entity - apparently though it has repercussions for the processing wanting to read from the spawned one.

Could you provide more details as to how the exception looks like? Maybe that gives a hint on solving this issue, for now I am absolutely puzzled and wouldn't know what GitPython should do in an environment like this.

@ShetGanesh
Copy link
Author

@ShetGanesh ShetGanesh commented Jun 12, 2020

@Byron Thanks for helping out. Here is the ERROR I'm getting in the output of the API call.

{"data":"Cmd('git') failed due to: exit code(-13)\n cmdline: git clone -v https://<ACCESS_TOKEN>:[email protected]/ganesh/repo1.git  /tmp/DSPE-1234_1591895896.185123\n stderr: 'Error reading command stream\n'","message":"error","status":1}

Here is the sample output of a successful call:

{'data': 'https://github.com/ganesh/repo1/pull/22', 'message': 'success', 'status': 0}
@ShetGanesh
Copy link
Author

@ShetGanesh ShetGanesh commented Jun 12, 2020

This part of the code is erroring out.

GitPython/git/repo/base.py

Lines 953 to 960 in 24cd6da

proc = git.clone(multi, Git.polish_url(url), clone_path, with_extended_output=True, as_process=True,
v=True, universal_newlines=True, **add_progress(kwargs, git, progress))
if progress:
handle_process_output(proc, None, progress.new_message_handler(), finalize_process, decode_streams=False)
else:
(stdout, stderr) = proc.communicate()
log.debug("Cmd(%s)'s unused stdout: %s", getattr(proc, 'args', ''), stdout)
finalize_process(proc, stderr=stderr)

@ShetGanesh
Copy link
Author

@ShetGanesh ShetGanesh commented Jun 12, 2020

This issue is resolved now.
To describe the solution, basically, we had our app run in a daemonized mode via gunicorn. Based on gitpythons code for the clone functionality it expects its output stream to be read by the calling process. This was not happening in case of a daemonised process while the API call which is made to the shell bound process was going through.
I have set certain features of the gunicorn to provide this output listener for the gitpythons clone call.

--enable-stdio-inheritance \
--error-logfile /tmp/gunicorn_error.log \
--capture-output \

This did the trick and the code now works in a daemonized environment as well.

What I do want to understand, why is git clone functionality expecting some handler to listen to its outputs and raising exceptions based on it. Why don't execute the command and give back the output without checking for listeners?

@Byron
Copy link
Member

@Byron Byron commented Jun 13, 2020

I am glad there is a way!

A handler is required only if live-progress is desired by the caller, after all initial clones can take a very long time. The alternative branch uses proc.communicate() and blocks while waiting for the output to become available.

Lastly GitPython doesn't do anything special, it merely connects pipes to stdout and stderr to be able to intercept them. If that doesn't work because a parent-process changes the way spawn works, GitPython cannot work.

Please let me know if you come across a way to spawn processes in a way that would work naturally in such a context. Without such a capability, I believe there is nothing else that can be done here.

@Byron Byron closed this Jun 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.