Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent E2E test failure: ERROR: 1 setup jobs failed. Check logs above for details #3796

Open
wallrj opened this issue Mar 24, 2021 · 3 comments

Comments

@wallrj
Copy link
Member

@wallrj wallrj commented Mar 24, 2021

The E2E tests sometimes fail during setup with the message:

 ERROR: 1 setup jobs failed. Check logs above for details. 

-- https://prow.build-infra.jetstack.net/view/gcs/jetstack-logs/pr-logs/pull/jetstack_cert-manager/3647/pull-cert-manager-e2e-v1-20/1359494849920765955#1:build-log.txt%3A668 (thanks @irbekrm )

The setup jobs are launched as background shell jobs:

But the logs of all the setup jobs are mixed up, so it's difficult to know which job failed and what went wrong.

Simplest solution would be to disable the parallelism and run the setup jobs in series.

Alternatively we might try and keep track of exactly which jobs failed and show the logs of those jobs on failure.

And that should allow us to find out which of the jobs keeps failing.

/kind flake
/cc @RinkiyaKeDad

@RinkiyaKeDad
Copy link
Contributor

@RinkiyaKeDad RinkiyaKeDad commented Mar 24, 2021

/assign

@RinkiyaKeDad
Copy link
Contributor

@RinkiyaKeDad RinkiyaKeDad commented Mar 26, 2021

Hi @wallrj! I am relatively new to writing bash scripts so just wanted to confirm a few things:

  1. Here wait $job returns a boolean value right? And if that is false then EXIT is incremented. So that would mean the simplest solution you were referring to would have this pseudo code:
for job in $(jobs -p); do
    if !(wait $job)
        echo "{logs}"
        break 
done
  1. Like you mentioned,

Alternatively we might try and keep track of exactly which jobs failed and show the logs of those jobs on failure.

Would this be recommended over the first approach?

I can see that the benefit of this would be that we'll be able to see all the jobs that fail in a single test run.

But the downside would be that if we know that one job has failed, we know the entire test will fail so we would be using unnecessary resources running all other jobs.

  1. The current message just says to check logs for details. Can you please point me to how I would get access to these logs in the shell script?

Thanks!

@irbekrm
Copy link
Collaborator

@irbekrm irbekrm commented Apr 15, 2021

/priority important-soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants