[IMP] runbot_merge: precisely filter stagings before validating them

Before this, we would "roughly" select stagings by looking at stagings
whose heads matched a specific sha then validating them all. This
could perform extra validations on stagings once in a while but this
was assumed not to be much an issue, at least originally.

However two changes later on have contributed to this likely being the
cause of #429 (stagings never timing out):

* heads of the staging branches are uniquifier commits stored in the
  heads map, but the actual heads of the stagings are also stored
  there, some of which are no-ops (hence the uniquifiers) so assuming
  repos A and B, if a staging contains PRs touching A then the head of
  B actual will also be a head of B
* when a staging is validated, if it *contains* any pending result the
  timeout limit gets bumped back

The issue here is that if a success / failure status is lost (which
would be the most common reason for timeouts) *and* someone has forked
and is regularly rebuilding a branch-head used as-is by a staging,
each of those rebuilds will trigger a validation of the staging, which
will find that one of the statuses is still pending (because we missed
the success / failure), which will bump up the timeout limit,
continuing until the branch stops getting rebuilt.

This is probably one of the reasons why some stagings last for *way*
more than 2h, though it is far from explaining all of them: 90% of the
stagings lasting more than *3*h end up succeeding. Tho it's always
possible that this is because someone notices and resends a success
for the missing status it seems somewhat doubtful. Oh well.

Also fix the incorrect log call on `update_timeout_limit` triggering.
This commit is contained in:
Xavier Morel 2021-01-20 09:22:11 +01:00
parent b0576d1d41
commit 08b8da08df

View File

@ -1539,9 +1539,14 @@ class Commit(models.Model):
pr = PRs.search([('head', '=', c.sha)])
if pr:
pr._validate(st)
# heads is a json-encoded mapping of reponame:head, so chances
# are if a sha matches a heads it's matching one of the shas
stagings = Stagings.search([('heads', 'ilike', c.sha)])
stagings = Stagings.search([('heads', 'ilike', c.sha)]).filtered(
lambda s, h=c.sha: any(
head == h
for repo, head in json.loads(s.heads).items()
if not repo.endswith('^')
)
)
if stagings:
stagings._validate()
except Exception:
@ -1670,7 +1675,7 @@ class Stagings(models.Model):
vals = {'state': st}
if update_timeout_limit:
vals['timeout_limit'] = fields.Datetime.to_string(datetime.datetime.now() + datetime.timedelta(minutes=s.target.project_id.ci_timeout))
_logger.debug("staging %s: got pending status, bumping timeout to %s (%s)", vals['timeout_limit'], cmap)
_logger.debug("%s got pending status, bumping timeout to %s (%s)", self, vals['timeout_limit'], cmap)
s.write(vals)
def action_cancel(self):