From 7348e4d7a4d3565433423cc2d91307d40fd46d35 Mon Sep 17 00:00:00 2001 From: Xavier Morel Date: Fri, 11 Aug 2023 11:50:38 +0200 Subject: [PATCH] [IMP] runbot_merge: ensure at least 1s between mutating GH calls Mostly a temporary safety feature after the events of 07-31: it's still not clear whether that was a one-off issue or a change in policy (I was not able to reproduce locally even doing several set_refs a second) and the gh support is not super talkative, but it probably doesn't hurt to commit the workaround until #247 gets implemented. On 2023-07-31, around 08:30 UTC, `set_ref` started failing, a lot (although oddly enough not continuously), with the unhelpful message that > 422: Reference cannot be updated This basically broke all stagings, until a workaround was implemented by adding a 1s sleep before `set_ref` to ensure no more than 1 `set_ref` per second, which kinda sorta has been the github recommendation forever but had never been an issue before. Contributing to this suspicion is that in late 2022, the documentation of error 422 on `PATCH git/refs/{ref}` was updated to: > Validation failed, or the endpoint has been spammed. Still would be nice if GH was clear about it and sent a 429 instead. Technically the recommendation is: > If you're making a large number of POST, PATCH, PUT, or DELETE > requests for a single user or client ID, wait at least one second > between each request. So... actually implement that. On a per-worker basis as for the most part these are serial processes (e.g. crons), we can still get above the rate limit due to concurrent crons but it should be less likely. Also take `Retry-After` in account, can't hurt, though we're supposed to retry just the request rather than abort the entire thing. Maybe a future update can improve this handling. Would also be nice to take `X-RateLimit` in account, although that's supposed to apply to *all* requests so we'd need a second separate timestamp to track it. Technically that's probably also the case for `Retry-After`. And fixing #247 should cut down drastically on the API calls traffic as staging is a very API-intensive process, especially with the sanity checks we had to add, these days we might be at 4 calls per commit per PR, and up to 80 PRs/staging (5 repositories and 16 batches per staging), with 13 live branches (though realistically only 6-7 have significant traffic, and only 1~2 get close to filling their staging slots). --- runbot_merge/github.py | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/runbot_merge/github.py b/runbot_merge/github.py index 1461dc82..ed3950b0 100644 --- a/runbot_merge/github.py +++ b/runbot_merge/github.py @@ -6,6 +6,7 @@ import logging.handlers import os import pathlib import pprint +import time import unicodedata import requests @@ -57,6 +58,7 @@ class GH(object): def __init__(self, token, repo): self._url = 'https://api.github.com' self._repo = repo + self._last_update = 0 session = self._session = requests.Session() session.headers['Authorization'] = 'token {}'.format(token) session.headers['Accept'] = 'application/vnd.github.symmetra-preview+json' @@ -103,8 +105,16 @@ class GH(object): """ :type check: bool | dict[int:Exception] """ + if method.casefold() != 'get': + to_sleep = 1. - (time.time() - self._last_update) + if to_sleep > 0: + time.sleep(to_sleep) + path = f'/repos/{self._repo}/{path}' r = self._session.request(method, self._url + path, params=params, json=json) + if method.casefold() != 'get': + self._last_update = time.time() + int(r.headers.get('Retry-After', 0)) + self._log_gh(_gh, r) if check: try: