[IMP] runbot_merge: ensure at least 1s between mutating GH calls

Mostly a temporary safety feature after the events of 07-31: it's still not clear whether that was a one-off issue or a change in policy (I was not able to reproduce locally even doing several set_refs a second) and the gh support is not super talkative, but it probably doesn't hurt to commit the workaround until #247 gets implemented. On 2023-07-31, around 08:30 UTC, `set_ref` started failing, a lot (although oddly enough not continuously), with the unhelpful message that > 422: Reference cannot be updated This basically broke all stagings, until a workaround was implemented by adding a 1s sleep before `set_ref` to ensure no more than 1 `set_ref` per second, which kinda sorta has been the github recommendation forever but had never been an issue before. Contributing to this suspicion is that in late 2022, the documentation of error 422 on `PATCH git/refs/{ref}` was updated to: > Validation failed, or the endpoint has been spammed. Still would be nice if GH was clear about it and sent a 429 instead. Technically the recommendation is: > If you're making a large number of POST, PATCH, PUT, or DELETE > requests for a single user or client ID, wait at least one second > between each request. So... actually implement that. On a per-worker basis as for the most part these are serial processes (e.g. crons), we can still get above the rate limit due to concurrent crons but it should be less likely. Also take `Retry-After` in account, can't hurt, though we're supposed to retry just the request rather than abort the entire thing. Maybe a future update can improve this handling. Would also be nice to take `X-RateLimit` in account, although that's supposed to apply to *all* requests so we'd need a second separate timestamp to track it. Technically that's probably also the case for `Retry-After`. And fixing #247 should cut down drastically on the API calls traffic as staging is a very API-intensive process, especially with the sanity checks we had to add, these days we might be at 4 calls per commit per PR, and up to 80 PRs/staging (5 repositories and 16 batches per staging), with 13 live branches (though realistically only 6-7 have significant traffic, and only 1~2 get close to filling their staging slots).
2025-03-15 23:45:44 +07:00 · 2023-08-11 11:50:38 +02:00 · 2023-08-11 11:50:38 +02:00 · 7348e4d7a4
commit 7348e4d7a4
parent 85a74a9e32
1 changed files with 10 additions and 0 deletions
--- a/runbot_merge/github.py
+++ b/runbot_merge/github.py
@ -6,6 +6,7 @@ import logging.handlers
 import os
 import pathlib
 import pprint
+import time
 import unicodedata

 import requests
@ -57,6 +58,7 @@ class GH(object):
    def __init__(self, token, repo):
        self._url = 'https://api.github.com'
        self._repo = repo
+        self._last_update = 0
        session = self._session = requests.Session()
        session.headers['Authorization'] = 'token {}'.format(token)
        session.headers['Accept'] = 'application/vnd.github.symmetra-preview+json'
@ -103,8 +105,16 @@ class GH(object):
        """
        :type check: bool | dict[int:Exception]
        """
+        if method.casefold() != 'get':
+            to_sleep = 1. - (time.time() - self._last_update)
+            if to_sleep > 0:
+                time.sleep(to_sleep)
+
        path = f'/repos/{self._repo}/{path}'
        r = self._session.request(method, self._url + path, params=params, json=json)
+        if method.casefold() != 'get':
+            self._last_update = time.time() + int(r.headers.get('Retry-After', 0))
+
        self._log_gh(_gh, r)
        if check:
            try: