[FIX] runbot_merge: concurrency error in freeze wizard (hopefully)

During the 16.3 freeze an issue was noticed with the concurrency safety of the freeze wizard (because it blew up, which caused a few issues): it is possible for the cancelling of an active staging to the master branch to fail, which causes the mergebot side of the freeze to fail, but the github state is completed, which puts the entire thing in a less than ideal state. Especially with the additional issue that the branch inserter has its own concurrency issue (which maybe I should fix): if there are branches *being* forward-ported across the new branch, it's unable to see them, and thus can not create the now-missing PRs. Try to make the freeze wizard more resilient: 1. Take a lock on the master staging (if any) early on, this means if we can acquire it we should be able to cancel it, and it won't suffer a concurrency error. 2. Add the `process_updated_commits` cron to the set of locked crons, trying to read the log timeline it looks like the issue was commits being impacted on that staging while the action had started: REPEATABLE READ meant the freeze's transaction was unable to see the update from the commit statuses, therefore creating a diverging update when it cancelled the staging, which postgres then reported as a serialization error. I'd like to relax the locking of the cron (to just FOR SHARE), but I think it would work, per postgres: > SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as > SELECT in terms of searching for target rows: they will only find > target rows that were committed as of the transaction start > time. However, such a target row might have already been updated (or > deleted or locked) by another concurrent transaction by the time it > is found. In this case, the repeatable read transaction will wait > for the first updating transaction to commit or roll back (if it is > still in progress). If the first updater rolls back, then its > effects are negated and the repeatable read transaction can proceed > with updating the originally found row. But if the first updater > commits (and actually updated or deleted the row, not just locked > it) then the repeatable read transaction will be rolled back with > the message This means it would be possible to lock the cron, and then get a transaction error because the cron modified one of the records we're going to hit while it was running: as far as the above is concerned the cron's worker had "just locked" the row so it's fine to continue. However this makes it more and more likely an error will be hit when trying to freeze (to no issue, but still). We'll have to see how that ends up. Fixes #766 maybe
2025-03-15 23:45:44 +07:00 · 2023-06-15 15:25:17 +02:00 · 2023-06-15 15:25:17 +02:00 · 9260384284
commit 9260384284
parent ed0fd88854
1 changed files with 9 additions and 1 deletions
--- a/runbot_merge/models/project_freeze/init.py
+++ b/runbot_merge/models/project_freeze/init.py
@ -177,7 +177,9 @@ class FreezeWizard(models.Model):
        if self.errors:
            return self.action_open()

-        conflict_crons = self.env.ref('runbot_merge.merge_cron') | self.env.ref('runbot_merge.staging_cron')
+        conflict_crons = self.env.ref('runbot_merge.merge_cron')\
+                       | self.env.ref('runbot_merge.staging_cron')\
+                       | self.env.ref('runbot_merge.process_updated_commits')
        # we don't want to run concurrently to the crons above, though we
        # don't need to prevent read access to them
        self.env.cr.execute(
@ -190,6 +192,12 @@ class FreezeWizard(models.Model):
        # everything so the new branch is the second one, just after the branch
        # it "forks"
        master, rest = project_id.branch_ids[0], project_id.branch_ids[1:]
+        if self.bump_pr_ids and master.active_staging_id:
+            self.env.cr.execute(
+                'SELECT * FROM runbot_merge_stagings WHERE id = %s FOR UPDATE NOWAIT',
+                [master.active_staging_id]
+            )
+
        seq = itertools.count(start=1) # start reseq at 1
        commands = [
            (1, master.id, {'sequence': next(seq)}),