runbot/runbot_merge
Xavier Morel d4fa1fd353 [CHG] *: rewrite commands set, rework status management
This commit revisits the commands set in order to make it more
regular, and limit inconsistent command-sets, although it includes
pseudo-command aliases for common tasks now removed from the core set.

Hard Errors
===========

The previous iteration of the commands set would ignore any
non-command term in a command line. This has been changed to hard
error (and ignoring the entire thing) if any command is unknown or
invalid.

This fixes inconsistent / unexpected interpretations where a user
sends a command, then writes a novel on the same line some words of
which happen to *also* be commands, leading to merge states they did
not expect. They should now be told to fuck off.

Priority Restructuring
----------------------

The numerical priority system was pretty messy in that it confused
"staging priority" (in ways which were not entirely straightforward)
with overrides to other concerns.

This has now being split along all the axis, with separate command
subsets for:

- staging prioritisation, now separated between `default`, `priority`,
  and `alone`,

  - `default` means PRs are picked by an unspecified order when
    creating a staging, if nothing better is available
  - `priority` means PRs are picked first when staging, however if
    `priority` PRs don't fill the staging the rest will be filled with
    `default`, this mode did not previously exist
  - `alone` means the PRs are picked first, before splits, and only
    `alone` PRs can be part of the staging (which usually matches the
    modename)
- `skipchecks` overrides both statuses and approval checks, for the
  batch, something previously implied in `p=0`, but now
  independent. Setting `skipchecks` basically makes the entire batch
  `ready`.

  For consistency this also sets the reviewer implicitly: since
  skipchecks overrides both statuses *and approval*, whoever enables
  this mode is essentially the reviewer.
- `cancel` cancels any ongoing staging when the marked PR becomes
  ready again, previously this was also implied (in a more restricted
  form) by setting `p=0`

FWBot removal
=============

While the "forwardport bot" still exists as an API level (to segregate
access rights between tokens) it has been removed as an interaction
point, as part of the modules merge plan. As a result,

fwbot stops responding
----------------------

Feedback messages are now always sent by the mergebot, the
forward-porting bot should not send any message or notification
anymore.

commands moved to the merge bot
-------------------------------

- `ignore`/`up to` simply changes bot
- `close` as well
- `skipci` is now a choice / flag of an `fw` command, which denotes
  the forward-port policy,

  - `fw=default` is the old `ci` and resets the policy to default,
    that is wait for the PR to be merged to create forward ports, and
    for the required statuses on each forward port to be received
    before creating the next
  - `fw=skipci` is the old `skipci`, it waits for the merge of the
    base PR but then creates all the forward ports immediately (unless
    it gets a conflict)
  - `fw=skipmerge` immediately creates all the forward ports, without
    even waiting for the PR to be merged

    This is a completely new mode, and may be rather broken as until
    now the 'bot has always assumed the source PR had been merged.

approval rework
---------------

Because of the previous section, there is no distinguishing feature
between `mergebot r+` = "merge this PR" and `forwardbot r+` = "merge
this PR and all its parent with different access rights".

As a result, the two have been merged under a single `mergebot r+`
with heuristics attempting to provide the best experience:

- if approving a non-forward port, the behavior does not change
- else, with review rights on the source, all ancestors are approved
- else, as author of the original, approves all ancestors which descend
  from a merged PR
- else, approves all ancestors up to and including the oldest ancestor
  to which we have review rights

Most notably, the source's author is not delegated on the source or
any of its descendants anymore. This might need to be revisited if it
provides too restrictive.

For the very specialized need of approving a forward-port *and none of
its ancestors*, `review=` can now take a comma (`,`) separated list of
pull request numbers (github numbers, not mergebot ids).

Computed State
==============

The `state` field of pull requests is now computed. Hopefully this
makes the status more consistent and predictable in the long run, and
importantly makes status management more reliable (because reference
datum get updated naturally flowing to the state).

For now however it makes things more complicated as some of the states
have to be separately signaled or updated:

- `closed` and `error` are now separate flags
- `merge_date` is pulled down from forwardport and becomes the
  transition signal for ready -> merged
- `reviewed_by` becomes the transition signal for approval (might be a
  good idea to rename it...)
- `status` is computed from the head's statuses and overrides, and
  *that* becomes the validation state

Ideally, batch-level flags like `skipchecks` should be on, well, the
batch, and `state` should have a dependency on the batch. However
currently the batch is not a durable / permanent member of the system,
so it's a PR-level flag and a messy pile.

On notable change is that *forcing* the state to `ready` now does that
but also sets the reviewer, `skipchecks`, and overrides to ensure the
API-mediated readying does not get rolled back by e.g. the runbot
sending a status.

This is useful for a few types of automated / programmatic PRs
e.g. translation exports, where we set the state programmatically to
limit noise.

recursive dependency hack
-------------------------

Given a sequence of PRs with an override of the source, if one of the
PRs is updated its descendants should not have the override
anymore. However if the updated PR gets overridden, its descendants
should have *that* override.

This requires some unholy manipulations via an override of `modified`,
as the ORM supports recursive fields but not recursive
dependencies (on a different field).

unconditional followup scheduling
---------------------------------

Previously scheduling forward-port followup was contigent on the FW
policy, but it's not actually correct if the new PR is *immediately*
validated (which can happen now that the field is computed, if there
are no required statuses *or* all of the required statuses are
overridden by an ancestor) as nothing will trigger the state change
and thus scheduling of the fp followup.

The followup function checks all the properties of the batch to port,
so this should not result on incorrect ports. Although it's a bit more
expensive, and will lead to more spam.

Previously this would not happen because on creation of a PR the
validation task (commit -> PR) would still have to execute.

Misc Changes
============

- If a PR is marked as overriding / canceling stagings, it now does
  so on retry not just when setting initially.

  This was not handled at all previously, so a PR in P0 going into
  error due to e.g. a non-deterministic bug would be retried and still
  p=0, but a current staging would not get cancelled. Same when a PR
  in p=0 goes into error because something was failed, then is updated
  with a fix.
- Add tracking to a bunch of relevant PR fields.

  Post-mortem analysis currently generally requires going through the
  text logs to see what happened, which is annoying.

  There is a nondeterminism / inconsistency in the tracking which
  sometimes leads the admin user to trigger tracking before the bot
  does, leading to the staging tracking being attributed to them
  during tests, shove under the carpet by ignoring the user to whom
  that tracking is attributed.

  When multiple users update tracked fields in the same transaction
  all the changes are attributed to the first one having triggered
  tracking (?), I couldn't find why the admin sometimes takes over.
- added and leveraged support for enum-backed selection fields
- moved variuous fields from forwardport to runbot_merge
- fix a migration which had never worked and which never run (because
  I forgot to bump the version on the module)
- remove some unnecessary intermediate de/serialisation

fixes #673, fixes #309, fixes #792, fixes #846 (probably)
2024-05-23 07:58:46 +02:00
..
changelog [CHG] *: rewrite commands set, rework status management 2024-05-23 07:58:46 +02:00
controllers [CHG] *: rewrite commands set, rework status management 2024-05-23 07:58:46 +02:00
data [CHG] *: rewrite commands set, rework status management 2024-05-23 07:58:46 +02:00
migrations [CHG] *: rewrite commands set, rework status management 2024-05-23 07:58:46 +02:00
models [CHG] *: rewrite commands set, rework status management 2024-05-23 07:58:46 +02:00
security [IMP] runbot_merge: split staging heads out to join tables 2023-08-10 14:04:59 +02:00
static [IMP] rewrite /forwardport/outstanding 2023-07-10 15:23:31 +02:00
tests [CHG] *: rewrite commands set, rework status management 2024-05-23 07:58:46 +02:00
views [CHG] *: rewrite commands set, rework status management 2024-05-23 07:58:46 +02:00
__init__.py [IMP] runbot_merge: add sentry filtering, rework some error messages 2023-06-15 08:21:20 +02:00
__manifest__.py [CHG] *: rewrite commands set, rework status management 2024-05-23 07:58:46 +02:00
exceptions.py [FIX] *: prevent merging conflicts commits with loss of authorship 2021-08-24 15:39:47 +02:00
git.py [FIX] runbot_merge: error in maintenance, and tracking 2024-02-23 13:58:31 +01:00
github.py [IMP] runbot_merge: prevent merging empty commits 2023-11-30 12:45:39 +01:00
README.rst [IMP] readme: add the Odoo workflow 2018-09-10 09:50:17 +02:00
sentry.py [FIX] runbot_merge: sentry issue via monkeypatch 2023-08-10 15:27:20 +02:00
utils.py [FIX] runbot_merge: a few issues with updated staging check 2023-02-14 13:45:28 +01:00

Merge Bot
=========

Odoo workflow
-------------

The sticky branches are protected on the github odoo project to restrict
push for the Merge Bot (MB) only.

The MB only works with PR's using the github API.

1. When a PR is created the github notifies the MB. The MB labels the PR
   as 'seen 🙂' on github [#]_.

2. Once the PR github statuses are green [#]_ , the MB labels the PR as
   'CI 🤖'.

3. When a reviewer, known by the MB, approves the PR, the MB labels that
   PR as 'r+ 👌'.

4. At this moment, MB tries to merge the PR and labels the PR with
   'merging 👷'.

5. If the merge is successfull, MB labels it 'merged 🎉', removes the
   label 'merging 👷' and closes the PR. A message from MB gives a link
   to the merge's commit [#]_.

If an error occurs during the step 4, MB labels the PR with 'error 🙅'
and adds a message in the conversion stating what kind of error.  For
example 'Unable to stage PR (merge conflict)'.

If a new commit is pushed in the PR, the process starts again from the
begining.

It's possible to interact with the MB by the way of github messages
containing `Commands`_. The message must start with the MB name (for
instance 'robodoo').

.. [#] Any activity on a PR the MB hasn't seen yet will bring it to the
   MB's attention. e.g a comment on a PR.

.. [#] At this moment the statuses are: Runbot build is green and CLA is
   signed if needed.  The expected statuses may change in the future.

.. [#] If a PR contains only one commit, the PR is rebased and the
   commit is fast forwarded. With more than one commit, the PR is
   rebased and the commits are merged with a merge commit. When one
   wants to avoid the rebase, 'rebase-' command should be used.

Setup
-----

* Setup a project with relevant repositories and branches the bot
  should manage (e.g. odoo/odoo and 10.0).
* Set up reviewers (github_login + boolean flag on partners).
* Add "Issue comments", "Pull request reviews", "Pull requests" and
  "Statuses" webhooks to managed repositories.
* If applicable, add "Statuses" webhook to the *source* repositories.

  Github does not seem to send statuses cross-repository when commits
  get transmigrated so if a user creates a branch in odoo-dev/odoo,
  waits for CI to run then creates a PR targeted to odoo/odoo the PR
  will never get status-checked (unless we modify runbot to re-send
  statuses on pull_request webhook).

Working Principles
------------------

Useful information (new PRs, CI, comments, ...) is pushed to the MB
via webhooks. Most of the staging work is performed via a cron job:

1. for each active staging, check if they are done

   1. if successful

      * ``push --ff`` to target branches
      * close PRs

   2. if only one batch, mark as failed

      for batches of multiple PRs, the MB attempts to infer which
      specific PR failed

   3. otherwise split staging in 2 (bisection search of problematic
      batch)

2. for each branch with no active staging

   * if there are inactive stagings, stage one of them
   * otherwise look for batches targeted to that PR (PRs grouped by
     label with branch as target)
   * attempt staging

     1. reset temp branches (one per repo) to corresponding targets
     2. merge each batch's PR into the relevant temp branch

        * on merge failure, mark PRs as failed

     3. once no more batch or limit reached, reset staging branches to
        tmp
     4. mark staging as active

Commands
--------

A command string is a line starting with the mergebot's name and
followed by various commands. Self-reviewers count as reviewers for
the purpose of their own PRs, but delegate reviewers don't.

retry
  resets a PR in error mode to ready for staging

  can be used by a reviewer or the PR author to re-stage the PR after
  it's been updated or the target has been updated & fixed.

r(review)+
  approves a PR, can be used by a reviewer or delegate reviewer

  submitting an "approve" review implicitly r+'s the PR

r(eview)-
  removes approval from a PR, allows un-reviewing a PR in error (staging
  failed) so it can be updated and re-submitted

.. squash+/squash-
..   marks the PR as squash or merge, can override squash inference or a
..   previous squash command, can only be used by reviewers

delegate+/delegate=<users>
  adds either PR author or the specified (github) users as authorised
  reviewers for this PR. ``<users>`` is a comma-separated list of
  github usernames (no @), can be used by reviewers

p(riority)=2|1|0
  sets the priority to normal (2), pressing (1) or urgent (0),
  lower-priority PRs are selected first and batched together, can be
  used by reviewers

rebase-
  the default merge mode is to rebase and merge the PR into the
  target, however for some situations this is not suitable and
  a regular merge is necessary; this command toggles rebasing
  mode off (and thus back to a regular merge)

Structure
---------

A *project* is used to manage multiple *repositories* across many
*branches*.

Each *PR* targets a specific branch in a specific repository.

A *batch* is a number of co-dependent PRs, PRs which are assumed to
depend on one another (the exact relationship is irrelevant) and thus
always need to be batched together. Batches are normally created on
the fly during staging.

A *staging* is a number of batches (up to 8 by default) which will be
tested together, and split if CI fails. Each staging applies to a
single *branch* the target) across all managed repositories. Stagings
can be active (currently live on the various staging branches) or
inactive (to be staged later, generally as a result of splitting a
failed staging).

Notes
-----

* When looking for stageable batches, priority is taken in account and
  isolating e.g. if there's a single high-priority PR, low-priority
  PRs are ignored completely and only that will be staged on its own
* Reviewers are set up on partners so we can e.g. have author-tracking
  & delegate reviewers without needing to create proper users for
  every contributor.
* MB collates statuses on commits independently from other objects, so
  a commit getting CI'd in odoo-dev/odoo then made into a PR on
  odoo/odoo should be correctly interpreted assuming odoo-dev/odoo
  sent its statuses to the MB.
* Github does not support transactional sequences of API calls, so
  it's possible that "intermediate" staging states are visible & have
  to be rollbacked e.g. a staging succeeds in a 2-repo scenario,
  A.{target} is ff-d to A.{staging}, then B.{target}'s ff to
  B.{staging} fails, we have to rollback A.{target}.
* Co-dependence is currently inferred through *labels*, which is a
  pair of ``{repo}:{branchname}`` e.g. odoo-dev:11.0-pr-flanker-jke.
  If this label is present in a PR to A and a PR to B, these two
  PRs will be collected into a single batch to ensure they always
  get batched (and failed) together.

Previous Work
-------------

bors-ng
~~~~~~~

* r+: accept (only for trusted reviewers)
* r-: unaccept
* r=users...: accept on behalf of users
* delegate+: allows author to self-review
* delegate=users...: allow non-reviewers users to review
* try: stage build (to separate branch) but don't merge on succes

Why not bors-ng
###############

* no concurrent staging (can only stage one target at a time)
* can't do co-dependent repositories/multi-repo staging
* cancels/forgets r+'d branches on FF failure (emergency pushes)
  instead of re-staging

homu
~~~~

Additionally to bors-ng's:

* SHA option on r+/r=, guards
* p=NUMBER: set priority (unclear if best = low/high)
* rollup/rollup-: should be default
* retry: re-attempt PR (flaky?)
* delegate-: remove delegate+/delegate=
* force: ???
* clean: ???