Commit Graph

8 Commits

Author SHA1 Message Date
Xavier Morel
3ee3e9cc81 [IMP] *: trigger-ify staging cron
The staging cron turns out to be pretty reasonable to trigger, as we
already have a handler on the transition of a batch to `not blocked`,
which is exactly when we want to create a staging (that and the
completion of the previous staging).

The batch transition is in a compute which is not awesome, but on the
flip side we also cancel active stagings in that exact scenario (if it
applies), so that matches.

The only real finesse is that one of the tests wants to observe the
instant between the end of a staging (and creation of splits) and the
start of the next one, which because the staging cron is triggered by
the failure of the previous staging is now "atomic", requiring
disabling the staging cron, which means the trigger is skipped
entirely. So this requires triggering the staging cron by hand.
2024-08-02 15:14:50 +02:00
Xavier Morel
f367a64481 [IMP] *: trigger-ify merge cron
The merge cron is the one in charge of checking staging state and
either integrating the staging into the reference branch (if
successful) or cancelling the staging (if failed).

The most obvious trigger for the merge cron is a change in staging
state from the states computation (transition from pending to either
success or failure). Explicitly cancelling / failing a staging marks
it as inactive so the merge cron isn't actually needed.

However an other major trigger is *timeout*, which doesn't have a
trivial signal. Instead, it needs to be hooked on the `timeout_limit`,
and has to be re-triggered at every update to the `timeout_limit`,
which in normal operations is mostly from "pending" statuses bumping
the timeout limit. In that case, `_trigger` to the `timeout_limit` as
that's where / when we expect a status change.

Worst case scenario with this is we have parasitic wakeups of this
cron, but having half a dozen wakeups unnecessary wakeups in an hour
is still probably better than having a wakeup every minute.
2024-08-02 15:14:50 +02:00
Xavier Morel
57a82235d9 [IMP] *: rely only on triggers to run statuses propagation
The cron had been converted to using mostly triggers in
7cd9afe7f2 but I forgot to update
the *tests* to avoid explicitly triggering it.
2024-08-02 12:12:20 +02:00
Xavier Morel
dd17730f4c [IMP] *: crons tests running for better triggered compatibility
Mergebot / forwardport crons need to run in a specific ordering in
order to flow into one another correctly. The default ordering being
unspecified, it was not possible to use the normal cron
runner (instead of the external driver running crons in sequence one
at a time). This can be fixed by setting *sequences* on crons, as the
cron runner (`_process_jobs`) will use that order to acquire and run
crons.

Also override `_process_jobs` however: the built-in cron runner
fetches a static list of ready crons, then runs that.

This is fine for normal situation where the cron runner runs in a loop
anyway but it's any issue for the tests, as we expect that cron A can
trigger cron B, and we want cron B to run *right now* even if it
hadn't been triggered before cron A ran.

We can replace `_process_job` with a cut down version which does
that (cut down because we don't need most of the error handling /
resilience, there's no concurrent workers, there's no module being
installed, versions must match, ...). This allows e.g. the cron
propagating commit statuses to trigger the staging cron, and both will
run within the same `run_crons` session.

Something I didn't touch is that `_process_jobs` internally creates
completely new environments so there is no way to pass context into
the cron jobs anymore (whereas it works for `method_direct_trigger`),
this means the context values have to be shunted elsewhere for that
purpose which is gross. But even though I'm replacing `_process_jobs`,
this seems a bit too much of a change in cron execution semantics. So
left it out.

While at it tho, silence the spammy `py.warnings` stuff I can't do
much about.
2024-08-02 09:00:34 +02:00
Xavier Morel
fec3d39d19 [ADD] *: per-repository webhook secret
Currently webhook secrets are configured per *project* which is an
issue both because different repositories may have different
administrators and thus creates safety concerns, and because multiple
repositories can feed into different projects (e.g. on mergebot,
odoo-dev/odoo is both an ancillary repository to the main RD project,
and the main repository to the minor / legacy master-wowl
project). This means it can be necessary to have multiple projects
share the same secret as well, this then mandates the secret for more
repositories per (1).

This is a pain in the ass, so just detach secrets from projects and
link them *only* to repositories, it's cleaner and easier to manage
and set up progressively.

This requires a lot of changes to the tests, as they all need to
correctly configure the signaling.

For `runbot_merge` there was *some* setup sharing already via the
module-level `repo` fixtures`, those were merged into a conftest-level
fixture which could handle the signaling setup. A few tests which
unnecessarily set up repositories ad-hoc were also moved to the
fixture. But for most of the ad-hoc setup in `runbot_merge`, as well
as `forwardport` where it's all ad-hoc, events sources setup was just
appended as is. This should probably be cleaned up at one point, with
the various requirements collected and organised into a small set of
fixtures doing the job more uniformly.

Fixes #887
2024-06-06 11:07:57 +02:00
Xavier Morel
c140701975 [ADD] runbot_merge: support staging ready PRs over splits
Not sure it's going to be useful but it's hard to know if we can't
test it. The intent is mostly the ability to prioritize throughput (or
attempt to) during high-load events, if we can favour staging N
new batches over a split's N/2 we might be able to merge more crap.

But maybe not, we'll see, either way now it's here and seems to more
or less work.

Fixes #798
2024-05-23 07:58:58 +02:00
Xavier Morel
9f54e6f209 [ADD] runbot_merge: option to disable staging without cron
Because the mergebot crons are on such a tight scheduling, and just
them finding out they have nothing to do can take a while, disabling
them can be a chore. Disabling staging via the project is much less
likely to cause issues as the projects don't normally (or ever?) get
exclusively locked, so they can generally be written to at any moment.

Furthermore, if we ever get in a situation where we have multiple
active projects (not really the case currently, we have multiple
projects but only one is really active) it's less disruptive to
disable stagings on a single specific project.

Fixes #860
2024-05-23 07:58:58 +02:00
Xavier Morel
d4fa1fd353 [CHG] *: rewrite commands set, rework status management
This commit revisits the commands set in order to make it more
regular, and limit inconsistent command-sets, although it includes
pseudo-command aliases for common tasks now removed from the core set.

Hard Errors
===========

The previous iteration of the commands set would ignore any
non-command term in a command line. This has been changed to hard
error (and ignoring the entire thing) if any command is unknown or
invalid.

This fixes inconsistent / unexpected interpretations where a user
sends a command, then writes a novel on the same line some words of
which happen to *also* be commands, leading to merge states they did
not expect. They should now be told to fuck off.

Priority Restructuring
----------------------

The numerical priority system was pretty messy in that it confused
"staging priority" (in ways which were not entirely straightforward)
with overrides to other concerns.

This has now being split along all the axis, with separate command
subsets for:

- staging prioritisation, now separated between `default`, `priority`,
  and `alone`,

  - `default` means PRs are picked by an unspecified order when
    creating a staging, if nothing better is available
  - `priority` means PRs are picked first when staging, however if
    `priority` PRs don't fill the staging the rest will be filled with
    `default`, this mode did not previously exist
  - `alone` means the PRs are picked first, before splits, and only
    `alone` PRs can be part of the staging (which usually matches the
    modename)
- `skipchecks` overrides both statuses and approval checks, for the
  batch, something previously implied in `p=0`, but now
  independent. Setting `skipchecks` basically makes the entire batch
  `ready`.

  For consistency this also sets the reviewer implicitly: since
  skipchecks overrides both statuses *and approval*, whoever enables
  this mode is essentially the reviewer.
- `cancel` cancels any ongoing staging when the marked PR becomes
  ready again, previously this was also implied (in a more restricted
  form) by setting `p=0`

FWBot removal
=============

While the "forwardport bot" still exists as an API level (to segregate
access rights between tokens) it has been removed as an interaction
point, as part of the modules merge plan. As a result,

fwbot stops responding
----------------------

Feedback messages are now always sent by the mergebot, the
forward-porting bot should not send any message or notification
anymore.

commands moved to the merge bot
-------------------------------

- `ignore`/`up to` simply changes bot
- `close` as well
- `skipci` is now a choice / flag of an `fw` command, which denotes
  the forward-port policy,

  - `fw=default` is the old `ci` and resets the policy to default,
    that is wait for the PR to be merged to create forward ports, and
    for the required statuses on each forward port to be received
    before creating the next
  - `fw=skipci` is the old `skipci`, it waits for the merge of the
    base PR but then creates all the forward ports immediately (unless
    it gets a conflict)
  - `fw=skipmerge` immediately creates all the forward ports, without
    even waiting for the PR to be merged

    This is a completely new mode, and may be rather broken as until
    now the 'bot has always assumed the source PR had been merged.

approval rework
---------------

Because of the previous section, there is no distinguishing feature
between `mergebot r+` = "merge this PR" and `forwardbot r+` = "merge
this PR and all its parent with different access rights".

As a result, the two have been merged under a single `mergebot r+`
with heuristics attempting to provide the best experience:

- if approving a non-forward port, the behavior does not change
- else, with review rights on the source, all ancestors are approved
- else, as author of the original, approves all ancestors which descend
  from a merged PR
- else, approves all ancestors up to and including the oldest ancestor
  to which we have review rights

Most notably, the source's author is not delegated on the source or
any of its descendants anymore. This might need to be revisited if it
provides too restrictive.

For the very specialized need of approving a forward-port *and none of
its ancestors*, `review=` can now take a comma (`,`) separated list of
pull request numbers (github numbers, not mergebot ids).

Computed State
==============

The `state` field of pull requests is now computed. Hopefully this
makes the status more consistent and predictable in the long run, and
importantly makes status management more reliable (because reference
datum get updated naturally flowing to the state).

For now however it makes things more complicated as some of the states
have to be separately signaled or updated:

- `closed` and `error` are now separate flags
- `merge_date` is pulled down from forwardport and becomes the
  transition signal for ready -> merged
- `reviewed_by` becomes the transition signal for approval (might be a
  good idea to rename it...)
- `status` is computed from the head's statuses and overrides, and
  *that* becomes the validation state

Ideally, batch-level flags like `skipchecks` should be on, well, the
batch, and `state` should have a dependency on the batch. However
currently the batch is not a durable / permanent member of the system,
so it's a PR-level flag and a messy pile.

On notable change is that *forcing* the state to `ready` now does that
but also sets the reviewer, `skipchecks`, and overrides to ensure the
API-mediated readying does not get rolled back by e.g. the runbot
sending a status.

This is useful for a few types of automated / programmatic PRs
e.g. translation exports, where we set the state programmatically to
limit noise.

recursive dependency hack
-------------------------

Given a sequence of PRs with an override of the source, if one of the
PRs is updated its descendants should not have the override
anymore. However if the updated PR gets overridden, its descendants
should have *that* override.

This requires some unholy manipulations via an override of `modified`,
as the ORM supports recursive fields but not recursive
dependencies (on a different field).

unconditional followup scheduling
---------------------------------

Previously scheduling forward-port followup was contigent on the FW
policy, but it's not actually correct if the new PR is *immediately*
validated (which can happen now that the field is computed, if there
are no required statuses *or* all of the required statuses are
overridden by an ancestor) as nothing will trigger the state change
and thus scheduling of the fp followup.

The followup function checks all the properties of the batch to port,
so this should not result on incorrect ports. Although it's a bit more
expensive, and will lead to more spam.

Previously this would not happen because on creation of a PR the
validation task (commit -> PR) would still have to execute.

Misc Changes
============

- If a PR is marked as overriding / canceling stagings, it now does
  so on retry not just when setting initially.

  This was not handled at all previously, so a PR in P0 going into
  error due to e.g. a non-deterministic bug would be retried and still
  p=0, but a current staging would not get cancelled. Same when a PR
  in p=0 goes into error because something was failed, then is updated
  with a fix.
- Add tracking to a bunch of relevant PR fields.

  Post-mortem analysis currently generally requires going through the
  text logs to see what happened, which is annoying.

  There is a nondeterminism / inconsistency in the tracking which
  sometimes leads the admin user to trigger tracking before the bot
  does, leading to the staging tracking being attributed to them
  during tests, shove under the carpet by ignoring the user to whom
  that tracking is attributed.

  When multiple users update tracked fields in the same transaction
  all the changes are attributed to the first one having triggered
  tracking (?), I couldn't find why the admin sometimes takes over.
- added and leveraged support for enum-backed selection fields
- moved variuous fields from forwardport to runbot_merge
- fix a migration which had never worked and which never run (because
  I forgot to bump the version on the module)
- remove some unnecessary intermediate de/serialisation

fixes #673, fixes #309, fixes #792, fixes #846 (probably)
2024-05-23 07:58:46 +02:00