Commit Graph

25 Commits

Author SHA1 Message Date
Xavier Morel
1099fc5119 [IMP] runbot_merge: check squash on every PR event
All `pull_request` events seem to provide the `commits` count
property. As such we can use them all to check the `squash` state even
if we don't otherwise care for the event.

Also split out notifying an approved pull request about its missing
merge method into a separate cron from the one notifying a PR that its
siblings are ready.

Fixes #1036
2025-01-30 11:53:44 +01:00
Xavier Morel
62fbda52a8 [IMP] runbot_merge: a PR can't be reopened if its batch is merged
In that case, ignore the reopen, close the PR, and tell the idiot to
fuck off.

Also the case where a PR is reopened while its batch is staged was
already handled, but probably not tested: it was implicit in
forcefully updating the HEAD of the PR, which triggers an unstage
since c8a06601a7.

Now that scenario is tested, which should lower the odds of breaking
it in the future.

Fixes #965
2024-12-09 16:02:28 +01:00
Xavier Morel
fbfb96be53 [IMP] runbot_merge: ping commenter when fetching PR due to comment
If a comment causes an unknown PR to be fetched, it's a bit odd to
ping the author (and possibly reviewer) anyway as they're not super
concerned (and technically we could be ignoring the purported /
attempted reviewer).

So if a fetch job was created because of a comment, remember the
comment author and ping *them* instead of using the default ping
policy.

Fixes #981
2024-11-18 13:52:27 +01:00
Xavier Morel
632763d390 [CHG] runbot_merge: move labels cron to triggered
Missed it during the previous pass, probably because it's in the
middle of `pull_requests.py`. It's a classic template for triggered
crons since the model is just a queue of actions for the cron.
2024-10-22 10:50:09 +02:00
Xavier Morel
3ee3e9cc81 [IMP] *: trigger-ify staging cron
The staging cron turns out to be pretty reasonable to trigger, as we
already have a handler on the transition of a batch to `not blocked`,
which is exactly when we want to create a staging (that and the
completion of the previous staging).

The batch transition is in a compute which is not awesome, but on the
flip side we also cancel active stagings in that exact scenario (if it
applies), so that matches.

The only real finesse is that one of the tests wants to observe the
instant between the end of a staging (and creation of splits) and the
start of the next one, which because the staging cron is triggered by
the failure of the previous staging is now "atomic", requiring
disabling the staging cron, which means the trigger is skipped
entirely. So this requires triggering the staging cron by hand.
2024-08-02 15:14:50 +02:00
Xavier Morel
f367a64481 [IMP] *: trigger-ify merge cron
The merge cron is the one in charge of checking staging state and
either integrating the staging into the reference branch (if
successful) or cancelling the staging (if failed).

The most obvious trigger for the merge cron is a change in staging
state from the states computation (transition from pending to either
success or failure). Explicitly cancelling / failing a staging marks
it as inactive so the merge cron isn't actually needed.

However an other major trigger is *timeout*, which doesn't have a
trivial signal. Instead, it needs to be hooked on the `timeout_limit`,
and has to be re-triggered at every update to the `timeout_limit`,
which in normal operations is mostly from "pending" statuses bumping
the timeout limit. In that case, `_trigger` to the `timeout_limit` as
that's where / when we expect a status change.

Worst case scenario with this is we have parasitic wakeups of this
cron, but having half a dozen wakeups unnecessary wakeups in an hour
is still probably better than having a wakeup every minute.
2024-08-02 15:14:50 +02:00
Xavier Morel
029957dbeb [IMP] *: trigger-ify task queue type crons
These are pretty simple to convert as they are straightforward: an
item is added  to a work queue (table), then a cron regularly scans
through the table executing the items and deleting them.

That means the cron trigger can just be added on `create` and things
should work out fine.

There's just two wrinkles in the port_forward cron:

- It can be requeued in the future, so needs a conditional trigger-ing
  in `write`.
- It is disabled during freeze (maybe something to change), as a
  result triggers don't enqueue at all, so we need to immediately
  trigger after freeze to force the cron re-enabling it.
2024-08-02 15:14:50 +02:00
Xavier Morel
57a82235d9 [IMP] *: rely only on triggers to run statuses propagation
The cron had been converted to using mostly triggers in
7cd9afe7f2 but I forgot to update
the *tests* to avoid explicitly triggering it.
2024-08-02 12:12:20 +02:00
Xavier Morel
dd17730f4c [IMP] *: crons tests running for better triggered compatibility
Mergebot / forwardport crons need to run in a specific ordering in
order to flow into one another correctly. The default ordering being
unspecified, it was not possible to use the normal cron
runner (instead of the external driver running crons in sequence one
at a time). This can be fixed by setting *sequences* on crons, as the
cron runner (`_process_jobs`) will use that order to acquire and run
crons.

Also override `_process_jobs` however: the built-in cron runner
fetches a static list of ready crons, then runs that.

This is fine for normal situation where the cron runner runs in a loop
anyway but it's any issue for the tests, as we expect that cron A can
trigger cron B, and we want cron B to run *right now* even if it
hadn't been triggered before cron A ran.

We can replace `_process_job` with a cut down version which does
that (cut down because we don't need most of the error handling /
resilience, there's no concurrent workers, there's no module being
installed, versions must match, ...). This allows e.g. the cron
propagating commit statuses to trigger the staging cron, and both will
run within the same `run_crons` session.

Something I didn't touch is that `_process_jobs` internally creates
completely new environments so there is no way to pass context into
the cron jobs anymore (whereas it works for `method_direct_trigger`),
this means the context values have to be shunted elsewhere for that
purpose which is gross. But even though I'm replacing `_process_jobs`,
this seems a bit too much of a change in cron execution semantics. So
left it out.

While at it tho, silence the spammy `py.warnings` stuff I can't do
much about.
2024-08-02 09:00:34 +02:00
Xavier Morel
7cd9afe7f2 [IMP] runbot_merge: trigger commits cron
The commit cron needs to be triggered any time we:

- create a new commit
- update a commit to set its `to_check`

So do that in create and write as well as the SQL query in the
webhook handler.

This should mean we don't need the periodic cron anymore, but for
safety's sake run it on 30mn for now.

TBF even if we miss triggers, the next `status` webhook hitting will
check all the relevant commits anyway...
2024-06-21 11:02:50 +02:00
Xavier Morel
98aaa9107f [CHG] forwardport: notify the outstanding forwardports rather than source
I have been convinced that this might be an improvement to the affairs
of the people: originally the message was sent to the source PR so we
wouldn't have to ping the author & reviewer and to limit the amount of
spam, *however*:

- we ended up adding pings anyway
- it also pings the followers of the source PR
- it increases the size of the original discussion (especially if was
- originally long)
- it adds steps to fixing the issue as you need to bounce from the
  source to the forward ports

Note that this might still notify a lot of people as they might be
made followers of the forward ports automatically, and it increases
the messaging load of the forwardbot significantly. But we'll see how
things go. Worst case scenario, we can revert it back.

Fixes #836
2024-06-04 08:56:51 +02:00
Xavier Morel
232aa271b0 [ADD] runbot_merge: PR dashboard V2
Displays the entire batch set as a table, along both
repository (linked PRs) and branch (forward ports). Should provide a
much more complete overview.

Adds a copy of the dashboard as a raster render, to link from the PR:
as usual SVG is shit, content-based viewboxes are hell and having to
duplicate the entire CSS because `<img/>`-linked CSS can't run is
gross. And there's no payoff since the image is not interactible
anyway.

Performing manual ad-hoc table rendering via pillow is not
significantly worse, it works fine and it's possible to do *really*
good conditional request handling (hopefully) because I've basically
got all the information I need right here.

In fact it might make sense to upgrade the regular HTML page with
similar conditional request handling, at least for the last-update
bit if not the etag.

Fixes #771,fixes #770
2024-05-29 07:55:07 +02:00
Xavier Morel
a4a067e7e9 [CHG] *: move forward-porting over to batches
Thank god I have a bunch of tests because once again I forgot / missed
a bunch of edge cases in doing the conversion, which the tests
caught (sadly that means I almost certainly broke a few untested edge
cases).

Important notes:

Handling of parent links
------------------------

Unlike PRs, batches don't lose their parent info ever, the link is
permanent, which is convenient to trawl through a forward port
(currently implemented very inefficiently, maybe we'll optimise that
in the future).

However this means the batch having a parent and the batch's PRs
having parents are slightly different informations, one of the edge
cases I missed is that of conflicting PRs, which are deparented and
have to be merged by hand before being forward ported further, I had
originally replaced the checks on a pr and its sibling having parents
by just the batch.

Batches & targets
-----------------

Batches were originally concepted as being fixed to a target and PRs
having that target, a PR being retargeted would move it from one batch
to an other.

As it turns out this does not work in the case where people retarget
forward-port PRs, which I know they do because #551
(2337bd8518). I could not think of a
good way to handle this issue as is, so scrapped the moving PRs thing,
instead one of the coherence checks of a batch being ready is that all
its PRs have the same target, and a batch only has a target if all its
PRs have the same target.

It's possible for somewhat odd effects to arise, notably if a PR is
closed (removed from batch), the other PRs are retargeted, and the new
PR is reopened, it will now be on a separate batch even if it also
gets retargeted. This is weird. I don't quite know how I should handle
it, maybe batches could merge if they have the same target and label?
however batches don't currently have a label so...

Improve limits
--------------

Keep limits on the PRs rather than lift them on the batchL if we can
add/remove PRs of batches having different limits on different PRs of
the same batch is reasonable.

Also leave limit unset by default: previously, the limit was eagerly
set to the tip (accessible) branch. That doesn't really seem
necessary, so stop doing that.

Also remove completely unnecessary `max` when trying to find a PR's
next target: `root` is either `self` or `self.source_id`, so it should
not be possible for that to have a later target.

And for now ensure the limits are consistent per batch: a PR defaults
to the limit of their batch-mate if they don't have one, and if a
limit is set via command it's set on all PRs of a batch.

This commit does not allow differential limits via commands, they are
allowed via the backend but not really tested. The issue is mostly
that it's not clear what the UX should look like to have clear and not
super error prone interactions. So punt on it for now, and hopefully
there's no hole I missed which will create inconsistent batches.
2024-05-24 09:08:56 +02:00
Xavier Morel
d4fa1fd353 [CHG] *: rewrite commands set, rework status management
This commit revisits the commands set in order to make it more
regular, and limit inconsistent command-sets, although it includes
pseudo-command aliases for common tasks now removed from the core set.

Hard Errors
===========

The previous iteration of the commands set would ignore any
non-command term in a command line. This has been changed to hard
error (and ignoring the entire thing) if any command is unknown or
invalid.

This fixes inconsistent / unexpected interpretations where a user
sends a command, then writes a novel on the same line some words of
which happen to *also* be commands, leading to merge states they did
not expect. They should now be told to fuck off.

Priority Restructuring
----------------------

The numerical priority system was pretty messy in that it confused
"staging priority" (in ways which were not entirely straightforward)
with overrides to other concerns.

This has now being split along all the axis, with separate command
subsets for:

- staging prioritisation, now separated between `default`, `priority`,
  and `alone`,

  - `default` means PRs are picked by an unspecified order when
    creating a staging, if nothing better is available
  - `priority` means PRs are picked first when staging, however if
    `priority` PRs don't fill the staging the rest will be filled with
    `default`, this mode did not previously exist
  - `alone` means the PRs are picked first, before splits, and only
    `alone` PRs can be part of the staging (which usually matches the
    modename)
- `skipchecks` overrides both statuses and approval checks, for the
  batch, something previously implied in `p=0`, but now
  independent. Setting `skipchecks` basically makes the entire batch
  `ready`.

  For consistency this also sets the reviewer implicitly: since
  skipchecks overrides both statuses *and approval*, whoever enables
  this mode is essentially the reviewer.
- `cancel` cancels any ongoing staging when the marked PR becomes
  ready again, previously this was also implied (in a more restricted
  form) by setting `p=0`

FWBot removal
=============

While the "forwardport bot" still exists as an API level (to segregate
access rights between tokens) it has been removed as an interaction
point, as part of the modules merge plan. As a result,

fwbot stops responding
----------------------

Feedback messages are now always sent by the mergebot, the
forward-porting bot should not send any message or notification
anymore.

commands moved to the merge bot
-------------------------------

- `ignore`/`up to` simply changes bot
- `close` as well
- `skipci` is now a choice / flag of an `fw` command, which denotes
  the forward-port policy,

  - `fw=default` is the old `ci` and resets the policy to default,
    that is wait for the PR to be merged to create forward ports, and
    for the required statuses on each forward port to be received
    before creating the next
  - `fw=skipci` is the old `skipci`, it waits for the merge of the
    base PR but then creates all the forward ports immediately (unless
    it gets a conflict)
  - `fw=skipmerge` immediately creates all the forward ports, without
    even waiting for the PR to be merged

    This is a completely new mode, and may be rather broken as until
    now the 'bot has always assumed the source PR had been merged.

approval rework
---------------

Because of the previous section, there is no distinguishing feature
between `mergebot r+` = "merge this PR" and `forwardbot r+` = "merge
this PR and all its parent with different access rights".

As a result, the two have been merged under a single `mergebot r+`
with heuristics attempting to provide the best experience:

- if approving a non-forward port, the behavior does not change
- else, with review rights on the source, all ancestors are approved
- else, as author of the original, approves all ancestors which descend
  from a merged PR
- else, approves all ancestors up to and including the oldest ancestor
  to which we have review rights

Most notably, the source's author is not delegated on the source or
any of its descendants anymore. This might need to be revisited if it
provides too restrictive.

For the very specialized need of approving a forward-port *and none of
its ancestors*, `review=` can now take a comma (`,`) separated list of
pull request numbers (github numbers, not mergebot ids).

Computed State
==============

The `state` field of pull requests is now computed. Hopefully this
makes the status more consistent and predictable in the long run, and
importantly makes status management more reliable (because reference
datum get updated naturally flowing to the state).

For now however it makes things more complicated as some of the states
have to be separately signaled or updated:

- `closed` and `error` are now separate flags
- `merge_date` is pulled down from forwardport and becomes the
  transition signal for ready -> merged
- `reviewed_by` becomes the transition signal for approval (might be a
  good idea to rename it...)
- `status` is computed from the head's statuses and overrides, and
  *that* becomes the validation state

Ideally, batch-level flags like `skipchecks` should be on, well, the
batch, and `state` should have a dependency on the batch. However
currently the batch is not a durable / permanent member of the system,
so it's a PR-level flag and a messy pile.

On notable change is that *forcing* the state to `ready` now does that
but also sets the reviewer, `skipchecks`, and overrides to ensure the
API-mediated readying does not get rolled back by e.g. the runbot
sending a status.

This is useful for a few types of automated / programmatic PRs
e.g. translation exports, where we set the state programmatically to
limit noise.

recursive dependency hack
-------------------------

Given a sequence of PRs with an override of the source, if one of the
PRs is updated its descendants should not have the override
anymore. However if the updated PR gets overridden, its descendants
should have *that* override.

This requires some unholy manipulations via an override of `modified`,
as the ORM supports recursive fields but not recursive
dependencies (on a different field).

unconditional followup scheduling
---------------------------------

Previously scheduling forward-port followup was contigent on the FW
policy, but it's not actually correct if the new PR is *immediately*
validated (which can happen now that the field is computed, if there
are no required statuses *or* all of the required statuses are
overridden by an ancestor) as nothing will trigger the state change
and thus scheduling of the fp followup.

The followup function checks all the properties of the batch to port,
so this should not result on incorrect ports. Although it's a bit more
expensive, and will lead to more spam.

Previously this would not happen because on creation of a PR the
validation task (commit -> PR) would still have to execute.

Misc Changes
============

- If a PR is marked as overriding / canceling stagings, it now does
  so on retry not just when setting initially.

  This was not handled at all previously, so a PR in P0 going into
  error due to e.g. a non-deterministic bug would be retried and still
  p=0, but a current staging would not get cancelled. Same when a PR
  in p=0 goes into error because something was failed, then is updated
  with a fix.
- Add tracking to a bunch of relevant PR fields.

  Post-mortem analysis currently generally requires going through the
  text logs to see what happened, which is annoying.

  There is a nondeterminism / inconsistency in the tracking which
  sometimes leads the admin user to trigger tracking before the bot
  does, leading to the staging tracking being attributed to them
  during tests, shove under the carpet by ignoring the user to whom
  that tracking is attributed.

  When multiple users update tracked fields in the same transaction
  all the changes are attributed to the first one having triggered
  tracking (?), I couldn't find why the admin sometimes takes over.
- added and leveraged support for enum-backed selection fields
- moved variuous fields from forwardport to runbot_merge
- fix a migration which had never worked and which never run (because
  I forgot to bump the version on the module)
- remove some unnecessary intermediate de/serialisation

fixes #673, fixes #309, fixes #792, fixes #846 (probably)
2024-05-23 07:58:46 +02:00
Xavier Morel
65ed7c51bc [IMP] *: note to merge using mergebot in conflict message
The message has a lot of info, but left the merging bit
unwritten. Correct this issue.

Fixes #765
2023-08-30 12:10:46 +02:00
Xavier Morel
302fd42cae [ADD] forwardport: message on parent of detached PR
Currently a user is not notified that the parent of a detached PR
needs to be independently approved and may miss that information. Add
a notification to *that* PR as well.

Fixes #788
2023-08-29 15:59:05 +02:00
Xavier Morel
270dfdd495 [REF] *: move most feedback messages to pseudo-templates
Current system makes it hard to iterate feedback messages and make
them clearer, this should improve things a touch.

Use a bespoke model to avoid concerns with qweb rendering
complexity (we just want GFM output and should not need logic).

Also update fwbot test setup to always configure an fwbot name, in
order to avoid ping messages closing the PRs they're talking
about, that took a while to debug, and given the old message I assume
I'd already hit it and just been too lazy to fix. This requires
updating a bunch of tests as fwbot ping are sent *to*
`fp_github_name`, but sent *from* the reference user (because that's
the key we set).

Note: noupdate on CSV files doesn't seem to work anymore, which isn't
great. But instead set tracking on the template's templates, it's not
quite as good but should be sufficient.

Fixes #769
2023-06-14 16:01:45 +02:00
Xavier Morel
f13c60a018 [IMP] runbot_merge: small reorg of main models file
To prep for the addition of the freeze wizard:

* move projects out of `pull_requests.py`
* then realize half the methods there have no relation to projects and
  move them to more relevant places in `pull_requests.py`
* update corresponding crons (and tests using those crons) as the
  methods have changed model, and the cron definitions thus need to be
  updated
* split update to labels out of sending feedback comments while at it:
  labels are not used much during tests so their manipulation can be
  avoided; and labels are not as urgent as feedback so the crons can
  be quite a bit slower
* move the project view out of `mergebot.xml` as well
2021-11-10 13:13:34 +01:00
Xavier Morel
9d661480fc [IMP] runbot_merge: split staging cron in two
The staging cron was already essentially split between "check if one
of the stagings is successful (and merge it)" and "check if we should
create a staging" as these were two separate loops in the cron.

But it might be useful to disable these two operations separately
e.g. we might want to stop the creation of new staging but let the
existing stagings complete.

The actual splitting is easy but it turns out a bunch of tests were
"optimised" to only run the merge cron. Most of them didn't blow up
but it seems more prudent to fix them all.

fixes odoo/runbot#310
2020-02-11 08:07:57 +01:00
Xavier Morel
5aa9f5a567 [IMP] runbot_merge: extract commit validation to cron
Before this, impacting a commit's statuses on the relevant PR or
staging would be performed immediatly / inline with its
consumption. This, however, is problematic if we want to implement
additional processing like #87 (and possibly though probably not #52):
webhook handlers should be kept short and fast, feeding back into
github would not be acceptable.

- flag commits as needing processing instead of processing them
  immediately, this uses a partial index as it looks like the
  recommended / proper way to index a boolean column in which one of
  the values is searched much more than the other (todo: eventually
  check if that actually does anythnig)
- add a new cron for commits processing
- alter tests so they use this new cron (mostly by migrating them to
  `run_crons` though not solely as some still need more detailed
  management to properly check intermediate steps)

Fix an issue with closing a staged PR while at it (the "merging" tag
would potentially never be removed).
2019-03-05 08:07:19 +01:00
Xavier Morel
fe8a4588d3 [FIX] runbot_merge: try to fix race condition again
Original issue (staging would get cancelled just as it was being
merged) was not really fixed but traded for a new one: serialization
errors which can lock up the mergebot for a long time, stopping
handling of all incoming signals (possibly/probably because all of
them try to write on the PR which is locked?).

Splitting the tagging cron out should already way improve things as
the status update cron should be way shorter (and thus hold its locks
for a smaller amount of time). This should also avoid the "close"
handler waiting on the extant transaction, and make the "pr update"
transaction be much shorter as each staging gets its own tnx.
2018-10-16 15:10:43 +02:00
Xavier Morel
0b629a32bc [ADD] runbot_merge: warning that a ready PR is linked to a non-ready one
People get surprised/worried that their ready PR never gets picked up,
but it's because there is a non-ready (either unreviewed or failing CI)
pull request linked to it. This aims to at least warn them of the issue.
2018-10-15 16:19:29 +02:00
Xavier Morel
5ec2c12454 [IMP] runbot_merge: split tagging cron out of main progress cron
They're completely independent (or should be), and there's no reason
for the tagging cron to extend the "lifetime" of the main cron's
transaction (and thus extend the odds of racey behaviour).
2018-10-12 15:54:14 +02:00
Xavier Morel
cab683ae0f [IMP] runbot_merge: remove "sync PRs", fetch unknown PRs
The old "sync pr" thing is turning out to be a bust, while it
originally worked fine these days it's a catastrophe as the v4 API
performances seem to have significantly degraded, to the point that
fetching all 15k PRs by pages of 100 simply blows up after a few
hundreds/thousands.

Instead, add a table of PRs to sync: if we get notified of a
"compatible" PR (enabled repo & target) which we don't know of, create
an entry in a "fetch jobs" table, then a cron will handle fetching the
PR then fetching/applying all relevant metadata (statuses,
review-comments and reviews).

Also change indexation of Commit(sha) and PR(head) to hash, as btree
indexes are not really sensible for such content (the ordering is
unhelpful and the index locality is awful by design/definition).
2018-09-03 13:16:36 +02:00
Xavier Morel
e52d08ecdf [ADD] runbot_merge: a merge bot
No actual dependency on runbot, should be usable completely
independently.
2018-09-03 13:16:36 +02:00