Commit Graph

1849 Commits

Author SHA1 Message Date
Xavier Morel
ef6a002ea7 [CHG] runbot_merge: move staging readiness to batch
Staging readiness is a batch-level concerns, and many of the markers
are already there though a few need to be aggregated from the PRs. As
such, staging has no reason to be performed in terms of PRs anymore,
it should be performed via batches directly.

There is a bit of a mess in order not to completely fuck up when
retargeting PRs (implicitly via freeze wizard, or explicitely) as for
now we're moving PRs between batches in order to keep the
batches *mostly* target-bound.

Some of the side-effects in managing the coherence of the targeting
and moving PRs between batches is... not great. This might need to be
revisited and cleaned up with those scenarios better considered.
2024-05-23 07:58:58 +02:00
Xavier Morel
9ddf017768 [CHG] *: move fw_policy from PR to batch 2024-05-23 07:58:58 +02:00
Xavier Morel
21b5dd439b [CHG] runbot_merge: move merge_date to batch, remove active
- `merge_date` should be common to an entire batch, so move it there
- remove `Batch.active` which should probably have been removed when
  batches were made persistent (can eventually re-add as a proxy for
  `merge_date` being set maybe, but for now removing it seems a better
  way to catch mistakes)
- update various sites to use `Batch.merge_date` instead of
  `Batch.active`
2024-05-23 07:58:58 +02:00
Xavier Morel
e910b8e857 [IMP] runbot_merge: move cross-pr properties to batch 2024-05-23 07:58:58 +02:00
Xavier Morel
473f89f87d [CHG] *: persistent batches
This probably has latent bugs, and is only the start of the road to v2
(#789): PR batches are now created up-front (alongside the PR), with
PRs attached and detached as needed, hopefully such that things are
not broken (tests pass but...), this required a fair number of
ajustments to code not taking batches into account, or creating
batches on the fly.

`PullRequests.blocked` has also been updated to rely on the batch to
get its batch-mates, such that it can now be a stored field with the
right dependencies.

The next step is to better leverage this change:

- move cross-PR state up to the batch (e.g. skipchecks, priority, ...)
- add fw info to the batch, perform forward-ports batchwise in order
  to avoid redundant batch-selection work, and allow altering batches
  during fw (e.g. adding or removing PRs)
- use batches to select stagings
- maybe expose staging history of a batch?
2024-05-23 07:58:58 +02:00
Xavier Morel
c140701975 [ADD] runbot_merge: support staging ready PRs over splits
Not sure it's going to be useful but it's hard to know if we can't
test it. The intent is mostly the ability to prioritize throughput (or
attempt to) during high-load events, if we can favour staging N
new batches over a split's N/2 we might be able to merge more crap.

But maybe not, we'll see, either way now it's here and seems to more
or less work.

Fixes #798
2024-05-23 07:58:58 +02:00
Xavier Morel
9f54e6f209 [ADD] runbot_merge: option to disable staging without cron
Because the mergebot crons are on such a tight scheduling, and just
them finding out they have nothing to do can take a while, disabling
them can be a chore. Disabling staging via the project is much less
likely to cause issues as the projects don't normally (or ever?) get
exclusively locked, so they can generally be written to at any moment.

Furthermore, if we ever get in a situation where we have multiple
active projects (not really the case currently, we have multiple
projects but only one is really active) it's less disruptive to
disable stagings on a single specific project.

Fixes #860
2024-05-23 07:58:58 +02:00
Xavier Morel
d4fa1fd353 [CHG] *: rewrite commands set, rework status management
This commit revisits the commands set in order to make it more
regular, and limit inconsistent command-sets, although it includes
pseudo-command aliases for common tasks now removed from the core set.

Hard Errors
===========

The previous iteration of the commands set would ignore any
non-command term in a command line. This has been changed to hard
error (and ignoring the entire thing) if any command is unknown or
invalid.

This fixes inconsistent / unexpected interpretations where a user
sends a command, then writes a novel on the same line some words of
which happen to *also* be commands, leading to merge states they did
not expect. They should now be told to fuck off.

Priority Restructuring
----------------------

The numerical priority system was pretty messy in that it confused
"staging priority" (in ways which were not entirely straightforward)
with overrides to other concerns.

This has now being split along all the axis, with separate command
subsets for:

- staging prioritisation, now separated between `default`, `priority`,
  and `alone`,

  - `default` means PRs are picked by an unspecified order when
    creating a staging, if nothing better is available
  - `priority` means PRs are picked first when staging, however if
    `priority` PRs don't fill the staging the rest will be filled with
    `default`, this mode did not previously exist
  - `alone` means the PRs are picked first, before splits, and only
    `alone` PRs can be part of the staging (which usually matches the
    modename)
- `skipchecks` overrides both statuses and approval checks, for the
  batch, something previously implied in `p=0`, but now
  independent. Setting `skipchecks` basically makes the entire batch
  `ready`.

  For consistency this also sets the reviewer implicitly: since
  skipchecks overrides both statuses *and approval*, whoever enables
  this mode is essentially the reviewer.
- `cancel` cancels any ongoing staging when the marked PR becomes
  ready again, previously this was also implied (in a more restricted
  form) by setting `p=0`

FWBot removal
=============

While the "forwardport bot" still exists as an API level (to segregate
access rights between tokens) it has been removed as an interaction
point, as part of the modules merge plan. As a result,

fwbot stops responding
----------------------

Feedback messages are now always sent by the mergebot, the
forward-porting bot should not send any message or notification
anymore.

commands moved to the merge bot
-------------------------------

- `ignore`/`up to` simply changes bot
- `close` as well
- `skipci` is now a choice / flag of an `fw` command, which denotes
  the forward-port policy,

  - `fw=default` is the old `ci` and resets the policy to default,
    that is wait for the PR to be merged to create forward ports, and
    for the required statuses on each forward port to be received
    before creating the next
  - `fw=skipci` is the old `skipci`, it waits for the merge of the
    base PR but then creates all the forward ports immediately (unless
    it gets a conflict)
  - `fw=skipmerge` immediately creates all the forward ports, without
    even waiting for the PR to be merged

    This is a completely new mode, and may be rather broken as until
    now the 'bot has always assumed the source PR had been merged.

approval rework
---------------

Because of the previous section, there is no distinguishing feature
between `mergebot r+` = "merge this PR" and `forwardbot r+` = "merge
this PR and all its parent with different access rights".

As a result, the two have been merged under a single `mergebot r+`
with heuristics attempting to provide the best experience:

- if approving a non-forward port, the behavior does not change
- else, with review rights on the source, all ancestors are approved
- else, as author of the original, approves all ancestors which descend
  from a merged PR
- else, approves all ancestors up to and including the oldest ancestor
  to which we have review rights

Most notably, the source's author is not delegated on the source or
any of its descendants anymore. This might need to be revisited if it
provides too restrictive.

For the very specialized need of approving a forward-port *and none of
its ancestors*, `review=` can now take a comma (`,`) separated list of
pull request numbers (github numbers, not mergebot ids).

Computed State
==============

The `state` field of pull requests is now computed. Hopefully this
makes the status more consistent and predictable in the long run, and
importantly makes status management more reliable (because reference
datum get updated naturally flowing to the state).

For now however it makes things more complicated as some of the states
have to be separately signaled or updated:

- `closed` and `error` are now separate flags
- `merge_date` is pulled down from forwardport and becomes the
  transition signal for ready -> merged
- `reviewed_by` becomes the transition signal for approval (might be a
  good idea to rename it...)
- `status` is computed from the head's statuses and overrides, and
  *that* becomes the validation state

Ideally, batch-level flags like `skipchecks` should be on, well, the
batch, and `state` should have a dependency on the batch. However
currently the batch is not a durable / permanent member of the system,
so it's a PR-level flag and a messy pile.

On notable change is that *forcing* the state to `ready` now does that
but also sets the reviewer, `skipchecks`, and overrides to ensure the
API-mediated readying does not get rolled back by e.g. the runbot
sending a status.

This is useful for a few types of automated / programmatic PRs
e.g. translation exports, where we set the state programmatically to
limit noise.

recursive dependency hack
-------------------------

Given a sequence of PRs with an override of the source, if one of the
PRs is updated its descendants should not have the override
anymore. However if the updated PR gets overridden, its descendants
should have *that* override.

This requires some unholy manipulations via an override of `modified`,
as the ORM supports recursive fields but not recursive
dependencies (on a different field).

unconditional followup scheduling
---------------------------------

Previously scheduling forward-port followup was contigent on the FW
policy, but it's not actually correct if the new PR is *immediately*
validated (which can happen now that the field is computed, if there
are no required statuses *or* all of the required statuses are
overridden by an ancestor) as nothing will trigger the state change
and thus scheduling of the fp followup.

The followup function checks all the properties of the batch to port,
so this should not result on incorrect ports. Although it's a bit more
expensive, and will lead to more spam.

Previously this would not happen because on creation of a PR the
validation task (commit -> PR) would still have to execute.

Misc Changes
============

- If a PR is marked as overriding / canceling stagings, it now does
  so on retry not just when setting initially.

  This was not handled at all previously, so a PR in P0 going into
  error due to e.g. a non-deterministic bug would be retried and still
  p=0, but a current staging would not get cancelled. Same when a PR
  in p=0 goes into error because something was failed, then is updated
  with a fix.
- Add tracking to a bunch of relevant PR fields.

  Post-mortem analysis currently generally requires going through the
  text logs to see what happened, which is annoying.

  There is a nondeterminism / inconsistency in the tracking which
  sometimes leads the admin user to trigger tracking before the bot
  does, leading to the staging tracking being attributed to them
  during tests, shove under the carpet by ignoring the user to whom
  that tracking is attributed.

  When multiple users update tracked fields in the same transaction
  all the changes are attributed to the first one having triggered
  tracking (?), I couldn't find why the admin sometimes takes over.
- added and leveraged support for enum-backed selection fields
- moved variuous fields from forwardport to runbot_merge
- fix a migration which had never worked and which never run (because
  I forgot to bump the version on the module)
- remove some unnecessary intermediate de/serialisation

fixes #673, fixes #309, fixes #792, fixes #846 (probably)
2024-05-23 07:58:46 +02:00
Xavier Morel
75f29f9315 [IMP] conftest: avoid parsing json it if may not be
When asserting a status fails, it's possible that this is a 400 or
500-type error which does not yield JSON data at all (e.g. forgot to
start the proxy or dummy), in which case trying to dump the json data
is actively harmful as it triggers cascading failures. And it's not
like the output message is formatted, so a re-structured assertion
message is not helpful.
2024-05-16 10:37:50 +02:00
Xavier Morel
955a61a1e8 [CHG] runbot_merge, forwardbot: merge commands parser
- move all commands parsing to runbot_merge as part of the long-term
  unification effort (#789)
- set up an actual parser-ish structure to parse the commands to
  something approaching a sum type (fixes #507)
- this is mostly prep for reworking the commands set (#673), although
  *strict command parsing* has been implemented (cf update to
  `test_unknown_commands`)
2024-05-16 10:37:50 +02:00
Xavier Morel
0e71f85802 [ADD] forwardport: some typing to conftest
It's not amazing, because cross-typing between the different conftests
and utility modules doesn't really work smoothly. So not entirely
convinced it's great.

While at it, inline a few `x.json()` local aliases when the jsons are
used in exclusive locations e.g. usually an assertion message on one
side and a result / process on the other.
2024-05-16 10:37:50 +02:00
Xavier-Do
8722aba511 [FIX] runbot: report access error on view 2024-05-16 10:04:54 +02:00
Xavier Morel
f4889ec8cf [ADD] runbot_merge: ad-hoc ACL tracking to res.partner
Sadly m2ms don't support tracking, so add a bunch of ad-hoc tracking
to the override rights in order to know who, what, when at least.

Do the same for the review rights although maybe tracking works for
those.
2024-05-16 09:32:03 +02:00
Xavier Morel
b177361f20 [IMP] forwardport: locking during creation of fw batch
Not sure why I didn't think about it previously (in #357), but it
would make sense for the entire batch creation to be atomic and thus
under the same lock.

The commit during forward porting also makes a lot less sense: it was
a failed early attempt at resolving the problem by hoping we'd win the
race with the webhook (commit before the webhook hit). By locking the
PRs table to update, we actually resolved it.

But since all that happens then is a few updates and then a commit by
the cron itself (it commits per batch), it's probably good enough to
leave the entire thing under the same lock. This means we lock out
other interactions a bit longer, but since the span is still just the
forward port of a single batch it should not be too much of an issue
outside of post-freeze recovery thingie.
2024-05-16 09:32:03 +02:00
Xavier Morel
a8e4d6dfee [IMP] runbot_merge: don't select content when locking rows
It might not be a huge amount of extra work since we're never actually
retrieving the rows, but it still seems completely unnecessary.

Sadly we can't do something cleaner like an aggregation, because
aggregating requires moving the locking query to a subquery, and
experimentally that seems slower than just ignoring / discarding the
result set.
2024-05-16 09:32:03 +02:00
Xavier-Do
2a49ac4f8c [FIX] runbot: no result for subcommand 2024-05-15 12:11:24 +02:00
Xavier-Do
d5114c0062 [IMP] runbot: improve make_result
The "Error or traceback found in logs" message is sometimes confusing
since we don't know what is the cause of the issue.

The first idea is to split the two concept, error and traceback to have
a better idea of the cause of the issue

The second one will also log the content of the line in the error
message. For traceback, tries to get the complete traceback, getting all
indentend lines and one last non idented one.

While working on it, cleaning slighty to partially get rid of
returning a dict, artefact from odoo < 13.0
2024-05-14 16:21:15 +02:00
Xavier-Do
507f4e37e3 [IMP] runbot: custom trigger view
Somme trigger may have an important depth and nightly result can be long
to check.

A custom view was already done for upgrade nightly, but this is hidden
and the same logic could be applied to the distro builds, ...

This commit adds a custom view on the trigger and related controller to
display a custom view for a trigger.
2024-05-14 15:26:24 +02:00
Xavier-Do
2a18cd7f3d [FIX] *: remove invalid escape sequences 2024-04-30 16:05:21 +02:00
Xavier-Do
e7d1cc9c57 [FIX] runbot: small fix following BS 5.3 2024-04-23 11:53:46 +02:00
Xavier-Do
816af5a058 [IMP] runbot: better log listing
The current logic to define if a build has logs or not is based on
is_docker_step. This related logic is not always valid, since it is
based on step type and step content, but there are many way to have a
result that could be a docker start, one of them being to call another
step. The main issue is that we don't always now if this other step is a
docker step or not before runtime. Moreover the logic becamore more and
more complex adding more conditions like commands, _run_, docker_params,
...

Moreover, the log list is completed before the build start, meaning that
if the build is killed before completion, some logs may be missing but
will be listed. This is also an issue when trying to access logs before
the correspondong step ran.

This commit changes the logic by adding the log file to the list (and
start log) when starting a docker, wich should give a more consistent
result.
2024-04-23 10:17:34 +02:00
Xavier-Do
a3c85d87d0 [IMP] runbot: improve log message layout 2024-04-23 10:08:54 +02:00
Xavier-Do
c577b72a66 [IMP] runbot: rework preference menu 2024-04-23 10:08:54 +02:00
Xavier-Do
2c5f129969 [IMP] runbot: add dark theme 2024-04-23 10:08:54 +02:00
Xavier-Do
6bafea7c36 [IMP] runbot: adapt to bootstrap 5.3
The bootstrap version was freezed during a previous migration to avoid
loosing to much time adapting the style again to fit the previous
look and feel.

Anyway, the  latest version of bootsrap offers more flexibility about
themes, and it could be a good oportunity to modernise a little the
runbot interface and answer to long lasting requests.

The main part of the adaptation is to tweak colors to match the
previous style, and adapt some class in xml views.

Some css rules are also tweaked to keep the same looks without the need
to rewrite xml views too much, this could be done in a future commit.
2024-04-23 10:08:54 +02:00
Xavier-Do
0eae784a1e [FIX] runbot: don't kill build if build may be used.
The current stratey to check if a build can be killed is to ensure that
no slot unskipped slots points to the same build. Since the check is
only done after a prepare, it should ensure that the new slot have the
build linked before checking if the previous batch can kill it's build.
Since the slot are now filled latter on (after the minimal check) it is
possible that a build is considered killable an killed before being
linked again.

To fix this, we just need to consier params instead of builds to define
if a build is killable or not. If the params of a builds are linked
elsewere, don't kill them.
2024-04-17 16:21:21 +02:00
Xavier-Do
e241f03321 [FIX] runbot: only start triggers based on alive branches 2024-04-17 14:28:53 +02:00
Xavier-Do
9eb7f0d577 [IMP] runbot: add trigger cross dependency 2024-04-16 11:09:50 +02:00
Xavier-Do
86b88d86c8 [FIX] runbot: adapt tests 2024-04-16 08:39:51 +02:00
Xavier-Do
7277d402fb [FIX] runbot: fix typo in view 2024-04-15 23:10:27 +02:00
Xavier-Do
7bc26219c2 [FIX] runbot: add missing tree hash
Previous commit introduced commit tree hash, but only when calling
get_commit_infos. This fixes the get_ref and find_new_commit to have
the same infos.
2024-04-15 16:44:44 +02:00
Xavier-Do
c70aa57acb [FIX] runbot: incorrect label 2024-04-15 14:57:06 +02:00
Xavier-Do
c067384b8b [IMP] runbot: add tree hash to views 2024-04-15 14:41:02 +02:00
Xavier-Do
e521149d93 [IMP] runbot: add container_cpus to step views 2024-04-15 14:32:54 +02:00
Xavier-Do
83acc43a05 [IMP] runbot: save reference batches on base batches 2024-04-15 11:42:28 +02:00
Xavier-Do
c9e14a86ca [IMP] runbot: propagate batch to reference builds decision 2024-04-15 11:42:28 +02:00
Xavier-Do
082ecaafe9 [IMP] runbot: add tree hash info to commit
Right now the commit is considered as a part of the build uniquifier.

If it makes sence for a check on the commit metadata, it is not the case
for execution tests where only the content matters.

This will allow to mark a trigger as non depending on commit but tree
hash, and avoid rebuild when only fixing a commit message.
2024-04-15 10:53:48 +02:00
Xavier-Do
f191e8cc48 [IMP] runbot: optionnal container_name in python steps 2024-04-15 10:44:32 +02:00
Xavier-Do
7e848b8073 [FIX] runbot: fix stats.js 2024-04-15 10:35:20 +02:00
Xavier-Do
9ef850220b [IMP] runbot: add a way to add custom pre/post
Mainly usefull for custom triggers
2024-04-15 10:08:56 +02:00
Christophe Monniez
412baa3fad [REF] runbot: centralize cpu_limit in _run_step
In order to keep it coherent, the cpu_limit computation can be done in
one place instead of defining it in all _run* methods.

In python steps, it can still be overridden when returning
docker_params.
2024-04-15 10:06:28 +02:00
Christophe Monniez
d0a96faf84 [IMP] runbot: add cpus parameter
When build eats all the CPU's resources, it may interfere with other
builds and cause collateral damages.

With this commit, a new settings parameter `Containers CPUs` is added in
order to limit the usage of available CPU's on runbot instances.

If left to 0, no limts are applied.

Otherwise, the cpu_quota docker parameter is computed as Containers
CPU's * (logical cpu's count / nb parallel builds) * cpu period which defaults to 100000.

e.g.:
- on a host with 16 logical CPU's
- with 8 parallel builds allowed
- with Containers CPUs set to 1.5
- with the default cpu_period
cpu_quota will be:
    (16/8) * 1.5 * 100000 = 300000

This system parameter can be overridden by the `container_cpus` field on
steps.
2024-04-15 10:06:28 +02:00
Xavier-Do
0fc1daeac9 [IMP] runbot: enable user to toggle no_build 2024-04-10 16:23:50 +02:00
Xavier-Do
70f4fe23a5 [IMP] runbot: alow autorebase for external pr 2024-04-02 11:57:47 +02:00
Xavier-Do
9c4983f5b7 [IMP] runbot: always display autorebase for external pr 2024-04-02 08:33:36 -01:00
Xavier-Do
70532df2d6 [FIX] runbot: fix build error unlink 2024-03-20 09:50:03 +01:00
Xavier-Do
fd7f49aff8 [FIX] runbot: fix staging creation 2024-03-20 09:47:51 +01:00
Xavier Morel
9f22305903 [IMP] runbot_merge: view warnings around ACLs
Eventually we might want to add a proper "sensitive" flag on overrides
and compute the flag based on that. For now just check for
`ci/security`.
2024-03-19 12:54:20 +01:00
Xavier Morel
5024c2e27b [FIX] forwardport: suppress warning when closing unmanaged PR
Continuation of 327500bc83 for an other
edge case of closing a PR to a detached branch with a merged
descendant. The mergebot would:

- warn on the parent about it being detached due to being closed
- then warn on the child about it being detached due to the parent
  being closed (despite it being merged already)
- then warn the parent *again* due to the child being detached

At least some of those messages were still produced by the test case,
stop them.

Issue was noticed on odoo/odoo#145969 and odoo/odoo#145984 due to 16.2
being deactivated.
2024-03-19 11:46:36 +01:00
Xavier Morel
953bf86044 [FIX] forwardport: don't notify of detached child on merged parents
The notification is both noise and confusing: we're telling the
author (and reviewer, and anyone else subscribed) that they need to
merge a merged PR.

Fixes #855
2024-03-12 14:57:50 +01:00