This commit revisits the commands set in order to make it more
regular, and limit inconsistent command-sets, although it includes
pseudo-command aliases for common tasks now removed from the core set.
Hard Errors
===========
The previous iteration of the commands set would ignore any
non-command term in a command line. This has been changed to hard
error (and ignoring the entire thing) if any command is unknown or
invalid.
This fixes inconsistent / unexpected interpretations where a user
sends a command, then writes a novel on the same line some words of
which happen to *also* be commands, leading to merge states they did
not expect. They should now be told to fuck off.
Priority Restructuring
----------------------
The numerical priority system was pretty messy in that it confused
"staging priority" (in ways which were not entirely straightforward)
with overrides to other concerns.
This has now being split along all the axis, with separate command
subsets for:
- staging prioritisation, now separated between `default`, `priority`,
and `alone`,
- `default` means PRs are picked by an unspecified order when
creating a staging, if nothing better is available
- `priority` means PRs are picked first when staging, however if
`priority` PRs don't fill the staging the rest will be filled with
`default`, this mode did not previously exist
- `alone` means the PRs are picked first, before splits, and only
`alone` PRs can be part of the staging (which usually matches the
modename)
- `skipchecks` overrides both statuses and approval checks, for the
batch, something previously implied in `p=0`, but now
independent. Setting `skipchecks` basically makes the entire batch
`ready`.
For consistency this also sets the reviewer implicitly: since
skipchecks overrides both statuses *and approval*, whoever enables
this mode is essentially the reviewer.
- `cancel` cancels any ongoing staging when the marked PR becomes
ready again, previously this was also implied (in a more restricted
form) by setting `p=0`
FWBot removal
=============
While the "forwardport bot" still exists as an API level (to segregate
access rights between tokens) it has been removed as an interaction
point, as part of the modules merge plan. As a result,
fwbot stops responding
----------------------
Feedback messages are now always sent by the mergebot, the
forward-porting bot should not send any message or notification
anymore.
commands moved to the merge bot
-------------------------------
- `ignore`/`up to` simply changes bot
- `close` as well
- `skipci` is now a choice / flag of an `fw` command, which denotes
the forward-port policy,
- `fw=default` is the old `ci` and resets the policy to default,
that is wait for the PR to be merged to create forward ports, and
for the required statuses on each forward port to be received
before creating the next
- `fw=skipci` is the old `skipci`, it waits for the merge of the
base PR but then creates all the forward ports immediately (unless
it gets a conflict)
- `fw=skipmerge` immediately creates all the forward ports, without
even waiting for the PR to be merged
This is a completely new mode, and may be rather broken as until
now the 'bot has always assumed the source PR had been merged.
approval rework
---------------
Because of the previous section, there is no distinguishing feature
between `mergebot r+` = "merge this PR" and `forwardbot r+` = "merge
this PR and all its parent with different access rights".
As a result, the two have been merged under a single `mergebot r+`
with heuristics attempting to provide the best experience:
- if approving a non-forward port, the behavior does not change
- else, with review rights on the source, all ancestors are approved
- else, as author of the original, approves all ancestors which descend
from a merged PR
- else, approves all ancestors up to and including the oldest ancestor
to which we have review rights
Most notably, the source's author is not delegated on the source or
any of its descendants anymore. This might need to be revisited if it
provides too restrictive.
For the very specialized need of approving a forward-port *and none of
its ancestors*, `review=` can now take a comma (`,`) separated list of
pull request numbers (github numbers, not mergebot ids).
Computed State
==============
The `state` field of pull requests is now computed. Hopefully this
makes the status more consistent and predictable in the long run, and
importantly makes status management more reliable (because reference
datum get updated naturally flowing to the state).
For now however it makes things more complicated as some of the states
have to be separately signaled or updated:
- `closed` and `error` are now separate flags
- `merge_date` is pulled down from forwardport and becomes the
transition signal for ready -> merged
- `reviewed_by` becomes the transition signal for approval (might be a
good idea to rename it...)
- `status` is computed from the head's statuses and overrides, and
*that* becomes the validation state
Ideally, batch-level flags like `skipchecks` should be on, well, the
batch, and `state` should have a dependency on the batch. However
currently the batch is not a durable / permanent member of the system,
so it's a PR-level flag and a messy pile.
On notable change is that *forcing* the state to `ready` now does that
but also sets the reviewer, `skipchecks`, and overrides to ensure the
API-mediated readying does not get rolled back by e.g. the runbot
sending a status.
This is useful for a few types of automated / programmatic PRs
e.g. translation exports, where we set the state programmatically to
limit noise.
recursive dependency hack
-------------------------
Given a sequence of PRs with an override of the source, if one of the
PRs is updated its descendants should not have the override
anymore. However if the updated PR gets overridden, its descendants
should have *that* override.
This requires some unholy manipulations via an override of `modified`,
as the ORM supports recursive fields but not recursive
dependencies (on a different field).
unconditional followup scheduling
---------------------------------
Previously scheduling forward-port followup was contigent on the FW
policy, but it's not actually correct if the new PR is *immediately*
validated (which can happen now that the field is computed, if there
are no required statuses *or* all of the required statuses are
overridden by an ancestor) as nothing will trigger the state change
and thus scheduling of the fp followup.
The followup function checks all the properties of the batch to port,
so this should not result on incorrect ports. Although it's a bit more
expensive, and will lead to more spam.
Previously this would not happen because on creation of a PR the
validation task (commit -> PR) would still have to execute.
Misc Changes
============
- If a PR is marked as overriding / canceling stagings, it now does
so on retry not just when setting initially.
This was not handled at all previously, so a PR in P0 going into
error due to e.g. a non-deterministic bug would be retried and still
p=0, but a current staging would not get cancelled. Same when a PR
in p=0 goes into error because something was failed, then is updated
with a fix.
- Add tracking to a bunch of relevant PR fields.
Post-mortem analysis currently generally requires going through the
text logs to see what happened, which is annoying.
There is a nondeterminism / inconsistency in the tracking which
sometimes leads the admin user to trigger tracking before the bot
does, leading to the staging tracking being attributed to them
during tests, shove under the carpet by ignoring the user to whom
that tracking is attributed.
When multiple users update tracked fields in the same transaction
all the changes are attributed to the first one having triggered
tracking (?), I couldn't find why the admin sometimes takes over.
- added and leveraged support for enum-backed selection fields
- moved variuous fields from forwardport to runbot_merge
- fix a migration which had never worked and which never run (because
I forgot to bump the version on the module)
- remove some unnecessary intermediate de/serialisation
fixes#673, fixes#309, fixes#792, fixes#846 (probably)
When asserting a status fails, it's possible that this is a 400 or
500-type error which does not yield JSON data at all (e.g. forgot to
start the proxy or dummy), in which case trying to dump the json data
is actively harmful as it triggers cascading failures. And it's not
like the output message is formatted, so a re-structured assertion
message is not helpful.
- move all commands parsing to runbot_merge as part of the long-term
unification effort (#789)
- set up an actual parser-ish structure to parse the commands to
something approaching a sum type (fixes#507)
- this is mostly prep for reworking the commands set (#673), although
*strict command parsing* has been implemented (cf update to
`test_unknown_commands`)
It's not amazing, because cross-typing between the different conftests
and utility modules doesn't really work smoothly. So not entirely
convinced it's great.
While at it, inline a few `x.json()` local aliases when the jsons are
used in exclusive locations e.g. usually an assertion message on one
side and a result / process on the other.
Sadly m2ms don't support tracking, so add a bunch of ad-hoc tracking
to the override rights in order to know who, what, when at least.
Do the same for the review rights although maybe tracking works for
those.
Not sure why I didn't think about it previously (in #357), but it
would make sense for the entire batch creation to be atomic and thus
under the same lock.
The commit during forward porting also makes a lot less sense: it was
a failed early attempt at resolving the problem by hoping we'd win the
race with the webhook (commit before the webhook hit). By locking the
PRs table to update, we actually resolved it.
But since all that happens then is a few updates and then a commit by
the cron itself (it commits per batch), it's probably good enough to
leave the entire thing under the same lock. This means we lock out
other interactions a bit longer, but since the span is still just the
forward port of a single batch it should not be too much of an issue
outside of post-freeze recovery thingie.
It might not be a huge amount of extra work since we're never actually
retrieving the rows, but it still seems completely unnecessary.
Sadly we can't do something cleaner like an aggregation, because
aggregating requires moving the locking query to a subquery, and
experimentally that seems slower than just ignoring / discarding the
result set.
The "Error or traceback found in logs" message is sometimes confusing
since we don't know what is the cause of the issue.
The first idea is to split the two concept, error and traceback to have
a better idea of the cause of the issue
The second one will also log the content of the line in the error
message. For traceback, tries to get the complete traceback, getting all
indentend lines and one last non idented one.
While working on it, cleaning slighty to partially get rid of
returning a dict, artefact from odoo < 13.0
Somme trigger may have an important depth and nightly result can be long
to check.
A custom view was already done for upgrade nightly, but this is hidden
and the same logic could be applied to the distro builds, ...
This commit adds a custom view on the trigger and related controller to
display a custom view for a trigger.
The current logic to define if a build has logs or not is based on
is_docker_step. This related logic is not always valid, since it is
based on step type and step content, but there are many way to have a
result that could be a docker start, one of them being to call another
step. The main issue is that we don't always now if this other step is a
docker step or not before runtime. Moreover the logic becamore more and
more complex adding more conditions like commands, _run_, docker_params,
...
Moreover, the log list is completed before the build start, meaning that
if the build is killed before completion, some logs may be missing but
will be listed. This is also an issue when trying to access logs before
the correspondong step ran.
This commit changes the logic by adding the log file to the list (and
start log) when starting a docker, wich should give a more consistent
result.
The bootstrap version was freezed during a previous migration to avoid
loosing to much time adapting the style again to fit the previous
look and feel.
Anyway, the latest version of bootsrap offers more flexibility about
themes, and it could be a good oportunity to modernise a little the
runbot interface and answer to long lasting requests.
The main part of the adaptation is to tweak colors to match the
previous style, and adapt some class in xml views.
Some css rules are also tweaked to keep the same looks without the need
to rewrite xml views too much, this could be done in a future commit.
The current stratey to check if a build can be killed is to ensure that
no slot unskipped slots points to the same build. Since the check is
only done after a prepare, it should ensure that the new slot have the
build linked before checking if the previous batch can kill it's build.
Since the slot are now filled latter on (after the minimal check) it is
possible that a build is considered killable an killed before being
linked again.
To fix this, we just need to consier params instead of builds to define
if a build is killable or not. If the params of a builds are linked
elsewere, don't kill them.
Previous commit introduced commit tree hash, but only when calling
get_commit_infos. This fixes the get_ref and find_new_commit to have
the same infos.
Right now the commit is considered as a part of the build uniquifier.
If it makes sence for a check on the commit metadata, it is not the case
for execution tests where only the content matters.
This will allow to mark a trigger as non depending on commit but tree
hash, and avoid rebuild when only fixing a commit message.
In order to keep it coherent, the cpu_limit computation can be done in
one place instead of defining it in all _run* methods.
In python steps, it can still be overridden when returning
docker_params.
When build eats all the CPU's resources, it may interfere with other
builds and cause collateral damages.
With this commit, a new settings parameter `Containers CPUs` is added in
order to limit the usage of available CPU's on runbot instances.
If left to 0, no limts are applied.
Otherwise, the cpu_quota docker parameter is computed as Containers
CPU's * (logical cpu's count / nb parallel builds) * cpu period which defaults to 100000.
e.g.:
- on a host with 16 logical CPU's
- with 8 parallel builds allowed
- with Containers CPUs set to 1.5
- with the default cpu_period
cpu_quota will be:
(16/8) * 1.5 * 100000 = 300000
This system parameter can be overridden by the `container_cpus` field on
steps.
Continuation of 327500bc83 for an other
edge case of closing a PR to a detached branch with a merged
descendant. The mergebot would:
- warn on the parent about it being detached due to being closed
- then warn on the child about it being detached due to the parent
being closed (despite it being merged already)
- then warn the parent *again* due to the child being detached
At least some of those messages were still produced by the test case,
stop them.
Issue was noticed on odoo/odoo#145969 and odoo/odoo#145984 due to 16.2
being deactivated.
The notification is both noise and confusing: we're telling the
author (and reviewer, and anyone else subscribed) that they need to
merge a merged PR.
Fixes#855
If an untracked PR is closed, especially on an inactive or untracked
branch, the closer (or author) almost certainly don't care to receive
3 different notifications on the subject.
The fix requires a schema change in order to track that we're fetching
the PR due to a `closed` event, as in other cases we may still want to
notify the user that we received the request (and it just happened to
resolve to a closed PR).
Fixes#857
- correctly handle projects without a secret set, we don't want the
requests to blow up by trying to `strip()` a `False` or `None`, that
is dumb, who would do that?
- provide better reporting on signature mismatch: which repo we tried
to access, and the full list of headers
- log when there was no signature matching, either because there was
no signature in the request and no secret on the project, or because
the request is signed but no secret is configured on the repo
`gc --prune` can not take a *separate* parameter, it has to be part of
the same arg (the `=` is not optional), otherwise the `gc` call blows
up.
So use the positional form of the git command to generate the correct
invocation, Python-level `foo=bar` generates a split-style option in
two args which does not please git.
Before this, we would check if a repository had a name and run
maintenance on it, leading to repeated (but unnoticed until now
because I didn't monitor it) tracebacks as the maintenance cron would
fail to find the local repo then run maintenance on nowhere anyway.
Also augment the repo-finding process to try and get better
information about what it's doing when it fails, rather than failing
completely silently.