Test and refine the handling of batch forward ports around branch
deactivation, especially with differential. Notably, fix an error in
the conversion of the FW process to batches: individual PR limit was
not correctly taken in account during forward port unless *all* PRs
were done, even though that is a primary motivation for the
change.
Partial forward porting should now work correctly, and the detection
and handling of differential next target should be better handled to
boot.
Significantly rework the interplay between batches and PRs being
closed in order to maintain sequencing / consistency of forward port
sequences: previously a batch would get deleted if all its PRs are
closed, but that is an issue when it is part of a forward port
sequence as we now lose information.
Instead, detach the PRs from the batch as before but have the batch
skip unlinking if it has historical value (parent or child
batch). Currently the batch's state is a bit weird as it doesn't get
merged, but...
While at it, significantly simplify `_try_closing` as it turns out to
have a ton of incidental / historical complexity from old attempts at
fixing concurrency issues, which should not be necessary anymore and
in fact actively interfere with the new and more compute-heavy state
of things.
Thank god I have a bunch of tests because once again I forgot / missed
a bunch of edge cases in doing the conversion, which the tests
caught (sadly that means I almost certainly broke a few untested edge
cases).
Important notes:
Handling of parent links
------------------------
Unlike PRs, batches don't lose their parent info ever, the link is
permanent, which is convenient to trawl through a forward port
(currently implemented very inefficiently, maybe we'll optimise that
in the future).
However this means the batch having a parent and the batch's PRs
having parents are slightly different informations, one of the edge
cases I missed is that of conflicting PRs, which are deparented and
have to be merged by hand before being forward ported further, I had
originally replaced the checks on a pr and its sibling having parents
by just the batch.
Batches & targets
-----------------
Batches were originally concepted as being fixed to a target and PRs
having that target, a PR being retargeted would move it from one batch
to an other.
As it turns out this does not work in the case where people retarget
forward-port PRs, which I know they do because #551
(2337bd8518). I could not think of a
good way to handle this issue as is, so scrapped the moving PRs thing,
instead one of the coherence checks of a batch being ready is that all
its PRs have the same target, and a batch only has a target if all its
PRs have the same target.
It's possible for somewhat odd effects to arise, notably if a PR is
closed (removed from batch), the other PRs are retargeted, and the new
PR is reopened, it will now be on a separate batch even if it also
gets retargeted. This is weird. I don't quite know how I should handle
it, maybe batches could merge if they have the same target and label?
however batches don't currently have a label so...
Improve limits
--------------
Keep limits on the PRs rather than lift them on the batchL if we can
add/remove PRs of batches having different limits on different PRs of
the same batch is reasonable.
Also leave limit unset by default: previously, the limit was eagerly
set to the tip (accessible) branch. That doesn't really seem
necessary, so stop doing that.
Also remove completely unnecessary `max` when trying to find a PR's
next target: `root` is either `self` or `self.source_id`, so it should
not be possible for that to have a later target.
And for now ensure the limits are consistent per batch: a PR defaults
to the limit of their batch-mate if they don't have one, and if a
limit is set via command it's set on all PRs of a batch.
This commit does not allow differential limits via commands, they are
allowed via the backend but not really tested. The issue is mostly
that it's not clear what the UX should look like to have clear and not
super error prone interactions. So punt on it for now, and hopefully
there's no hole I missed which will create inconsistent batches.
In case of PRs not being ready, don't just say the PRs are waiting for
CI even though they might be unreviewed, and make a difference
between *waiting* for CI (pending) and having failed CI.
It's a bit weird and inconsistent to have a PR being staged while
unreviewed or unapproved or w/e. If we compute the state based on
skipchecks then everything is consistent.
Also remove the implicit override of all statuses when explicitly
marking the pr as `ready`, it risks creating difficult to understand
states, and it's unnecessary since `skipchecks` gets set.
Also as with setting skipchecks, sets the current user as reviewer on
all PRs of the batch without a reviewer.
- remove the `legal/cla` and `ci/runbot` context names, which I use a
lot for historical reasons but fundamentally they're not useful to
the tests, the `default` context is generally simpler.
- remove `make_branch` helper as we don't actually use branch
protection and at the end of the day it doesn't do much else
- convert a few explicit PR lookups to the project-wide `to_pr` helper
Move staging cancellation to the batch, remove its (complicated)
handling from the PRs.
This loses some precision in the cancellation message, but that could
likely be recovered (in part) by adding more precise checks &
diagnostic extractions in the compute.
Because `alone` (formerly p != 2) is selected before split PRs, if a
prioritised PR gets split (or a split PR gets prioritised) it will be
staged once as prioritised, and again because split.
Improve the selection of ready batches to exclude split batches
upstream, such that they don't have to be rechecked over and over, and
their priorities don't cause us issues.
Simplifies the `ready_prs` query a bit and allows it to be converted
to an ORM search, by moving the priority check outside. This also
allows the caller to not need to post-process the records list
anywhere near the previous state of affairs.
`ready_prs` now returns *either* the "alone" batches, or the non-alone
batches, rather than mixing both into a single sequence. This requires
correctly applying the search filters to not retrieve priority of
batches in error or targeting other branches.
Staging readiness is a batch-level concerns, and many of the markers
are already there though a few need to be aggregated from the PRs. As
such, staging has no reason to be performed in terms of PRs anymore,
it should be performed via batches directly.
There is a bit of a mess in order not to completely fuck up when
retargeting PRs (implicitly via freeze wizard, or explicitely) as for
now we're moving PRs between batches in order to keep the
batches *mostly* target-bound.
Some of the side-effects in managing the coherence of the targeting
and moving PRs between batches is... not great. This might need to be
revisited and cleaned up with those scenarios better considered.
- `merge_date` should be common to an entire batch, so move it there
- remove `Batch.active` which should probably have been removed when
batches were made persistent (can eventually re-add as a proxy for
`merge_date` being set maybe, but for now removing it seems a better
way to catch mistakes)
- update various sites to use `Batch.merge_date` instead of
`Batch.active`
This probably has latent bugs, and is only the start of the road to v2
(#789): PR batches are now created up-front (alongside the PR), with
PRs attached and detached as needed, hopefully such that things are
not broken (tests pass but...), this required a fair number of
ajustments to code not taking batches into account, or creating
batches on the fly.
`PullRequests.blocked` has also been updated to rely on the batch to
get its batch-mates, such that it can now be a stored field with the
right dependencies.
The next step is to better leverage this change:
- move cross-PR state up to the batch (e.g. skipchecks, priority, ...)
- add fw info to the batch, perform forward-ports batchwise in order
to avoid redundant batch-selection work, and allow altering batches
during fw (e.g. adding or removing PRs)
- use batches to select stagings
- maybe expose staging history of a batch?
Not sure it's going to be useful but it's hard to know if we can't
test it. The intent is mostly the ability to prioritize throughput (or
attempt to) during high-load events, if we can favour staging N
new batches over a split's N/2 we might be able to merge more crap.
But maybe not, we'll see, either way now it's here and seems to more
or less work.
Fixes#798
Because the mergebot crons are on such a tight scheduling, and just
them finding out they have nothing to do can take a while, disabling
them can be a chore. Disabling staging via the project is much less
likely to cause issues as the projects don't normally (or ever?) get
exclusively locked, so they can generally be written to at any moment.
Furthermore, if we ever get in a situation where we have multiple
active projects (not really the case currently, we have multiple
projects but only one is really active) it's less disruptive to
disable stagings on a single specific project.
Fixes#860
This commit revisits the commands set in order to make it more
regular, and limit inconsistent command-sets, although it includes
pseudo-command aliases for common tasks now removed from the core set.
Hard Errors
===========
The previous iteration of the commands set would ignore any
non-command term in a command line. This has been changed to hard
error (and ignoring the entire thing) if any command is unknown or
invalid.
This fixes inconsistent / unexpected interpretations where a user
sends a command, then writes a novel on the same line some words of
which happen to *also* be commands, leading to merge states they did
not expect. They should now be told to fuck off.
Priority Restructuring
----------------------
The numerical priority system was pretty messy in that it confused
"staging priority" (in ways which were not entirely straightforward)
with overrides to other concerns.
This has now being split along all the axis, with separate command
subsets for:
- staging prioritisation, now separated between `default`, `priority`,
and `alone`,
- `default` means PRs are picked by an unspecified order when
creating a staging, if nothing better is available
- `priority` means PRs are picked first when staging, however if
`priority` PRs don't fill the staging the rest will be filled with
`default`, this mode did not previously exist
- `alone` means the PRs are picked first, before splits, and only
`alone` PRs can be part of the staging (which usually matches the
modename)
- `skipchecks` overrides both statuses and approval checks, for the
batch, something previously implied in `p=0`, but now
independent. Setting `skipchecks` basically makes the entire batch
`ready`.
For consistency this also sets the reviewer implicitly: since
skipchecks overrides both statuses *and approval*, whoever enables
this mode is essentially the reviewer.
- `cancel` cancels any ongoing staging when the marked PR becomes
ready again, previously this was also implied (in a more restricted
form) by setting `p=0`
FWBot removal
=============
While the "forwardport bot" still exists as an API level (to segregate
access rights between tokens) it has been removed as an interaction
point, as part of the modules merge plan. As a result,
fwbot stops responding
----------------------
Feedback messages are now always sent by the mergebot, the
forward-porting bot should not send any message or notification
anymore.
commands moved to the merge bot
-------------------------------
- `ignore`/`up to` simply changes bot
- `close` as well
- `skipci` is now a choice / flag of an `fw` command, which denotes
the forward-port policy,
- `fw=default` is the old `ci` and resets the policy to default,
that is wait for the PR to be merged to create forward ports, and
for the required statuses on each forward port to be received
before creating the next
- `fw=skipci` is the old `skipci`, it waits for the merge of the
base PR but then creates all the forward ports immediately (unless
it gets a conflict)
- `fw=skipmerge` immediately creates all the forward ports, without
even waiting for the PR to be merged
This is a completely new mode, and may be rather broken as until
now the 'bot has always assumed the source PR had been merged.
approval rework
---------------
Because of the previous section, there is no distinguishing feature
between `mergebot r+` = "merge this PR" and `forwardbot r+` = "merge
this PR and all its parent with different access rights".
As a result, the two have been merged under a single `mergebot r+`
with heuristics attempting to provide the best experience:
- if approving a non-forward port, the behavior does not change
- else, with review rights on the source, all ancestors are approved
- else, as author of the original, approves all ancestors which descend
from a merged PR
- else, approves all ancestors up to and including the oldest ancestor
to which we have review rights
Most notably, the source's author is not delegated on the source or
any of its descendants anymore. This might need to be revisited if it
provides too restrictive.
For the very specialized need of approving a forward-port *and none of
its ancestors*, `review=` can now take a comma (`,`) separated list of
pull request numbers (github numbers, not mergebot ids).
Computed State
==============
The `state` field of pull requests is now computed. Hopefully this
makes the status more consistent and predictable in the long run, and
importantly makes status management more reliable (because reference
datum get updated naturally flowing to the state).
For now however it makes things more complicated as some of the states
have to be separately signaled or updated:
- `closed` and `error` are now separate flags
- `merge_date` is pulled down from forwardport and becomes the
transition signal for ready -> merged
- `reviewed_by` becomes the transition signal for approval (might be a
good idea to rename it...)
- `status` is computed from the head's statuses and overrides, and
*that* becomes the validation state
Ideally, batch-level flags like `skipchecks` should be on, well, the
batch, and `state` should have a dependency on the batch. However
currently the batch is not a durable / permanent member of the system,
so it's a PR-level flag and a messy pile.
On notable change is that *forcing* the state to `ready` now does that
but also sets the reviewer, `skipchecks`, and overrides to ensure the
API-mediated readying does not get rolled back by e.g. the runbot
sending a status.
This is useful for a few types of automated / programmatic PRs
e.g. translation exports, where we set the state programmatically to
limit noise.
recursive dependency hack
-------------------------
Given a sequence of PRs with an override of the source, if one of the
PRs is updated its descendants should not have the override
anymore. However if the updated PR gets overridden, its descendants
should have *that* override.
This requires some unholy manipulations via an override of `modified`,
as the ORM supports recursive fields but not recursive
dependencies (on a different field).
unconditional followup scheduling
---------------------------------
Previously scheduling forward-port followup was contigent on the FW
policy, but it's not actually correct if the new PR is *immediately*
validated (which can happen now that the field is computed, if there
are no required statuses *or* all of the required statuses are
overridden by an ancestor) as nothing will trigger the state change
and thus scheduling of the fp followup.
The followup function checks all the properties of the batch to port,
so this should not result on incorrect ports. Although it's a bit more
expensive, and will lead to more spam.
Previously this would not happen because on creation of a PR the
validation task (commit -> PR) would still have to execute.
Misc Changes
============
- If a PR is marked as overriding / canceling stagings, it now does
so on retry not just when setting initially.
This was not handled at all previously, so a PR in P0 going into
error due to e.g. a non-deterministic bug would be retried and still
p=0, but a current staging would not get cancelled. Same when a PR
in p=0 goes into error because something was failed, then is updated
with a fix.
- Add tracking to a bunch of relevant PR fields.
Post-mortem analysis currently generally requires going through the
text logs to see what happened, which is annoying.
There is a nondeterminism / inconsistency in the tracking which
sometimes leads the admin user to trigger tracking before the bot
does, leading to the staging tracking being attributed to them
during tests, shove under the carpet by ignoring the user to whom
that tracking is attributed.
When multiple users update tracked fields in the same transaction
all the changes are attributed to the first one having triggered
tracking (?), I couldn't find why the admin sometimes takes over.
- added and leveraged support for enum-backed selection fields
- moved variuous fields from forwardport to runbot_merge
- fix a migration which had never worked and which never run (because
I forgot to bump the version on the module)
- remove some unnecessary intermediate de/serialisation
fixes#673, fixes#309, fixes#792, fixes#846 (probably)
- move all commands parsing to runbot_merge as part of the long-term
unification effort (#789)
- set up an actual parser-ish structure to parse the commands to
something approaching a sum type (fixes#507)
- this is mostly prep for reworking the commands set (#673), although
*strict command parsing* has been implemented (cf update to
`test_unknown_commands`)
Sadly m2ms don't support tracking, so add a bunch of ad-hoc tracking
to the override rights in order to know who, what, when at least.
Do the same for the review rights although maybe tracking works for
those.
It might not be a huge amount of extra work since we're never actually
retrieving the rows, but it still seems completely unnecessary.
Sadly we can't do something cleaner like an aggregation, because
aggregating requires moving the locking query to a subquery, and
experimentally that seems slower than just ignoring / discarding the
result set.
If an untracked PR is closed, especially on an inactive or untracked
branch, the closer (or author) almost certainly don't care to receive
3 different notifications on the subject.
The fix requires a schema change in order to track that we're fetching
the PR due to a `closed` event, as in other cases we may still want to
notify the user that we received the request (and it just happened to
resolve to a closed PR).
Fixes#857
- correctly handle projects without a secret set, we don't want the
requests to blow up by trying to `strip()` a `False` or `None`, that
is dumb, who would do that?
- provide better reporting on signature mismatch: which repo we tried
to access, and the full list of headers
- log when there was no signature matching, either because there was
no signature in the request and no secret on the project, or because
the request is signed but no secret is configured on the repo
`gc --prune` can not take a *separate* parameter, it has to be part of
the same arg (the `=` is not optional), otherwise the `gc` call blows
up.
So use the positional form of the git command to generate the correct
invocation, Python-level `foo=bar` generates a split-style option in
two args which does not please git.
Before this, we would check if a repository had a name and run
maintenance on it, leading to repeated (but unnoticed until now
because I didn't monitor it) tracebacks as the maintenance cron would
fail to find the local repo then run maintenance on nowhere anyway.
Also augment the repo-finding process to try and get better
information about what it's doing when it fails, rather than failing
completely silently.
The signature validation code seems correct, but there are validation
failure in production, increase logging around webhook requests to
try and diagnose things better:
- dump the *entire* body to the github_requests logfile
- add the received & computed signatures to the log error
Turns out I've always been mistaken about the handling of quotes
*inside* shell parameters, apparently they are always consumed by the
shell unless nested so
--foo="bar"
reaches the underlying program as
--foo=bar
This means when using subprocess (without shell=True), adding the
quotes leads to mishandling of the parameters (as the subprocess now
has quotes it's not equipped to deal with).
This exact error is made in the `--pretty` parameter of git show,
locally this results in the author name and the committer email being
terminated by double quotes although somehow other layers seem to
exclude those from the end result (I assume `commit-tree` strips the
quotes from the envvars under the assumption that users can mistakenly
quote them or something?
Anyway while it does not seem harmful (so far), better safe than
sorry.
Add intermediate forks to a pair of tests, because github now (?)
requires being able to write on a branch to create a PR from it, so
the non-collaborator reviewers were not able to create a PR from a
branch created by user.
Github delivery delays keep getting worse. Depending on what comes
before `to_pr`, this leads it to fail more often as it runs before the
PR it's looking for was signaled to the mergebot.
In order to mitigate this issue, add a wait loop in `to_pr`, waiting
up to 4 seconds for the PR it's looking for before aborting.
Also replace manual lookups by `to_pr` in every method of
`TestPRUpdate` while at it since it hit a few of the issues. And
remove the xfail test case since it seems unlikely github will change
tack (maybe? could be worth testing to be sure).
Reverts commit 85a7890023 which
untrimmed the commits: while it's *probably* true that git and
github's APIs differ in their treatment of whitespace (in that git
pretty much always terminates the commit message with a newline while
github does not, as far as I understand, though I didn't really
validate it) the issue was that github also trims on *output* when
fetching over the API, something the fake did not do.
So rather than update the test data I should have fixed the fake, but
I failed to realise that at the time. I only realised when I decided
to re-run against github actual (something I rarely do anymore as it's
painfully slow) and it went on to choke on every message I'd updated.
The logging line was copied over from the github-api version, but it
was not correctly fixed up to match, leading to a lot of spam on
stderr when debug is enabled (aka spams journalctl on the production
server).
Splat the logging call out of `rebase` and into the various callers,
so they have access to the pr object to log it.
Forgot to bump the version when creating the migration. Also convert
the migration to a single sql query, although the migration will never
run because I ran the query manually to fix things up after finding
out the data was "dirty" since the new code (assuming only modern
statuses) was merged without running the migration.
Thankfully it looks like the impact was not too severe (because the
legacy statuses should only be present on very old commits / PRs), I
don't remember when I deployed the update but apparently just a pair
of PRs got affected, because their `previous_failure` was the old
style and thus broke the "new failure" check.
Forgot to deref the id of the staging we're trying to lock, so the
specific case where we start a freeze with a bump PR and an
outstanding staging in master would instantly blow up.
The low-level APIs used by the staging process don't do any merge
check, so because of the way Git works it's possible for them to merge
commits with content as empty commits, e.g. if something was merged
then backported and the backport was merged on top. This should
trigger a merge failure as we don't really want to merge newly
empty. This is a feature which some high level commands of git
support, kind-of, e.g. by default `git rebase --interactive` will ask
about newly empty commits.
Take care to allow merging already-empty commits, as these do have a
use for signaling, freezes, ....
Fixes#809
Prepares the possibility of either more direct communication with the
CI platform(s) or just assuming CI has gotten reliable enough and
colleagues intelligent enough that this is not an issue anymore
because they've stopped pushing empty branches (which we know is not
the case).
Fixes#806
During the 17.0 freezeathon, the freeze wizard blew up with
MergeError: merge-tree: {oid} - not something we can merge
Turns out when freezes were moved to local
(4d2c0f86e1) I forgot to fetch the heads
of the release and bump PRs into the local repo, so rebasing them atop
their branch would fail because the local repository would just not
find the object being rebased.
I had missed that case in testing as well, but in fairness even if I
had tried testing it I'd likely have missed it: implementation
limitations (shortcuts) of dummy central mean it currently ignores
what objects the client requests and bundles everything it can find
associated with the repository (meaning it sends the entire network).
This is not usually an issue because the test repos are pretty small,
but it means the client can have objects they should not because they
never requested them and might not even be supposed to be aware of
their existence.
Anyway solve by doing the obvious: fetch the heads of the release and
bump PRs at the same time we update the branch being forked off. Also
update the freeze tests to trigger the issue (by creating the release
/ bump PRs in different repos) and running the tests against github
actual to make sure we can actually see them fail (correctly, the
merge error we expect) not via errors in the test), and we do fix
them.
Fixes#821
Currently, once a source PR has been merged it's not possible to set
or update a limit, which can be inconvenient (e.g. people might have
forgotten to set it, or realise after the fact that setting one is not
useful, or might realise after the fact that they should *unset* it).
This PR relaxes that constraint (which is not just a relaxation as it
requires a bunch of additional work and validation), it should now be
possible to:
- update the limit to any target, as long as that target is at or
above the current forwardport tip's
- with the exception of a tip being forward ported, as that can't be
cancelled
- resume a forward port stopped by a previously set limit (just
increase it to whatever)
- set a limit from any forward-port PR
- set a limit from a source, post-merge
The process should also provide pretty chatty feedback:
- feedback on the source and closest root
- feedback about adjustments if any (e.g. setting the limit to B but
there's already a forward port to C, the 'bot should set the limit
to C and tell you that it happened and why)
Fixes#506
If a branch `foo` is disabled, then `tmp.foo` and `staging.foo` become
unnecessary (with #247 fixed the tmp refs are not used for creating
stagings anymore, but for now they're still used for the "safety
dance" of merging a successful staging into the corresponding
mainline).
Fixes#605
Per the Github webhook documentation:
1. sha1 signatures are deprecated, github recommends sha256 (though
that's unlikely to be a concern anyway), and dummy-central supports
both so it should be no issue.
> If possible, we recommend that you use the x-hub-signature-256
> header for improved security.
2. Non-ascii secrets are supported and should be utf8-encoded to
compute signatures... that's not actually documented as github docs
only mention payload encoding but it seems to make sense anyway.
Also improve the warning message by replacing the signature (which is
useless) by the delivery id (which could allow introspecing the hook
or something).
Currently a user is not notified that the parent of a detached PR
needs to be independently approved and may miss that information. Add
a notification to *that* PR as well.
Fixes#788
The github API has gotten a lot more constraining (with rate
restrictions being newly enforced or added somewhat out of nowhere),
and as importantly a lot less reliable. So move the staging process
off of github and locally, similar to the forward porting
process (whose repo cache is being reused for this).
Fixes#247
Probably less necessary than for the regular staging stuff, but might
as well while at it.
Requires updating one of the test to generate a non-ff push, as
O_CREAT doesn't exist at the git level, and the client (and it is
client-side) only protects against force pushes. So there is no way to
trigger an issue with just the creation of the new branch, it needs to
exist *and point to a non-ancestor commit*.
Also remove a sleep in the ref update loop as there are no ref updates
anymore, until the very final sync via git.
NB: maybe it'd be possible to push both bump and release PRs together
for each repo, but getting which update failed in case of failure
seems difficult.
It has been a consideration for a while, but the pain of subtly
interacting with git via the ignominous CLI kept it back. Then ~~the
fire nation attacked~~ github got more and more tight-fisted (and in
some ways less reliable) with their API.
Staging pretty much just interacts with the git database, so it's both
a facultative github operator (it can just interact with git directly)
and a big consumer of API requests (because the git database endpoints
are very low level so it takes quite a bit of work to do anything
especially when high-level operations like rebase have to be
replicated by hand).
Furthermore, an issue has also been noticed which can be attributed to
using the github API (and that API's reliability getting worse): in
some cases github will fail to propagate a ref update / reset, so when
staging 2 PRs it's possible that the second one is merged on top of
the temporary branch of the first one, yielding a kinda broken commit
(in that it's a merge commit with a broken error message) instead of
the rebase / squash commit we expected.
As it turns out it's a very old issue but only happened very early so
was misattributed and not (sufficiently) guarded against:
- 41bd82244bb976bbd4d4be5e7bd792417c7dae6b (October 8th 2018) was
spotted but thought to be a mergebot issue (might have been one of
the opportunities where ref-checks were added though I can't find
any reference to the commit in the runbot repo).
- 2be25052e147b151d1d8a5bc73cceb351586ce03 (October 15th, 2019) was
missed (or ignored).
- 5a9fe7a7d05a9df7186072a7bffd60c6b428fd0e (July 31st, 2023) was
spotted, but happened at a moment where everything kinda broke
because of github rate-limiting ref updates, so the forensics were
difficult and it was attributed to rate limiting issues.
- f10d03bf0f2e8f88f62a5d8356b84f714196130f (August 24th, 2023) broke
the camel's back (and the head block): the logs were not too
interspersed with other garbage and pretty clear that github ack'd a
ref update, returned the correct oid when checking the ref, then
returned the wrong oid when fetching it later on.
No Working Copy
===============
The working copy turns out to not be necessary, the plumbing commands
we *need* work just fine on a bare repository.
Working without a WC means we had to reimplement the high level
operations (rebase) by hand much as we'd done previously, *but* we
needed to do that anyway as git doesn't seem to provide any way to
retrieve the mapping when rebasing/cherrypicking, and cherrypicking by
commit doesn't work well as it can't really find the *merge base* it
needs.
Forward-porting can almost certainly be implemented similarly (with
some overhead), issue #803 has been opened to keep track of the idea.
No TMP
======
The `tmp.` branches are no more, the process of creating stagings is
based entirely around oids, if staging something fails we can just
abandon the oids (they'll be collected by the weekly GC), we only
need to update the staging branches at the very end of the process.
This simplifies things a fair bit.
For now we have stopped checking for visibility / backoff as we're
pushing via git, hopefully it is a more reliable reference than the
API.
Commmit Message Formatting
==========================
There's some unfortunate churn in the test, as the handling of
trailing newlines differs between github's APIs and git itself.
Fixes#247
PS: It might be a good idea to use pygit2 instead of the CLI
eventually, the library is typed which is nice, and it avoids
shelling out although that's really unlikely to be a major cost.
Necessary to create commits *as* the mergebot without going through
the github API. Copy of the improved version from forwardport. *Not*
an override, to avoid unnecessarily triggering one or the other which
is confusing and weird.
Move *almost* all the staging code to free functions, in a separate
module, and extensively typed.
The only bits which didn't move are:
- the entry point (the cron hook), because it has to be a model method
in order to be called
- the `_build_merge_message` method, because it needs to be
overridable
There's also a bit of an import mess, because the cron &
`_build_merge_message` need to call into the new module, but the new
module wants the types they belong to, so it's a bit circular.
If the stagings are going to be created locally (via a git working
copy rather than the github API), the mergebot part needs to have
access to the cache, so move the cache over. Also move the maintenance
cron.
In an extermely minor way, this prefigures the (hopeful) eventual
merging of the ~~planes~~ modules.
- add formatting for a bunch of backend objects
- add cross-links in order to use toplevel navigation between objects
e.g. project -> branch -> staging -> PR with breadcrumbs instead of
shitty dialog boxes
Relates to #802
When I updated the status storage (including `previous_failure`) for
some reason I didn't just migrate from the old to the new format, and
added bridge functions instead.
This is not really necessary (or useful), so convert all the legacy
data and remove the conversion helpers.
Relates to #802
Mostly a temporary safety feature after the events of 07-31: it's
still not clear whether that was a one-off issue or a change in
policy (I was not able to reproduce locally even doing several
set_refs a second) and the gh support is not super talkative, but it
probably doesn't hurt to commit the workaround until #247 gets
implemented.
On 2023-07-31, around 08:30 UTC, `set_ref` started failing, a lot
(although oddly enough not continuously), with the unhelpful message
that
> 422: Reference cannot be updated
This basically broke all stagings, until a workaround was implemented
by adding a 1s sleep before `set_ref` to ensure no more than 1
`set_ref` per second, which kinda sorta has been the github
recommendation forever but had never been an issue
before. Contributing to this suspicion is that in late 2022, the
documentation of error 422 on `PATCH git/refs/{ref}` was updated to:
> Validation failed, or the endpoint has been spammed.
Still would be nice if GH was clear about it and sent a 429 instead.
Technically the recommendation is:
> If you're making a large number of POST, PATCH, PUT, or DELETE
> requests for a single user or client ID, wait at least one second
> between each request.
So... actually implement that. On a per-worker basis as for the most
part these are serial processes (e.g. crons), we can still get above
the rate limit due to concurrent crons but it should be less likely.
Also take `Retry-After` in account, can't hurt, though we're supposed
to retry just the request rather than abort the entire thing. Maybe a
future update can improve this handling.
Would also be nice to take `X-RateLimit` in account, although that's
supposed to apply to *all* requests so we'd need a second separate
timestamp to track it. Technically that's probably also the case for
`Retry-After`. And fixing #247 should cut down drastically on the API
calls traffic as staging is a very API-intensive process, especially
with the sanity checks we had to add, these days we might be at 4
calls per commit per PR, and up to 80 PRs/staging (5 repositories and
16 batches per staging), with 13 live branches (though realistically
only 6-7 have significant traffic, and only 1~2 get close to filling
their staging slots).
`/runbot_merge/stagings`
========================
This endpoint is a reverse lookup from any number of commits to a
(number of) staging(s):
- it takes a list of commit hashes as either the `commits` or the
`heads` keyword parameter
- it then returns the stagings which have *all* these commits as
respectively commits or heads, if providing all commits for a
project the result should always be unique (if any)
- `commits` are the merged commits, aka the stuff which ends up in the
actual branches
- `heads` are the staging heads, aka the commits at the tip of the
`staging.$name` branches, those may be the same as the corresponding
commit, or might be deduplicator commits which get discarded on
success
`/runbot_merge/stagings/:id`
============================
Returns a list of all PRs in the staging, grouped by batch (aka PRs
which have the same label and must be merged together).
For each PR, the `repository` name, `number`, and `name` in the form
`$repository#$number` get returned.
`/runbot_merge/stagings/:id1/:id2`
==================================
Returns a list of all the *successfully merged* stagings between `id1`
and `id2`, from oldest to most recent. Individual records have the
form:
- `staging` is the id of the staging
- `prs` is the contents of the previous endpoint (a list of PRs
grouped by batch)
`id1` *must* be lower than `id2`.
By default, this endpoint is inclusive on both ends, the
`include_from` and / or `include_to` parameters can be passed with the
`False` value to exclude the corresponding bound from the result.
Related to #768
`auto_session_tracking` causes issues when not specified on the super
old version of the client which is available on ubuntu.
Also disable tracing as it seems less useful than hoped for, and I've
not been using what's been collected so far.
Currently the heads of a staging (both staging heads and merged heads)
are just JSON data on the staging itself. Historically this was
convenient as the heads were mostly of use to the staging process, and
thus accessed directly through the staging essentially exclusively.
However this makes finding stagings from merged commits e.g. for
forensic research almost impossible, because querying based on
the *values* of a JSON map is expensive, and indexing it is difficult.
To make this use case more feasible, split the `heads` field into two
join tables, one for the staging heads and one for the merged heads,
this makes looking for stagings by commits much more
efficient (although the queries may not be trivial). Also add two
utility RPC methods, so that it's possible to query stagings
reasonably easily and efficiently based on a set of commits (branch
heads).
related to #768
Allow filtering stagings by state (success or failure), and provide a
control to explicitly update the staging date limit.
Should make it easier to drill through stagings when looking for
specific information.
Related to #751
Fix outstanding query to make a positive `state` filtering, instead of
negative, matching 3b52b1aace8674259812a76b1566260937dbcacb.
Also manually create a map of stagings (grouped by branch) sharing a
single prefetch set.
For odoo the mergebot home page has 12 branches in the odoo project
and 8 in spreadsheet, 6 stagings each. This means 120 queries to
retrieve all the heads (Odoo stagings have 5 heads and spreadsheet
have 1, but that seems immaterial).
By fixing `_compute_statuses` and creating a single prefetch set for
all stagings of all branches we can fetch all the commits in a single
query instead of 120.
- add support for authorship (not just approval)
- make display counts directly
- fix `state` filter: postgres can't do negative index lookups
- add indexes for author and reviewed_by as we look them up
- ensure we handle the entire source filtering via a single subquery
Closes#778
A few cases of conflict were missing from the provisioning
handler.
They can't really be auto-fixed, so just output a warning and ignore
the entry, that way the rest of the provisioning succeeds.
During the 16.3 freeze an issue was noticed with the concurrency
safety of the freeze wizard (because it blew up, which caused a few
issues): it is possible for the cancelling of an active staging to the
master branch to fail, which causes the mergebot side of the freeze to
fail, but the github state is completed, which puts the entire thing
in a less than ideal state.
Especially with the additional issue that the branch inserter has its
own concurrency issue (which maybe I should fix): if there are
branches *being* forward-ported across the new branch, it's unable to
see them, and thus can not create the now-missing PRs.
Try to make the freeze wizard more resilient:
1. Take a lock on the master staging (if any) early on, this means if
we can acquire it we should be able to cancel it, and it won't
suffer a concurrency error.
2. Add the `process_updated_commits` cron to the set of locked crons,
trying to read the log timeline it looks like the issue was commits
being impacted on that staging while the action had started:
REPEATABLE READ meant the freeze's transaction was unable to see
the update from the commit statuses, therefore creating a diverging
update when it cancelled the staging, which postgres then reported
as a serialization error.
I'd like to relax the locking of the cron (to just FOR SHARE), but I
think it would work, per postgres:
> SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as
> SELECT in terms of searching for target rows: they will only find
> target rows that were committed as of the transaction start
> time. However, such a target row might have already been updated (or
> deleted or locked) by another concurrent transaction by the time it
> is found. In this case, the repeatable read transaction will wait
> for the first updating transaction to commit or roll back (if it is
> still in progress). If the first updater rolls back, then its
> effects are negated and the repeatable read transaction can proceed
> with updating the originally found row. But if the first updater
> commits (and actually updated or deleted the row, not just locked
> it) then the repeatable read transaction will be rolled back with
> the message
This means it would be possible to lock the cron, and then get a
transaction error because the cron modified one of the records we're
going to hit while it was running: as far as the above is concerned
the cron's worker had "just locked" the row so it's fine to
continue. However this makes it more and more likely an error will be
hit when trying to freeze (to no issue, but still). We'll have to see
how that ends up.
Fixes#766 maybe
Currently sentry is only hooked from the outside, which doesn't
necessarily provide sufficiently actionable information.
Add some a few hooks to (try and) report odoo / mergebot metadata:
- add the user to WSGI transactions
- add a transaction (with users) around crons
- add the webhook event info to webhook requests
- add a few spans to the long-running crons, when they cover multiple
units per iteration (e.g. a span per branch being staged)
Closes#544
- move sentry configuration and add exception-based filtering
- clarify and reclassify (e.g. from warning to info) a few messages
- convert assertions in rebase to MergeError so they can be correctly
logged & reported, and ignored by sentry, also clarify them
(especially the consistency one)
Related to #544
Largely informed by sentry,
- Fix an inconsistency in staging ref verification, `set_ref`
internally waits for the observed head to match the requested head,
but then `try_staging` would re-check that and immediately fail if
it didn't.
Because github is *eventually* consistent (hopefully) this second
check can fail (and is also an extra API call), breaking staging
unnecessarily, especially as we're still going to wait for the
update to be visible to git.
Remove this redundant check entirely, as github provides no way to
ensure we have a consistent view of anything, it doesn't have much
value and can do much harm.
- Add github request id to one of the sanity check warnings as that
could be a useful thing to send upstream, missing github request ids
in the future should be noted and added.
- Reworked the GH object's calls to be clearer and more coherent:
consistently log the same thing on all GH errors (if `check`),
rather than just on the one without a `check` entry.
Also remove `raise_for_status` and raise `HTTPError` by hand every
time we hit a status >= 400, so we always forward the response body
no matter what its type is.
- Try again to log the request body (in full as it should be pretty
small), also remove stripping since we specifically wanted to add a
newline at the start, I've no idea what I was thinking.
Fixes#735, #764, #544
Current system makes it hard to iterate feedback messages and make
them clearer, this should improve things a touch.
Use a bespoke model to avoid concerns with qweb rendering
complexity (we just want GFM output and should not need logic).
Also update fwbot test setup to always configure an fwbot name, in
order to avoid ping messages closing the PRs they're talking
about, that took a while to debug, and given the old message I assume
I'd already hit it and just been too lazy to fix. This requires
updating a bunch of tests as fwbot ping are sent *to*
`fp_github_name`, but sent *from* the reference user (because that's
the key we set).
Note: noupdate on CSV files doesn't seem to work anymore, which isn't
great. But instead set tracking on the template's templates, it's not
quite as good but should be sufficient.
Fixes#769
- currently disabling staging only works globally, allow disabling on
a single branch
- use a toggle
- remove a pair of tests which work specifically with `fp_target`,
can't work with `active` (probably)
- cleanup search of possible and active stagings, add relevant
indexes and use direct search of relevant branches instead of
looking up from the project
- also use toggle button for `active` on branches
- shitty workaround for upgrading DB: apparently mail really wants to
have a `user_id` to do some weird thing, so need to re-add it after
resetting everything
Fixes#727
- github logins are case-insensitive while the db field is CI the dict
in which partners are stored for matching is not, And the caller may
not preserve casing.
Thus it's necessary to check the casefolded database values against
casefolded parameters, rather than exactly.
- users may get disabled by mistake or when one leaves the project,
they may also get switched from internal to portal, therefore it is
necessary to re-enable and re-enroll them if they come back.
- while at it remove the user's email when they depart, as they likely
use an organisational email which they don't have access to anymore
Side-note, but remove the limit on the number of users / partners
being created at once: because there are almost no modules in the
mergebot's instance, creating partner goes quite fast (compared to a
full instance), thus the limitation is almost certainly unnecessary
(creating ~300 users seems to take ~450ms).
Fixes ##776
652b1ff9ae wanted to check if a request
was available, however it deref'd the `request` object without
checking it which is not correct: a `request` normally has an
`httprequest`, but the `request` itself might be missing if the
handler is called from e.g. a cron.
Fixes#739
The mismatch diff attribute contains values from the in-db object and
the github PR structure, some of which are explicitly *not*
strings (e.g. the squash flag, possibly the commits # in the future).
As a result, when the squash-flag of a PR differs from the actual the
formatting for diffing blows up, because difflib can't handle
non-strings.
Stringify values between passing them to `format_items`, this way the
string operations on names and values should work correctly.
The mergebot page become a bit slow with the years, it is time to make
small optimisation to speed up thinks a little.
Note: all changes where applied modifying the views or adding index by
hand. There is still room for improvement but it would need more in
depth refactoring, mainly adding specialized computed fields to
enable a better batching.
The first issue was using branch.staging_ids
branch.staging_ids.sorted(lambda s: s.staged_at, reverse=True)[:6]
The number of staging_ids is increasing and prefetching + sorting all
of them is slow.
The proposed solution is to replace it by a search, not ideal, a
specialized compute field may be a good idea, but this is a quick fix
that can be done editing a view.
branch.env['runbot_merge.stagings'].search([('target', '=', branch.id)],order='staged_at desc', limit=6)
Other changes are just index on critical columns.
Before changes, /runbot_merge page takes ~5s to load
After changes, /runbot_merge page takes ~1s to load
Small note: note 100% sure that runbot_merge.batch.target was useful
The loggers would only print the "tail" of the path, not including the
repo name, or the `/repos` prefix.
While this made logs shorter, it was not intentional and made
debugging some issues on endpoints harder than necessary as the calls
had to be adjusted mentally, which is completely unnecessary.
1cea247e6c tried to improve staging
checks to avoid staging PRs in the wrong state, however it had two
issues:
PR state
--------
The process would reset the PR's state to open, but unless the head
was being resync'd it wouldn't re-apply the statuses on the state,
leading to a PR with all-valid statuses, but a missing CI.
Message
-------
The message check didn't compose the PR message the same way PR
creation / update did (it did not trim the title and description
individually, only after concatenation), resulting in a
not-actually-existing divergence getting signaled in the case where
the PR title ends or the description starts with whitespace.
Expand relevant test, add a utility function to compose a PR message
and use it everywhere for coherence.
Also update the logging and reporting to show a diff of all the
updated items (hidden behind a `details` element).
If there are bump PRs anyway: the bump commits will cause the
forward-port of the staging to fail, so might as well clearly notify
everybody of the issue if there is a pending staging, and not waste
too much time waiting for a staging which can not succeed.
We could also cancel stagings when there's no bump PR, but it's not
clear that there's any reason to do so: if we didn't touch any master
branch, there's no reason for the staging to fail, or to otherwise
cancel it.
And obviously we can't have staged anything on the new branch so
there's nothing to cancel.
Part-Of: #718
I DECLARE BANKRUPTCY!!!
The previous implementation of labels lookup was really not
intuitive (it was just a char field, and matched labels by equality
including the owner tag), and was also full of broken edge
cases (e.g. traceback if a label matched multiple PRs in the same repo
because people reuse branch names).
Tried messing about with contextual `display_name` and `name_search`
on PRs but the client goes wonky in that case, and there is no clean
autocomplete for non-relational fields.
So created a view which reifies labels, and that can be used as the
basis for our search. It doesn't have to be maintained by hand, can be
searched somewhat flexibly, we can add new view fields in the future
if desirable, and it seems to work fine providing a nice
understandable UX, with the reliability of using a normal Odoo model
the normal way.
Also fixed the handling of bump PRs, clearly clearing the entire field
before trying to update existing records (even with a link_to
inbetween) is not the web client's fancy, re-selecting the current
label would just empty the thing entirely.
So use a two-step process slightly closer to the release PRs instead:
- first update or delete the existing bump PRs
- then add the new ones
The second part is because bump PRs are somewhat less critical than
release, so it can be a bit more DWIM compared to the more deliberate
process of release PRs where first the list of repositories involved
has to be set up just so, then the PRs can be filled in each of them.
Fixes#697
In order to support partial freezing, we need the ability to remove
some of the release lines for the repos we don't want to
freeze (e.g. because they don't use per-version branches).
This subsequently means we need the ability to *create* new lines if
we fucked up and removed one we should not have. Alternatively the
freeze meat-bot could cancel the entire thing and redo the wizard but
that seems harsh and mean, so don't do that.
Fixes 0f3647b7c7 which specifically
mentioned partial freeze then proceeded to make them entirely
impossible anyway.
Part of #718
Previously the mergebot would only sync the head commit, but synching
more is useful.
Also update the final sanity check on staging:
- as with check, update the message & target branch
- reset PR state and post a message when updating message instead of
doing so silently
Note: maybe only fail the staging if the message is updated *and*
relevant to staging (aka there's a merge method and it's not
`rebase`)?
Fixes#680
Was missing a logging message in the case where the current and sync'd
head are identical, which seems to occur from time to time but can
only be inferred (by seeing a sync event then nothing happening).
Add a logging warning (because it's a strange situation) in order to
explicitely note the issue.
Also make the sync logging messages more regular for clarity.
And add the delivery information (delivery id and user-agent) to event
log, so it's more possible to report issues to github.
After review, there doesn't seem to be a single integer field created
by the mergebot or fortwardbot modules for which a `group_operator`
makes sense, let alone the default of `sum`.
So just disable them all.
Fixes#674
If commits have different authors (/ committers), the mergebot would
ask github to create a commit with an author (/ committer) of `None` /
`null`.
Apparently github really does not like that, and complains that
> nil is not an object
So remove the key entirely. Also fix the collision between `author`
and the `Co-Authored-By` list, which could lead to trying to set an
`author` of `[name, email]` instead of an object, which is also not
accepted by github.
Fixes to the new bits which didn't really work:
- Fix borked view layout
- Add some help to the label fields
- Improve the resolution of label -> pr, and fix
- Also make the feature actually work for bump PRs
- Also make pr -> label work more reliably, now allows setting one PR
and getting the other PRs of the same batch (with the same label)
even without setting the label by hand
An autocomplete for the label has been considered but there is no
autocomplete field for char/selection fields, and it seems way too
much work for the utility:
- either create a brand new widget for 15.0 which will have to be
entirely rewritten in 16
- or create a transient model composed entirely of fake records to
provide an m2o to records which don't actually exist as label
bearers, which is also a lot of unnecessary work
NOTE: we want to support partial freezing (aka not freeze all the
branches because some of them have different release models
than others), so some project repos *not* having a release
PR is fine and normal, such a validation should not be added.
Fixes#664
In case where the last branch (before the branch being frozen) is
disabled, the forwardport inserter screws up, and fails to correctly
create the intermediate forwardports from the new branch.
Also when disabling a branch, if there are FW PRs which target that
branch and have not been forward-ported further, automatically
forward-port them as if the branch had been disabled when they were
created, this should limit data loss and confusion.
Also change the message set on PRs when disabling a branch: because of
user conflicts in test setup, the message about a branch being
disabled would close the PRs, which would then orphan the followup,
leading to unexpected / inconsistent behaviour.
Fixes#665
The `statuses` field of a staging is always "live" because it's a
computed non-stored field. This is an issue when a staging finishes in
whatever state, then someone gets new statuses sent on one of the head
commits, either by rebuilding (part of) the staging or by just using
the same commit for one of their branches.
This makes the reporting of the main dashboard confusing, as one might
look at a failed staging and see all the required statuses
successful. It also makes post-mortem analysis more complicated as the
logs have to be trawled for what the statuses used to be (and they
don't always tell).
Solve this by storing a snapshot of the statuses the first time a
staging moves away from `pending`, whether it's to success or failure.
Fixes#667
In the branch lists of stagings, the timestamps in the left column and
the labels in the data cells can not be selected, because they're
buttons and anyway bootstrap explicitly sets
.btn {
...
user-select: none;
}
This can be frustrating, as timestamps and labels are useful
information to cross-reference, the ability to copy them is
convenient.
Custom-set the reverse via our own CSS.
Fixes#668
Partially revert 0c882fc0df
This turns out to be more bane than boon, as it breaks forward-port
chains and confuses people (despite the message). Update notification
message and don't close the PR anymore.
While at it, disable any pending staging on the branch being deactivated.
Fixes#654
af016f4239 did a half-assed job and
didn't fix the one test which actually checks the dashboard.
TBF I was in a bit of a hurry trying to make the mergebot work and be
presentable again, but still...
15.0 (or 14.0) dropped some of the BS3 (?) compatibility stuff, which
the mergebot was (apparently) relying on. This lead to a visual
degradation as well as the frontend dropdown looking absolutely awful.
Fix that, on both style and templates.
15.0 (or 14.0) also dropped the bespoke responsive utility classes,
switch to bootstrap's.
Turns out I was running "15.0" except just on the runbot, enterprise
and community were still the 14.0 repos, so some of the changes were
missing.
While at it, bundle fixes for 3.10, as that's what Jammy needs, and
the mergebot/15.0 will be running on that.
Test seems to fail from time to time with one of the PRs getting
lost. Tried to move code around trying to investigate, can't repro
anymore. Possibly a race condition because the `to_pr` call was
performed too early, before the webhook had run (and thus before the
PR object had been created on the odoo side).
By moving the `to_pr` calls to after the cron run, we really ensure
the webhooks will have run.
Also update `to_pr` to ensure exactly one PR was retrieved, as
currently nothing is checked so we might have gotten none (yet), which
should be noticed early and clearly. In theory this also guards
against multiple PRs, but PRs should be unique on (repo, number).
- Some batches in a few stagings are apparently empty (e.g. batch
71771 from staging 32511), the listing was not resilient to such
issues.
Update the code to suppress the display of empty batches.
- The possibility of accessing `/runbot_merge/<id>` with a
non-existent branch had not been considered and triggered a
rendering error in the template.
Fixes#630, fixes#631
- override the staging's name_get to provide a slightly more useful
display_name (though still not great as the staging object remains
quite technical and inimical to human interaction)
- show individual PRs in a batch (as m2m tags) for readability
- update PR views to show the author and reviewer, except in the list
of delegations of users where it's a lot less useful
/cc #632
Currently deactivating a branch kinda leaves users in the dark, with
little way to know what has happened aside from inferring it from the
branch having disappeared from the main dashboard.
- surface the state of the branch in the PR dashboard (also surface
the target branch at all so users can see if their PR is targeted
as they expect as far as the mergebot is concerned)
- close & notify every PR to a branch being deactivated
- cancel any current staging to the branch (as a consequence of the
above)
Closes#632
Previously if a branch name could not be line-broken (because it was
full of underscores) it would break the layout by making "its"
staging's column much wider than the expected 1/2 / 1/4 /
1/6 (depending on window width), compressing the width of its
sibling's columns.
By disabling content-based width (only taking in account flex-width)
and setting overflow to hidden, the overlong branch names get cut off
instead.
- if stderr has been rerouted or explicitely rerouted to STDOUT,
`e.stderr` is `None` and the error reporting blows up (which is
inconvenient). Handle this case.
- handle the case where `fp_github_name` has not been configured (it's
not a super useful handling but meh, apparently git/hub doesn't
really care if there's a username when using API tokens)
- minor improvement to the refline parser: per-spec, the trailing
newline is optional, so don't fail if it's missing
The previous version of the code assumed `pr['body']` is always a
string, which is not correct, when the PR body is emptied the body
itself is removed (its value is `None`).
Add a case for this in the PR edition test, and avoid blowing up (or
adding empty newlines) when the PR body is empty. For PR creation this
issue was fixed in c2db5659d8 but
apparently I missed that the exact same issue occurs just a few lines
above.
Also turns out github does *not* send change information when the body
is updated from (or to?) `None`, so don't even bother with that, just
check every time if the overall message has been updated.
Fixes#629
Stop *staging* release PRs: they are normally fairly simple and should
not fail their staging outside of unreliable tests (or possibly a few
edge cases e.g. forgot one version change thing), however staging them
creates the possibility of a "version hole" on the release branch
which is undesirable.
Instead, immediately and unconditionally push the release commits onto
the newly created branches, if there are things which don't work they
can be fixed afterwards (and the process refined, maybe).
Also add the same feature for *bump* PRs, with the difference that the
bump PRs are not created / requested by default (they have to be opted
in individually).
For convenience, add a feature which automatically finds the PRs via
inputting the label (not really tested yet).
Closes#603
Because the searching of the PR occurs *right* after the PR was
created on the server, despite the additional operations (status,
approval) it's apparently possible for the lookup of the new PR to
occur about the same time the PR is being created, kinda, maybe?
On DS it triggers very reliably for every PR but the first. By moving
the retrieval after the repo timeout & we've run the crons, we near
guarantee the PRs are visible (it's possible for things to fail on
grounds of github reliability, but then they'd have failed *even more*
before).
- add a logging entry for PR updates
- change the generic log entry to log the sender (of the event) rather
than the PR author
- fix the post-facto PR loader to more systematically and reliably set
a `sender`
It's likely that the PR pages are seen more commonly than the
dashboard by most users, so add alerts there in case users wonder
what's happening.
Fixes#580
The log message only indicated whether the PR was squashed or not, but
that's not actually useful.
Improve the message to log the actual merge method, for
information. The old "squash" (aka squash flag set and no merge
method, since an actual squash merge method was reintroduced a while
ago) has been renamed to "single" for the purpose of this display.
Before this, until the first status for a required context the status
would appear as pending, but would be have oddly (e.g. not clickable).
Update the style of such statuses for clarity:
- use a light background to show them as inactive
- use the `wait` cursor to show their status as oddball (and not clickable)
Setting this styling on the link (or even `li`) doesn't seem to work,
so set it on the `ul`, the actual active links will set the relevant
"active" cursor instead, which seems to work fine.
While at it, extract the status menu to its own template and unify the
disparate bits, mainly in that both the main dashboard and the
per-branch list display the staging instant in UTC on hover: before,
the main dashboard would display a relative delta and provide the
UTC-formatted instant on hover, but the branch would only show a zoned
ISO-8601 instant.
While adjusting is easy, it's unnecessary, we can easily provide the UTC
staging instant there).
Old messages were quite inconsistent in their pinging of the PR author
and reviewer.
Reviewed messages (probably missed some but...) and try to more
consistently ping when the feedback requires some sort of action in
order to proceed.
Fixes#592
A few fixes and improvements after testing the feature:
- ensure the provisioned users are created as internal (not portal)
- assume oauth is installed and just crash if it's not
- handle a user not having an email (ignore)
- return value from json handler, otherwise JsonRequest sends no
payload which is *weird*
Stagings would be cancelled automatically if the PR's commits were
updated, but not if the target (base) was changed, even though that
has a drastic impact on staging.
Add hooks to unstage PRs if their base is updated, or if their message
is updated and relevant to staging (merge or rebase-merge methods).
Fixes#604
Extract the creation of a PR link (to github) as a dedicated
template for easier updates, and to use `display_name`
everywhere (instead of reimplementing it by name).
Also implement support for repo-level groups setting for information
hiding.
Fixes#590
When a staging's fast-forward (to the target branch) fails, the
mergebot would provide no useful information on the staging or the
dashboard.
This is because the reason was set to the HTTP status, which in case
of a fast-forward error is just "422 client error: unprocessable
entity".
Improve this by trying to parse github's response in that case, and
using the JSON error message as failure reason. This provides more
useful failure information like "update is not a fast forward",
"reference does not exist", or a branch protection failure.
Closes#591
- code in the various menus added over time through the UI (queues,
configuration, ...)
- update / improve PR layout a tick
- fix "outstanding forward ports" count on the dashboard
- improve hover title / help on dashboard
- add date of last modification (usually date of success / failure)
- make casing more coherent (everything lowercase)
- add explicit note that UTC date on staged at label is staged at datetime
- rediscover yet again that the staging information is when hovering
on the staging *except the staged at label*
- improve `PullRequest.unstage` to always insert the PR at the start of the
reason when cancelling the staging, for clarity / traceability
Closes#560, closes#609
New accounts endpoint such that the SSO can push new pre-configured
users / employees directly. This lowers maintenance burden.
Also remove one of the source partners from the merge test, as
ordering seems wonky for unclear reasons leading to random failures of
that test.
On #509, the rebasing process was changed to forcefully update the
commit date of the commits, in order to force trigger builds.
However when squashing was re-enabled for #539 for some fool reason it
implemented its own bespoke rebasing (despite that not actually saving
any API call that I can see), meaning it did *not* update the commit
date. As such, an old commit being squashed would not get picked up by
the runbot, which is what happened to odoo/documentation#1226 (which
ultimately had to be hand-rebased after some confusion as to why it
did not work).
Update `_stage_squash` to go through `rebase` the normal way, also
update `rebase` to pop the commit date entirely instead of setting it
manually, and update the squashing test to check that the commit date
gets properly updated.
Fixes#579, closes#582
Multiple fixes to various issues of the freeze wizard.
Some of the less fix-oriented sub-tasks were moved to #586 for discussion instead.
Closes#559, #587
Currently limited to release/freeze PRs: it can be difficult to be
sure the right PR was selected then, and a mistake there seems more
impactful than in the PRs being waited for?
Note: adds a test to make sure I don't break the check that all
release PRs must have the same label (be linked). This was
already safe, and in a way this PR adds convenience but not
really safety, but better sure than sorry.
- add flag to not select repos for freezing
- allow removing more repositories from the wizard
- when performing the freeze, only create branches for the selected
repos
The freeze wizard was implemented using a single action to open and
validate the dialog.
This was a mistake, as it means if there are no errors left (e.g. all
the PRs being waited for are now validated) trying to view the freeze
wizard will immediately validate it and commit the freeze, which is
unexpected, surprising, and unsafe e.g.
- open wizard
- add freeze prs
- add a required pr or two
- close and go do something else
- be told that more PRs need to be waited for
- reopen wizard
- oops freeze is done
So split the "open action" part of `action_freeze` into opening the
action and performing the freeze. The "freeze" / "view freeze" button
on the project only activates the latter, and the actual freeze
operation is only triggered from the wizard's "Freeze" button.
Part of #559.
Provides a less manual interface for creating the freeze:
* takes the name of the branch to create
* takes any number of PRs which must be part of the freeze
* takes PRs representing the HEADs of the new branches
Then essentially takes care of the test.
Implementation of the actual wizard is not trivial but fairly
straightforward and linear, biggest issue is not being able to
`project_id.branch_ids[1]` to get the new branch, not sure why but it
seems to ignore the ordering, clearing the cache doens't fix it.
When creating the branches, add a sleep after each one for secondary
rate limiting purposes. Same when deleting branches.
Also the forwardbot has been updated to disable the forwardport cron
while a freeze is ongoing, this simplifies the freezing process.
Note: after recommendation of @aab-odoo, tried using `_applyChanges`
in `_checkState` but it simply did not work: the two relational fields
got completely frozen and were impossible to update, which is less
than ideal. Oh well, hopefully it works well enough like this for now.
Could not reproduce it in a shell, but in the original version
`self.env.cr.rowcount` would always be 0 after the `modified`.
Turns out the check is really completely dumb, because if we got any
match in `select for update` we're going to find the same on in the
update, and thus the conditional is unnecessary. I've no idea why I
did that.
Anyway remove the conditional and just always try to unstage the PR.
To prep for the addition of the freeze wizard:
* move projects out of `pull_requests.py`
* then realize half the methods there have no relation to projects and
move them to more relevant places in `pull_requests.py`
* update corresponding crons (and tests using those crons) as the
methods have changed model, and the cron definitions thus need to be
updated
* split update to labels out of sending feedback comments while at it:
labels are not used much during tests so their manipulation can be
avoided; and labels are not as urgent as feedback so the crons can
be quite a bit slower
* move the project view out of `mergebot.xml` as well
Following #531 reviews from reviewers without an email set are
rejected.
For delegates this isn't very helpful, however for specifically
configured reviewers we can warn the configurer that they need to set
an email for things to work out.
* Adds a changelog page, linked from the main, with content
automatically loaded from the source. To avoid conflicts, each entry
is its own file and entries are grouped by the month during which
the update will (probably) be deployed
* The last group (most likely "last update") doesn't have a title, the
rest do.
* Add changelog entries from the last update so it's not too empty.
* Also update the layout for the alerts a bit: remove bottom margin to
reduce loss of whitespace.
When a commit is lacking the purpose (?) tag e.g. `[FIX]`, `[IMP]`,
..., a normal commit message of the form `<module>: <info>` marches
the looks of a git pseudo-header.
This results in a commit rewrite rejiggering the entire thing and
breaking the message by moving the title to the pseudo-headers and
mis-promoting either the `closes` line of body content to "title",
resulting in a really crappy commit message
e.g. odoo/odoo@d4aa9ad031.
Update the commit rewriting procedure to specifically skip the title
line, and re-inject it without processing in the output.
Fixes#540
Re-introduce a "squash" mode solely for the purpose of fixing up
commit messages without having to go and edit them: for now "squash"
only works for single-commit PRs, acts as a normal
integration (`rebase-ff`) *but* replaces the message of the commit
itself by that of the PR, similar to the `merge` modes.
This means maintainers can update commit messages to standards by
editing the PR description (though this is obviously sensible to
edition races with the original author).
Fixes#539
The list of outstanding forwardports was pretty messy as the ordering
was unclear and there was little way to really drill into the thing.
* Shows outstanding forward ports sorted by merged date ascending, the
oldest-merged PRs are the ones most in need of fixing while PRs
which were only just merged can safely be ignored.
* List reviewers with outstanding forward-ports, allow filtering by
clicking on their name, allow deseleting through the subtitle of the
page.
* Don't display reviewer in list when page is already filtered by
reviewer.
Also improve PR page a bit:
* Add reviewer.
* Add direct link to backend (closes#524).
Closes#529
Because only the first staging failure is considered "hard" and will
put the PR in error, when looking at staging logs it's possible to see
the same PR get staged over and over and over again, which is quite
confusing.
To make the logs less weird, always log a staging failure even when it
doesn't put the PR in error. Sadly this can't be tested as `capsys` is
not able to intercept an stderr inherited by a child process (capfd
doesn't work either).
Fixes#527
If a reviewer doesn't have an email set, the Signed-Off-By is an
`@users.noreply.github.com` address which just looks weird in the
final result.
Initially the thinking was that emails would be required for users to
*be* reviewers or self-reviewers, but since those are now o2ms / m2ms
it's a bit of a pain in the ass.
Instead, provide an action to easily try and fetch the public email of
a user from github.
Fixes#531
Since we have the model fields loaded up, we can just check into that
and assume anything that's not a field is a method.
That avoids having to go through `_call`, making things way less awkward.
After internal discussions it was concluded that this didn't extend
much more trust than allowing authors to accept their single-PR
commits without additional supervisions, and it would avoid some
inconveniences and PR-blocking.
Fixes#69 (nice)
Because sometimes github updates are missed (usually because github
never triggers it), it's possible for the mergebot's view of a PR
description to be incorrect. In that case, the PR may get merged with
the wrong merge message entirely, through no fault of the user.
Since we already fetch the PR info when staging it, there's very
little overhead to checking that the PR message we store is correct
then, and update it if it's not. This means the forward-port's
description should also be correct.
While at it, clean the forward port PR's creation a bit:
- there should always be a message since the title is required on
PRs (only the body can be missing), therefore no need to check that
- as we're adding a bunch of pseudo-headers, there always is a body,
no need for the condition
- inline the `pr_data` and `URL`: they were extracted for the support
of draft PRs, since that's been removed it's now unnecessary
Fixes#530
The page of PRs in "error" is currently kinda broken: it does not show
any feedback aside from the PR being in error which is not very
useful.
The intent was always to show an explanation, but when adding the page
I just deref'd `staging_id` which always fails though in two different
ways:
* when the PR can not be staged at all (because of a conflict) there
is no staging at all with a reason to show, so there should be
a fallback that the PR could not even be staged
* `staging_id` is a related field which deref's to the staging_ids
of the first *active* batch, except when a staging completes
(successfully or not) both staging and batch are disabled.
Plus the first batch will be the one for the first staging so if the
PR is retried and fails again the wrong reason may be displayed.
So update the section to show what we want: the reason of the
staging of the *last* batch attached to the PR.
NOTE: there's one failure mode remaining, namely if a staging fails
then on retry the PR conflicts with the new state of the
repository (so it can't be staged at all), the "reason" will
remain that of the staging. This could be mitigated by attaching
a "nonsense" batch on failure to stage (similar to the
forwardport stuff), that batch would have no staging, therefore
no staging reason, therefore fallback.
Closes#525
Draft was added in 82174ae66e but turns
out the v13 ORM is not able to create a required column (even when
given a default value), at least for booleans.
So create it by hand.
On staging failure, the 'bot would point to the first error or failure
status it found on the commit. This turns out not to be correct as
we (now) have various statuses which are optional, and may fail
without blocking stagings (either because they're solely informational
or because they're blocking & overridable on PRs).
Fix this so the 'bot points to the first *required* failure.
Fixes#517
* Remove the forwardport creating PRs in draft, that was mostly to
avoid codeowners triggering but we've removed the github one and
hand-rolled it, so not a concern anymore.
* Prevent merging `draft` PRs, the mergebot rejects approval on draft
PRs and insults people.
TBD (maybe): try to create *conflicting* forward-port PRs in draft so
it's clearer they need to be *fixed*? Issue of not being able to do
that on all private repositories remains so~~
Fixes#500
"Uniquifier" commits were introduced to ensure branches of a staging
on which nothing had been staged would still be rebuilt properly.
This means technically the branches on which something had been
staged never *needed* a uniquifier, strictly speaking. And those lead
to extra building, because once the actually staged PRs get pushed
from staging to their final destination it's an unknown commit to the
runbot, which needs to rebuild it instead of being able to just use
the staging it already has.
Thus only add the uniquifier where it *might* be necessary:
technically the runbot should not manage this use case much better,
however there are still issues like an ancillary build working with
the same branch tip (e.g. the "current master") and sending a failure
result which would fail the entire staging. The uniquifier guards
against this issue.
Also update rebase semantics to always update the *commit date* of the
rebased commits: this ensures the tip commit is always "recent" in the
case of a rebase-ff (which is common as that's what single-commit PRs
do), as the runbot may skip commits it considers "old".
Also update some of the utility methods around repos / commits to be
simpler, and avoid assuming the result is JSON-decodable (sometimes it
is not).
Also update the handling of commit statuses using postgres' ON
CONFLICT and jsonb support, hopefully this improves (or even fixes)
the serialization errors. Should be compatible with 9.5 onwards which
is *ancient* at this point.
Fixes#509
Although it's possible to find what PR a commit was part of with a bit
of `git log` magic (e.g. `--ancestry-path COMMIT.. --reverse`) it's
not the most convenient, and many people don't know about it, leading
them to various debatable decisions to try and mitigate the issue,
such as tagging every commit in a PR with the PR's identity, which
then leads github to spam the PR itself with pingbacks from its own
commits. Which is great.
Add this information to the commits when rebasing them (and *only*
when rebasing them), using a `Part-of:` pseudo-header.
Fixes#482
Proper attribution is important in general, but especially for
external contributors. Before this change, and the previous change
fixing authorship deduplication, it was rather easy for a "squashed"
conflict commit (attributed to the 'bot for lack of a really clean
option) to get merged by mistake.
This commit changes two things:
* The mergebot now refuses to stage commits without an email set, the
github API rejects those commits anyway so any integration mode
other than `merge` would fail, just with a very unclear error
* The forwardbot now creates commits with an empty author / committer
email when the pull request as a whole has multiple authors /
committers. This leverages the mergebot update.
Also clean up the staging process to provide richer error reporting
using bespoke exceptions instead of simple assertions. I'm not sure
we've ever encountered most of these errors so they're really sanity
checks but the old reporting would be... less than great.
Fixes#505
If a PR is closed on github and unknown by the mergebot, when fetched
it should be properly sync'd as "closed" in the backend, otherwise the
PR can get in a weird state and cause issues.
Also move the "I fetched the thing" comment before the actual creation
of the PR for workflow clarity, otherwise the reader has the
impression that the 'bot knew about the PR then fetched it anyway.
And improve savepoint management around the fetching: savepoints
should be released in all cases.
Closes#488.
If two PRs have the same label *in different projects entirely*, the
mergebot should not consider them to be linked, but it did as shown by
the warning message on odoo-dev/odoo#905 (two PRs created from the
same branch in different projects were seen as linked by the status
checker).
3b417b16a1 fixed the actual staging
selection, it's only the warning which did not properly segregate PRs.
Only group PRs which target the same branch (therefore are within the
same project).
Fixes#487
Previously, a PR's status page would only show the linked / related
PRs when `open`.
Since the relations between PRs remains useful, also make this
information available during staging and after merging.
Fixes#463
* in the main dashboard, show the exact UTC timestamp (with a Z
marker) on hover, not just the relative delta
* in the branch details page, show the full timestamp, zoned, in
ISO-8601 format
If a PR got merged to master (or whatever the current development
branch is), there's no easy way to know what maintenance branch it
ended up landing in, except by asking git which branches contain the
commit (which can be rather slow).
Add a special case on merge which labels the PR with a pseudo-branch
patterned after the second-to-last branch of the project:
* if the branch ends with a number, increment the number by one
e.g. 2.0 -> 2.1, 5 -> 5.1
* otherwise, just prefix with `post-` e.g. "maint" ->
"post-maint" (that one doesn't sound very helpful, but I guess it's
nice for the weirdoes who call their branches "natty narwhal" and
shit)
Fixes#450
5cf3617eef intended to create merge
messages with only the content of PR descriptions before the first
thematic break. However this processing was incorrectly applied
to all messages being processed (meaning rebased / squashed commit
messages as well).
Properly apply thematic break processing to only commit messages
created from PR descriptions.
Before this, we would "roughly" select stagings by looking at stagings
whose heads matched a specific sha then validating them all. This
could perform extra validations on stagings once in a while but this
was assumed not to be much an issue, at least originally.
However two changes later on have contributed to this likely being the
cause of #429 (stagings never timing out):
* heads of the staging branches are uniquifier commits stored in the
heads map, but the actual heads of the stagings are also stored
there, some of which are no-ops (hence the uniquifiers) so assuming
repos A and B, if a staging contains PRs touching A then the head of
B actual will also be a head of B
* when a staging is validated, if it *contains* any pending result the
timeout limit gets bumped back
The issue here is that if a success / failure status is lost (which
would be the most common reason for timeouts) *and* someone has forked
and is regularly rebuilding a branch-head used as-is by a staging,
each of those rebuilds will trigger a validation of the staging, which
will find that one of the statuses is still pending (because we missed
the success / failure), which will bump up the timeout limit,
continuing until the branch stops getting rebuilt.
This is probably one of the reasons why some stagings last for *way*
more than 2h, though it is far from explaining all of them: 90% of the
stagings lasting more than *3*h end up succeeding. Tho it's always
possible that this is because someone notices and resends a success
for the missing status it seems somewhat doubtful. Oh well.
Also fix the incorrect log call on `update_timeout_limit` triggering.
I'd forgotten that in order to better handle cases where the CI is
highly backed up (and / or slow for some reason), we actually update
the CI timeout to really take the last "pending" status as the "true
start" of the CI. This might explain why lots of stagings needed extra
time: as of right now, out of 28835 stagings
- 20086 had their timeout bumped by more than 15mn
- 6967 had their timeout bumped by more than 30mn
- 264 had their timeout bumped by more than 1h
- 30 had their timeout bumped by more than 2h
Add some logging every time the CI is bumped this way, so we have
better visibility into that event.
Closes#429
The mergebot has a feature to ping users when an approved PR or
forward-port suffers from a CI failure, as those PRs might be somewhat
unattended (so the author needs to be warned explicitly).
Because the runbot can send the same failure information multiple
times, the mergebot also has a *deduplication* feature, however this
deduplication feature was too weak to handle the case where the PR has
2+ failures e.g. ci and linting as it only stores the last-seen
failure, and there would be two different failures here.
Worse, because the validation step looks at all required statuses, in
that case it would send a failure ping message for each failed
status *on each inbound status*: first it'd notify about the ci
failure and store that, then it'd see the linting failure, check
against the previous (ci), consider it a new failure, notify, and
store that. Rinse and repeat every time runbot sends a ci *or* lint
failure, leading to a lot of dumb and useless spam.
Fix by storing the entire current failure state (a map of context:
status) instead of just the last-seen status data.
Note: includes a backwards-compatibility shim where we just convert a
stored status into a full `{context: status}` map. This uses the
"current context" because we don't have the original, but if it was a
different context it's not going to match anyway (the target_url
should be different) and if it was the same context then there's a
chance we skip sending a redundant notification.
Fixes#435
Before this change, a CI override would have to be replicated on most
/ all forward-ports of the base PR. This was intentional to see how it
would shake out, the answer being that it's rather annoying.
Also add a `statuses_full` computed field on PRs for the aggregate
status: the existing `statuses` field is just a copy of the commit
statuses which I didn't remember I kept free of the overrides so the
commit statuses could be displayed "as-is" in the backend (the
overrides are displayed separately). And while at it fix the PR
dashboard to use that new field: that was basically the intention but
then I went on to use the "wrong" field hence #433.
Mebbe the UI part should be displayed using a computed M2M (?)
as a table or as tags instead? This m2m could indicate whether the
status is an override or an "intrinsic" status.
Also removed some dead code:
* leftover from the removed tagging feature (removed the tag
manipulation but forgot some of the setup / computations)
* unused local variables
* an empty skipped test case
Fixes#439.
Fixes#433.
Currently when creating *merges* the commit message is the
concatenation of the PR title and the PR body.
However it can be convenient to include description test which would
not be included in the commit message e.g. PR template
stuff. "Thematic breaks" seem like a good way to do this: the commit
message will only include the content preceding the first thematic
break, everything else is ignored (except headings which are not
ignored, double negations FTW).
Might be that that's a bit excessive and we should only ignore what
follows the *last* thematic break. Or that there needs to be a more
specific marker. We'll see.
Fixes#443.
Because the commands were collected in a `dict[Command, Params]` a
command could only be present once *in the mergebot* (the forwardbot
already collected commands in a list).
* Update mergebot commands parser to collect commands in a list.
* Improve override to allow a comma-separated list of CIs to override.
* Simplify the parsing of delegate to generate one delegate command
per target (slightly less efficient if multiple logins are provided
but that is likely extremely rare).
Fixes#445
Because github materialises every labels change in the
timeline (interspersed with comments), the increasing labels churn
contributes to PRs being difficult to read and review.
This change removes the update of labels on PRs, instead the mergebot
will automatically send a comment to created PRs serving as a
notification that the PR was noticed & providing a link to the
mergebot's dashboard for that PR where users should be able to see the
PR state in detail in case they wonder what's what.
Lots of tests had to be edited to:
- remove any check on the labels of the PR
- add checks on the PR dashboard (to ensure that they're at least on
the correct "view")
- add a helper to handle the comment now added to every PR by the 'bot
- since that helper is needed by both mergebot and forwardbot, the
utils modules were unified and moved out of the odoo modules
Probably relevant note: no test was added for the dashboard
ACL, though since I had to explicitly unset the group on the repo used
for tests for things to work it looks to me like it at least excludes
people just fine.
Fixes#419
Convert overridable CI to an m2m from partners, it's significantly
more convenient to manipulate as multiple users can (and likely will)
have access to the same overrides, add a name_search so the override
is easy to find from a partner, and provide a view for the
overrides (with partners as tags).
Also make the repository optional on CI overrides.
Fixes#420
When retrieving unknown PRs, the process would apply all comments,
thereby applying eventual r+ without taking in account their
relationship to a force push. This means it was possible for a
mergebot-unknown PR to be r+'d, updated, retargeted, and the mergetbot
would consider it good to go.
The possible damage would be somewhat limited but still, not great.
Sadly Github simply doesn't provide access to the entire event stream
of the PR, so there is no way to even know whether the PR was updated,
let alone when in relation to comments. Therefore just resync the PR
after having applied comments: we still want to apply the merge method
& al, we just want to reset back to un-approved.
An other minor fix (for something we never actually hit but could):
reviews are treated more or less as comments, but separate at github's
level. The job would apply all comments then all reviews, so the
relative order of comments and reviews would be wrong.
Combine and order comments and reviews so they are applied
in (hopefully) the correct order of their creation / submission.
Closes#416
Historically PRs to disabled branches were treated like PRs to
un-managed branches: ignored.
However because they cay *already exist* when the branch is disabled,
the effects can be subtly different, and problematically so
e.g. ignoring all PR events on PRs targeting disabled branches means
we can't close them anymore, which is less than great.
So don't ignore events on PRs to disabled branches (creation, sync,
closing, and reopening) but also send feedback on PRs to disabled or
un-managed branches to indicate that they're not merge-able.
Fixes#410
If we can't stage a PR, rather than immediately put them in error wait
until they were the first we tried staging, otherwise they might have
been conflicting with the previous batch which ultimately won't be
merged for other reason and they would have worked as part of the next
batch.
Note: this commit will lead to false negatives because it's
batch-based rather than repo-based, so if the first batch only affects
repo A and the second batch only affects repo B, if the second batch
triggers a merge error it should be rejected immediately (as it's
applied on a "pristine" repo B) but this change will just bump it to
the next staging.
fixes#209
On per-repo status configurations, convert the "branch_ids" filter to
a domain on branches. Since the selection is generally
binary (statuses either apply to the master branch or apply to
non-master branch) this avoids error-prone missed updates where we
forget to enable statuses pretty much every time we fork off a new
branch.
Fixes#404
Normally opening a PR against a disabled branch is like opening a PR
against a branch which is not configured at all: the PR id ignored
entirely.
However if the PR already exists then the state of the branch isn't
normally checked when interacting with the branch, and it is possible
to trigger its staging, at which point the staging itself will crash:
on a project the branches are `active_test=False` so they're all
visible in the form, but when repos go search()-ing for the branch
they won't find it and will blow up.
Solution: only try staging on branches which are active. Fixes
odoo/runbot#408. Also do the same for checking stagings.
And while at it, fix#409 by wrapping each checking or staging into a
try/except and a savepoint. This way if a staging blows up it should
move on to the next branch instead of getting stuck.
The "blocked" computation would not take branch targets in account, so
PRs with the same label targeting *different branches* (possible if
somewhat rare due to our naming conventions) could block one another,
despite really being unrelated.
Also fix up some messages:
* if a PR is blocked due to having no merge method, it should say
that, not "has no merge" (no merge what?)
* format un-managed branches as `$repo:$branch` in logging messages,
`$repo#$thing` is for issues / PRs and `$branch` alone can be very
unhelpful
Closes#407
When using @fw-bot close, a feedback would be created without a
message (rather than e.g. with an empty one). As a result, the
feedback-sending cron would crash, but not before having closed the
corresponding PR.
This would lead to closing the PR in a loop & spamming the logs with
tracebacks.
In Odoo 13, the cache middleware was modified to straight hit
`http.root` assuming it's the Odoo root object. When `http.root` is
replaced by a wrapping middleware, the entire thing blows up and shits
the bed.
Patch up by automatically delegating attribute accesses to the wrapped
application (which is probably Root), although why this is not just
folded into Root is getting less and less clear.
Seems to be a pretty long-standing issue but I'd not noticed it before
as it's rather rarely taken & our sentry remains rather blown to hell,
I only happened to stumble upon the issue in the logs.
There's no ``number`` attribute on the repository object (to which
``_load_pr`` belongs). We obviously want to use the number of the PR
we're currently loading.
Mistake in the statuses handling: the context is not sufficient to
uniquely identify a staging status as different repositories can get
the same status context (e.g. ci/runbot is present on all our
repositories).
This is only a visual problem, but the status dropdown on
stagings (both the dashboard and the branchwise listing) would reuse
one of the status with the context for all of them, leading to
incorrect links and misleading displays.
Fix by keying on (repo, context) instead, that's exactly why the
repository name was part of the status in the first place.
This is a regression due to the implementation details of
odoo/runbot#376: previously _parse_command would only yield the
commands it had specifically recognised (from a whitelist).
22e18e752b simplified the implementation
and (for convenience when adding new commands) now passes through any
command to the executor instead of skipping the unknown one.
But I forgot to update the executor to ignore unknown commands, so it
treats them as *failed* (since the success flag doesn't get set) and
assumes it's an ACL issue, so notifies the user that they can't do the
thing they never really asked for.
Add an end-case which skips the feedback bit for unrecognized
commands, which restores the old behavior.
Fixes#390
Currently it can be difficult to know why the mergebot refuses to
merge a PR (not that people care, they generally just keep sending new
commands without checking what the 'bot is telling them, oh well...).
Anyway knowing the CI state is the most complicated bit because the CI
tag only provides a global pass/fail for statuses but not a view of
specific statuses, and sometimes either the runbot or github fails to
notify the mergebot, leading to inconsistent internal states & shit.
By adding a tag per status context per PR, we can more clearly
indicate what's what.
Fixes#389
Apparently a long-running issue but not really a concern before the
new mergebot started sending a lot more statuses: stagings would show
a list of all statuses they received, including optional / irrelevant
statuses.
Get a list of required statuses and only show that on the staging
dropdowns.
Closes#387
Adds an `override` mergebot command. The ability to override is set on
an individual per-context per-repository basis, similar to but
independent from review rights. That is, a given individual may be
able to override the status X on repository A and unable to do so on
repository B.
Overrides are stored in the same format as regular statuses, but
independent from them in order to persist them across builds.
Only PR statuses can be overridden, statuses which are overridable on
PRs would simply not be required on stagings.
An alternative to implementing this feature in the mergebot would be
to add it to individual status-generating tools on a per-need
basis.
Pros of that alternative:
* display the correct status on PRs, currently the PR will be failing
status-wise (on github) but correct as far as the mergebot is
concerned
* remove complexity from the mergebot
Cons of that alternative:
* each status-generating tool would have to implement some sort of ACL
system
* each status-generating tool would have to receive & parse PR
comments
* each status-generating tool would have to maintain per-pr state in
order to track overrides
Some sort of helper library / framework ought make that rather easy
though. It could also be linked into the central provisioning system
thing.
Closes#376
Requirement for odoo/runbot#376: one can't expect there being someone
to override CI checks on stagings, so it only makes sense for checks
on PRs, which in turns requires that there could be checks only
required on PRs.
Could also be useful for features like incremental linting /
formatting, we may want to apply checks on PRs which filter on the
lines modified, but not require the entire software be reformatted at
once.
Having the required statuses be a mere list of contexts has become a
bit too limiting for our needs as it doesn't allow e.g. adding new
required statuses on only some branches of a repository (e.g. only
master), nor does it allow putting checks on only branches, or only
stagings, which would be useful for overridable checks and the like,
or for checks which only make sense linked to a specific revision
range (e.g. "incremental" linting which would only check whatever's
been modified in a PR).
Split the required statuses into a separate set of objects, any of
which can be separately marked as applying only to some branches (no
branch = all branches).
Fixes#382
While the head gets updated (properly), the squash flag did not which
could lead to odd results. Since a PR can only be reopened if it was
regular-pushed to (not after a force push) there are two scenarios:
* the PR updated to have 0 commits, closed, pushed to with one commit
then reopened, after reopening the PR would be marked as !squash and
would ask for a merge method (that's what happened with
odoo/odoo#51763)
* the PR has a single commit, is closed, pushed to then reopened,
after reopening the PR would still be marked a squash and potentially
straight rebased without asking for a merge method
Nothing would break per-se but both scenarios are undesirable.
Close#373
When it updated tagging e82de3136b also
incorrectly replaced a `pr` by `pr.display_name`, probably leftover
from an attempt to update a callsite from `str(pr)` to
`pr.display_name` which I missed when reverting that.
Anyway at that section, `pr` is an integer (as it comes from an SQL
query) not an object.
The logic of the partner merge wizard is to collect all relevant data
from source partners, write them to a destination partner, then remove
the sources.
This... doesn't work when the field in question has a UNIQUE
constraint (like github_login), because it's going to copy the value
from a source onto a dest which will blow the constraint, and so the
copy fails. In that case the user first has to *move over* the unique
field's value then they can use the wizard.
Just fix for the github login: take all sources, remove (and store)
their github logins, then write the login onto the dst.
An alternative would have been to *defer* the constraint, however:
* it only works on unique constraints, not unique indexes
* it requires the constraint to be declared DEFERRABLE
Closes#301
Up till now if an FF failed with an exception having neither cause nor
context the cancel reason would be an empty string.
Fallback on stringifying the exception itself as a last resort.
Currently the PR becomes successful-green as soon as CI fully passes
but before it's merged, which can be an issue as e.g. merging might be
delayed (there's no visible difference between "CI success" and
"staging merged") or it might ultimately failed (FF error).
Create an intermediate color for "successful" stagings which are still
pending merge.
Also add a fallback message for fast-forward errors instead of en
empty string.
Closes#308
Genericise runbot_merge's tagging (move states to the "UI" but only
store / manage actual tags), and remove forwardport.tagging as it's
now redundant.
Closes#232
approving a PR which failed CI should trigger a feedback message since
6cb58a322d (#158), the code has not been
removed and the tests still pass.
However fwbot r+ would go through its own process for r+ which would
explain why that feedback is sometimes gone / lost (cf #327 and #336).
* make fwbot r+ delegate to mergebot r+
* add dedicated logging for this operation to better analyze
post-mortem
* automatically ping the reviewer to specifically tell them they're idiots
* move the feedback item out of the state change bit, send it even if
it's a useless r+ (because it's already r+'d)
* add a test for forward-ports
Closes#327, closes#336
Remove original-signed-off-by, doesn't actually seem useful given the
semantics of signed-off-by according to the kernel doc'. Plus it
didn't actually work as the intent was to keep the signoff of the
original PR in the forward-port, but that signoff is not part of the
commit we're cherrypicking (it gets added on the fly when the commit
is merged).
Therefore explicitly get the ack-chain into the PR: when merging an FP
PR, try to integrate the signoff of the original PR, that of the final
FP pr, and while at it that of the last explicit update in the commit
chain (e.g. in case there's been a conflict or something).
Fixes#284
* only provide fields which make sense for the mergebot
* provide formatting & searchability for review rights records so
they're visible from the list directly
This is more of a sanity check as it normally should not be a factor:
labels generally contain the target name, and staging checks are
performed per-target so we're not mixing multiple targets anyway.
But let's say a third-party creates a fix-foo branch for A and a
fix-foo branch for B, we want to ensure they're not considered batched
together.
Rather than try to fix up various bits where we search & all and
wonder what index we should be using, make the column a CIText.
For mergebot the main use case would be properly handling
delegate=XXX: currently if XXX is not a case-sensitive match we're
going to create a new partner with the new github login and
give *them* delegation, and the intended target of the delegation
isn't going to work correctly.
Also try to install the citext extension if it's not in the database,
and run the database-creation process with `check=True` so if that
fails we properly bubble up the error and don't try to run tests on a
corrupted / broken DB.
Fixes#318
As the odds of having more projects or more repos with different
requirements in the same project, the need to have different sets of
reviewers for different repositories increases.
As a result, rather than be trivial boolean flags the review info
should probably depend on the user / partner and the repo. Turns out
the permission checks had already been extracted into their own
function so most of the mess comes from testing utilities which went
and configured their review rights as needed.
Incidentally it might be that the test suite could just use something
like a sequence of commoditized accounts which get configured as
needed and not even looked at unless they're used.
Before this change, `r-` on a pr[p=0] does essentially nothing. At
most it will unstage if the PR had been (somewhat unnecessarily) r+'d
in the past but then the PR will get re-staged immediately.
To avoid this odd behaviour, if r- is sent to a p=0 PR not only is
the PR unreviewed (if it was reviewed) it always gets unstaged, and
its priority gets reset to 1 (high priority but doesn't bypass CI and
review). Also send a comment on that subject so followers of the pr
are notified.
Fixes#313
During freezes it can be useful to notify viewers that nothing is
going to forward port or merge for a while, and that this is
intentional (not something that's broken).
Fixes#307
The staging cron was already essentially split between "check if one
of the stagings is successful (and merge it)" and "check if we should
create a staging" as these were two separate loops in the cron.
But it might be useful to disable these two operations separately
e.g. we might want to stop the creation of new staging but let the
existing stagings complete.
The actual splitting is easy but it turns out a bunch of tests were
"optimised" to only run the merge cron. Most of them didn't blow up
but it seems more prudent to fix them all.
fixesodoo/runbot#310
The PR creation had been fixed to always validate even without a
commit found (in case there was no need for a commit), but the update
of a PR in such a situation was not tested, and thus naturally did not
work because why would it work if it wasn't tested?
Also remove the conditional skip on updating a PR to a new head.
The test was checking things would work properly with
required_statuses being an empty string, because I'd also forgotten an
empty field becomes stored as `False` in the database, so trying
things out live neither the PRs nor the staging would work as their
assumption that they could straight split the required_statuses would
always fail.
Update the test to better match expectations, and hopefully this is
the end of that saga.
PRs transitioning to 'ready' had been checked and tested but turns out
I had completely forgotten to test that stagings would validate
properly therefore of course they didn't.
The issue here was I'd forgotten `''.split(',')` returns `['']` rather
than `[]`, so on an empty required_statuses the staging validator
would keep looking for a status matching the context `''` and would
never find it, keeping the staging pending until timeout. So most
likely the problem could have been resolved by just adding a condition
to
[r.strip() for r in repomap[c.sha].required_statuses.split(',')]
but I'd already done all the rest of the reorganisation by that point,
test pass and I think it's a somewhat better logic. Therefore I'll go
with that for now.
* properly handle empty required_statuses during staging validation
* remove the final postcondition, if we're missing commits which don't
require any statuse we should not care
* expand test to include up to merging PRs
* automatically create dummy commits when creating stagings, that way
the relevant commits are in the database (can't hurt)
PS: an other alternative would have been to filter out or skip ahead
on commits which don't require any statuses aka cmap &
required_statuse / cmap would not even have that entry
Refactor the selection thingie, hopefully in a way which doesn't
absolutely crater performances, so that it's possible to explain the
reason why a PR is considered blocked.
Despite the existing dedup' sometimes the "xxx failed on this
forward-port PR" would still get multiplicated due to split builds
e.g. in odoo/odoo#43935 4 such messages appear within ~5 minutes, then
one more 10mn later.
This is despite all of them having the same "build" (target_url) and
status (failure). Since the description is the only thing that's not
logged I assume that's the field which varies and makes the dedup'
fail. Therefore:
* add the description to the logging (when getting a status ping)
* exclude the description when checking if a new status should be
taken in account or ignored: the build (and thus url) should change
on rebuild
Hopefully fixes#281
A while back I implemented name_get/display_name to print PRs using
the canonical github format (owner/repo#number), however looks like
some of the logging calls were still using bespoke formatting.
Moving statuses from project to repo was originally developed on 11,
but since the PR was only merged after the 13.0 update, the script
migration script should be moved to match.
The pytest suite had been partially unified between mergebot and
forwardport but because of session-scoped modules it could not run
across those.
Make the db cache lazy and able to cache multiple databases, and move
the "current required module" to function scoped, this way things
should (and seem to) work properly on runs involving mergebot & fwbot.
Next step: xdist! (need to randomise repo names for that, probably).
When an employee sadly leaves Odoo,
the Odoo production database (odoo.com) will call these routes
in order to remove the reviewer rights automatically.
So a user who no longer works for Odoo can't "r+" Github PRs.
This is related to odoo/internal#617
Pages take over from redirections which really is a pain in the ass
when trying to find out why the bloody redirection seemingly refuses
to work.
Note: can't use the record tag because homepage_page is marked as
noupdate, so we have to bypass the flag checking.
Interaction of CacheMiss and BaseModel is fucked, leading to an
infinite loop when trying to provide useful __str__ on a model (by
accessing model fields).
bs4 yields complete vomit on the template as-is (see:
https://imgur.com/a/XIMn7MX).
Add a bunch of color and styling overrides to get something closer to
the original, and move the existing styles to a "proper" scss file
while at it.
Using `modified` seems safer than just blowing the cache with respect
to stored computed fields depending on PR state (not sure there are
any but it's likely).
Previous version incorrectly browsed the PR *number* (rather than ID)
so at best it would do nothing and at worst it might go and notify the
wrong PR entirely.
Discussing #238 with @odony, the main concern was the difficulty of
understanding if things merged in one repo were related to things
merged in an other repo: currently, knowing this requires going to the
merged PR, getting its label, and checking the PRs with the same HEAD
in the other repository to see if there's a correlation (e.g. PRs
merged around the same time).
The current structure of the mergebot makes it reasonably easy to add
the other PRs of the batch in the pseudo-headers, such that we get
links to all "related" PRs in the head commit (and links back from the
commits which is probably less useful but...)
Fixes#238