If an untracked PR is closed, especially on an inactive or untracked
branch, the closer (or author) almost certainly don't care to receive
3 different notifications on the subject.
The fix requires a schema change in order to track that we're fetching
the PR due to a `closed` event, as in other cases we may still want to
notify the user that we received the request (and it just happened to
resolve to a closed PR).
Fixes#857
- correctly handle projects without a secret set, we don't want the
requests to blow up by trying to `strip()` a `False` or `None`, that
is dumb, who would do that?
- provide better reporting on signature mismatch: which repo we tried
to access, and the full list of headers
- log when there was no signature matching, either because there was
no signature in the request and no secret on the project, or because
the request is signed but no secret is configured on the repo
`gc --prune` can not take a *separate* parameter, it has to be part of
the same arg (the `=` is not optional), otherwise the `gc` call blows
up.
So use the positional form of the git command to generate the correct
invocation, Python-level `foo=bar` generates a split-style option in
two args which does not please git.
Before this, we would check if a repository had a name and run
maintenance on it, leading to repeated (but unnoticed until now
because I didn't monitor it) tracebacks as the maintenance cron would
fail to find the local repo then run maintenance on nowhere anyway.
Also augment the repo-finding process to try and get better
information about what it's doing when it fails, rather than failing
completely silently.
The signature validation code seems correct, but there are validation
failure in production, increase logging around webhook requests to
try and diagnose things better:
- dump the *entire* body to the github_requests logfile
- add the received & computed signatures to the log error
Turns out I've always been mistaken about the handling of quotes
*inside* shell parameters, apparently they are always consumed by the
shell unless nested so
--foo="bar"
reaches the underlying program as
--foo=bar
This means when using subprocess (without shell=True), adding the
quotes leads to mishandling of the parameters (as the subprocess now
has quotes it's not equipped to deal with).
This exact error is made in the `--pretty` parameter of git show,
locally this results in the author name and the committer email being
terminated by double quotes although somehow other layers seem to
exclude those from the end result (I assume `commit-tree` strips the
quotes from the envvars under the assumption that users can mistakenly
quote them or something?
Anyway while it does not seem harmful (so far), better safe than
sorry.
Add intermediate forks to a pair of tests, because github now (?)
requires being able to write on a branch to create a PR from it, so
the non-collaborator reviewers were not able to create a PR from a
branch created by user.
Github delivery delays keep getting worse. Depending on what comes
before `to_pr`, this leads it to fail more often as it runs before the
PR it's looking for was signaled to the mergebot.
In order to mitigate this issue, add a wait loop in `to_pr`, waiting
up to 4 seconds for the PR it's looking for before aborting.
Also replace manual lookups by `to_pr` in every method of
`TestPRUpdate` while at it since it hit a few of the issues. And
remove the xfail test case since it seems unlikely github will change
tack (maybe? could be worth testing to be sure).
Reverts commit 85a7890023 which
untrimmed the commits: while it's *probably* true that git and
github's APIs differ in their treatment of whitespace (in that git
pretty much always terminates the commit message with a newline while
github does not, as far as I understand, though I didn't really
validate it) the issue was that github also trims on *output* when
fetching over the API, something the fake did not do.
So rather than update the test data I should have fixed the fake, but
I failed to realise that at the time. I only realised when I decided
to re-run against github actual (something I rarely do anymore as it's
painfully slow) and it went on to choke on every message I'd updated.
The logging line was copied over from the github-api version, but it
was not correctly fixed up to match, leading to a lot of spam on
stderr when debug is enabled (aka spams journalctl on the production
server).
Splat the logging call out of `rebase` and into the various callers,
so they have access to the pr object to log it.
Forgot to bump the version when creating the migration. Also convert
the migration to a single sql query, although the migration will never
run because I ran the query manually to fix things up after finding
out the data was "dirty" since the new code (assuming only modern
statuses) was merged without running the migration.
Thankfully it looks like the impact was not too severe (because the
legacy statuses should only be present on very old commits / PRs), I
don't remember when I deployed the update but apparently just a pair
of PRs got affected, because their `previous_failure` was the old
style and thus broke the "new failure" check.
Forgot to deref the id of the staging we're trying to lock, so the
specific case where we start a freeze with a bump PR and an
outstanding staging in master would instantly blow up.
The low-level APIs used by the staging process don't do any merge
check, so because of the way Git works it's possible for them to merge
commits with content as empty commits, e.g. if something was merged
then backported and the backport was merged on top. This should
trigger a merge failure as we don't really want to merge newly
empty. This is a feature which some high level commands of git
support, kind-of, e.g. by default `git rebase --interactive` will ask
about newly empty commits.
Take care to allow merging already-empty commits, as these do have a
use for signaling, freezes, ....
Fixes#809
Prepares the possibility of either more direct communication with the
CI platform(s) or just assuming CI has gotten reliable enough and
colleagues intelligent enough that this is not an issue anymore
because they've stopped pushing empty branches (which we know is not
the case).
Fixes#806
During the 17.0 freezeathon, the freeze wizard blew up with
MergeError: merge-tree: {oid} - not something we can merge
Turns out when freezes were moved to local
(4d2c0f86e1) I forgot to fetch the heads
of the release and bump PRs into the local repo, so rebasing them atop
their branch would fail because the local repository would just not
find the object being rebased.
I had missed that case in testing as well, but in fairness even if I
had tried testing it I'd likely have missed it: implementation
limitations (shortcuts) of dummy central mean it currently ignores
what objects the client requests and bundles everything it can find
associated with the repository (meaning it sends the entire network).
This is not usually an issue because the test repos are pretty small,
but it means the client can have objects they should not because they
never requested them and might not even be supposed to be aware of
their existence.
Anyway solve by doing the obvious: fetch the heads of the release and
bump PRs at the same time we update the branch being forked off. Also
update the freeze tests to trigger the issue (by creating the release
/ bump PRs in different repos) and running the tests against github
actual to make sure we can actually see them fail (correctly, the
merge error we expect) not via errors in the test), and we do fix
them.
Fixes#821
Currently, once a source PR has been merged it's not possible to set
or update a limit, which can be inconvenient (e.g. people might have
forgotten to set it, or realise after the fact that setting one is not
useful, or might realise after the fact that they should *unset* it).
This PR relaxes that constraint (which is not just a relaxation as it
requires a bunch of additional work and validation), it should now be
possible to:
- update the limit to any target, as long as that target is at or
above the current forwardport tip's
- with the exception of a tip being forward ported, as that can't be
cancelled
- resume a forward port stopped by a previously set limit (just
increase it to whatever)
- set a limit from any forward-port PR
- set a limit from a source, post-merge
The process should also provide pretty chatty feedback:
- feedback on the source and closest root
- feedback about adjustments if any (e.g. setting the limit to B but
there's already a forward port to C, the 'bot should set the limit
to C and tell you that it happened and why)
Fixes#506
If a branch `foo` is disabled, then `tmp.foo` and `staging.foo` become
unnecessary (with #247 fixed the tmp refs are not used for creating
stagings anymore, but for now they're still used for the "safety
dance" of merging a successful staging into the corresponding
mainline).
Fixes#605
Per the Github webhook documentation:
1. sha1 signatures are deprecated, github recommends sha256 (though
that's unlikely to be a concern anyway), and dummy-central supports
both so it should be no issue.
> If possible, we recommend that you use the x-hub-signature-256
> header for improved security.
2. Non-ascii secrets are supported and should be utf8-encoded to
compute signatures... that's not actually documented as github docs
only mention payload encoding but it seems to make sense anyway.
Also improve the warning message by replacing the signature (which is
useless) by the delivery id (which could allow introspecing the hook
or something).
Currently a user is not notified that the parent of a detached PR
needs to be independently approved and may miss that information. Add
a notification to *that* PR as well.
Fixes#788
The github API has gotten a lot more constraining (with rate
restrictions being newly enforced or added somewhat out of nowhere),
and as importantly a lot less reliable. So move the staging process
off of github and locally, similar to the forward porting
process (whose repo cache is being reused for this).
Fixes#247
Probably less necessary than for the regular staging stuff, but might
as well while at it.
Requires updating one of the test to generate a non-ff push, as
O_CREAT doesn't exist at the git level, and the client (and it is
client-side) only protects against force pushes. So there is no way to
trigger an issue with just the creation of the new branch, it needs to
exist *and point to a non-ancestor commit*.
Also remove a sleep in the ref update loop as there are no ref updates
anymore, until the very final sync via git.
NB: maybe it'd be possible to push both bump and release PRs together
for each repo, but getting which update failed in case of failure
seems difficult.
It has been a consideration for a while, but the pain of subtly
interacting with git via the ignominous CLI kept it back. Then ~~the
fire nation attacked~~ github got more and more tight-fisted (and in
some ways less reliable) with their API.
Staging pretty much just interacts with the git database, so it's both
a facultative github operator (it can just interact with git directly)
and a big consumer of API requests (because the git database endpoints
are very low level so it takes quite a bit of work to do anything
especially when high-level operations like rebase have to be
replicated by hand).
Furthermore, an issue has also been noticed which can be attributed to
using the github API (and that API's reliability getting worse): in
some cases github will fail to propagate a ref update / reset, so when
staging 2 PRs it's possible that the second one is merged on top of
the temporary branch of the first one, yielding a kinda broken commit
(in that it's a merge commit with a broken error message) instead of
the rebase / squash commit we expected.
As it turns out it's a very old issue but only happened very early so
was misattributed and not (sufficiently) guarded against:
- 41bd82244bb976bbd4d4be5e7bd792417c7dae6b (October 8th 2018) was
spotted but thought to be a mergebot issue (might have been one of
the opportunities where ref-checks were added though I can't find
any reference to the commit in the runbot repo).
- 2be25052e147b151d1d8a5bc73cceb351586ce03 (October 15th, 2019) was
missed (or ignored).
- 5a9fe7a7d05a9df7186072a7bffd60c6b428fd0e (July 31st, 2023) was
spotted, but happened at a moment where everything kinda broke
because of github rate-limiting ref updates, so the forensics were
difficult and it was attributed to rate limiting issues.
- f10d03bf0f2e8f88f62a5d8356b84f714196130f (August 24th, 2023) broke
the camel's back (and the head block): the logs were not too
interspersed with other garbage and pretty clear that github ack'd a
ref update, returned the correct oid when checking the ref, then
returned the wrong oid when fetching it later on.
No Working Copy
===============
The working copy turns out to not be necessary, the plumbing commands
we *need* work just fine on a bare repository.
Working without a WC means we had to reimplement the high level
operations (rebase) by hand much as we'd done previously, *but* we
needed to do that anyway as git doesn't seem to provide any way to
retrieve the mapping when rebasing/cherrypicking, and cherrypicking by
commit doesn't work well as it can't really find the *merge base* it
needs.
Forward-porting can almost certainly be implemented similarly (with
some overhead), issue #803 has been opened to keep track of the idea.
No TMP
======
The `tmp.` branches are no more, the process of creating stagings is
based entirely around oids, if staging something fails we can just
abandon the oids (they'll be collected by the weekly GC), we only
need to update the staging branches at the very end of the process.
This simplifies things a fair bit.
For now we have stopped checking for visibility / backoff as we're
pushing via git, hopefully it is a more reliable reference than the
API.
Commmit Message Formatting
==========================
There's some unfortunate churn in the test, as the handling of
trailing newlines differs between github's APIs and git itself.
Fixes#247
PS: It might be a good idea to use pygit2 instead of the CLI
eventually, the library is typed which is nice, and it avoids
shelling out although that's really unlikely to be a major cost.
Necessary to create commits *as* the mergebot without going through
the github API. Copy of the improved version from forwardport. *Not*
an override, to avoid unnecessarily triggering one or the other which
is confusing and weird.
Move *almost* all the staging code to free functions, in a separate
module, and extensively typed.
The only bits which didn't move are:
- the entry point (the cron hook), because it has to be a model method
in order to be called
- the `_build_merge_message` method, because it needs to be
overridable
There's also a bit of an import mess, because the cron &
`_build_merge_message` need to call into the new module, but the new
module wants the types they belong to, so it's a bit circular.
If the stagings are going to be created locally (via a git working
copy rather than the github API), the mergebot part needs to have
access to the cache, so move the cache over. Also move the maintenance
cron.
In an extermely minor way, this prefigures the (hopeful) eventual
merging of the ~~planes~~ modules.
- add formatting for a bunch of backend objects
- add cross-links in order to use toplevel navigation between objects
e.g. project -> branch -> staging -> PR with breadcrumbs instead of
shitty dialog boxes
Relates to #802
When I updated the status storage (including `previous_failure`) for
some reason I didn't just migrate from the old to the new format, and
added bridge functions instead.
This is not really necessary (or useful), so convert all the legacy
data and remove the conversion helpers.
Relates to #802
Mostly a temporary safety feature after the events of 07-31: it's
still not clear whether that was a one-off issue or a change in
policy (I was not able to reproduce locally even doing several
set_refs a second) and the gh support is not super talkative, but it
probably doesn't hurt to commit the workaround until #247 gets
implemented.
On 2023-07-31, around 08:30 UTC, `set_ref` started failing, a lot
(although oddly enough not continuously), with the unhelpful message
that
> 422: Reference cannot be updated
This basically broke all stagings, until a workaround was implemented
by adding a 1s sleep before `set_ref` to ensure no more than 1
`set_ref` per second, which kinda sorta has been the github
recommendation forever but had never been an issue
before. Contributing to this suspicion is that in late 2022, the
documentation of error 422 on `PATCH git/refs/{ref}` was updated to:
> Validation failed, or the endpoint has been spammed.
Still would be nice if GH was clear about it and sent a 429 instead.
Technically the recommendation is:
> If you're making a large number of POST, PATCH, PUT, or DELETE
> requests for a single user or client ID, wait at least one second
> between each request.
So... actually implement that. On a per-worker basis as for the most
part these are serial processes (e.g. crons), we can still get above
the rate limit due to concurrent crons but it should be less likely.
Also take `Retry-After` in account, can't hurt, though we're supposed
to retry just the request rather than abort the entire thing. Maybe a
future update can improve this handling.
Would also be nice to take `X-RateLimit` in account, although that's
supposed to apply to *all* requests so we'd need a second separate
timestamp to track it. Technically that's probably also the case for
`Retry-After`. And fixing #247 should cut down drastically on the API
calls traffic as staging is a very API-intensive process, especially
with the sanity checks we had to add, these days we might be at 4
calls per commit per PR, and up to 80 PRs/staging (5 repositories and
16 batches per staging), with 13 live branches (though realistically
only 6-7 have significant traffic, and only 1~2 get close to filling
their staging slots).
`/runbot_merge/stagings`
========================
This endpoint is a reverse lookup from any number of commits to a
(number of) staging(s):
- it takes a list of commit hashes as either the `commits` or the
`heads` keyword parameter
- it then returns the stagings which have *all* these commits as
respectively commits or heads, if providing all commits for a
project the result should always be unique (if any)
- `commits` are the merged commits, aka the stuff which ends up in the
actual branches
- `heads` are the staging heads, aka the commits at the tip of the
`staging.$name` branches, those may be the same as the corresponding
commit, or might be deduplicator commits which get discarded on
success
`/runbot_merge/stagings/:id`
============================
Returns a list of all PRs in the staging, grouped by batch (aka PRs
which have the same label and must be merged together).
For each PR, the `repository` name, `number`, and `name` in the form
`$repository#$number` get returned.
`/runbot_merge/stagings/:id1/:id2`
==================================
Returns a list of all the *successfully merged* stagings between `id1`
and `id2`, from oldest to most recent. Individual records have the
form:
- `staging` is the id of the staging
- `prs` is the contents of the previous endpoint (a list of PRs
grouped by batch)
`id1` *must* be lower than `id2`.
By default, this endpoint is inclusive on both ends, the
`include_from` and / or `include_to` parameters can be passed with the
`False` value to exclude the corresponding bound from the result.
Related to #768
`auto_session_tracking` causes issues when not specified on the super
old version of the client which is available on ubuntu.
Also disable tracing as it seems less useful than hoped for, and I've
not been using what's been collected so far.
Currently the heads of a staging (both staging heads and merged heads)
are just JSON data on the staging itself. Historically this was
convenient as the heads were mostly of use to the staging process, and
thus accessed directly through the staging essentially exclusively.
However this makes finding stagings from merged commits e.g. for
forensic research almost impossible, because querying based on
the *values* of a JSON map is expensive, and indexing it is difficult.
To make this use case more feasible, split the `heads` field into two
join tables, one for the staging heads and one for the merged heads,
this makes looking for stagings by commits much more
efficient (although the queries may not be trivial). Also add two
utility RPC methods, so that it's possible to query stagings
reasonably easily and efficiently based on a set of commits (branch
heads).
related to #768
Allow filtering stagings by state (success or failure), and provide a
control to explicitly update the staging date limit.
Should make it easier to drill through stagings when looking for
specific information.
Related to #751
Fix outstanding query to make a positive `state` filtering, instead of
negative, matching 3b52b1aace8674259812a76b1566260937dbcacb.
Also manually create a map of stagings (grouped by branch) sharing a
single prefetch set.
For odoo the mergebot home page has 12 branches in the odoo project
and 8 in spreadsheet, 6 stagings each. This means 120 queries to
retrieve all the heads (Odoo stagings have 5 heads and spreadsheet
have 1, but that seems immaterial).
By fixing `_compute_statuses` and creating a single prefetch set for
all stagings of all branches we can fetch all the commits in a single
query instead of 120.
- add support for authorship (not just approval)
- make display counts directly
- fix `state` filter: postgres can't do negative index lookups
- add indexes for author and reviewed_by as we look them up
- ensure we handle the entire source filtering via a single subquery
Closes#778
A few cases of conflict were missing from the provisioning
handler.
They can't really be auto-fixed, so just output a warning and ignore
the entry, that way the rest of the provisioning succeeds.
During the 16.3 freeze an issue was noticed with the concurrency
safety of the freeze wizard (because it blew up, which caused a few
issues): it is possible for the cancelling of an active staging to the
master branch to fail, which causes the mergebot side of the freeze to
fail, but the github state is completed, which puts the entire thing
in a less than ideal state.
Especially with the additional issue that the branch inserter has its
own concurrency issue (which maybe I should fix): if there are
branches *being* forward-ported across the new branch, it's unable to
see them, and thus can not create the now-missing PRs.
Try to make the freeze wizard more resilient:
1. Take a lock on the master staging (if any) early on, this means if
we can acquire it we should be able to cancel it, and it won't
suffer a concurrency error.
2. Add the `process_updated_commits` cron to the set of locked crons,
trying to read the log timeline it looks like the issue was commits
being impacted on that staging while the action had started:
REPEATABLE READ meant the freeze's transaction was unable to see
the update from the commit statuses, therefore creating a diverging
update when it cancelled the staging, which postgres then reported
as a serialization error.
I'd like to relax the locking of the cron (to just FOR SHARE), but I
think it would work, per postgres:
> SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as
> SELECT in terms of searching for target rows: they will only find
> target rows that were committed as of the transaction start
> time. However, such a target row might have already been updated (or
> deleted or locked) by another concurrent transaction by the time it
> is found. In this case, the repeatable read transaction will wait
> for the first updating transaction to commit or roll back (if it is
> still in progress). If the first updater rolls back, then its
> effects are negated and the repeatable read transaction can proceed
> with updating the originally found row. But if the first updater
> commits (and actually updated or deleted the row, not just locked
> it) then the repeatable read transaction will be rolled back with
> the message
This means it would be possible to lock the cron, and then get a
transaction error because the cron modified one of the records we're
going to hit while it was running: as far as the above is concerned
the cron's worker had "just locked" the row so it's fine to
continue. However this makes it more and more likely an error will be
hit when trying to freeze (to no issue, but still). We'll have to see
how that ends up.
Fixes#766 maybe
Currently sentry is only hooked from the outside, which doesn't
necessarily provide sufficiently actionable information.
Add some a few hooks to (try and) report odoo / mergebot metadata:
- add the user to WSGI transactions
- add a transaction (with users) around crons
- add the webhook event info to webhook requests
- add a few spans to the long-running crons, when they cover multiple
units per iteration (e.g. a span per branch being staged)
Closes#544
- move sentry configuration and add exception-based filtering
- clarify and reclassify (e.g. from warning to info) a few messages
- convert assertions in rebase to MergeError so they can be correctly
logged & reported, and ignored by sentry, also clarify them
(especially the consistency one)
Related to #544
Largely informed by sentry,
- Fix an inconsistency in staging ref verification, `set_ref`
internally waits for the observed head to match the requested head,
but then `try_staging` would re-check that and immediately fail if
it didn't.
Because github is *eventually* consistent (hopefully) this second
check can fail (and is also an extra API call), breaking staging
unnecessarily, especially as we're still going to wait for the
update to be visible to git.
Remove this redundant check entirely, as github provides no way to
ensure we have a consistent view of anything, it doesn't have much
value and can do much harm.
- Add github request id to one of the sanity check warnings as that
could be a useful thing to send upstream, missing github request ids
in the future should be noted and added.
- Reworked the GH object's calls to be clearer and more coherent:
consistently log the same thing on all GH errors (if `check`),
rather than just on the one without a `check` entry.
Also remove `raise_for_status` and raise `HTTPError` by hand every
time we hit a status >= 400, so we always forward the response body
no matter what its type is.
- Try again to log the request body (in full as it should be pretty
small), also remove stripping since we specifically wanted to add a
newline at the start, I've no idea what I was thinking.
Fixes#735, #764, #544
Current system makes it hard to iterate feedback messages and make
them clearer, this should improve things a touch.
Use a bespoke model to avoid concerns with qweb rendering
complexity (we just want GFM output and should not need logic).
Also update fwbot test setup to always configure an fwbot name, in
order to avoid ping messages closing the PRs they're talking
about, that took a while to debug, and given the old message I assume
I'd already hit it and just been too lazy to fix. This requires
updating a bunch of tests as fwbot ping are sent *to*
`fp_github_name`, but sent *from* the reference user (because that's
the key we set).
Note: noupdate on CSV files doesn't seem to work anymore, which isn't
great. But instead set tracking on the template's templates, it's not
quite as good but should be sufficient.
Fixes#769
- currently disabling staging only works globally, allow disabling on
a single branch
- use a toggle
- remove a pair of tests which work specifically with `fp_target`,
can't work with `active` (probably)
- cleanup search of possible and active stagings, add relevant
indexes and use direct search of relevant branches instead of
looking up from the project
- also use toggle button for `active` on branches
- shitty workaround for upgrading DB: apparently mail really wants to
have a `user_id` to do some weird thing, so need to re-add it after
resetting everything
Fixes#727
- github logins are case-insensitive while the db field is CI the dict
in which partners are stored for matching is not, And the caller may
not preserve casing.
Thus it's necessary to check the casefolded database values against
casefolded parameters, rather than exactly.
- users may get disabled by mistake or when one leaves the project,
they may also get switched from internal to portal, therefore it is
necessary to re-enable and re-enroll them if they come back.
- while at it remove the user's email when they depart, as they likely
use an organisational email which they don't have access to anymore
Side-note, but remove the limit on the number of users / partners
being created at once: because there are almost no modules in the
mergebot's instance, creating partner goes quite fast (compared to a
full instance), thus the limitation is almost certainly unnecessary
(creating ~300 users seems to take ~450ms).
Fixes ##776
652b1ff9ae wanted to check if a request
was available, however it deref'd the `request` object without
checking it which is not correct: a `request` normally has an
`httprequest`, but the `request` itself might be missing if the
handler is called from e.g. a cron.
Fixes#739
The mismatch diff attribute contains values from the in-db object and
the github PR structure, some of which are explicitly *not*
strings (e.g. the squash flag, possibly the commits # in the future).
As a result, when the squash-flag of a PR differs from the actual the
formatting for diffing blows up, because difflib can't handle
non-strings.
Stringify values between passing them to `format_items`, this way the
string operations on names and values should work correctly.
The mergebot page become a bit slow with the years, it is time to make
small optimisation to speed up thinks a little.
Note: all changes where applied modifying the views or adding index by
hand. There is still room for improvement but it would need more in
depth refactoring, mainly adding specialized computed fields to
enable a better batching.
The first issue was using branch.staging_ids
branch.staging_ids.sorted(lambda s: s.staged_at, reverse=True)[:6]
The number of staging_ids is increasing and prefetching + sorting all
of them is slow.
The proposed solution is to replace it by a search, not ideal, a
specialized compute field may be a good idea, but this is a quick fix
that can be done editing a view.
branch.env['runbot_merge.stagings'].search([('target', '=', branch.id)],order='staged_at desc', limit=6)
Other changes are just index on critical columns.
Before changes, /runbot_merge page takes ~5s to load
After changes, /runbot_merge page takes ~1s to load
Small note: note 100% sure that runbot_merge.batch.target was useful
The loggers would only print the "tail" of the path, not including the
repo name, or the `/repos` prefix.
While this made logs shorter, it was not intentional and made
debugging some issues on endpoints harder than necessary as the calls
had to be adjusted mentally, which is completely unnecessary.
1cea247e6c tried to improve staging
checks to avoid staging PRs in the wrong state, however it had two
issues:
PR state
--------
The process would reset the PR's state to open, but unless the head
was being resync'd it wouldn't re-apply the statuses on the state,
leading to a PR with all-valid statuses, but a missing CI.
Message
-------
The message check didn't compose the PR message the same way PR
creation / update did (it did not trim the title and description
individually, only after concatenation), resulting in a
not-actually-existing divergence getting signaled in the case where
the PR title ends or the description starts with whitespace.
Expand relevant test, add a utility function to compose a PR message
and use it everywhere for coherence.
Also update the logging and reporting to show a diff of all the
updated items (hidden behind a `details` element).
If there are bump PRs anyway: the bump commits will cause the
forward-port of the staging to fail, so might as well clearly notify
everybody of the issue if there is a pending staging, and not waste
too much time waiting for a staging which can not succeed.
We could also cancel stagings when there's no bump PR, but it's not
clear that there's any reason to do so: if we didn't touch any master
branch, there's no reason for the staging to fail, or to otherwise
cancel it.
And obviously we can't have staged anything on the new branch so
there's nothing to cancel.
Part-Of: #718
I DECLARE BANKRUPTCY!!!
The previous implementation of labels lookup was really not
intuitive (it was just a char field, and matched labels by equality
including the owner tag), and was also full of broken edge
cases (e.g. traceback if a label matched multiple PRs in the same repo
because people reuse branch names).
Tried messing about with contextual `display_name` and `name_search`
on PRs but the client goes wonky in that case, and there is no clean
autocomplete for non-relational fields.
So created a view which reifies labels, and that can be used as the
basis for our search. It doesn't have to be maintained by hand, can be
searched somewhat flexibly, we can add new view fields in the future
if desirable, and it seems to work fine providing a nice
understandable UX, with the reliability of using a normal Odoo model
the normal way.
Also fixed the handling of bump PRs, clearly clearing the entire field
before trying to update existing records (even with a link_to
inbetween) is not the web client's fancy, re-selecting the current
label would just empty the thing entirely.
So use a two-step process slightly closer to the release PRs instead:
- first update or delete the existing bump PRs
- then add the new ones
The second part is because bump PRs are somewhat less critical than
release, so it can be a bit more DWIM compared to the more deliberate
process of release PRs where first the list of repositories involved
has to be set up just so, then the PRs can be filled in each of them.
Fixes#697
In order to support partial freezing, we need the ability to remove
some of the release lines for the repos we don't want to
freeze (e.g. because they don't use per-version branches).
This subsequently means we need the ability to *create* new lines if
we fucked up and removed one we should not have. Alternatively the
freeze meat-bot could cancel the entire thing and redo the wizard but
that seems harsh and mean, so don't do that.
Fixes 0f3647b7c7 which specifically
mentioned partial freeze then proceeded to make them entirely
impossible anyway.
Part of #718
Previously the mergebot would only sync the head commit, but synching
more is useful.
Also update the final sanity check on staging:
- as with check, update the message & target branch
- reset PR state and post a message when updating message instead of
doing so silently
Note: maybe only fail the staging if the message is updated *and*
relevant to staging (aka there's a merge method and it's not
`rebase`)?
Fixes#680
Was missing a logging message in the case where the current and sync'd
head are identical, which seems to occur from time to time but can
only be inferred (by seeing a sync event then nothing happening).
Add a logging warning (because it's a strange situation) in order to
explicitely note the issue.
Also make the sync logging messages more regular for clarity.
And add the delivery information (delivery id and user-agent) to event
log, so it's more possible to report issues to github.
After review, there doesn't seem to be a single integer field created
by the mergebot or fortwardbot modules for which a `group_operator`
makes sense, let alone the default of `sum`.
So just disable them all.
Fixes#674
If commits have different authors (/ committers), the mergebot would
ask github to create a commit with an author (/ committer) of `None` /
`null`.
Apparently github really does not like that, and complains that
> nil is not an object
So remove the key entirely. Also fix the collision between `author`
and the `Co-Authored-By` list, which could lead to trying to set an
`author` of `[name, email]` instead of an object, which is also not
accepted by github.
Fixes to the new bits which didn't really work:
- Fix borked view layout
- Add some help to the label fields
- Improve the resolution of label -> pr, and fix
- Also make the feature actually work for bump PRs
- Also make pr -> label work more reliably, now allows setting one PR
and getting the other PRs of the same batch (with the same label)
even without setting the label by hand
An autocomplete for the label has been considered but there is no
autocomplete field for char/selection fields, and it seems way too
much work for the utility:
- either create a brand new widget for 15.0 which will have to be
entirely rewritten in 16
- or create a transient model composed entirely of fake records to
provide an m2o to records which don't actually exist as label
bearers, which is also a lot of unnecessary work
NOTE: we want to support partial freezing (aka not freeze all the
branches because some of them have different release models
than others), so some project repos *not* having a release
PR is fine and normal, such a validation should not be added.
Fixes#664
In case where the last branch (before the branch being frozen) is
disabled, the forwardport inserter screws up, and fails to correctly
create the intermediate forwardports from the new branch.
Also when disabling a branch, if there are FW PRs which target that
branch and have not been forward-ported further, automatically
forward-port them as if the branch had been disabled when they were
created, this should limit data loss and confusion.
Also change the message set on PRs when disabling a branch: because of
user conflicts in test setup, the message about a branch being
disabled would close the PRs, which would then orphan the followup,
leading to unexpected / inconsistent behaviour.
Fixes#665
The `statuses` field of a staging is always "live" because it's a
computed non-stored field. This is an issue when a staging finishes in
whatever state, then someone gets new statuses sent on one of the head
commits, either by rebuilding (part of) the staging or by just using
the same commit for one of their branches.
This makes the reporting of the main dashboard confusing, as one might
look at a failed staging and see all the required statuses
successful. It also makes post-mortem analysis more complicated as the
logs have to be trawled for what the statuses used to be (and they
don't always tell).
Solve this by storing a snapshot of the statuses the first time a
staging moves away from `pending`, whether it's to success or failure.
Fixes#667
In the branch lists of stagings, the timestamps in the left column and
the labels in the data cells can not be selected, because they're
buttons and anyway bootstrap explicitly sets
.btn {
...
user-select: none;
}
This can be frustrating, as timestamps and labels are useful
information to cross-reference, the ability to copy them is
convenient.
Custom-set the reverse via our own CSS.
Fixes#668
Partially revert 0c882fc0df
This turns out to be more bane than boon, as it breaks forward-port
chains and confuses people (despite the message). Update notification
message and don't close the PR anymore.
While at it, disable any pending staging on the branch being deactivated.
Fixes#654
af016f4239 did a half-assed job and
didn't fix the one test which actually checks the dashboard.
TBF I was in a bit of a hurry trying to make the mergebot work and be
presentable again, but still...
15.0 (or 14.0) dropped some of the BS3 (?) compatibility stuff, which
the mergebot was (apparently) relying on. This lead to a visual
degradation as well as the frontend dropdown looking absolutely awful.
Fix that, on both style and templates.
15.0 (or 14.0) also dropped the bespoke responsive utility classes,
switch to bootstrap's.
Turns out I was running "15.0" except just on the runbot, enterprise
and community were still the 14.0 repos, so some of the changes were
missing.
While at it, bundle fixes for 3.10, as that's what Jammy needs, and
the mergebot/15.0 will be running on that.
Test seems to fail from time to time with one of the PRs getting
lost. Tried to move code around trying to investigate, can't repro
anymore. Possibly a race condition because the `to_pr` call was
performed too early, before the webhook had run (and thus before the
PR object had been created on the odoo side).
By moving the `to_pr` calls to after the cron run, we really ensure
the webhooks will have run.
Also update `to_pr` to ensure exactly one PR was retrieved, as
currently nothing is checked so we might have gotten none (yet), which
should be noticed early and clearly. In theory this also guards
against multiple PRs, but PRs should be unique on (repo, number).
- Some batches in a few stagings are apparently empty (e.g. batch
71771 from staging 32511), the listing was not resilient to such
issues.
Update the code to suppress the display of empty batches.
- The possibility of accessing `/runbot_merge/<id>` with a
non-existent branch had not been considered and triggered a
rendering error in the template.
Fixes#630, fixes#631
- override the staging's name_get to provide a slightly more useful
display_name (though still not great as the staging object remains
quite technical and inimical to human interaction)
- show individual PRs in a batch (as m2m tags) for readability
- update PR views to show the author and reviewer, except in the list
of delegations of users where it's a lot less useful
/cc #632
Currently deactivating a branch kinda leaves users in the dark, with
little way to know what has happened aside from inferring it from the
branch having disappeared from the main dashboard.
- surface the state of the branch in the PR dashboard (also surface
the target branch at all so users can see if their PR is targeted
as they expect as far as the mergebot is concerned)
- close & notify every PR to a branch being deactivated
- cancel any current staging to the branch (as a consequence of the
above)
Closes#632
Previously if a branch name could not be line-broken (because it was
full of underscores) it would break the layout by making "its"
staging's column much wider than the expected 1/2 / 1/4 /
1/6 (depending on window width), compressing the width of its
sibling's columns.
By disabling content-based width (only taking in account flex-width)
and setting overflow to hidden, the overlong branch names get cut off
instead.
- if stderr has been rerouted or explicitely rerouted to STDOUT,
`e.stderr` is `None` and the error reporting blows up (which is
inconvenient). Handle this case.
- handle the case where `fp_github_name` has not been configured (it's
not a super useful handling but meh, apparently git/hub doesn't
really care if there's a username when using API tokens)
- minor improvement to the refline parser: per-spec, the trailing
newline is optional, so don't fail if it's missing
The previous version of the code assumed `pr['body']` is always a
string, which is not correct, when the PR body is emptied the body
itself is removed (its value is `None`).
Add a case for this in the PR edition test, and avoid blowing up (or
adding empty newlines) when the PR body is empty. For PR creation this
issue was fixed in c2db5659d8 but
apparently I missed that the exact same issue occurs just a few lines
above.
Also turns out github does *not* send change information when the body
is updated from (or to?) `None`, so don't even bother with that, just
check every time if the overall message has been updated.
Fixes#629
Stop *staging* release PRs: they are normally fairly simple and should
not fail their staging outside of unreliable tests (or possibly a few
edge cases e.g. forgot one version change thing), however staging them
creates the possibility of a "version hole" on the release branch
which is undesirable.
Instead, immediately and unconditionally push the release commits onto
the newly created branches, if there are things which don't work they
can be fixed afterwards (and the process refined, maybe).
Also add the same feature for *bump* PRs, with the difference that the
bump PRs are not created / requested by default (they have to be opted
in individually).
For convenience, add a feature which automatically finds the PRs via
inputting the label (not really tested yet).
Closes#603
Because the searching of the PR occurs *right* after the PR was
created on the server, despite the additional operations (status,
approval) it's apparently possible for the lookup of the new PR to
occur about the same time the PR is being created, kinda, maybe?
On DS it triggers very reliably for every PR but the first. By moving
the retrieval after the repo timeout & we've run the crons, we near
guarantee the PRs are visible (it's possible for things to fail on
grounds of github reliability, but then they'd have failed *even more*
before).
- add a logging entry for PR updates
- change the generic log entry to log the sender (of the event) rather
than the PR author
- fix the post-facto PR loader to more systematically and reliably set
a `sender`
It's likely that the PR pages are seen more commonly than the
dashboard by most users, so add alerts there in case users wonder
what's happening.
Fixes#580
The log message only indicated whether the PR was squashed or not, but
that's not actually useful.
Improve the message to log the actual merge method, for
information. The old "squash" (aka squash flag set and no merge
method, since an actual squash merge method was reintroduced a while
ago) has been renamed to "single" for the purpose of this display.
Before this, until the first status for a required context the status
would appear as pending, but would be have oddly (e.g. not clickable).
Update the style of such statuses for clarity:
- use a light background to show them as inactive
- use the `wait` cursor to show their status as oddball (and not clickable)
Setting this styling on the link (or even `li`) doesn't seem to work,
so set it on the `ul`, the actual active links will set the relevant
"active" cursor instead, which seems to work fine.
While at it, extract the status menu to its own template and unify the
disparate bits, mainly in that both the main dashboard and the
per-branch list display the staging instant in UTC on hover: before,
the main dashboard would display a relative delta and provide the
UTC-formatted instant on hover, but the branch would only show a zoned
ISO-8601 instant.
While adjusting is easy, it's unnecessary, we can easily provide the UTC
staging instant there).
Old messages were quite inconsistent in their pinging of the PR author
and reviewer.
Reviewed messages (probably missed some but...) and try to more
consistently ping when the feedback requires some sort of action in
order to proceed.
Fixes#592
A few fixes and improvements after testing the feature:
- ensure the provisioned users are created as internal (not portal)
- assume oauth is installed and just crash if it's not
- handle a user not having an email (ignore)
- return value from json handler, otherwise JsonRequest sends no
payload which is *weird*
Stagings would be cancelled automatically if the PR's commits were
updated, but not if the target (base) was changed, even though that
has a drastic impact on staging.
Add hooks to unstage PRs if their base is updated, or if their message
is updated and relevant to staging (merge or rebase-merge methods).
Fixes#604
Extract the creation of a PR link (to github) as a dedicated
template for easier updates, and to use `display_name`
everywhere (instead of reimplementing it by name).
Also implement support for repo-level groups setting for information
hiding.
Fixes#590