When an fw batch fails, log a message to its chatter (so the reason
for the failure doesn't necessarily have to be hunted down in the
logs, although depending on the quality of the error that might still
be an issue).
Also if a batch keeps failing (fails for more than a day at a retry
every hour -- increased from a previous 30mn), stop retrying it and
flag it in the list view, it's clearly not going to go through.
This is because we hit an issue with a completion fw batch created on
January 27th which kept failing until it was deleted on February
10th. Thankfully it failed on `git push` and git operations apparently
are not rate limited at all, but still it's not great stewartship to
keep trying a forwardport which keeps failing. Retrying in case of
transient failure makes sense, but after 24 attempts over a day it's
either not transient, or it's not working because github is down and
hammering it won't help.
Previously the alert would show the total number of outstanding
forward ports, which most people don't care about (if they're even
logged which probably isn't the case), so that was mostly useful
for... me.
Add the current user's tally to the message as well. If the user is
logged. Which is always, because the alert only shows up if the user
has read access to PRs.
Fixes#801
Since b45ecf08f9 forwardport batches
which fail have a delay set in order to avoid spamming. However that
delay was not displayed anywhere, which made things confusing as the
batch would not get run even after creating new triggers.
Show the delay if it's set (to a value later than now), as a relative
delta for clarity (as normally the delay is in minutes so a full blown
date is difficult to read / aprehend), and allow viewing and setting
it in the form view.
Fixes#982
- reduce font-size and bold-ness a hair as it causes issues in the
backend
- remove font adjustment on root object
- add `text-bg-primary` in the oustanding FP view, apparently BS5 does
not do anything like that by default when setting `bg-primary` for
some reason so the current team or user appears in black on dark
blue leading to sub-par readability
These are pretty simple to convert as they are straightforward: an
item is added to a work queue (table), then a cron regularly scans
through the table executing the items and deleting them.
That means the cron trigger can just be added on `create` and things
should work out fine.
There's just two wrinkles in the port_forward cron:
- It can be requeued in the future, so needs a conditional trigger-ing
in `write`.
- It is disabled during freeze (maybe something to change), as a
result triggers don't enqueue at all, so we need to immediately
trigger after freeze to force the cron re-enabling it.
Mergebot / forwardport crons need to run in a specific ordering in
order to flow into one another correctly. The default ordering being
unspecified, it was not possible to use the normal cron
runner (instead of the external driver running crons in sequence one
at a time). This can be fixed by setting *sequences* on crons, as the
cron runner (`_process_jobs`) will use that order to acquire and run
crons.
Also override `_process_jobs` however: the built-in cron runner
fetches a static list of ready crons, then runs that.
This is fine for normal situation where the cron runner runs in a loop
anyway but it's any issue for the tests, as we expect that cron A can
trigger cron B, and we want cron B to run *right now* even if it
hadn't been triggered before cron A ran.
We can replace `_process_job` with a cut down version which does
that (cut down because we don't need most of the error handling /
resilience, there's no concurrent workers, there's no module being
installed, versions must match, ...). This allows e.g. the cron
propagating commit statuses to trigger the staging cron, and both will
run within the same `run_crons` session.
Something I didn't touch is that `_process_jobs` internally creates
completely new environments so there is no way to pass context into
the cron jobs anymore (whereas it works for `method_direct_trigger`),
this means the context values have to be shunted elsewhere for that
purpose which is gross. But even though I'm replacing `_process_jobs`,
this seems a bit too much of a change in cron execution semantics. So
left it out.
While at it tho, silence the spammy `py.warnings` stuff I can't do
much about.
Previously it would count the number of source PRs with outstanding
forward ports, which is not the count from the home page so that was
confusing.
Also add counts next to the groups, so teams can be identified at a
glance.
And finally outline the current user in the list, so they can find
themselves faster when they're not one of the top entries.
Displays the entire batch set as a table, along both
repository (linked PRs) and branch (forward ports). Should provide a
much more complete overview.
Adds a copy of the dashboard as a raster render, to link from the PR:
as usual SVG is shit, content-based viewboxes are hell and having to
duplicate the entire CSS because `<img/>`-linked CSS can't run is
gross. And there's no payoff since the image is not interactible
anyway.
Performing manual ad-hoc table rendering via pillow is not
significantly worse, it works fine and it's possible to do *really*
good conditional request handling (hopefully) because I've basically
got all the information I need right here.
In fact it might make sense to upgrade the regular HTML page with
similar conditional request handling, at least for the last-update
bit if not the etag.
Fixes #771,fixes #770
This commit revisits the commands set in order to make it more
regular, and limit inconsistent command-sets, although it includes
pseudo-command aliases for common tasks now removed from the core set.
Hard Errors
===========
The previous iteration of the commands set would ignore any
non-command term in a command line. This has been changed to hard
error (and ignoring the entire thing) if any command is unknown or
invalid.
This fixes inconsistent / unexpected interpretations where a user
sends a command, then writes a novel on the same line some words of
which happen to *also* be commands, leading to merge states they did
not expect. They should now be told to fuck off.
Priority Restructuring
----------------------
The numerical priority system was pretty messy in that it confused
"staging priority" (in ways which were not entirely straightforward)
with overrides to other concerns.
This has now being split along all the axis, with separate command
subsets for:
- staging prioritisation, now separated between `default`, `priority`,
and `alone`,
- `default` means PRs are picked by an unspecified order when
creating a staging, if nothing better is available
- `priority` means PRs are picked first when staging, however if
`priority` PRs don't fill the staging the rest will be filled with
`default`, this mode did not previously exist
- `alone` means the PRs are picked first, before splits, and only
`alone` PRs can be part of the staging (which usually matches the
modename)
- `skipchecks` overrides both statuses and approval checks, for the
batch, something previously implied in `p=0`, but now
independent. Setting `skipchecks` basically makes the entire batch
`ready`.
For consistency this also sets the reviewer implicitly: since
skipchecks overrides both statuses *and approval*, whoever enables
this mode is essentially the reviewer.
- `cancel` cancels any ongoing staging when the marked PR becomes
ready again, previously this was also implied (in a more restricted
form) by setting `p=0`
FWBot removal
=============
While the "forwardport bot" still exists as an API level (to segregate
access rights between tokens) it has been removed as an interaction
point, as part of the modules merge plan. As a result,
fwbot stops responding
----------------------
Feedback messages are now always sent by the mergebot, the
forward-porting bot should not send any message or notification
anymore.
commands moved to the merge bot
-------------------------------
- `ignore`/`up to` simply changes bot
- `close` as well
- `skipci` is now a choice / flag of an `fw` command, which denotes
the forward-port policy,
- `fw=default` is the old `ci` and resets the policy to default,
that is wait for the PR to be merged to create forward ports, and
for the required statuses on each forward port to be received
before creating the next
- `fw=skipci` is the old `skipci`, it waits for the merge of the
base PR but then creates all the forward ports immediately (unless
it gets a conflict)
- `fw=skipmerge` immediately creates all the forward ports, without
even waiting for the PR to be merged
This is a completely new mode, and may be rather broken as until
now the 'bot has always assumed the source PR had been merged.
approval rework
---------------
Because of the previous section, there is no distinguishing feature
between `mergebot r+` = "merge this PR" and `forwardbot r+` = "merge
this PR and all its parent with different access rights".
As a result, the two have been merged under a single `mergebot r+`
with heuristics attempting to provide the best experience:
- if approving a non-forward port, the behavior does not change
- else, with review rights on the source, all ancestors are approved
- else, as author of the original, approves all ancestors which descend
from a merged PR
- else, approves all ancestors up to and including the oldest ancestor
to which we have review rights
Most notably, the source's author is not delegated on the source or
any of its descendants anymore. This might need to be revisited if it
provides too restrictive.
For the very specialized need of approving a forward-port *and none of
its ancestors*, `review=` can now take a comma (`,`) separated list of
pull request numbers (github numbers, not mergebot ids).
Computed State
==============
The `state` field of pull requests is now computed. Hopefully this
makes the status more consistent and predictable in the long run, and
importantly makes status management more reliable (because reference
datum get updated naturally flowing to the state).
For now however it makes things more complicated as some of the states
have to be separately signaled or updated:
- `closed` and `error` are now separate flags
- `merge_date` is pulled down from forwardport and becomes the
transition signal for ready -> merged
- `reviewed_by` becomes the transition signal for approval (might be a
good idea to rename it...)
- `status` is computed from the head's statuses and overrides, and
*that* becomes the validation state
Ideally, batch-level flags like `skipchecks` should be on, well, the
batch, and `state` should have a dependency on the batch. However
currently the batch is not a durable / permanent member of the system,
so it's a PR-level flag and a messy pile.
On notable change is that *forcing* the state to `ready` now does that
but also sets the reviewer, `skipchecks`, and overrides to ensure the
API-mediated readying does not get rolled back by e.g. the runbot
sending a status.
This is useful for a few types of automated / programmatic PRs
e.g. translation exports, where we set the state programmatically to
limit noise.
recursive dependency hack
-------------------------
Given a sequence of PRs with an override of the source, if one of the
PRs is updated its descendants should not have the override
anymore. However if the updated PR gets overridden, its descendants
should have *that* override.
This requires some unholy manipulations via an override of `modified`,
as the ORM supports recursive fields but not recursive
dependencies (on a different field).
unconditional followup scheduling
---------------------------------
Previously scheduling forward-port followup was contigent on the FW
policy, but it's not actually correct if the new PR is *immediately*
validated (which can happen now that the field is computed, if there
are no required statuses *or* all of the required statuses are
overridden by an ancestor) as nothing will trigger the state change
and thus scheduling of the fp followup.
The followup function checks all the properties of the batch to port,
so this should not result on incorrect ports. Although it's a bit more
expensive, and will lead to more spam.
Previously this would not happen because on creation of a PR the
validation task (commit -> PR) would still have to execute.
Misc Changes
============
- If a PR is marked as overriding / canceling stagings, it now does
so on retry not just when setting initially.
This was not handled at all previously, so a PR in P0 going into
error due to e.g. a non-deterministic bug would be retried and still
p=0, but a current staging would not get cancelled. Same when a PR
in p=0 goes into error because something was failed, then is updated
with a fix.
- Add tracking to a bunch of relevant PR fields.
Post-mortem analysis currently generally requires going through the
text logs to see what happened, which is annoying.
There is a nondeterminism / inconsistency in the tracking which
sometimes leads the admin user to trigger tracking before the bot
does, leading to the staging tracking being attributed to them
during tests, shove under the carpet by ignoring the user to whom
that tracking is attributed.
When multiple users update tracked fields in the same transaction
all the changes are attributed to the first one having triggered
tracking (?), I couldn't find why the admin sometimes takes over.
- added and leveraged support for enum-backed selection fields
- moved variuous fields from forwardport to runbot_merge
- fix a migration which had never worked and which never run (because
I forgot to bump the version on the module)
- remove some unnecessary intermediate de/serialisation
fixes#673, fixes#309, fixes#792, fixes#846 (probably)
The github API has gotten a lot more constraining (with rate
restrictions being newly enforced or added somewhat out of nowhere),
and as importantly a lot less reliable. So move the staging process
off of github and locally, similar to the forward porting
process (whose repo cache is being reused for this).
Fixes#247
If the primary email is made public, it is returned directly as part
of the /users endpoint, in which case we don't need to fetch it via
/user/emails.
Also improve error messages, and fix the incorrect checks on the
existence of the github name and email. And allow manually updating
both via the project form.
If the stagings are going to be created locally (via a git working
copy rather than the github API), the mergebot part needs to have
access to the cache, so move the cache over. Also move the maintenance
cron.
In an extermely minor way, this prefigures the (hopeful) eventual
merging of the ~~planes~~ modules.
In 81ce4ea02b the delta for PRs being
listed in the `/forwardport/outstanding` page was increased from 3
days to 7 (1 week), however the warning box in the home page still
used the old cutoffs leading to
An inconsistency between the two and an effective severe overcounting,
as the reason why the cutoff was increased to a week is forward ports
can take a while or the author / reviewer can be a touch busy at end
of week, so 3~4 days are routine when a PR is merged on thursday or
friday (and even worse if there's bank holidays in the mix).
Fix outstanding query to make a positive `state` filtering, instead of
negative, matching 3b52b1aace8674259812a76b1566260937dbcacb.
Also manually create a map of stagings (grouped by branch) sharing a
single prefetch set.
For odoo the mergebot home page has 12 branches in the odoo project
and 8 in spreadsheet, 6 stagings each. This means 120 queries to
retrieve all the heads (Odoo stagings have 5 heads and spreadsheet
have 1, but that seems immaterial).
By fixing `_compute_statuses` and creating a single prefetch set for
all stagings of all branches we can fetch all the commits in a single
query instead of 120.
- add support for authorship (not just approval)
- make display counts directly
- fix `state` filter: postgres can't do negative index lookups
- add indexes for author and reviewed_by as we look them up
- ensure we handle the entire source filtering via a single subquery
Closes#778
- currently disabling staging only works globally, allow disabling on
a single branch
- use a toggle
- remove a pair of tests which work specifically with `fp_target`,
can't work with `active` (probably)
- cleanup search of possible and active stagings, add relevant
indexes and use direct search of relevant branches instead of
looking up from the project
- also use toggle button for `active` on branches
- shitty workaround for upgrading DB: apparently mail really wants to
have a `user_id` to do some weird thing, so need to re-add it after
resetting everything
Fixes#727
Currently, if a PR forward-port PR gets detached the reason for it is
not always obvious, and may have to be hunted in the logs or in
"sibling" PRs.
By writing a forward port reason (hopefully) ever time we detach a PR,
and displaying that reason in the form and dashboard, the
justification should be a lot more obvious.
Fixes#679
It's almost certainly not useful, save as a minor convenience for
tests: decorrelating the branch sequence and the fp sequence seems
like it would only be extremely confusing, and on the mergebot all the
fp_sequence values are set to the default while the sequence values
are set to something useful and sensible (kinda).
Fixes#584
The current system makes / lets GC run during fetching. This has a few
issues:
- the autogc consumes resources during the forward-porting
process (not that it's hugely urgent but it seems unnecessary)
- the autogc commonly fails due to the combination of large repository
(odoo/odoo) and low memory limits (hardmem for odoo, which get
translated into soft ulimits)
As a result, the garbage collection of the repository sometimes stops
entirely, leading to an increase in repository size and a decrease in
performances.
To mitigate this issue, disable the automagic gc and maintenance
during normal operation, and instead add a weekly cron which runs an
aggressive GC with memory limits disabled (as far as they can get, if
the limits are imposed externally there's nothing to be done).
The maintenance is implemented using a full lockout of the
forward-port cron and an in-place GC rather than a copy/gc/swap, as
doing this maintenance at the small hours of the week-end (sat-sun
night) seems like a non-issue: currently an aggressive GC of odoo/odoo
(using the default aggressive options) takes a total of 2:30 wallclock
(5h user) on a fairly elderly machine (it's closer to 20mn wallclock
and 2h user on my local machine, also turns out the cache repos are
kinda badly configured leading to ~30% more objects than necessary
which doesn't help).
For the record, a fresh checkout of odoo/odoo right now yields:
| Overall repository size | |
| * Commits | |
| * Count | 199 k |
| * Total size | 102 MiB |
| * Trees | |
| * Count | 1.60 M |
| * Total size | 2.67 GiB |
| * Total tree entries | 74.1 M |
| * Blobs | |
| * Count | 1.69 M |
| * Total size | 72.4 GiB |
If this still proves insufficient, a further option would be to deploy
a "generational repacking" strategy:
https://gitlab.com/gitlab-org/gitaly/-/issues/2861 (though apparently
it's not yet been implemented / deployed on gitlab so...).
But for now we'll see how it shakes out.
Close#489
It's likely that the PR pages are seen more commonly than the
dashboard by most users, so add alerts there in case users wonder
what's happening.
Fixes#580
This is an important bit of information but it was not visible without
going into the backend.
`user-select-none` doesn't work in BS3 but that way it'll be ready for
an eventual update. Currently when hovering the badge the cursor
switches to text selection, and the text is selectable, which is
useless.
Complements 4e235a2 and finishes the fixes for #617
- code in the various menus added over time through the UI (queues,
configuration, ...)
- update / improve PR layout a tick
- fix "outstanding forward ports" count on the dashboard
- improve hover title / help on dashboard
- add date of last modification (usually date of success / failure)
- make casing more coherent (everything lowercase)
- add explicit note that UTC date on staged at label is staged at datetime
- rediscover yet again that the staging information is when hovering
on the staging *except the staged at label*
- improve `PullRequest.unstage` to always insert the PR at the start of the
reason when cancelling the staging, for clarity / traceability
Closes#560, closes#609
* Adds a changelog page, linked from the main, with content
automatically loaded from the source. To avoid conflicts, each entry
is its own file and entries are grouped by the month during which
the update will (probably) be deployed
* The last group (most likely "last update") doesn't have a title, the
rest do.
* Add changelog entries from the last update so it's not too empty.
* Also update the layout for the alerts a bit: remove bottom margin to
reduce loss of whitespace.
The list of outstanding forwardports was pretty messy as the ordering
was unclear and there was little way to really drill into the thing.
* Shows outstanding forward ports sorted by merged date ascending, the
oldest-merged PRs are the ones most in need of fixing while PRs
which were only just merged can safely be ignored.
* List reviewers with outstanding forward-ports, allow filtering by
clicking on their name, allow deseleting through the subtitle of the
page.
* Don't display reviewer in list when page is already filtered by
reviewer.
Also improve PR page a bit:
* Add reviewer.
* Add direct link to backend (closes#524).
Closes#529
Though the forwardport posts regular reminders that an fw is outdated,
it can be easy to miss for the non-subject (and apparently the
subjects often just ignore the information entirely).
Add a few relevant links there:
* on PR pages, add a link to either the source or the
forward-ports (if applicable), as well as the merge date
* add a new page which lists all the PRs with outstanding
forwardports, as well as the forwardports in question
Fixes#474
Genericise runbot_merge's tagging (move states to the "UI" but only
store / manage actual tags), and remove forwardport.tagging as it's
now redundant.
Closes#232
During freezes it can be useful to notify viewers that nothing is
going to forward port or merge for a while, and that this is
intentional (not something that's broken).
Fixes#307
If a PR is *merged*, enqueue it for deletion (with a 2 weeks delay).
Mainly to avoid FW branches staying around long after they've been
merged (possibly eventually closed?), will also clean up regular
merged branches, including historical merges forgotten by their
author.
Fixes#230
When posting a reminder that there are open / waiting forward ports on
a source PR, also post *which* PRs those are.
While at it, move the cron code in a proper python file (so we can use
stuff from odoo.tools), and fix display_name so we can straight use
display_name as a github ref' ({owner}/{repo}#{number}). This impacts
log-grepping but it seems like an improvement nonetheless.
Closesodoo/runbot#228
Having all the feedback be sent by the mergebot user (github_token) is
confusing. Add a way to specify which field of project should be used to
source the token used when sending feedback.
Fixes#190
* don't warn on every PR on the dot every week, instead wait for the
PRs to be "sufficiently old" (at least 3 days)
* after discussion with bugfix, the reminder ping should be sent every
day following the PR being "old enough"
* run the cron every day instead of every week
* add an override context key for test purposes
Closes#198
* Cherrypicking is handrolled because there seems to be no easy way to
programmatically edit commit messages during the cherrypicking
sequence: `-n` basically squashes all commits and `-e` invokes a
subprocess. `-e` with `VISUAL=false` kinda sorta works (in that it
interrupts the process before each commit), however there doesn't
seem to be clean status codes so it's difficult to know if the
cherrypick failed or if it's just waiting for a commit of this step.
Instead, cherrypick commits individually then edit / rewrite their
commit messages:
* add a reference to the original commit
* convert signed-off-by to something else as the original commit was
signed off but not necessarily this one
* Can't assign users when creating PRs: only repository collaborators
or people who commented on the issue / PR (which we're in the
process of creating) can be assigned.
PR authors are as likely to be collaborators as not, and we can have
non-collaborator reviewers. So pinging via a regular comment seems
less fraught as a way to notify users.