There is no check or unique index, so nothing actually prevents having
multiple fw tasks for the same batch.
In that case, if we try to update the limit the handler will crash
because the code assumes a single forwardport task per batch.
Triggered by nle on odoo/odoo#201394 probably while trying to recover
from the incident of March 13/14 (which was not going to work as the
forwardport queue itself was hosed, see two commits above for the
reason).
The incident of Feb 13/Feb 14 was that the branch
update (`refs/heads/*:refs/heads/*`) failed, but the old message
assumes those always succeed and its the retrieval of the head which
fails.
Don't make assumptions in the error message, and dump the error from
git.
The queueing system clearly tried to mitigate the time delay from the
old cron style while trying to make things safe-ish. However this
probably is unnecessary (and might be harmful) thanks to cron
triggers: instead of processing a batch of records from the queue we
can just process one record, then trigger the cron, until we don't
find any record anymore. This means every cron run is much
shorter (doesn't try to process 10~100 elements per loop) and should
be way more reliable as they're all isolated, with some overhead to
pay. A cron with a long queue also should not lock up a worker for
quite as long.
This means on server shutdown, the crons should just try to finish
their current item (which hopefully doesn't take too long even for a
multi-stage update) and once that's done allow for the cron worker to
see the shutdown request.
Also remove the mostly unnecessary exclusive lock on queue records:
the cron system itself guarantees a given cron job only gets run by
one worker at a time, so there should be no concurrent access to a
job (except for trying to unlink a previously enqueued record I
guess).
Should limit the risks of #1085 repeating assuming it was caused by an
abrupt termination after 90s.
Also reduce the grace period for merged PR branches to 1 week (from
2), and go through the on-disk repository instead of the github API.
Technically it might even be possible to do this un bulk, though that
seems annoying in case of failure (e.g. because somebody else deleted
the branch previously).
Fixes#1082
Partial mitigation of #1065, not complete by any means. Avoiding
updating a local ref during staging is probably the most important bit
here, we're staging on commits (into a new ref) anyway so it should
not be needed, and that avoids conflicts between the staging cron and
e.g. the forwardport cron.
`groupby` returns an iterator of `(key, list[value])`, not a
recordset. So `BaseModel.__or__` is invalid.
I missed that bit because I didn't bother writing a test where a PR
has more than 6 months and triggers generating / sending emails, but
turns out we do have those in the production database, and as a result
the mergebot hasn't sent an outstanding fw notification since I
deployed 892fce4ddf in early
february (oops).
Thanks to @clbr-odoo for the notification / report of the issue.
This can yield odd effects when playing around e.g. opening freeze
wizards then cancelling them. There is a check that there are no
wizards left, but if a freeze is performed then an other is created
and cancelled it will immediately re-enable forward ports,
unexpectedly.
Since forever, if a PR gets added to an existing forward ported batch
it will be forward ported immediately, leading to a forwardport having
a source with no merge date.
Since the addition of the skipmerge feature, it's even more possible
than before.
As a result, the `outstanding` listing should not assume the
merge_date of a source is always set.
Hit issues where this didn't seem to work after deactivating 17.2 and
trying to recover from the autoport being broken (see previous
commits), but apparently this says everything works just fine so
╮( ̄▽ ̄"")╭
Fixes#1052
In the case where multiple batch completions are triggered (see 3
commits earlier) one of the issues is that the "already ported" check
is incorrect, as it checks the (source, target) pair against the PR
used to trigger the completion check. That PR *should* be the one PR
which was just added to a batch, but if one of its descendants is used
to re-trigger a port then the dupe check fails and we try to port the
thing again.
Remember to deref' the subject's source in order to perform the dupe
check, as well as set the source for the forwardport we create.
Fixes#1052
When an fw batch fails, log a message to its chatter (so the reason
for the failure doesn't necessarily have to be hunted down in the
logs, although depending on the quality of the error that might still
be an issue).
Also if a batch keeps failing (fails for more than a day at a retry
every hour -- increased from a previous 30mn), stop retrying it and
flag it in the list view, it's clearly not going to go through.
This is because we hit an issue with a completion fw batch created on
January 27th which kept failing until it was deleted on February
10th. Thankfully it failed on `git push` and git operations apparently
are not rate limited at all, but still it's not great stewartship to
keep trying a forwardport which keeps failing. Retrying in case of
transient failure makes sense, but after 24 attempts over a day it's
either not transient, or it's not working because github is down and
hammering it won't help.
This is a bit of an odd case which was only noticed because of
persistent forwardport.batches, which ended up having a ton of related
traceback in the logs (the mergebot kept trying to create forward
ports from Jan 27th to Feb 10th, thankfully the errors happened in git
so did not seem to eat through our API rate limiting).
The issue was triggered by the addition of odoo/enterprise#77876 to
odoo/odoo#194818. This triggered a completion job which led to the
creation of odoo/enterprise#77877 to odoo/enterprise#77880, so far so
good.
Except the bit of code responsible for creating completion jobs only
checked if the PR was being added to a batch with a descendant. That
is the case of odoo/enterprise#77877 to odoo/enterprise#77879 (not
odoo/enterprise#77880 because that's the end of the line). As a
result, those triggered 3 more completion jobs, which kept failing in
a loop because they tried pushing different commits to their
next-siblings (without forcing, leading git to reject the non-ff push,
hurray).
A completion request should only be triggered by the addition of a
new *source* (a PR without a source) to an existing batch with
descendants, so add that to the check. This requires updating
`_from_json` to create PRs in a single step (rather than one step to
create based on github's data, and an other one for the hierarchical
tracking) as we need the source to be set during `create` not as a
post-action.
Although there was a test which could have triggered this issue, the
test only had 3 branches so was not long enough to trigger the issue:
- Initial PR 1 on branch A merged then forward-ported to B and C.
- Sibling PR 2 added to the batch in B.
- Completed to C.
- Ending there as C(1) has no descendant batch, leading to no further
completion request.
Adding a 4th branch did surface / show the issue by providing space
for a new completion request from the creation of C(2). Interestingly
even though I the test harness attempts to run all triggered crons to
completion there can be misses, so the test can fail in two different
ways (being now checked for):
- there's a leftover forwardport.batch after we've created all our
forwardports
- there's an extra PR targeting D, descending from C(2)
- in almost every case there's also a traceback in the logs, which
does fail the build thanks to the `env` fixture's check
Looks like `--force-with-lease` can work in multi-head updates, and
this should allow just one push per repository for the entire sequence
of update, then commit the entire thing if it succeeds.
For historical reasons pretty much all tests used to use the contexts
legal/cla and ci/runbot. While there are a few tests where we need the
interactions of multiple contexts and that makes sense, on the vast
majority of tests that's just extra traffic and noise in the
test (from needing to send multiple statuses unnecessarily).
In fact on the average PR where everything passes by default we could
even remove the required statuses entirely...
Should limit the risk of cases where the fork contains outdated
versions of the reference branches and we end up with odd outdated /
not up to date basis for branches & updates, which can lead to
confusing situations.
Skipmerge creates forward-ports before the source PR is even merged.
- In a break from the norm, skipmerge will create forwardports even in
the face of conflicts.
- It will also not *detach* pull requests in case of conflicts, this
is so the source PR can be updated and the update correctly cascades
through the stack (likewise for any intermediate PR though *that*
will detach as usual).
Note that this doesn't really look at outstandings, especially as they
were recently updated, so it might need to be fixed up in case of
freakout, but I feel like that should not be too much of an issue, the
authors will just get their FW reminders earlier than usual. If that's
a hassle we can always update the reminder job to ignore forward ports
whose source is not merged I guess.
Fixes#418
There should only be 1 PR with a given parent at most anyway, but
still if it's a boolean we can stop at the first occurrence if any. pg
might even be able to do something more efficient in that case.
After 6 months of sitting unmerged, forward ports will now trigger a
monthly email.
Or would, if an email server was configured on the mergebot, but
adding this now should allow debugging its working and tracking the
generation of emails through the "Delivery Failed" collection. This
way if we ever decide to actually do the thing we can just enable an
SMTP server.
Fixes#801
Previously the alert would show the total number of outstanding
forward ports, which most people don't care about (if they're even
logged which probably isn't the case), so that was mostly useful
for... me.
Add the current user's tally to the message as well. If the user is
logged. Which is always, because the alert only shows up if the user
has read access to PRs.
Fixes#801
In the case where both author and reviewer of a PR have the same
team (which is very common e.g. self review or a team member approving
your PR) then the `outstanding_per_group` entry would be incremented
twice, leading to overcounting outstanding forward ports for the team.
Fixes#801
- only remind weekly initially (not daily)
- root reminders on the forward port's creation, not the source's
merge date
- cap reminder interval at 4 weeks (instead of doubling every time)
- track reminders on forward ports, don't share between siblings
- remove `forwardport_updated_before` from the testing system, it's
now possible to just update `reminder_next` to a past date and see
if it gets triggered (it should to nothing on sources, or on forward
port in a state which does not warrant concern)
Fixes#801
The `to_pr` helper was added a *while* ago to replace the pretty
verbose process of looking a PR with the right number in the right
repository after a `make_pr`. However while I did some ad-hoc
replacement of existing `search` calls as I had to work with existing
tests I never did a full search and replace to try and excise searches
from the test suite.
This is now done. I've undoubtedly missed a few, but a hundred-odd
lines of codes have been simplified.
- don't *fail* in `_compute_identity`, it causes issues when the token
is valid but doesn't have `user:email` access as the request is
aborted and saving doesn't work
- make `github_name` and `github_email` required rather than ad-hoc
requiring them in `_compute_identity` (which doesn't work correctly)
- force copy of `github_name` and `github_email`, with o2ms being
!copy this means duplicating projects now works out of the box (or
should...)
Currently errors in `_compute_identity` are reported via logging which
is not great as it's not UI visible, should probably get moved to
chatter eventually but that's not currently enabled on projects.
Fixes#990
Since b45ecf08f9 forwardport batches
which fail have a delay set in order to avoid spamming. However that
delay was not displayed anywhere, which made things confusing as the
batch would not get run even after creating new triggers.
Show the delay if it's set (to a value later than now), as a relative
delta for clarity (as normally the delay is in minutes so a full blown
date is difficult to read / aprehend), and allow viewing and setting
it in the form view.
Fixes#982
Like limit, fw=no should restrict the table length, in this case to
just the current branch (as we're not forward porting at all).
Before this, `no` would not be applied as a limit visually, the table
would still go up to the main branch which is very confusing.
Fixes#962
This is not a full user-driven backport thingie for now, just one
admins can use to facilitate thing and debug issues with the
system. May eventually graduate to a frontend feature.
Fixes#925
Given branch A, and branch B forked from it. If B removes a file which
a PR to A later modifies, on forward port Git generates a
modify/delete conflict (as in one side modifies a file which the
other deleted).
So far so good, except while it does notify the caller of the issue
the modified file is just dumped as-is into the working copy (or
commit), which essentially resurrects it.
This is an issue, *especially* as the file is already part of a
commit (rather tan just a U local file), if the developer fixes the
conflict "in place" rather than re-doing the forward-port from scratch
it's easy to miss the reincarnated file (and possibly the changes that
were part of it too), which at best leaves parasitic dead code in the
working copy. There is also no easy way for the runbot to check it as
adding unimported standalone files while rare is not unknown
e.g. utility scripts (to say nothing of JS where we can't really track
the usages / imports at all).
To resolve this issue, during conflict generation post-process
modify/delete to insert artificial conflict markers, the file should
be syntactically invalid so linters / checkers should complain, and
the minimal check has a step looking for conflict markers which should
fire and prevent merging the PR.
Fixes#896
- Switch to just `default` ci.
- Autouse fixture to init the master branch.
- Switch to `make_commits`.
- Merge `test_reopen_update` and `test_update_closed_revalidate` into
`test_update_closed`: the former did basically nothing useful and
the latter could easily be folded into `test_update_closed` by just
validating the new commit.
And unconditionally unstage when the HEAD of a PR is synchronised.
While a rebuild on a PR which was previously staged can be a false
positive (e.g. because it hit a non-derministic test the second time
around) it can also be legitimate e.g. auto-rebase of an old PR to
check it out. In that case the PR should be unstaged.
Furthermore, as soon as the PR gets rebuilt it goes back into
`approved` (because the status goes to pending and currently there's
no great way to suppress that in the rebuild case without also fucking
it up for the sync case). This would cause a sync on the PR to be
missed (in that it would not unstage the PR), which is broken. Fix
that by not checking the state of the PR beforehand, it seems to be an
unnecessary remnant of older code, and not really an optimisation (or
at least one likely not worth bothering with, especially as we then
proceed to perform a full PR validation anyway).
Fixes#950
The UX around the split of limit and forward port policy (and
especially moving "don't forward port" to the policy) was not really
considered and caused confusion for both me and devs: after having
disabled forward porting, updating the limit would not restore it, but
there would be no indication of such an issue.
This caused odoo/enterprise#68916 to not be forward ported at merge
(despite looking for all the world like it should be), and while
updating the limit post-merge did force a forward-port that
inconsistency was just as jarring (also not helped by being unable to
create an fw batch via the backend UI because reasons, see
10ca096d86).
Fix this along most axis:
- Notify the user and reset the fw policy if the limit is updated
while `fw=no`.
- Trigger a forward port if the fw policy is updated (from `no`) on a
merged PR, currently only sources.
- Add check & warning to limit update process so it does *not* force a
port (though maybe it should under the assumption that we're
updating the limit anyway? We'll see).
Fixes#953
Apparently when I added these to trigger the corresponding cron the
lack of decorator caused them to not work at all, at least when
invoked from the interface, consequently I notably wasn't able to
force a forward port via creating one such task when trying to work
around #953
Today (or really a month ago) I learned: when giving git a symbolic
ref (e.g. a ref name), if it's ambiguous then
1. If `$GIT_DIR/$name` exists, that is what you mean (this is usually
useful only for `HEAD`, `FETCH_HEAD`, `ORIG_HEAD`, `MERGE_HEAD`,
`REBASE_HEAD`, `REVERT_HEAD`, `CHERRY_PICK_HEAD`, `BISECT_HEAD` and
`AUTO_MERGE`)
2. otherwise, `refs/$name` if it exists
3. otherwise, `refs/tags/$name` if it exists
4. otherwise, `refs/heads/$name` if it exists
5. otherwise, `refs/remotes/$name` if it exists
6. otherwise, `refs/remotes/$name/HEAD` if it exists
This means if a tag and a branch have the same name and only the name
is provided (not the full ref), git will select the tag, which gets
very confusing for the mergebot as it now tries to rebase onto the tag
(which because that's not fun otherwise was not even on the branch of
the same name).
Fix by providing full refs to `rev-parse` when trying to retrieve the
head of the target branches. And as a defense in depth opportunity,
also exclude tags when fetching refs by spec: apparently fetching a
specific commit does not trigger the retrieval of tags, but any sort
of spec will see the tags come along for the ride even if the tags are
not in any way under the fetched ref e.g. `refs/heads/*` will very
much retrieve the tags despite them being located at `refs/tags/*`.
Fixes#922
Reminding users that `r-` on a forward port only unreviews *that*
forwardport is useful, as `r+;r-` is not a no-op (all preceding
siblings are still reviewed).
However it is useless if all siblings are not approved or already
merged. So avoid sending the warning in that case.
Fixes#934
The FW reminder is useful to remind people of the outstanding forward
ports they need to process, as taking too long can be an issue.
They are, however, not useful if the developer has already done
everything, the PR is ready and unblocked, but the mergebot has fallen
behind and has a hard time catching up. In that case there is nothing
for the developer to do, so pinging them is not productive, it's only
frustrating.
Fixes#930
In some cases, feedback to the PR author that an r+ is redundant went
missing.
This turns out to be due to the convolution of the handling of
approval on forward-port, and the fact that the target PR is treated
exactly like its ancestors: if the PR is already approved the approval
is not even attempted (and so no feedback if it's incorrect).
Straighten up this bit and add a special case for the PR being
commented on, it should have the usual feedback if in error or already
commented on.
Furthermore, update `PullRequests._pr_acl` to kinda work out of the
box for forward-port: if the current PR is a forward port,
`is_reviewer` should check delegation on all ancestors, there doesn't
seem to be any reason to split "source_reviewer", "parent_reviewer",
and "is_reviewer".
Fixes#939
- reduce font-size and bold-ness a hair as it causes issues in the
backend
- remove font adjustment on root object
- add `text-bg-primary` in the oustanding FP view, apparently BS5 does
not do anything like that by default when setting `bg-primary` for
some reason so the current team or user appears in black on dark
blue leading to sub-par readability
Broken (can't run odoo at all):
- In Odoo 17.0, the `pre_init_hook` takes an env, not a cursor, update
`_check_citext`.
- Odoo 17.0 rejects `@attrs` and doesn't say where they are or how to
update them, fun, hunt down `attrs={'invisible': ...` and try to fix
them.
- Odoo 17.0 warns on non-multi creates, update them, most were very
reasonable, one very wasn't.
Test failures:
- Odoo 17.0 deprecates `name_get` and doesn't use it as a *source*
anymore, replace overrides by overrides to `_compute_display_name`.
- Multiple tracking changes:
- `_track_set_author` takes a `Partner` not an id.
- `_message_compute_author` still requires overriding in order to
handle record creation, which in standard doesn't support author
overriding.
- `mail.tracking.value.field_type` has been removed, the field type
now needs to be retrieved from the `field_id`.
- Some tracking ordering have changed and require adjusting a few
tests.
Also added a few flushes before SQL queries which are not (obviously
at least) at the start of a cron or controller, no test failure
observed but better safe than sorry (probably).
With the trigger-ification pretty much complete the only cron that's
still routinely triggered explicitly is the cross-pr check, and it's
that in all modules, so there's no cause to keep an overridable
fixture.
The staging cron turns out to be pretty reasonable to trigger, as we
already have a handler on the transition of a batch to `not blocked`,
which is exactly when we want to create a staging (that and the
completion of the previous staging).
The batch transition is in a compute which is not awesome, but on the
flip side we also cancel active stagings in that exact scenario (if it
applies), so that matches.
The only real finesse is that one of the tests wants to observe the
instant between the end of a staging (and creation of splits) and the
start of the next one, which because the staging cron is triggered by
the failure of the previous staging is now "atomic", requiring
disabling the staging cron, which means the trigger is skipped
entirely. So this requires triggering the staging cron by hand.
The merge cron is the one in charge of checking staging state and
either integrating the staging into the reference branch (if
successful) or cancelling the staging (if failed).
The most obvious trigger for the merge cron is a change in staging
state from the states computation (transition from pending to either
success or failure). Explicitly cancelling / failing a staging marks
it as inactive so the merge cron isn't actually needed.
However an other major trigger is *timeout*, which doesn't have a
trivial signal. Instead, it needs to be hooked on the `timeout_limit`,
and has to be re-triggered at every update to the `timeout_limit`,
which in normal operations is mostly from "pending" statuses bumping
the timeout limit. In that case, `_trigger` to the `timeout_limit` as
that's where / when we expect a status change.
Worst case scenario with this is we have parasitic wakeups of this
cron, but having half a dozen wakeups unnecessary wakeups in an hour
is still probably better than having a wakeup every minute.
These are pretty simple to convert as they are straightforward: an
item is added to a work queue (table), then a cron regularly scans
through the table executing the items and deleting them.
That means the cron trigger can just be added on `create` and things
should work out fine.
There's just two wrinkles in the port_forward cron:
- It can be requeued in the future, so needs a conditional trigger-ing
in `write`.
- It is disabled during freeze (maybe something to change), as a
result triggers don't enqueue at all, so we need to immediately
trigger after freeze to force the cron re-enabling it.
Mergebot / forwardport crons need to run in a specific ordering in
order to flow into one another correctly. The default ordering being
unspecified, it was not possible to use the normal cron
runner (instead of the external driver running crons in sequence one
at a time). This can be fixed by setting *sequences* on crons, as the
cron runner (`_process_jobs`) will use that order to acquire and run
crons.
Also override `_process_jobs` however: the built-in cron runner
fetches a static list of ready crons, then runs that.
This is fine for normal situation where the cron runner runs in a loop
anyway but it's any issue for the tests, as we expect that cron A can
trigger cron B, and we want cron B to run *right now* even if it
hadn't been triggered before cron A ran.
We can replace `_process_job` with a cut down version which does
that (cut down because we don't need most of the error handling /
resilience, there's no concurrent workers, there's no module being
installed, versions must match, ...). This allows e.g. the cron
propagating commit statuses to trigger the staging cron, and both will
run within the same `run_crons` session.
Something I didn't touch is that `_process_jobs` internally creates
completely new environments so there is no way to pass context into
the cron jobs anymore (whereas it works for `method_direct_trigger`),
this means the context values have to be shunted elsewhere for that
purpose which is gross. But even though I'm replacing `_process_jobs`,
this seems a bit too much of a change in cron execution semantics. So
left it out.
While at it tho, silence the spammy `py.warnings` stuff I can't do
much about.