There is no check or unique index, so nothing actually prevents having
multiple fw tasks for the same batch.
In that case, if we try to update the limit the handler will crash
because the code assumes a single forwardport task per batch.
Triggered by nle on odoo/odoo#201394 probably while trying to recover
from the incident of March 13/14 (which was not going to work as the
forwardport queue itself was hosed, see two commits above for the
reason).
Also reduce the grace period for merged PR branches to 1 week (from
2), and go through the on-disk repository instead of the github API.
Technically it might even be possible to do this un bulk, though that
seems annoying in case of failure (e.g. because somebody else deleted
the branch previously).
Fixes#1082
Since 94cf3e9647, FW is not limited to
reviewers. And it's been possible to set the fw policy on any forward
port pretty much since it was created (during commands rework /
formalisation).
However the help for the fw subcommands would only be shown on the
source PR, unnecessarily.
Because the only condition between PRs and statuses is sharing a
repository (and kinda not even that for stagings), adding or removing
a status on a repository would try to recompute the statuses/state of
essentially every staging in the history of the project, and a
significant fraction of the PRs, leading to tens of thousands of
queries, minutes of computation, and even OOMs of the HTTP workers as
we'd load the PRs, their batches, and the stagings into memory to
update them, then load more things because of what depends on PR
statuses, etc...
But there is no reason to touch any of the closed or merged PRs, or
the completed (deactivated) stagings. And in fact there is every
reason *not* to.
Implementing a search-only m2m on each object in order to restrict the
set of PRs/stagings relevant to a change reduces the number of queries
to a few hundreds, the run time to seconds, and the memory increase to
unnoticeable. The change still takes some time because the RD project
currently has about 7000 open PRs, 4500 of which target
odoo/odoo (which is our test case here), but that is nothing compared
to the 164000 total PRs to odoo/odoo out of some 250000 PRs for the RD
project.
And though they're likely less of an issue as they don't recurse quite
as much, the >120000 stagings during the project's history are not to
be ignored, when then number of *active* stagings at any one time is
at most the number of branches, which is half a dozen to a dozen.
For very basic numbers, as of committing this change, creating a
status in odoo/odoo (RD), optional on PRs and ignored on statuses,
on my current machine (7530U with 32GB RAM):
- without this change, 4835 queries, 37s of sql, 65s of non-SQL, RSS
climbs to 2258128 (2.15GiB)
- with this change, 758 queries, 1.46s SQL, 2.25s non-SQL, RSS climbs
to 187088 (182MiB)
Fixes#1067
If a status is defined as `optional`, then the PR is considered valid
if the status is never sent, but *if* the status is sent then it
becomes required.
Note that as ever this is a per-commit requirement, so it's mostly
useful for conditional statuses.
Fixes#1062
This is a bit of an odd case which was only noticed because of
persistent forwardport.batches, which ended up having a ton of related
traceback in the logs (the mergebot kept trying to create forward
ports from Jan 27th to Feb 10th, thankfully the errors happened in git
so did not seem to eat through our API rate limiting).
The issue was triggered by the addition of odoo/enterprise#77876 to
odoo/odoo#194818. This triggered a completion job which led to the
creation of odoo/enterprise#77877 to odoo/enterprise#77880, so far so
good.
Except the bit of code responsible for creating completion jobs only
checked if the PR was being added to a batch with a descendant. That
is the case of odoo/enterprise#77877 to odoo/enterprise#77879 (not
odoo/enterprise#77880 because that's the end of the line). As a
result, those triggered 3 more completion jobs, which kept failing in
a loop because they tried pushing different commits to their
next-siblings (without forcing, leading git to reject the non-ff push,
hurray).
A completion request should only be triggered by the addition of a
new *source* (a PR without a source) to an existing batch with
descendants, so add that to the check. This requires updating
`_from_json` to create PRs in a single step (rather than one step to
create based on github's data, and an other one for the hierarchical
tracking) as we need the source to be set during `create` not as a
post-action.
Although there was a test which could have triggered this issue, the
test only had 3 branches so was not long enough to trigger the issue:
- Initial PR 1 on branch A merged then forward-ported to B and C.
- Sibling PR 2 added to the batch in B.
- Completed to C.
- Ending there as C(1) has no descendant batch, leading to no further
completion request.
Adding a 4th branch did surface / show the issue by providing space
for a new completion request from the creation of C(2). Interestingly
even though I the test harness attempts to run all triggered crons to
completion there can be misses, so the test can fail in two different
ways (being now checked for):
- there's a leftover forwardport.batch after we've created all our
forwardports
- there's an extra PR targeting D, descending from C(2)
- in almost every case there's also a traceback in the logs, which
does fail the build thanks to the `env` fixture's check
Skipmerge creates forward-ports before the source PR is even merged.
- In a break from the norm, skipmerge will create forwardports even in
the face of conflicts.
- It will also not *detach* pull requests in case of conflicts, this
is so the source PR can be updated and the update correctly cascades
through the stack (likewise for any intermediate PR though *that*
will detach as usual).
Note that this doesn't really look at outstandings, especially as they
were recently updated, so it might need to be fixed up in case of
freakout, but I feel like that should not be too much of an issue, the
authors will just get their FW reminders earlier than usual. If that's
a hassle we can always update the reminder job to ignore forward ports
whose source is not merged I guess.
Fixes#418
Not entirely sure about just allowing any PR author to set the merge
method as it gives them a lot more control over commits (via the PR
message), and I'm uncertain about the ramifications of doing that.
However if the author of the PR is classified as an
employee (via a provisioned user linked to the github account) then
allow it. Should not be prone to abuse at least.
Fixes#1036
All `pull_request` events seem to provide the `commits` count
property. As such we can use them all to check the `squash` state even
if we don't otherwise care for the event.
Also split out notifying an approved pull request about its missing
merge method into a separate cron from the one notifying a PR that its
siblings are ready.
Fixes#1036
The old controller system required `type='json'` which only did
JSON-RPC and prevented returning proper responses.
As of 17.0 this is not the case anymore, `type='http'` controllers can
get `content-type: application/json` requests just fine and return
whatever they want. So change that:
- `type='json'`.
- Return `Response` objects with nice status codes (and text
mimetypes, as otherwise werkzeug defaults to html).
- Update ping to bypass normal flow as otherwise it requires
authentication and event sources which is annoying (it might be a
good idea to have both in order to check for configuration, however
it's not possible to just send a ping via the webhook UI on github
so not sure how useful that would be).
- Add some typing and improve response messages while at it.
Note: some "internal" errors (e.g. ignoring event actions) are
reported as successes because as far as I can tell webhooks only
support success (2xy) / failure (4xy, 5xy) and an event that's ignored
is not really *failed* per se.
Some things are reported as failures even though they are on the edge
because that can be useful to see what's happening e.g. comment too
large or unable to lock rows.
Fixes#1019
Probably should create a mixin for this: when a model is used as a
task queue for a cron, the cron should automatically be triggered on
creation. Requiring an explicit trigger after a creation is error
prone and increase the risks that some of the triggers will be
forgotten/missed.
Adds a very limited ability to try and look for false positive /
non-determinstic staging errors. It tries to err on the side of
limiting false false positives, so it's likely to miss many.
Currently has no automation / reporting, just sets a flag on the
stagings which are strongly believed to have failed due to false
positives.
While at it, add link between a "root" staging and its splits. It's
necessary to clear the "false positive" flag, and surfacing it in the
UI could be useful one day.
Fixes#660
Requires parsing the commit messages as github plain doesn't collate
the info from there, but also the descriptions: apparently github only
adds those to the references if the PR targets the default
branch. That's not a limitation we want.
Meaning at the end of the day, the only thing we're calling graphql
for is explicit manual linking, which can't be tested without
extending the github API (and then would only be testable on DC), and
which I'm not sure anyone bothers with...
Limitations
-----------
- Links are retrieved on staging (so if one is added later it won't be
taken in account).
- Only repository-local links are handled, not cross-repository.
Fixes#777
If a comment causes an unknown PR to be fetched, it's a bit odd to
ping the author (and possibly reviewer) anyway as they're not super
concerned (and technically we could be ignoring the purported /
attempted reviewer).
So if a fetch job was created because of a comment, remember the
comment author and ping *them* instead of using the default ping
policy.
Fixes#981
If staging gets re-enabled on a branch (or the branch itself gets
re-enabled), immediately run a staging cron as there may already be
PRs waiting, and no trigger enqueued: cron triggers have no payload,
they just get removed when the cron runs which means if a bunch of PRs
become ready for branch B with staging disabled, the cron is going to
run, it's going to stage nothing on that branch (because staging is
disabled) then it's going to delete all the triggers.
Fixes#979
As far as I can tell they are properly handled:
- In `handle_status` we let the http layer retry the query, which
pretty much always succeeds.
- In `Commit.notify`, we rollback the application of the current
commit, meaning it'll be processed by the next run of the cron,
which also seems to succeed every time (that is going through the
log I pretty much never notice the same commit being serialization
failure'd twice in a row).
Which we can trigger for faster action, this last item is not
entirely necessary as statuses should generally come in fast and
especially if we have concurrency errors, but it can't hurt.
This means the only genuine issue is... sql_db logging a "bad query"
every time there's a serialization failure.
In `handle_status`, just suppress the message outright, if there's an
error other than serialization the http / dispatch layer should catch
and log it.
In `Commit._notify` things are slightly more difficult as the execute
is implicit (`flush` -> `_write` -> `execute`) so we can't pass the
flag by parameter. One option would be to set and unset
`_default_log_exception`, but it would either be a bit dodgy or it
would require using a context manager and increasing the indentation
level (or using a custom context manager).
Instead just `mute_logger` the fucking thing. It's a bit brutish and
mostly used in tests, but not just, and feels like the least bad
option here...
Closes#805
Missed it during the previous pass, probably because it's in the
middle of `pull_requests.py`. It's a classic template for triggered
crons since the model is just a queue of actions for the cron.
If a PR is closed but part of an ongoing batch, the change in status
of the batch might be reflected on the PR still:
- if a PR is closed and the batch gets staged, the PR shows up as
being staged
- if the PR is merged then the batch gets merged, the PR shows up as
merged
Fixes#914
Also remove the unused `_tagstate` helper property.
Because of the false negatives due to github's reordering of events on
retargeting, blocking merge methods can be rather frustrating or the
user as what's happening and how to solve it isn't clear in that case.
Keep the warnings and all, but remove the blocking factor: if a PR
doesn't have a merge method and is not single-commit, just skip it on
staging. This way, PRs which are actually single-commit will stage
fine even if the mergebot thinks they shouldn't be.
Fixes#957
- fix staging reasons containing escaped quotes (would render as
` ` to the end user)
- remove extra spacing around PR title @title/popovers
- simplify a few view conditionals through judicious use of `t-elif`
and nesting
- make `staging_end` non-computed as it's not computed anymore, just
set if and when the staging gets disabled
(146564a90a)
Add warnings when trying to send comments / commands to PRs targeting
inactive branches.
This was missing leading to confusion, as one warning is clearly not
enough.
Fixes#941
By updating the staging timeout every time we run `_compute_state` and
still have a `pending` status, we'd actually bump the timeout *on
every success CI* except for the last one. Which was never the
intention and can add an hour or two to the mergebot-side timeout.
Instead, add an `updated_at` attribute to statuses (name taken from
the webhook payload even though we don't use that), read it when we
see `pending` required statuses, and update the staging timeout based
on that if necessary.
That way as long as we don't get *new* pending statuses, the timeout
doesn't get updated.
Fixes#952
And unconditionally unstage when the HEAD of a PR is synchronised.
While a rebuild on a PR which was previously staged can be a false
positive (e.g. because it hit a non-derministic test the second time
around) it can also be legitimate e.g. auto-rebase of an old PR to
check it out. In that case the PR should be unstaged.
Furthermore, as soon as the PR gets rebuilt it goes back into
`approved` (because the status goes to pending and currently there's
no great way to suppress that in the rebuild case without also fucking
it up for the sync case). This would cause a sync on the PR to be
missed (in that it would not unstage the PR), which is broken. Fix
that by not checking the state of the PR beforehand, it seems to be an
unnecessary remnant of older code, and not really an optimisation (or
at least one likely not worth bothering with, especially as we then
proceed to perform a full PR validation anyway).
Fixes#950
The UX around the split of limit and forward port policy (and
especially moving "don't forward port" to the policy) was not really
considered and caused confusion for both me and devs: after having
disabled forward porting, updating the limit would not restore it, but
there would be no indication of such an issue.
This caused odoo/enterprise#68916 to not be forward ported at merge
(despite looking for all the world like it should be), and while
updating the limit post-merge did force a forward-port that
inconsistency was just as jarring (also not helped by being unable to
create an fw batch via the backend UI because reasons, see
10ca096d86).
Fix this along most axis:
- Notify the user and reset the fw policy if the limit is updated
while `fw=no`.
- Trigger a forward port if the fw policy is updated (from `no`) on a
merged PR, currently only sources.
- Add check & warning to limit update process so it does *not* force a
port (though maybe it should under the assumption that we're
updating the limit anyway? We'll see).
Fixes#953
- rather than enumerate states, forward-porting should just check if
the statuses are successful on a PR
- for the same consistency reasons explained in
f97502e503, `skipchecks` should force
the status of a PR to `success`: it very odd that a PR would be
ready without being validated...
Rather than only setting `staging_end` on status change, set it when
the staging gets deactivated. This way cancelling a staging (whether
explicitely or via a PR update) will also end it, so will a staging
timeout, etc..., rather than keep the counter running.
Fixes#931
Reminding users that `r-` on a forward port only unreviews *that*
forwardport is useful, as `r+;r-` is not a no-op (all preceding
siblings are still reviewed).
However it is useless if all siblings are not approved or already
merged. So avoid sending the warning in that case.
Fixes#934
In some cases, feedback to the PR author that an r+ is redundant went
missing.
This turns out to be due to the convolution of the handling of
approval on forward-port, and the fact that the target PR is treated
exactly like its ancestors: if the PR is already approved the approval
is not even attempted (and so no feedback if it's incorrect).
Straighten up this bit and add a special case for the PR being
commented on, it should have the usual feedback if in error or already
commented on.
Furthermore, update `PullRequests._pr_acl` to kinda work out of the
box for forward-port: if the current PR is a forward port,
`is_reviewer` should check delegation on all ancestors, there doesn't
seem to be any reason to split "source_reviewer", "parent_reviewer",
and "is_reviewer".
Fixes#939
Broken (can't run odoo at all):
- In Odoo 17.0, the `pre_init_hook` takes an env, not a cursor, update
`_check_citext`.
- Odoo 17.0 rejects `@attrs` and doesn't say where they are or how to
update them, fun, hunt down `attrs={'invisible': ...` and try to fix
them.
- Odoo 17.0 warns on non-multi creates, update them, most were very
reasonable, one very wasn't.
Test failures:
- Odoo 17.0 deprecates `name_get` and doesn't use it as a *source*
anymore, replace overrides by overrides to `_compute_display_name`.
- Multiple tracking changes:
- `_track_set_author` takes a `Partner` not an id.
- `_message_compute_author` still requires overriding in order to
handle record creation, which in standard doesn't support author
overriding.
- `mail.tracking.value.field_type` has been removed, the field type
now needs to be retrieved from the `field_id`.
- Some tracking ordering have changed and require adjusting a few
tests.
Also added a few flushes before SQL queries which are not (obviously
at least) at the start of a cron or controller, no test failure
observed but better safe than sorry (probably).
odoo/odoo@1341d52 added native support for tracking / overriding
tracking message so we don't need `change-message `anymore. The calls
had to be modified a bit as `_track_set_log_message` has to be invoked
on the records for which the tracking message is being set / updated.
Sadly the same for authors was only added in 16.3 *and* it's
unsuitable for our needs as we want to set the author without knowing
the scope of affected records (at least in the controller).
That way hopefully in 17.0 we can remove the `_message_compute_author`
override and things should work out. And maybe for 19.0 we can get the
ability to set a per-model (or even global) fallback author into
standard.
Because the first operation the notification task performs is updating
the commit, this has to be flushed in order to read-back the commit
hash (probably fixed in later version).
This means if updating the flag triggers a serialization
failure (which happens and is normal) we transition to the `exception`
handler, which *tries to retrieve the sha again* and we get an "in
failed transaction" error on top of the current "serialization
failure", leading to a giant traceback.
Also an unhelpful one since a serialization failure is expected in
this cron.
- Move the reads with no write dependency out of the `try`, there's no
reason for them to fail, and notably memoize the `sha`.
- Split the handling of serialization failures out of the normal one,
and just log an `info` (we could actually log nothing, probably).
- Also set the precommit data just before the commit, in case staging
tracking is ever a thing which could use it.
Fixes#909
The staging cron turns out to be pretty reasonable to trigger, as we
already have a handler on the transition of a batch to `not blocked`,
which is exactly when we want to create a staging (that and the
completion of the previous staging).
The batch transition is in a compute which is not awesome, but on the
flip side we also cancel active stagings in that exact scenario (if it
applies), so that matches.
The only real finesse is that one of the tests wants to observe the
instant between the end of a staging (and creation of splits) and the
start of the next one, which because the staging cron is triggered by
the failure of the previous staging is now "atomic", requiring
disabling the staging cron, which means the trigger is skipped
entirely. So this requires triggering the staging cron by hand.
The merge cron is the one in charge of checking staging state and
either integrating the staging into the reference branch (if
successful) or cancelling the staging (if failed).
The most obvious trigger for the merge cron is a change in staging
state from the states computation (transition from pending to either
success or failure). Explicitly cancelling / failing a staging marks
it as inactive so the merge cron isn't actually needed.
However an other major trigger is *timeout*, which doesn't have a
trivial signal. Instead, it needs to be hooked on the `timeout_limit`,
and has to be re-triggered at every update to the `timeout_limit`,
which in normal operations is mostly from "pending" statuses bumping
the timeout limit. In that case, `_trigger` to the `timeout_limit` as
that's where / when we expect a status change.
Worst case scenario with this is we have parasitic wakeups of this
cron, but having half a dozen wakeups unnecessary wakeups in an hour
is still probably better than having a wakeup every minute.
These are pretty simple to convert as they are straightforward: an
item is added to a work queue (table), then a cron regularly scans
through the table executing the items and deleting them.
That means the cron trigger can just be added on `create` and things
should work out fine.
There's just two wrinkles in the port_forward cron:
- It can be requeued in the future, so needs a conditional trigger-ing
in `write`.
- It is disabled during freeze (maybe something to change), as a
result triggers don't enqueue at all, so we need to immediately
trigger after freeze to force the cron re-enabling it.
Merge errors are logical failures, not technical, it doesn't make
sense to log them out because there's nothing to be done technically,
a PR having consistency issues or a conflict is "normal". As such
those messages are completely useless and just take unnecessary space
in the logs, making their use more difficult.
Instead of sending them to logging, log staging attempts to the PR
itself, and only do normal logging of the operation as an indicative
item. And remove a bunch of `expect_log_errors` which don't stand
anymore.
While at it, fix a missed issue in forward porting: if the `root.head`
doesn't exist in the repo its `fetch` will immediately fail before
`cat-file` can even run, so the second one is redundant and the first
one needs to be handled properly. Do that. And leave checking
for *that* specific condition as a logged-out error as it should mean
there's something very odd with the repository (how can a pull request
have a head we can't fetch?)
Apparently I'd already fixed that in
286c1fdaee but it has yet to be
deployed.
While at it, add a feedback message to clarify that, unlike `r+`, `r-`
on forward ports does *not* propagate.
Fixes#912
- Update branch name to prefix with project as it can be hard to
differentiate when filtering by or trying to set targets, given some
targets are extremely common (e.g. `master`/`main`) and not all
fields are filtered by project (or even can be).
- Add a proper menu item and list view for batches, maybe it'll be of
use one day.
- Upgrade label in PR search, it's more likely to be needed than
author or target.
- Put PRs first in the mergebot menu, as it's *by far* the most likely
item to look for, unless it's staging in order to cancel one.
Previously PR descriptions were displayed as raw text in the PR
dashboard. While not wrong per se, this was pretty ugly and not always
convenient as e.g. links had to be copied by hand.
Push descriptions through pymarkdown for rendering them, with a few
customisations:
- Enabled footnotes & tables & fenced code blocks because GFM has
that, this doesn't quite put pymarkdown's base behaviour on par with
gfm (and py-gfm ultimately gave up on that effort moving to just
wrap github's own markdown renderer instead).
- Don't allow raw html because too much of a hassle to do it
correctly, and very few people ever do it (mostly me I think).
- Added a bespoke handler / renderer for github-style references.
Note: uses positional captures because it started that way and named
captures are not removed from that sequence so mixing and matching
is not very useful, plus python does not support identically named
groups (even exclusive) so all 4 repo captures and all 3 issue
number captures would need different names...
- And added a second bespoke handler for our own opw/issue references
leading to odoo.com, that's something we can't do via github[^1] so
it's a genuine value-add.
Fixes#889
[^1]: github can do it (though possibly not with the arbitrary
unspecified nonsense I got when I tried to list some of the
reference styles, some folks need therapy), but it's not available
on our plan
I thought I'd removed the error message when approving an already
approved PR but apparently not?
However we can improve the message in that specific case, to make the
expected operation clearer.
Fixes#906
After seeing it be used, I foresee confusion around the current
behaviour (where it sets the limit), as one would expect the `fw=`
flags to affect one another when it looks like that would make sense
e.g. no/default/skipci/skipmerge all specify how to forward port, so
`fw=default` not doing anything after you've said `fw=no` (possibly by
mistake) would be fucking weird.
Also since the author can set limits, allow them to reset the fw
policy to default (keep skipci for reviewers), and for @d-fence add a
`fw=disabled` alias.
Fixes#902
If a PR is cancel=staging, even if it's not the
urgentest (priority=alone) odds are good it's being staged to fix the
split. And even if it's not, it probably can't hurt.
So cancel splits in order to stage it. This may be slightly harmful if
the split is legit and has nothing to do with the PR being
prioritised, but that seems like the less likely scenario. And having
to update staging priorities on the fly seems like a bad idea. Though
obviously it might do nothing if the PRs are in "default" priority.#
Also simplify the unstage trigger from the PRs becoming ready:
- the user is useless as it's always the system user
- the batch id is not really helpful