There is no check or unique index, so nothing actually prevents having
multiple fw tasks for the same batch.
In that case, if we try to update the limit the handler will crash
because the code assumes a single forwardport task per batch.
Triggered by nle on odoo/odoo#201394 probably while trying to recover
from the incident of March 13/14 (which was not going to work as the
forwardport queue itself was hosed, see two commits above for the
reason).
Starting in 3.12 `TarFile.extact(all)` can take a `filter` parameter
which alters the behaviour of the extractor as it relates to the
application of tar metadata. Not passing this parameter is deprecated
unless the also new `TarFile.extraction_filter` attribute is set.
Now there are a few wrinkles here:
1. Both parameter and attributes were added in older patch releases
e.g. 3.9.17 and 3.10.12 and 3.11.4.
2. The attribute form will *not* resolve string filters, it needs the
callables.
As such just to be on the safe side of things set the attribute using
a `getattr`, in releases with the feature it will be set to a proper
value (the future default which ignores most or all metadata from the
archive), and in releases without it just sets the attribute to `None`
which should do no harm.
This was the root cause of the incident of Feb 13/14: because the
patcher pushed to the local branch before pushing to the remote
failing to push to the remote would leave the local ref broken, as
`fetch("refs/heads/*:refs/heads/*")` apparently does not do non-ff
updates (which does make some sense I guess).
So in this case a staging finished, was pushed to the remote, then git
delayed the read side just enough that when the patcher looked up the
target it got the old commit. It applied a patch on top of that, tried
to push, and got a failure (non-ff update), which led the local and
remote branches divergent, and caused any further update of the local
reference branches to fail, thus every forward port to be blocked.
Using symbolic branches during patching was completely dumb (and
updating the local branch unnecessary), so switch the entire thing to
using just commits, and update a bunch of error reporting while at it.
Also reduce the grace period for merged PR branches to 1 week (from
2), and go through the on-disk repository instead of the github API.
Technically it might even be possible to do this un bulk, though that
seems annoying in case of failure (e.g. because somebody else deleted
the branch previously).
Fixes#1082
Partial mitigation of #1065, not complete by any means. Avoiding
updating a local ref during staging is probably the most important bit
here, we're staging on commits (into a new ref) anyway so it should
not be needed, and that avoids conflicts between the staging cron and
e.g. the forwardport cron.
Since 94cf3e9647, FW is not limited to
reviewers. And it's been possible to set the fw policy on any forward
port pretty much since it was created (during commands rework /
formalisation).
However the help for the fw subcommands would only be shown on the
source PR, unnecessarily.
Because the only condition between PRs and statuses is sharing a
repository (and kinda not even that for stagings), adding or removing
a status on a repository would try to recompute the statuses/state of
essentially every staging in the history of the project, and a
significant fraction of the PRs, leading to tens of thousands of
queries, minutes of computation, and even OOMs of the HTTP workers as
we'd load the PRs, their batches, and the stagings into memory to
update them, then load more things because of what depends on PR
statuses, etc...
But there is no reason to touch any of the closed or merged PRs, or
the completed (deactivated) stagings. And in fact there is every
reason *not* to.
Implementing a search-only m2m on each object in order to restrict the
set of PRs/stagings relevant to a change reduces the number of queries
to a few hundreds, the run time to seconds, and the memory increase to
unnoticeable. The change still takes some time because the RD project
currently has about 7000 open PRs, 4500 of which target
odoo/odoo (which is our test case here), but that is nothing compared
to the 164000 total PRs to odoo/odoo out of some 250000 PRs for the RD
project.
And though they're likely less of an issue as they don't recurse quite
as much, the >120000 stagings during the project's history are not to
be ignored, when then number of *active* stagings at any one time is
at most the number of branches, which is half a dozen to a dozen.
For very basic numbers, as of committing this change, creating a
status in odoo/odoo (RD), optional on PRs and ignored on statuses,
on my current machine (7530U with 32GB RAM):
- without this change, 4835 queries, 37s of sql, 65s of non-SQL, RSS
climbs to 2258128 (2.15GiB)
- with this change, 758 queries, 1.46s SQL, 2.25s non-SQL, RSS climbs
to 187088 (182MiB)
Fixes#1067
Commits can take some time to propagate through the network (I guess),
or human error can lead to the wrong commit being set.
Either way, because the entire thing was done using a single fetch in
`check=True` the cron job would fail entirely if any of the patch
commits was yet unavailable.
Update the updater to:
- fallback on individual fetches
- remove the patch from the set of applicable patch if we (still)
can't find its commit
I'd have hoped `fetch` could retrieve whatever it found, but
apparently the server just crashes out when it doesn't find the commit
we ask for, and `fetch` doesn't update anything.
No linked issue because I apparently forgot to jot it down (and only
remembered about this issue with the #1063 patching issue) but this
was reported by mat last week (2025-02-21) when they were wondering
why one of their patches was taking a while:
- At 0832 patch was created by automated script.
- At 0947, an attempt to apply was made, the commit was not found.
- At 1126, a second attempt was made but an other patch had been
created whose commit was not found, failing both.
- At 1255, there was a concurrency error ("cannot lock ref" on the
target branch).
- Finally at 1427 the patch was applied.
All in all it took 6 hours to apply the patch, which is 3-4 staging
cycles.
It's shorter, it's simpler (kinda), and it's 70% faster (although
that's unlikely to be any sort of bottleneck given applying patches
involves multiple network roundtrips).
Verify that the tree is different before and after applying the patch,
otherwise if there's a mistake made (e.g. a script does not check that
they have content and request a patch applying an existing commit
which is how odoo/enterprise#612a9cf3cadba64e4b18d535ca0ac7e3f4429a08
occurred) we end up with a completely empty commit and a duplicated
commit message.
Fixes#1063
Note: using `rev-parse` to retrieve the commit's tree would be 50%
faster, but we're talking 3.2 versus 2.4ms and it requires string
formatting instead of nice clean arguments so it's a bit meh.
If a status is defined as `optional`, then the PR is considered valid
if the status is never sent, but *if* the status is sent then it
becomes required.
Note that as ever this is a per-commit requirement, so it's mostly
useful for conditional statuses.
Fixes#1062
"Run manually" is a bit meh, as it runs the cron synchronously (so you
have to wait for it, and hope it doesn't run for longer than the
request timeout which may be smaller than the cron timeout) and it can
run in a subtly different environment than normal, which can lead to
different behaviour.
Instead add a button to enqueue a cron trigger, now that they exist
that's much closer to what we actually want, and it does run the cron
in a normal / expected environment.
This is a bit of an odd case which was only noticed because of
persistent forwardport.batches, which ended up having a ton of related
traceback in the logs (the mergebot kept trying to create forward
ports from Jan 27th to Feb 10th, thankfully the errors happened in git
so did not seem to eat through our API rate limiting).
The issue was triggered by the addition of odoo/enterprise#77876 to
odoo/odoo#194818. This triggered a completion job which led to the
creation of odoo/enterprise#77877 to odoo/enterprise#77880, so far so
good.
Except the bit of code responsible for creating completion jobs only
checked if the PR was being added to a batch with a descendant. That
is the case of odoo/enterprise#77877 to odoo/enterprise#77879 (not
odoo/enterprise#77880 because that's the end of the line). As a
result, those triggered 3 more completion jobs, which kept failing in
a loop because they tried pushing different commits to their
next-siblings (without forcing, leading git to reject the non-ff push,
hurray).
A completion request should only be triggered by the addition of a
new *source* (a PR without a source) to an existing batch with
descendants, so add that to the check. This requires updating
`_from_json` to create PRs in a single step (rather than one step to
create based on github's data, and an other one for the hierarchical
tracking) as we need the source to be set during `create` not as a
post-action.
Although there was a test which could have triggered this issue, the
test only had 3 branches so was not long enough to trigger the issue:
- Initial PR 1 on branch A merged then forward-ported to B and C.
- Sibling PR 2 added to the batch in B.
- Completed to C.
- Ending there as C(1) has no descendant batch, leading to no further
completion request.
Adding a 4th branch did surface / show the issue by providing space
for a new completion request from the creation of C(2). Interestingly
even though I the test harness attempts to run all triggered crons to
completion there can be misses, so the test can fail in two different
ways (being now checked for):
- there's a leftover forwardport.batch after we've created all our
forwardports
- there's an extra PR targeting D, descending from C(2)
- in almost every case there's also a traceback in the logs, which
does fail the build thanks to the `env` fixture's check
Skipmerge creates forward-ports before the source PR is even merged.
- In a break from the norm, skipmerge will create forwardports even in
the face of conflicts.
- It will also not *detach* pull requests in case of conflicts, this
is so the source PR can be updated and the update correctly cascades
through the stack (likewise for any intermediate PR though *that*
will detach as usual).
Note that this doesn't really look at outstandings, especially as they
were recently updated, so it might need to be fixed up in case of
freakout, but I feel like that should not be too much of an issue, the
authors will just get their FW reminders earlier than usual. If that's
a hassle we can always update the reminder job to ignore forward ports
whose source is not merged I guess.
Fixes#418
Not entirely sure about just allowing any PR author to set the merge
method as it gives them a lot more control over commits (via the PR
message), and I'm uncertain about the ramifications of doing that.
However if the author of the PR is classified as an
employee (via a provisioned user linked to the github account) then
allow it. Should not be prone to abuse at least.
Fixes#1036
All `pull_request` events seem to provide the `commits` count
property. As such we can use them all to check the `squash` state even
if we don't otherwise care for the event.
Also split out notifying an approved pull request about its missing
merge method into a separate cron from the one notifying a PR that its
siblings are ready.
Fixes#1036
The forward port process adds a uniquifier to the branch name as it's
possible to reuse branch names for different sets of PRs (and some
individuals do that systematically) which can cause collisions in the
fw branches.
Originally this uniquifier was random by necessity, however now that
batches are reified and forward port is performed batch-wise we don't
need the randomness we can just use the source batch's id as it's
unique per sequence of forward ports. This means it'll eventually be
possible for an external system to retrieve the source(s) of a forward
port by reading the batch object, and it's also possible to correlate
forward ports through the batch id (although not possible to find the
source set without access to the mergebot's information).
Do the same thing for backports, because why not.
pycharm is a bit dumb so it considers `[ ]` to be unnecessary even in
`re.VERBOSE` regexes. But it has a bit of a point in that it's not
super clear, so inject the space characters by (unicode) name, the
behaviour should be the exact same but the regexes should be slightly
less opaque.
Hopefully this is the last fix to the patcher. From the start of the
implementation I relied on the idea that `git show` was adding a line
composed of a single space (and a newline) before and after the patch
message, as that is what I observed in my terminal, and it's
consistent with RFC 3676 signatures (two dashes, a space, and a
newline).
Turns out that single space, while present in my terminal indeed, was
completely made up by `less(1)`. `git show` itself doesn't generate
that, neither does it appear when using most pagers, or even when
piping the output of `less` into something (a file, an other pager,
...). It's pretty much just something `less(1)` sends to a terminal
during interactive sessions to fuck with you.
Fixes#1037
The old controller system required `type='json'` which only did
JSON-RPC and prevented returning proper responses.
As of 17.0 this is not the case anymore, `type='http'` controllers can
get `content-type: application/json` requests just fine and return
whatever they want. So change that:
- `type='json'`.
- Return `Response` objects with nice status codes (and text
mimetypes, as otherwise werkzeug defaults to html).
- Update ping to bypass normal flow as otherwise it requires
authentication and event sources which is annoying (it might be a
good idea to have both in order to check for configuration, however
it's not possible to just send a ping via the webhook UI on github
so not sure how useful that would be).
- Add some typing and improve response messages while at it.
Note: some "internal" errors (e.g. ignoring event actions) are
reported as successes because as far as I can tell webhooks only
support success (2xy) / failure (4xy, 5xy) and an event that's ignored
is not really *failed* per se.
Some things are reported as failures even though they are on the edge
because that can be useful to see what's happening e.g. comment too
large or unable to lock rows.
Fixes#1019
Probably should create a mixin for this: when a model is used as a
task queue for a cron, the cron should automatically be triggered on
creation. Requiring an explicit trigger after a creation is error
prone and increase the risks that some of the triggers will be
forgotten/missed.
Adds a very limited ability to try and look for false positive /
non-determinstic staging errors. It tries to err on the side of
limiting false false positives, so it's likely to miss many.
Currently has no automation / reporting, just sets a flag on the
stagings which are strongly believed to have failed due to false
positives.
While at it, add link between a "root" staging and its splits. It's
necessary to clear the "false positive" flag, and surfacing it in the
UI could be useful one day.
Fixes#660
Requires parsing the commit messages as github plain doesn't collate
the info from there, but also the descriptions: apparently github only
adds those to the references if the PR targets the default
branch. That's not a limitation we want.
Meaning at the end of the day, the only thing we're calling graphql
for is explicit manual linking, which can't be tested without
extending the github API (and then would only be testable on DC), and
which I'm not sure anyone bothers with...
Limitations
-----------
- Links are retrieved on staging (so if one is added later it won't be
taken in account).
- Only repository-local links are handled, not cross-repository.
Fixes#777
- don't *fail* in `_compute_identity`, it causes issues when the token
is valid but doesn't have `user:email` access as the request is
aborted and saving doesn't work
- make `github_name` and `github_email` required rather than ad-hoc
requiring them in `_compute_identity` (which doesn't work correctly)
- force copy of `github_name` and `github_email`, with o2ms being
!copy this means duplicating projects now works out of the box (or
should...)
Currently errors in `_compute_identity` are reported via logging which
is not great as it's not UI visible, should probably get moved to
chatter eventually but that's not currently enabled on projects.
Fixes#990
Show patch metadata on the patch screen, so it's easier to understand
what the parser sees in case of issues.
Behaviour is not *entirely* consisten, `file_ids` is correctly set but
it looks like during the initial `web_read` it gets stripped out in at
least some cases and the files list is empty even though files have
been found in the patch. nm.
Fixes#987
- Apparently if a user is on windows the ACE editor can swap out their
line end from unix to windows. The patch parsers were predicated
upon all patches being in unix mode (because git, and diff).
Fixup both parsers to convert windows-style line end to unix before
trying to parse the patch data. Also add a few fallbacks to limit
the odds of an unhelpful `StopIteration` (though that might hide
errors more than reveal them...)
- Make sure we support `format-patch --no-signature`, just requires
using the correct partition direction: I assume I used `rpartition`
as a form of micro-optimisation *but*
- If the separator is not found the "patch body" ends up in the
third parameter rather than the first, which makes the fallback
difficult.
- There doesn't seem to be anything preventing *multiple* signature
separators in a message, and logically the first one should hold
and the rest is all part of the signature.
As a result, for both reasons we need to look *forwards* for the
signature separator, not backwards. Hence `str.partition`.
Fixes#992
Turns out to not work well in 17.0, and after consideration moc hasn't
really used the auto-update feature (or the required-prs gate in
general to be honest), usually he knows what PRs he's waiting for and
only validates once he's confirmed every which way.
So it's probably not worth fixing the thing. According to jpp, this
should probably use something based on bus subscriptions to update
just the field (though tbf the `root.update` call doesn't really seem
to be "deep" anymore, so in reality rather than update the *form*'s
record I should probably have tried reloading the required_pr_ids
records to fetch the new color).
Closes#997
`get_content` round-trips the text part through `ascii` with
`error=replace`, so if the input is not ascii it screws up
tremendously, which leads to either failing to apply patches (the more
likely situation) or corrupting the patches.
`get_payload`, if called without `decode`, pretty much just returns
the payload unless it needs a decoding pass (e.g. because it contains
raw surrogates, but that should not be an issue for us). So this is
really what we want.
While at it, increase `patch`'s verbosity in case it can give us more
info.
If a comment causes an unknown PR to be fetched, it's a bit odd to
ping the author (and possibly reviewer) anyway as they're not super
concerned (and technically we could be ignoring the purported /
attempted reviewer).
So if a fetch job was created because of a comment, remember the
comment author and ping *them* instead of using the default ping
policy.
Fixes#981
If staging gets re-enabled on a branch (or the branch itself gets
re-enabled), immediately run a staging cron as there may already be
PRs waiting, and no trigger enqueued: cron triggers have no payload,
they just get removed when the cron runs which means if a bunch of PRs
become ready for branch B with staging disabled, the cron is going to
run, it's going to stage nothing on that branch (because staging is
disabled) then it's going to delete all the triggers.
Fixes#979
Turns out you can configure format-patch with `--no-prefix` and some
people (*cough cough* mat) have that in their standard setup, so the
assumption of needing to strip 1 level of prefix does not necessarily
hold.
Also fix a few more issues:
- some people (*cough cough* still mat) also use `-n` by default,
which adds the series sequence (`n/m`) even for a single patch,
handle that correctly
- logging patch application errors is pretty useful when patching
fails and I'm trying to get the information via logs, do that
- especially when I decide to add error messages to tracking *but
forgot to show the chatter by default*, fix that as well
The commit-based patcher worked first try, and patch-based would have
worked too if not for those meddling kids. In the future it might be a
good idea to reify the stripping level (`-p`) on the patch object
though, and maybe provide a computed preview of the list of files to
patch, so issues are easier for the operator to diagnose.
- fix incorrect view specs (the action id comes first)
- add a wizard form and hook it into the PR, completely forgot to do
that
- usability improvements: filter branches to be in the same project as
the PR being backported, and older than the current PR's branch
The latter is a somewhat incomplete condition: ideally we'd want to
only allow selecting branches preceding the target of the *source* of
the PR being backported, that way we don't risk errors when
backporting forward-ports (the condition should be checked in the
final action but still).
Also we're only filtering by sequence, so we're missing the name part
of the ordering, hence if multiple branches have the same sequence we
may not allow selecting some of the preceding branches.
pymarkdown's footnotes plugin *saves footnotes across invocations by
default*. Even if I understand the documented use case it seems wild
that it's not opt-in...
Anyway disable that resetting all internal state. Thanks rfr for the
inital report that things were looking odd.
As far as I can tell they are properly handled:
- In `handle_status` we let the http layer retry the query, which
pretty much always succeeds.
- In `Commit.notify`, we rollback the application of the current
commit, meaning it'll be processed by the next run of the cron,
which also seems to succeed every time (that is going through the
log I pretty much never notice the same commit being serialization
failure'd twice in a row).
Which we can trigger for faster action, this last item is not
entirely necessary as statuses should generally come in fast and
especially if we have concurrency errors, but it can't hurt.
This means the only genuine issue is... sql_db logging a "bad query"
every time there's a serialization failure.
In `handle_status`, just suppress the message outright, if there's an
error other than serialization the http / dispatch layer should catch
and log it.
In `Commit._notify` things are slightly more difficult as the execute
is implicit (`flush` -> `_write` -> `execute`) so we can't pass the
flag by parameter. One option would be to set and unset
`_default_log_exception`, but it would either be a bit dodgy or it
would require using a context manager and increasing the indentation
level (or using a custom context manager).
Instead just `mute_logger` the fucking thing. It's a bit brutish and
mostly used in tests, but not just, and feels like the least bad
option here...
Closes#805
This is not a full user-driven backport thingie for now, just one
admins can use to facilitate thing and debug issues with the
system. May eventually graduate to a frontend feature.
Fixes#925
- replace manual token_urlsafe by actual token_urlsafe
- make conditional right side up and more readable
- replace match by fullmatch, should not change anything since we end
with a greedy universal match but is slightly more explicit
Missed it during the previous pass, probably because it's in the
middle of `pull_requests.py`. It's a classic template for triggered
crons since the model is just a queue of actions for the cron.
If a PR is closed but part of an ongoing batch, the change in status
of the batch might be reflected on the PR still:
- if a PR is closed and the batch gets staged, the PR shows up as
being staged
- if the PR is merged then the batch gets merged, the PR shows up as
merged
Fixes#914
Also remove the unused `_tagstate` helper property.
Because of the false negatives due to github's reordering of events on
retargeting, blocking merge methods can be rather frustrating or the
user as what's happening and how to solve it isn't clear in that case.
Keep the warnings and all, but remove the blocking factor: if a PR
doesn't have a merge method and is not single-commit, just skip it on
staging. This way, PRs which are actually single-commit will stage
fine even if the mergebot thinks they shouldn't be.
Fixes#957