The current system makes / lets GC run during fetching. This has a few
issues:
- the autogc consumes resources during the forward-porting
process (not that it's hugely urgent but it seems unnecessary)
- the autogc commonly fails due to the combination of large repository
(odoo/odoo) and low memory limits (hardmem for odoo, which get
translated into soft ulimits)
As a result, the garbage collection of the repository sometimes stops
entirely, leading to an increase in repository size and a decrease in
performances.
To mitigate this issue, disable the automagic gc and maintenance
during normal operation, and instead add a weekly cron which runs an
aggressive GC with memory limits disabled (as far as they can get, if
the limits are imposed externally there's nothing to be done).
The maintenance is implemented using a full lockout of the
forward-port cron and an in-place GC rather than a copy/gc/swap, as
doing this maintenance at the small hours of the week-end (sat-sun
night) seems like a non-issue: currently an aggressive GC of odoo/odoo
(using the default aggressive options) takes a total of 2:30 wallclock
(5h user) on a fairly elderly machine (it's closer to 20mn wallclock
and 2h user on my local machine, also turns out the cache repos are
kinda badly configured leading to ~30% more objects than necessary
which doesn't help).
For the record, a fresh checkout of odoo/odoo right now yields:
| Overall repository size | |
| * Commits | |
| * Count | 199 k |
| * Total size | 102 MiB |
| * Trees | |
| * Count | 1.60 M |
| * Total size | 2.67 GiB |
| * Total tree entries | 74.1 M |
| * Blobs | |
| * Count | 1.69 M |
| * Total size | 72.4 GiB |
If this still proves insufficient, a further option would be to deploy
a "generational repacking" strategy:
https://gitlab.com/gitlab-org/gitaly/-/issues/2861 (though apparently
it's not yet been implemented / deployed on gitlab so...).
But for now we'll see how it shakes out.
Close#489
Github can fail to create the magic refs on PRs
(`pull/refs/?/head`). Since forwardport relies on these refs to fetch
PR content this is an issue when it occurs, as the forward ports fail
in a loop.
After discussion with Github support, it turns out Github enabled
`allowReachableSHA1InWant` a while back, meaning it's possible to
fetch content by commit (rather than ref) as long as the content is
"in network".
Use this property as fallback when checking if we can see the PR head
before forward porting.
Also:
- remove explicit configuration of GC during fetch, it doesn't disable
the autogc (yet?) but that's likely going to happen anyway
- update logging and logger hierarchy during forward port to make
things clearer and easier to extract, although based on PR id rather
than number
- rate limit failing forward ports to avoid running them on every cron
(~ every minute), run them every ~30mn instead, this provides higher
odds of recovery with less log garbage in case of transient github
failure, and if the PR is stuck it limits the log pollution
Fixes#658
Get mergebot updates from since the runbot was upgraded.
NOTE: updates forwardport.models.forwardport.Queue with slots for
compatibility with commit
odoo/odoo@ea3e39506a "use slots for
BaseModel", otherwise we get
TypeError: __bases__ assignment: 'ForwardPortTasks' object layout differs from 'BaseModel'
Old messages were quite inconsistent in their pinging of the PR author
and reviewer.
Reviewed messages (probably missed some but...) and try to more
consistently ping when the feedback requires some sort of action in
order to proceed.
Fixes#592
On two of the freezes, thereafter the logs showed serial crashes in
the forwardport cron when trying to find the insertion point for a new
forward-port.
The first time was not really diagnosed, the second time the cause was
found to be a retargeted PR which led to a failure of the "insertion"
forward port, which did not take that possibility in account (it
assumed -- sensibly I believed -- that an intermediate FP following a
branch insertion would always succeed, sadly the malevolent universe
had other plans).
So only insert the new forward port inside its sequence (if necessary)
if the forward port actually succeeded, otherwise ignore it.
Fixes#551
The freeze wizard has support for merging freeze / release PRs on each
of the newly created branches. But since this would be done by, well,
merging, those PRs would get forward-ported to master, and would have
to be closed there.
This creates additional work for the freeze master, and noise /
parasitic PRs.
Obviously it's possible for the freeze master to set some nonsense `up
to` (nonsense because the "real" limit doesn't exist yet at that
point), but really it never makes any sense to forward port release
PRs, so the wizard should do it.
Normally when a forwardport is updated the forward-bot cascades the
update onto its followups (if it has any), but takes care to keep the
followups attached as they were not updated "by hand".
In the case of odoo/odoo#77677 however that did not work and the
followup PRs got detached. Looking at the logs, it becomes flagrant
that there was a race condition: either Git took a long time to
respond to the push, or there was an IO slowdown which led to the
"local update" taking a very long time. Either way this allowed the
"synchronize" webhook from github to arrive before the current
transaction was committed, rolling back said transaction and making
the forwardbot assume this was a "real" sync and detach the followup
from its parent.
Locking the PR row up-front ought fix the issue, and also move the
local update to before having pushed: the "extra" commits in the local
cache don't matter too much even if pushing to github fails, they'll
be cleaned up by a GC eventually.
Also migrate the `-f` on push to `--force-if-includes` in order to
avoid possible race conditions on the push (since we're not fetching
the current branch, use the full-syntax explicit CAS form, that's
exactly what we're looking for anyway).
Fixes#541 (hopefully)
On conflicts in multi-commit PRs developers sometimes get confused as
to what happened why.
If a conflict occurs and the source pull request had multiple
commits, list all the source commits and show which one broke.
Related to #505
* Remove the forwardport creating PRs in draft, that was mostly to
avoid codeowners triggering but we've removed the github one and
hand-rolled it, so not a concern anymore.
* Prevent merging `draft` PRs, the mergebot rejects approval on draft
PRs and insults people.
TBD (maybe): try to create *conflicting* forward-port PRs in draft so
it's clearer they need to be *fixed*? Issue of not being able to do
that on all private repositories remains so~~
Fixes#500
If a PR is updated and has extent forward-ports, those forwardports
get updated automatically ("followup").
However there is an issue if the udpate causes a conflict in the
followup: the conflict gets silently pushed, and may fairly easily get
merged if it occurs in an area which the CI doesn't cover.
It's unclear what the policy really should be for this issue, and
there is no real way to *block* a pull request at the moment (save by
putting it in error at the mergebot level I guess?), so for now
clearly notify the user on both the modified PR and the followup,
with a comment on both.
We may want to revisit this eventually.
Fixes#467
Fix#457 hopefully: I didn't manage to repro / create a test for.
It looks like in some cases during the update process the PR ref lags
behind the branch itself. This means `forwardport.updates` creates a
new commit, pushes it, then on the next iteration updates the local
cache, tries to find the commit we just pushed... and that fails.
I can only assume this is because when there's enough load on the
github side the update to the `info/refs` pseudo-file can fall
behind (it's now 4MB and holding nearly 65k refs).
So cheat: take the commit we just pushed to the dev remote
and... immediately push it to the local cache under a dummy branch,
which we delete. Since we only gc "1 day ago" this should not vacuum.
The process did properly update the state, but not the squash state.
It's somewhat unclear whether the state should be fully reset and
require reapproval though. Maybe only the validation should be reset?
The CI will eventually run and either succeed (re-validating) or
fail (devalidating, hopefully) but I'm not entirely sure this is
correct.
The exponential backoff offsets from the write_date of the children
PRs, however it doesn't reset, so the offsetting gets bumped up way
more than originally expected or designed if the child PRs are under
active development for some reason.
Fix this by adding a field to specifically record the date of merge of
a PR, and check that feature against the backoff offset. This should
provide more regular and reliable backoff.
Fixes#369
If a new branch is added to a project, there's an issue with *ongoing*
forward ports (forward ports which were not merged before the branch
was forked from an existing one): the new branch gets "skipped" and
might be missing some fixes until those are noticed and backported.
This commit hooks into updating projects to try and see if the update
consists of adding a branch inside the sequence, in which case it
tries to find the FP sequences to update and queues up new
"intermediate" forward ports to insert into the existing sequences.
Note: had to increase the cron socket limit to 2mn as 1mn blew up the
big staging cron in the test (after all the forward-port PRs are
approved).
Fixes#262
[FIX]
If the ref we asked for does not exist, github apparently decides to
fall-back to prefix-matching. So if we're trying to delete
already-deleted branch A and someone called their branch A-x we're
going to get it as a result.
Thankfully they were apparently smart enough to return a list even if
there's only a single fuzzy match. So if we get a list (instead of a
dict) as response to git/refs/heads assume the branch was already
deleted as if we got a 404.
If a PR is *merged*, enqueue it for deletion (with a 2 weeks delay).
Mainly to avoid FW branches staying around long after they've been
merged (possibly eventually closed?), will also clean up regular
merged branches, including historical merges forgotten by their
author.
Fixes#230
In the case where we have a co-dependent forward port (co-dependent
PRs got forward-ported, which they should be together) where *one* of
the PRs got explicitly updated, the batch would "fall into a hole"
being handled as neither "this is part of a forward-port sequence" nor
"this is a new merge to forward-port" (the latter being the proper
one).
Modify & remove guards which checked that either no or all PRs in a
batch have parents: should be either all or not all.
Fixes#231
* shorten the postfix, forwardbot is now a bigram!
* shorten the uniquifier: go from 5 to 3 bytes, and use urlsafe base64
that way we only have a 4-char uniquifier instead of 8
* while at it, fix deprecated calls to logging.warn (should be
logging.warning)
Fixes#226
The queue would get items to process one at a time, process, commit,
and go to the next. However this is an issue if one of the item fails
systematically for some reason (aka it's not just a transient
failure): the cron fails, then restarts at the exact same point, and
fails again with the same issue, leading to following items never
getting processed.
Fix by getting all the queue contents at once, processing them one by
one and "skipping" any item which fails (leaving it in place so it can
get re-processed later).
That way, even if an item causes issues, the rest of the queue gets
processed normally. The interruption was an issue following
odoo/enterprise#5670 not getting properly updated in the
backend (backend didn't get notified of the last two updates /
force-push to the PR, so it was trying to forward-port a commit which
didn't exist - and failing).
* Cherrypicking is handrolled because there seems to be no easy way to
programmatically edit commit messages during the cherrypicking
sequence: `-n` basically squashes all commits and `-e` invokes a
subprocess. `-e` with `VISUAL=false` kinda sorta works (in that it
interrupts the process before each commit), however there doesn't
seem to be clean status codes so it's difficult to know if the
cherrypick failed or if it's just waiting for a commit of this step.
Instead, cherrypick commits individually then edit / rewrite their
commit messages:
* add a reference to the original commit
* convert signed-off-by to something else as the original commit was
signed off but not necessarily this one
* Can't assign users when creating PRs: only repository collaborators
or people who commented on the issue / PR (which we're in the
process of creating) can be assigned.
PR authors are as likely to be collaborators as not, and we can have
non-collaborator reviewers. So pinging via a regular comment seems
less fraught as a way to notify users.