runbot/forwardport/models/project.py

613 lines
25 KiB
Python
Raw Permalink Normal View History

# -*- coding: utf-8 -*-
"""
Technically could be independent from mergebot but would require a lot of
duplicate work e.g. keeping track of statuses (including on commits which
might not be in PRs yet), handling co-dependent PRs, ...
However extending the mergebot also leads to messiness: fpbot should have
its own user / feedback / API keys, mergebot and fpbot both have branch
ordering but for mergebot it's completely cosmetics, being slaved to mergebot
means PR creation is trickier (as mergebot assumes opened event will always
lead to PR creation but fpbot wants to attach meaning to the PR when setting
it up), ...
"""
from __future__ import annotations
[IMP] *: crons tests running for better triggered compatibility Mergebot / forwardport crons need to run in a specific ordering in order to flow into one another correctly. The default ordering being unspecified, it was not possible to use the normal cron runner (instead of the external driver running crons in sequence one at a time). This can be fixed by setting *sequences* on crons, as the cron runner (`_process_jobs`) will use that order to acquire and run crons. Also override `_process_jobs` however: the built-in cron runner fetches a static list of ready crons, then runs that. This is fine for normal situation where the cron runner runs in a loop anyway but it's any issue for the tests, as we expect that cron A can trigger cron B, and we want cron B to run *right now* even if it hadn't been triggered before cron A ran. We can replace `_process_job` with a cut down version which does that (cut down because we don't need most of the error handling / resilience, there's no concurrent workers, there's no module being installed, versions must match, ...). This allows e.g. the cron propagating commit statuses to trigger the staging cron, and both will run within the same `run_crons` session. Something I didn't touch is that `_process_jobs` internally creates completely new environments so there is no way to pass context into the cron jobs anymore (whereas it works for `method_direct_trigger`), this means the context values have to be shunted elsewhere for that purpose which is gross. But even though I'm replacing `_process_jobs`, this seems a bit too much of a change in cron execution semantics. So left it out. While at it tho, silence the spammy `py.warnings` stuff I can't do much about.
2024-07-30 14:21:05 +07:00
import builtins
import datetime
import itertools
import json
import logging
import operator
import subprocess
import typing
import dateutil.relativedelta
import requests
from odoo import models, fields, api
from odoo.exceptions import UserError
from odoo.osv import expression
from odoo.tools.misc import topological_sort, groupby
from odoo.addons.base.models.res_partner import Partner
[FIX] forwardport: don't break forward porting on huge conflicts On forward-porting, odoo/odoo#170183 generates a conflict on pretty much every one of the 1111 files it touches, because they are modify/delete conflicts that generates a conflict message over 200 bytes per file, which is over 200kB of output. For this specific scenario, the commit message was being passed through arguments to the `git` command, resulting in a command line exceeding `MAX_ARG_STRLEN`[^1]. The obvious way to fix this is to pass the commit message via stdin as is done literally in the line above where we just copy a non-generated commit message. However I don't think hundreds of kbytes worth of stdout[^2] is of any use, so shorten that a bit, and stderr while at it. Don't touch the commit message size for now, possibly forever, but note that some log diving reveals a commit with a legit 18kB message (odoo/odoo@42a3b704f7c7889a74338253138d0201d3b6e7a3) so if we want to restrict that the limit should be at least 32k, and possibly 64. But it might be a good idea to make that limit part of the ready / merge checks too, rather than cut things off or trigger errors during staging. Fixes #900 [^1]: Most resources on "Argument list too long" reference `ARG_MAX`, but on both my machine and the server it is 2097152 (25% of the default stack), which is ~10x larger than the commit message we tried to generate. The actual limit is `MAX_ARG_STRLEN` which can't be queried directly but is essentially hard-coded to PAGE_SIZE * 32 = 128kiB, which tracks. [^2]: Somewhat unexpectedly, that's where `git-cherry-pick` sends the conflict info.
2024-06-25 18:11:32 +07:00
from odoo.addons.runbot_merge import git, utils
from odoo.addons.runbot_merge.models.pull_requests import Branch
from odoo.addons.runbot_merge.models.stagings_create import Message
Conflict = tuple[str, str, str, list[str]]
DEFAULT_DELTA = dateutil.relativedelta.relativedelta(days=3)
_logger = logging.getLogger('odoo.addons.forwardport')
class Project(models.Model):
_inherit = 'runbot_merge.project'
id: int
github_prefix: str
def write(self, vals):
# check on branches both active and inactive so disabling branches doesn't
# make it look like the sequence changed.
self_ = self.with_context(active_test=False)
previously_active_branches = {project: project.branch_ids.filtered('active') for project in self_}
branches_before = {project: project._forward_port_ordered() for project in self_}
r = super().write(vals)
self_._followup_prs(previously_active_branches)
self_._insert_intermediate_prs(branches_before)
return r
def _followup_prs(self, previously_active_branches):
"""If a branch has been disabled and had PRs without a followup (e.g.
because no CI or CI failed), create followup, as if the branch had been
originally disabled (and thus skipped over)
"""
[CHG] *: move forward-porting over to batches Thank god I have a bunch of tests because once again I forgot / missed a bunch of edge cases in doing the conversion, which the tests caught (sadly that means I almost certainly broke a few untested edge cases). Important notes: Handling of parent links ------------------------ Unlike PRs, batches don't lose their parent info ever, the link is permanent, which is convenient to trawl through a forward port (currently implemented very inefficiently, maybe we'll optimise that in the future). However this means the batch having a parent and the batch's PRs having parents are slightly different informations, one of the edge cases I missed is that of conflicting PRs, which are deparented and have to be merged by hand before being forward ported further, I had originally replaced the checks on a pr and its sibling having parents by just the batch. Batches & targets ----------------- Batches were originally concepted as being fixed to a target and PRs having that target, a PR being retargeted would move it from one batch to an other. As it turns out this does not work in the case where people retarget forward-port PRs, which I know they do because #551 (2337bd85186de000d074e5a7178cfeb110d15de7). I could not think of a good way to handle this issue as is, so scrapped the moving PRs thing, instead one of the coherence checks of a batch being ready is that all its PRs have the same target, and a batch only has a target if all its PRs have the same target. It's possible for somewhat odd effects to arise, notably if a PR is closed (removed from batch), the other PRs are retargeted, and the new PR is reopened, it will now be on a separate batch even if it also gets retargeted. This is weird. I don't quite know how I should handle it, maybe batches could merge if they have the same target and label? however batches don't currently have a label so... Improve limits -------------- Keep limits on the PRs rather than lift them on the batchL if we can add/remove PRs of batches having different limits on different PRs of the same batch is reasonable. Also leave limit unset by default: previously, the limit was eagerly set to the tip (accessible) branch. That doesn't really seem necessary, so stop doing that. Also remove completely unnecessary `max` when trying to find a PR's next target: `root` is either `self` or `self.source_id`, so it should not be possible for that to have a later target. And for now ensure the limits are consistent per batch: a PR defaults to the limit of their batch-mate if they don't have one, and if a limit is set via command it's set on all PRs of a batch. This commit does not allow differential limits via commands, they are allowed via the backend but not really tested. The issue is mostly that it's not clear what the UX should look like to have clear and not super error prone interactions. So punt on it for now, and hopefully there's no hole I missed which will create inconsistent batches.
2024-05-21 20:45:53 +07:00
Batch = self.env['runbot_merge.batch']
ported = self.env['runbot_merge.pull_requests']
for p in self:
actives = previously_active_branches[p]
for deactivated in p.branch_ids.filtered(lambda b: not b.active) & actives:
# if a non-merged batch targets a deactivated branch which is
# not its limit
[CHG] *: move forward-porting over to batches Thank god I have a bunch of tests because once again I forgot / missed a bunch of edge cases in doing the conversion, which the tests caught (sadly that means I almost certainly broke a few untested edge cases). Important notes: Handling of parent links ------------------------ Unlike PRs, batches don't lose their parent info ever, the link is permanent, which is convenient to trawl through a forward port (currently implemented very inefficiently, maybe we'll optimise that in the future). However this means the batch having a parent and the batch's PRs having parents are slightly different informations, one of the edge cases I missed is that of conflicting PRs, which are deparented and have to be merged by hand before being forward ported further, I had originally replaced the checks on a pr and its sibling having parents by just the batch. Batches & targets ----------------- Batches were originally concepted as being fixed to a target and PRs having that target, a PR being retargeted would move it from one batch to an other. As it turns out this does not work in the case where people retarget forward-port PRs, which I know they do because #551 (2337bd85186de000d074e5a7178cfeb110d15de7). I could not think of a good way to handle this issue as is, so scrapped the moving PRs thing, instead one of the coherence checks of a batch being ready is that all its PRs have the same target, and a batch only has a target if all its PRs have the same target. It's possible for somewhat odd effects to arise, notably if a PR is closed (removed from batch), the other PRs are retargeted, and the new PR is reopened, it will now be on a separate batch even if it also gets retargeted. This is weird. I don't quite know how I should handle it, maybe batches could merge if they have the same target and label? however batches don't currently have a label so... Improve limits -------------- Keep limits on the PRs rather than lift them on the batchL if we can add/remove PRs of batches having different limits on different PRs of the same batch is reasonable. Also leave limit unset by default: previously, the limit was eagerly set to the tip (accessible) branch. That doesn't really seem necessary, so stop doing that. Also remove completely unnecessary `max` when trying to find a PR's next target: `root` is either `self` or `self.source_id`, so it should not be possible for that to have a later target. And for now ensure the limits are consistent per batch: a PR defaults to the limit of their batch-mate if they don't have one, and if a limit is set via command it's set on all PRs of a batch. This commit does not allow differential limits via commands, they are allowed via the backend but not really tested. The issue is mostly that it's not clear what the UX should look like to have clear and not super error prone interactions. So punt on it for now, and hopefully there's no hole I missed which will create inconsistent batches.
2024-05-21 20:45:53 +07:00
extant = Batch.search([
('parent_id', '!=', False),
('target', '=', deactivated.id),
# if at least one of the PRs has a different limit
[CHG] *: move forward-porting over to batches Thank god I have a bunch of tests because once again I forgot / missed a bunch of edge cases in doing the conversion, which the tests caught (sadly that means I almost certainly broke a few untested edge cases). Important notes: Handling of parent links ------------------------ Unlike PRs, batches don't lose their parent info ever, the link is permanent, which is convenient to trawl through a forward port (currently implemented very inefficiently, maybe we'll optimise that in the future). However this means the batch having a parent and the batch's PRs having parents are slightly different informations, one of the edge cases I missed is that of conflicting PRs, which are deparented and have to be merged by hand before being forward ported further, I had originally replaced the checks on a pr and its sibling having parents by just the batch. Batches & targets ----------------- Batches were originally concepted as being fixed to a target and PRs having that target, a PR being retargeted would move it from one batch to an other. As it turns out this does not work in the case where people retarget forward-port PRs, which I know they do because #551 (2337bd85186de000d074e5a7178cfeb110d15de7). I could not think of a good way to handle this issue as is, so scrapped the moving PRs thing, instead one of the coherence checks of a batch being ready is that all its PRs have the same target, and a batch only has a target if all its PRs have the same target. It's possible for somewhat odd effects to arise, notably if a PR is closed (removed from batch), the other PRs are retargeted, and the new PR is reopened, it will now be on a separate batch even if it also gets retargeted. This is weird. I don't quite know how I should handle it, maybe batches could merge if they have the same target and label? however batches don't currently have a label so... Improve limits -------------- Keep limits on the PRs rather than lift them on the batchL if we can add/remove PRs of batches having different limits on different PRs of the same batch is reasonable. Also leave limit unset by default: previously, the limit was eagerly set to the tip (accessible) branch. That doesn't really seem necessary, so stop doing that. Also remove completely unnecessary `max` when trying to find a PR's next target: `root` is either `self` or `self.source_id`, so it should not be possible for that to have a later target. And for now ensure the limits are consistent per batch: a PR defaults to the limit of their batch-mate if they don't have one, and if a limit is set via command it's set on all PRs of a batch. This commit does not allow differential limits via commands, they are allowed via the backend but not really tested. The issue is mostly that it's not clear what the UX should look like to have clear and not super error prone interactions. So punt on it for now, and hopefully there's no hole I missed which will create inconsistent batches.
2024-05-21 20:45:53 +07:00
('prs.limit_id', '!=', deactivated.id),
('merge_date', '=', False),
]).filtered(lambda b:\
# and has a next target (should already be a function of
# the search but doesn't hurt)
b._find_next_target() \
# and has not already been forward ported
and Batch.search_count([('parent_id', '=', b.id)]) == 0
)
# PRs may have different limits in the same batch so only notify
# those which actually needed porting
ported |= extant._schedule_fp_followup(force_fw=True)\
.prs.filtered(lambda p: p._find_next_target())
if not ported:
return
for feedback in self.env['runbot_merge.pull_requests.feedback'].search(expression.OR(
[('repository', '=', p.repository.id), ('pull_request', '=', p.number)]
for p in ported
)):
# FIXME: better signal
if 'disabled' in feedback.message:
feedback.message += '\n\nAs this was not its limit, it will automatically be forward ported to the next active branch.'
def _insert_intermediate_prs(self, branches_before):
"""If new branches have been added to the sequence inbetween existing
branches (mostly a freeze inserted before the main branch), fill in
forward-ports for existing sequences
"""
Branches = self.env['runbot_merge.branch']
for p in self:
# check if the branches sequence has been modified
bbefore = branches_before[p]
bafter = p._forward_port_ordered()
if bafter.ids == bbefore.ids:
continue
logger = _logger.getChild('project').getChild(p.name)
logger.debug("branches updated %s -> %s", bbefore, bafter)
# if it's just that a branch was inserted at the end forwardport
# should keep on keeping normally
if bafter.ids[:-1] == bbefore.ids:
continue
if bafter <= bbefore:
raise UserError("Branches can not be reordered or removed after saving.")
# Last possibility: branch was inserted but not at end, get all
# branches before and all branches after
before = new = after = Branches
for b in bafter:
if b in bbefore:
if new:
after += b
else:
before += b
else:
if new:
raise UserError("Inserting multiple branches at the same time is not supported")
new = b
logger.debug('before: %s new: %s after: %s', before.ids, new.ids, after.ids)
# find all FPs whose ancestry spans the insertion
leaves = self.env['runbot_merge.pull_requests'].search([
('state', 'not in', ['closed', 'merged']),
('target', 'in', after.ids),
('source_id.target', 'in', before.ids),
])
# get all PRs just preceding the insertion point which either are
# sources of the above or have the same source
candidates = self.env['runbot_merge.pull_requests'].search([
('target', '=', before[-1].id),
'|', ('id', 'in', leaves.mapped('source_id').ids),
('source_id', 'in', leaves.mapped('source_id').ids),
])
logger.debug("\nPRs spanning new: %s\nto port: %s", leaves, candidates)
# enqueue the creation of a new forward-port based on our candidates
# but it should only create a single step and needs to stitch back
# the parents linked list, so it has a special type
for _, cs in groupby(candidates, key=lambda p: p.label):
self.env['forwardport.batches'].create({
'batch_id': cs[0].batch_id.id,
'source': 'insert',
})
class Repository(models.Model):
_inherit = 'runbot_merge.repository'
id: int
project_id: Project
name: str
branch_filter: str
fp_remote_target = fields.Char(help="where FP branches get pushed")
class PullRequests(models.Model):
_inherit = 'runbot_merge.pull_requests'
id: int
display_name: str
number: int
repository: Repository
target: Branch
reviewed_by: Partner
head: str
state: str
merge_date: datetime.datetime
parent_id: PullRequests
reminder_backoff_factor = fields.Integer(default=-4, group_operator=None)
@api.model_create_multi
def create(self, vals_list):
created = []
to_create = []
old = self.browse(())
for vals in vals_list:
# PR opened event always creates a new PR, override so we can precreate PRs
existing = self.search([
('repository', '=', vals['repository']),
('number', '=', vals['number']),
])
created.append(not existing)
if existing:
old |= existing
continue
to_create.append(vals)
if vals.get('parent_id') and 'source_id' not in vals:
vals['source_id'] = self.browse(vals['parent_id']).root_id.id
if not to_create:
return old
new = super().create(to_create)
for pr in new:
# added a new PR to an already forward-ported batch: port the PR
if self.env['runbot_merge.batch'].search_count([
('parent_id', '=', pr.batch_id.id),
]):
self.env['forwardport.batches'].create({
'batch_id': pr.batch_id.id,
'source': 'complete',
'pr_id': pr.id,
})
if not old:
return new
new = iter(new)
old = iter(old)
return self.browse(next(new).id if c else next(old).id for c in created)
def write(self, vals):
# if the PR's head is updated, detach (should split off the FP lines as this is not the original code)
# TODO: better way to do this? Especially because we don't want to
# recursively create updates
# also a bit odd to only handle updating 1 head at a time, but then
# again 2 PRs with same head is weird so...
newhead = vals.get('head')
with_parents = {
p: p.parent_id
for p in self
if p.state not in ('merged', 'closed')
if p.parent_id
}
closed_fp = self.filtered(lambda p: p.state == 'closed' and p.source_id)
if newhead and not self.env.context.get('ignore_head_update') and newhead != self.head:
vals.setdefault('parent_id', False)
if with_parents and vals['parent_id'] is False:
vals['detach_reason'] = f"Head updated from {self.head} to {newhead}"
# if any children, this is an FP PR being updated, enqueue
# updating children
if self.search_count([('parent_id', '=', self.id)]):
self.env['forwardport.updates'].create({
'original_root': self.root_id.id,
'new_root': self.id
})
if vals.get('parent_id') and 'source_id' not in vals:
parent = self.browse(vals['parent_id'])
vals['source_id'] = (parent.source_id or parent).id
r = super().write(vals)
if self.env.context.get('forwardport_detach_warn', True):
for p, parent in with_parents.items():
if p.parent_id:
continue
self.env.ref('runbot_merge.forwardport.update.detached')._send(
repository=p.repository,
pull_request=p.number,
token_field='fp_github_token',
format_args={'pr': p},
)
if parent.state not in ('closed', 'merged'):
self.env.ref('runbot_merge.forwardport.update.parent')._send(
repository=parent.repository,
pull_request=parent.number,
token_field='fp_github_token',
format_args={'pr': parent, 'child': p},
)
return r
def _commits_lazy(self):
s = requests.Session()
s.headers['Authorization'] = 'token %s' % self.repository.project_id.fp_github_token
for page in itertools.count(1):
r = s.get('https://api.github.com/repos/{}/pulls/{}/commits'.format(
self.repository.name,
self.number
), params={'page': page})
r.raise_for_status()
yield from r.json()
if not r.links.get('next'):
return
def commits(self):
""" Returns a PR's commits oldest first (that's what GH does &
is what we want)
"""
commits = list(self._commits_lazy())
# map shas to the position the commit *should* have
idx = {
c: i
for i, c in enumerate(topological_sort({
c['sha']: [p['sha'] for p in c['parents']]
for c in commits
}))
}
return sorted(commits, key=lambda c: idx[c['sha']])
def _create_port_branch(
self,
source: git.Repo,
target_branch: Branch,
*,
forward: bool,
) -> tuple[typing.Optional[Conflict], str]:
""" Creates a forward-port for the current PR to ``target_branch`` under
``fp_branch_name``.
:param source: the git repository to work with / in
:param target_branch: the branch to port ``self`` to
:param forward: whether this is a forward (``True``) or a back
(``False``) port
:returns: a conflict if one happened, and the head of the port branch
(either a succcessful port of the entire `self`, or a conflict
commit)
"""
logger = _logger.getChild(str(self.id))
root = self.root_id
logger.info(
"%s %s (%s) to %s",
"Forward-porting" if forward else "Back-porting",
self.display_name, root.display_name, target_branch.name
)
fetch = source.with_config(stdout=subprocess.PIPE, stderr=subprocess.STDOUT).fetch()
logger.info("Updated cache repo %s:\n%s", source._directory, fetch.stdout.decode())
[FIX] forwardport: storage of old garbage, repo cache sizes Since the forwardport bot works off of PRs, when it was created leveraging the magic refs of github (refs/pull/*/head) seemed sensible: when updating the cache repo, the magic refs would be updated alongside and the forward-porting would have all the commits available. Turns out there are a few issues with this: - the magic refs have become a bit unreliable, so we need a fallback (b1c16fce8768080d30430f4e6d3788b60ce13de7) - the forwardport bot only needs the commits on a transient basis, but the magic refs live forever and diverge from all other content (since we rarely `merge` PRs), for a large repo with lots of PRs this leads to a significant inflation in size of repo (both on-disk and objects count) e.g. odoo/odoo has about 25% more objects with the pull refs included (3486550 to 4395319) and takes nearly 50% more space on disk (1.21G to 1.77) As a result, we can just stop configuring the pull refs entirely, and instead fetch the heads (~refs) we need as we need them. This can be a touch more expensive at times as it requires two `fetch` calls, and we'll have a few redundant calls as every forward port of a chain will require a fetch of the root's head, *however* it will avoid retrieving PR data for e.g. prs to master, as they don't get forward-ported, they also won't get parasite updates from PRs being ignored or eventually closed. Eventually this could be optimised further by - performing an update of the cache repo at the start of the cron iff there are batches to port - creating a temp clone for the batch - fetching the heads of all PRs to forward port in the temp clone in a single `fetch` - performing all the ports by cloning the temp clone (and not `fetch`-ing into those) - then cleaning up the temp clone after having cleaned up individual forward port clones Small followup for #489
2022-11-08 18:02:45 +07:00
head_fetch = source.with_config(stdout=subprocess.PIPE, stderr=subprocess.STDOUT, check=False) \
[CHG] forwardport: perform forward porting without working copies The goal is to reduce maintenance and odd disk interactions & concurrency issues, by not creating concurrent clones, not having to push forks back in the repository, etc... it also removes the need to cleanup "scratch" working copies though that looks not to have been an issue in a while. The work is done on isolated objects without using or mutating refs, so even concurrent work should not be a problem. This turns out to not be any more verbose (less so if anything) than using `cherry-pick`, as that is not really designed for scripted / non-interactive use, or for squashing commits thereafter. Working directly with trees and commits is quite a bit cleaner even without a ton of helpers. Much of the credit goes to Julia Evans for [their investigation of 3-way merges as the underpinnings of cherry-picking][3-way merge], this would have been a lot more difficult if I'd had to rediscover the merge-base trick independently. A few things have been changed by this: - The old trace/stderr from cherrypick has disappeared as it's generated by cherrypick, but for a non-interactive use it's kinda useless anyway so I probably should have looked into removing it earlier (I think the main use was investigation of the inflateinit issue). - Error on emptied commits has to be hand-rolled as `merge-tree` couldn't care less, this is not hard but is a bit annoying. - `merge-tree`'s conflict information only references raw commits, which makes sense, but requires updating a bunch of tests. Then again so does the fact that it *usually* doesn't send anything to stderr, so that's usually disappearing. Conveniently `merge-tree` merges the conflict marker directly in the files / tree so we don't have to mess about moving them back out of the repository and into the working copy as I assume cherry-pick does, which means we don't have to try and commit them back in ether. That is a huge part of the gain over faffing about with the working copy. Fixes #847 [3-way merge]: https://jvns.ca/blog/2023/11/10/how-cherry-pick-and-revert-work/
2024-07-05 18:32:02 +07:00
.fetch(git.source_url(self.repository), root.head)
if head_fetch.returncode:
[FIX] forwardport: storage of old garbage, repo cache sizes Since the forwardport bot works off of PRs, when it was created leveraging the magic refs of github (refs/pull/*/head) seemed sensible: when updating the cache repo, the magic refs would be updated alongside and the forward-porting would have all the commits available. Turns out there are a few issues with this: - the magic refs have become a bit unreliable, so we need a fallback (b1c16fce8768080d30430f4e6d3788b60ce13de7) - the forwardport bot only needs the commits on a transient basis, but the magic refs live forever and diverge from all other content (since we rarely `merge` PRs), for a large repo with lots of PRs this leads to a significant inflation in size of repo (both on-disk and objects count) e.g. odoo/odoo has about 25% more objects with the pull refs included (3486550 to 4395319) and takes nearly 50% more space on disk (1.21G to 1.77) As a result, we can just stop configuring the pull refs entirely, and instead fetch the heads (~refs) we need as we need them. This can be a touch more expensive at times as it requires two `fetch` calls, and we'll have a few redundant calls as every forward port of a chain will require a fetch of the root's head, *however* it will avoid retrieving PR data for e.g. prs to master, as they don't get forward-ported, they also won't get parasite updates from PRs being ignored or eventually closed. Eventually this could be optimised further by - performing an update of the cache repo at the start of the cron iff there are batches to port - creating a temp clone for the batch - fetching the heads of all PRs to forward port in the temp clone in a single `fetch` - performing all the ports by cloning the temp clone (and not `fetch`-ing into those) - then cleaning up the temp clone after having cleaned up individual forward port clones Small followup for #489
2022-11-08 18:02:45 +07:00
raise ForwardPortError(
f"During forward port of {self.display_name}, unable to find "
f"expected head of {root.display_name} ({root.head})"
)
logger.info(
[CHG] forwardport: perform forward porting without working copies The goal is to reduce maintenance and odd disk interactions & concurrency issues, by not creating concurrent clones, not having to push forks back in the repository, etc... it also removes the need to cleanup "scratch" working copies though that looks not to have been an issue in a while. The work is done on isolated objects without using or mutating refs, so even concurrent work should not be a problem. This turns out to not be any more verbose (less so if anything) than using `cherry-pick`, as that is not really designed for scripted / non-interactive use, or for squashing commits thereafter. Working directly with trees and commits is quite a bit cleaner even without a ton of helpers. Much of the credit goes to Julia Evans for [their investigation of 3-way merges as the underpinnings of cherry-picking][3-way merge], this would have been a lot more difficult if I'd had to rediscover the merge-base trick independently. A few things have been changed by this: - The old trace/stderr from cherrypick has disappeared as it's generated by cherrypick, but for a non-interactive use it's kinda useless anyway so I probably should have looked into removing it earlier (I think the main use was investigation of the inflateinit issue). - Error on emptied commits has to be hand-rolled as `merge-tree` couldn't care less, this is not hard but is a bit annoying. - `merge-tree`'s conflict information only references raw commits, which makes sense, but requires updating a bunch of tests. Then again so does the fact that it *usually* doesn't send anything to stderr, so that's usually disappearing. Conveniently `merge-tree` merges the conflict marker directly in the files / tree so we don't have to mess about moving them back out of the repository and into the working copy as I assume cherry-pick does, which means we don't have to try and commit them back in ether. That is a huge part of the gain over faffing about with the working copy. Fixes #847 [3-way merge]: https://jvns.ca/blog/2023/11/10/how-cherry-pick-and-revert-work/
2024-07-05 18:32:02 +07:00
"Fetched head of %s (%s):\n%s",
[FIX] forwardport: storage of old garbage, repo cache sizes Since the forwardport bot works off of PRs, when it was created leveraging the magic refs of github (refs/pull/*/head) seemed sensible: when updating the cache repo, the magic refs would be updated alongside and the forward-porting would have all the commits available. Turns out there are a few issues with this: - the magic refs have become a bit unreliable, so we need a fallback (b1c16fce8768080d30430f4e6d3788b60ce13de7) - the forwardport bot only needs the commits on a transient basis, but the magic refs live forever and diverge from all other content (since we rarely `merge` PRs), for a large repo with lots of PRs this leads to a significant inflation in size of repo (both on-disk and objects count) e.g. odoo/odoo has about 25% more objects with the pull refs included (3486550 to 4395319) and takes nearly 50% more space on disk (1.21G to 1.77) As a result, we can just stop configuring the pull refs entirely, and instead fetch the heads (~refs) we need as we need them. This can be a touch more expensive at times as it requires two `fetch` calls, and we'll have a few redundant calls as every forward port of a chain will require a fetch of the root's head, *however* it will avoid retrieving PR data for e.g. prs to master, as they don't get forward-ported, they also won't get parasite updates from PRs being ignored or eventually closed. Eventually this could be optimised further by - performing an update of the cache repo at the start of the cron iff there are batches to port - creating a temp clone for the batch - fetching the heads of all PRs to forward port in the temp clone in a single `fetch` - performing all the ports by cloning the temp clone (and not `fetch`-ing into those) - then cleaning up the temp clone after having cleaned up individual forward port clones Small followup for #489
2022-11-08 18:02:45 +07:00
root.display_name,
[CHG] forwardport: perform forward porting without working copies The goal is to reduce maintenance and odd disk interactions & concurrency issues, by not creating concurrent clones, not having to push forks back in the repository, etc... it also removes the need to cleanup "scratch" working copies though that looks not to have been an issue in a while. The work is done on isolated objects without using or mutating refs, so even concurrent work should not be a problem. This turns out to not be any more verbose (less so if anything) than using `cherry-pick`, as that is not really designed for scripted / non-interactive use, or for squashing commits thereafter. Working directly with trees and commits is quite a bit cleaner even without a ton of helpers. Much of the credit goes to Julia Evans for [their investigation of 3-way merges as the underpinnings of cherry-picking][3-way merge], this would have been a lot more difficult if I'd had to rediscover the merge-base trick independently. A few things have been changed by this: - The old trace/stderr from cherrypick has disappeared as it's generated by cherrypick, but for a non-interactive use it's kinda useless anyway so I probably should have looked into removing it earlier (I think the main use was investigation of the inflateinit issue). - Error on emptied commits has to be hand-rolled as `merge-tree` couldn't care less, this is not hard but is a bit annoying. - `merge-tree`'s conflict information only references raw commits, which makes sense, but requires updating a bunch of tests. Then again so does the fact that it *usually* doesn't send anything to stderr, so that's usually disappearing. Conveniently `merge-tree` merges the conflict marker directly in the files / tree so we don't have to mess about moving them back out of the repository and into the working copy as I assume cherry-pick does, which means we don't have to try and commit them back in ether. That is a huge part of the gain over faffing about with the working copy. Fixes #847 [3-way merge]: https://jvns.ca/blog/2023/11/10/how-cherry-pick-and-revert-work/
2024-07-05 18:32:02 +07:00
root.head,
head_fetch.stdout.decode()
)
try:
[CHG] forwardport: perform forward porting without working copies The goal is to reduce maintenance and odd disk interactions & concurrency issues, by not creating concurrent clones, not having to push forks back in the repository, etc... it also removes the need to cleanup "scratch" working copies though that looks not to have been an issue in a while. The work is done on isolated objects without using or mutating refs, so even concurrent work should not be a problem. This turns out to not be any more verbose (less so if anything) than using `cherry-pick`, as that is not really designed for scripted / non-interactive use, or for squashing commits thereafter. Working directly with trees and commits is quite a bit cleaner even without a ton of helpers. Much of the credit goes to Julia Evans for [their investigation of 3-way merges as the underpinnings of cherry-picking][3-way merge], this would have been a lot more difficult if I'd had to rediscover the merge-base trick independently. A few things have been changed by this: - The old trace/stderr from cherrypick has disappeared as it's generated by cherrypick, but for a non-interactive use it's kinda useless anyway so I probably should have looked into removing it earlier (I think the main use was investigation of the inflateinit issue). - Error on emptied commits has to be hand-rolled as `merge-tree` couldn't care less, this is not hard but is a bit annoying. - `merge-tree`'s conflict information only references raw commits, which makes sense, but requires updating a bunch of tests. Then again so does the fact that it *usually* doesn't send anything to stderr, so that's usually disappearing. Conveniently `merge-tree` merges the conflict marker directly in the files / tree so we don't have to mess about moving them back out of the repository and into the working copy as I assume cherry-pick does, which means we don't have to try and commit them back in ether. That is a huge part of the gain over faffing about with the working copy. Fixes #847 [3-way merge]: https://jvns.ca/blog/2023/11/10/how-cherry-pick-and-revert-work/
2024-07-05 18:32:02 +07:00
return None, root._cherry_pick(source, target_branch.name)
except CherrypickError as e:
[FIX] forwardport: storage of old garbage, repo cache sizes Since the forwardport bot works off of PRs, when it was created leveraging the magic refs of github (refs/pull/*/head) seemed sensible: when updating the cache repo, the magic refs would be updated alongside and the forward-porting would have all the commits available. Turns out there are a few issues with this: - the magic refs have become a bit unreliable, so we need a fallback (b1c16fce8768080d30430f4e6d3788b60ce13de7) - the forwardport bot only needs the commits on a transient basis, but the magic refs live forever and diverge from all other content (since we rarely `merge` PRs), for a large repo with lots of PRs this leads to a significant inflation in size of repo (both on-disk and objects count) e.g. odoo/odoo has about 25% more objects with the pull refs included (3486550 to 4395319) and takes nearly 50% more space on disk (1.21G to 1.77) As a result, we can just stop configuring the pull refs entirely, and instead fetch the heads (~refs) we need as we need them. This can be a touch more expensive at times as it requires two `fetch` calls, and we'll have a few redundant calls as every forward port of a chain will require a fetch of the root's head, *however* it will avoid retrieving PR data for e.g. prs to master, as they don't get forward-ported, they also won't get parasite updates from PRs being ignored or eventually closed. Eventually this could be optimised further by - performing an update of the cache repo at the start of the cron iff there are batches to port - creating a temp clone for the batch - fetching the heads of all PRs to forward port in the temp clone in a single `fetch` - performing all the ports by cloning the temp clone (and not `fetch`-ing into those) - then cleaning up the temp clone after having cleaned up individual forward port clones Small followup for #489
2022-11-08 18:02:45 +07:00
h, out, err, commits = e.args
# commits returns oldest first, so youngest (head) last
[FIX] forwardport: storage of old garbage, repo cache sizes Since the forwardport bot works off of PRs, when it was created leveraging the magic refs of github (refs/pull/*/head) seemed sensible: when updating the cache repo, the magic refs would be updated alongside and the forward-porting would have all the commits available. Turns out there are a few issues with this: - the magic refs have become a bit unreliable, so we need a fallback (b1c16fce8768080d30430f4e6d3788b60ce13de7) - the forwardport bot only needs the commits on a transient basis, but the magic refs live forever and diverge from all other content (since we rarely `merge` PRs), for a large repo with lots of PRs this leads to a significant inflation in size of repo (both on-disk and objects count) e.g. odoo/odoo has about 25% more objects with the pull refs included (3486550 to 4395319) and takes nearly 50% more space on disk (1.21G to 1.77) As a result, we can just stop configuring the pull refs entirely, and instead fetch the heads (~refs) we need as we need them. This can be a touch more expensive at times as it requires two `fetch` calls, and we'll have a few redundant calls as every forward port of a chain will require a fetch of the root's head, *however* it will avoid retrieving PR data for e.g. prs to master, as they don't get forward-ported, they also won't get parasite updates from PRs being ignored or eventually closed. Eventually this could be optimised further by - performing an update of the cache repo at the start of the cron iff there are batches to port - creating a temp clone for the batch - fetching the heads of all PRs to forward port in the temp clone in a single `fetch` - performing all the ports by cloning the temp clone (and not `fetch`-ing into those) - then cleaning up the temp clone after having cleaned up individual forward port clones Small followup for #489
2022-11-08 18:02:45 +07:00
head_commit = commits[-1]['commit']
to_tuple = operator.itemgetter('name', 'email')
authors, committers = set(), set()
[CHG] forwardport: perform forward porting without working copies The goal is to reduce maintenance and odd disk interactions & concurrency issues, by not creating concurrent clones, not having to push forks back in the repository, etc... it also removes the need to cleanup "scratch" working copies though that looks not to have been an issue in a while. The work is done on isolated objects without using or mutating refs, so even concurrent work should not be a problem. This turns out to not be any more verbose (less so if anything) than using `cherry-pick`, as that is not really designed for scripted / non-interactive use, or for squashing commits thereafter. Working directly with trees and commits is quite a bit cleaner even without a ton of helpers. Much of the credit goes to Julia Evans for [their investigation of 3-way merges as the underpinnings of cherry-picking][3-way merge], this would have been a lot more difficult if I'd had to rediscover the merge-base trick independently. A few things have been changed by this: - The old trace/stderr from cherrypick has disappeared as it's generated by cherrypick, but for a non-interactive use it's kinda useless anyway so I probably should have looked into removing it earlier (I think the main use was investigation of the inflateinit issue). - Error on emptied commits has to be hand-rolled as `merge-tree` couldn't care less, this is not hard but is a bit annoying. - `merge-tree`'s conflict information only references raw commits, which makes sense, but requires updating a bunch of tests. Then again so does the fact that it *usually* doesn't send anything to stderr, so that's usually disappearing. Conveniently `merge-tree` merges the conflict marker directly in the files / tree so we don't have to mess about moving them back out of the repository and into the working copy as I assume cherry-pick does, which means we don't have to try and commit them back in ether. That is a huge part of the gain over faffing about with the working copy. Fixes #847 [3-way merge]: https://jvns.ca/blog/2023/11/10/how-cherry-pick-and-revert-work/
2024-07-05 18:32:02 +07:00
for commit in (c['commit'] for c in commits):
authors.add(to_tuple(commit['author']))
committers.add(to_tuple(commit['committer']))
fp_authorship = (self.repository.project_id.fp_github_name, '', '')
[FIX] forwardport: don't break forward porting on huge conflicts On forward-porting, odoo/odoo#170183 generates a conflict on pretty much every one of the 1111 files it touches, because they are modify/delete conflicts that generates a conflict message over 200 bytes per file, which is over 200kB of output. For this specific scenario, the commit message was being passed through arguments to the `git` command, resulting in a command line exceeding `MAX_ARG_STRLEN`[^1]. The obvious way to fix this is to pass the commit message via stdin as is done literally in the line above where we just copy a non-generated commit message. However I don't think hundreds of kbytes worth of stdout[^2] is of any use, so shorten that a bit, and stderr while at it. Don't touch the commit message size for now, possibly forever, but note that some log diving reveals a commit with a legit 18kB message (odoo/odoo@42a3b704f7c7889a74338253138d0201d3b6e7a3) so if we want to restrict that the limit should be at least 32k, and possibly 64. But it might be a good idea to make that limit part of the ready / merge checks too, rather than cut things off or trigger errors during staging. Fixes #900 [^1]: Most resources on "Argument list too long" reference `ARG_MAX`, but on both my machine and the server it is 2097152 (25% of the default stack), which is ~10x larger than the commit message we tried to generate. The actual limit is `MAX_ARG_STRLEN` which can't be queried directly but is essentially hard-coded to PAGE_SIZE * 32 = 128kiB, which tracks. [^2]: Somewhat unexpectedly, that's where `git-cherry-pick` sends the conflict info.
2024-06-25 18:11:32 +07:00
author = fp_authorship if len(authors) != 1 \
else authors.pop() + (head_commit['author']['date'],)
committer = fp_authorship if len(committers) != 1 \
else committers.pop() + (head_commit['committer']['date'],)
[CHG] forwardport: perform forward porting without working copies The goal is to reduce maintenance and odd disk interactions & concurrency issues, by not creating concurrent clones, not having to push forks back in the repository, etc... it also removes the need to cleanup "scratch" working copies though that looks not to have been an issue in a while. The work is done on isolated objects without using or mutating refs, so even concurrent work should not be a problem. This turns out to not be any more verbose (less so if anything) than using `cherry-pick`, as that is not really designed for scripted / non-interactive use, or for squashing commits thereafter. Working directly with trees and commits is quite a bit cleaner even without a ton of helpers. Much of the credit goes to Julia Evans for [their investigation of 3-way merges as the underpinnings of cherry-picking][3-way merge], this would have been a lot more difficult if I'd had to rediscover the merge-base trick independently. A few things have been changed by this: - The old trace/stderr from cherrypick has disappeared as it's generated by cherrypick, but for a non-interactive use it's kinda useless anyway so I probably should have looked into removing it earlier (I think the main use was investigation of the inflateinit issue). - Error on emptied commits has to be hand-rolled as `merge-tree` couldn't care less, this is not hard but is a bit annoying. - `merge-tree`'s conflict information only references raw commits, which makes sense, but requires updating a bunch of tests. Then again so does the fact that it *usually* doesn't send anything to stderr, so that's usually disappearing. Conveniently `merge-tree` merges the conflict marker directly in the files / tree so we don't have to mess about moving them back out of the repository and into the working copy as I assume cherry-pick does, which means we don't have to try and commit them back in ether. That is a huge part of the gain over faffing about with the working copy. Fixes #847 [3-way merge]: https://jvns.ca/blog/2023/11/10/how-cherry-pick-and-revert-work/
2024-07-05 18:32:02 +07:00
conf = source.with_params(
'merge.renamelimit=0',
'merge.renames=copies',
'merge.conflictstyle=zdiff3'
[CHG] forwardport: perform forward porting without working copies The goal is to reduce maintenance and odd disk interactions & concurrency issues, by not creating concurrent clones, not having to push forks back in the repository, etc... it also removes the need to cleanup "scratch" working copies though that looks not to have been an issue in a while. The work is done on isolated objects without using or mutating refs, so even concurrent work should not be a problem. This turns out to not be any more verbose (less so if anything) than using `cherry-pick`, as that is not really designed for scripted / non-interactive use, or for squashing commits thereafter. Working directly with trees and commits is quite a bit cleaner even without a ton of helpers. Much of the credit goes to Julia Evans for [their investigation of 3-way merges as the underpinnings of cherry-picking][3-way merge], this would have been a lot more difficult if I'd had to rediscover the merge-base trick independently. A few things have been changed by this: - The old trace/stderr from cherrypick has disappeared as it's generated by cherrypick, but for a non-interactive use it's kinda useless anyway so I probably should have looked into removing it earlier (I think the main use was investigation of the inflateinit issue). - Error on emptied commits has to be hand-rolled as `merge-tree` couldn't care less, this is not hard but is a bit annoying. - `merge-tree`'s conflict information only references raw commits, which makes sense, but requires updating a bunch of tests. Then again so does the fact that it *usually* doesn't send anything to stderr, so that's usually disappearing. Conveniently `merge-tree` merges the conflict marker directly in the files / tree so we don't have to mess about moving them back out of the repository and into the working copy as I assume cherry-pick does, which means we don't have to try and commit them back in ether. That is a huge part of the gain over faffing about with the working copy. Fixes #847 [3-way merge]: https://jvns.ca/blog/2023/11/10/how-cherry-pick-and-revert-work/
2024-07-05 18:32:02 +07:00
).with_config(stdout=subprocess.PIPE, stderr=subprocess.PIPE)
[FIX] forwardport: don't break forward porting on huge conflicts On forward-porting, odoo/odoo#170183 generates a conflict on pretty much every one of the 1111 files it touches, because they are modify/delete conflicts that generates a conflict message over 200 bytes per file, which is over 200kB of output. For this specific scenario, the commit message was being passed through arguments to the `git` command, resulting in a command line exceeding `MAX_ARG_STRLEN`[^1]. The obvious way to fix this is to pass the commit message via stdin as is done literally in the line above where we just copy a non-generated commit message. However I don't think hundreds of kbytes worth of stdout[^2] is of any use, so shorten that a bit, and stderr while at it. Don't touch the commit message size for now, possibly forever, but note that some log diving reveals a commit with a legit 18kB message (odoo/odoo@42a3b704f7c7889a74338253138d0201d3b6e7a3) so if we want to restrict that the limit should be at least 32k, and possibly 64. But it might be a good idea to make that limit part of the ready / merge checks too, rather than cut things off or trigger errors during staging. Fixes #900 [^1]: Most resources on "Argument list too long" reference `ARG_MAX`, but on both my machine and the server it is 2097152 (25% of the default stack), which is ~10x larger than the commit message we tried to generate. The actual limit is `MAX_ARG_STRLEN` which can't be queried directly but is essentially hard-coded to PAGE_SIZE * 32 = 128kiB, which tracks. [^2]: Somewhat unexpectedly, that's where `git-cherry-pick` sends the conflict info.
2024-06-25 18:11:32 +07:00
[CHG] forwardport: perform forward porting without working copies The goal is to reduce maintenance and odd disk interactions & concurrency issues, by not creating concurrent clones, not having to push forks back in the repository, etc... it also removes the need to cleanup "scratch" working copies though that looks not to have been an issue in a while. The work is done on isolated objects without using or mutating refs, so even concurrent work should not be a problem. This turns out to not be any more verbose (less so if anything) than using `cherry-pick`, as that is not really designed for scripted / non-interactive use, or for squashing commits thereafter. Working directly with trees and commits is quite a bit cleaner even without a ton of helpers. Much of the credit goes to Julia Evans for [their investigation of 3-way merges as the underpinnings of cherry-picking][3-way merge], this would have been a lot more difficult if I'd had to rediscover the merge-base trick independently. A few things have been changed by this: - The old trace/stderr from cherrypick has disappeared as it's generated by cherrypick, but for a non-interactive use it's kinda useless anyway so I probably should have looked into removing it earlier (I think the main use was investigation of the inflateinit issue). - Error on emptied commits has to be hand-rolled as `merge-tree` couldn't care less, this is not hard but is a bit annoying. - `merge-tree`'s conflict information only references raw commits, which makes sense, but requires updating a bunch of tests. Then again so does the fact that it *usually* doesn't send anything to stderr, so that's usually disappearing. Conveniently `merge-tree` merges the conflict marker directly in the files / tree so we don't have to mess about moving them back out of the repository and into the working copy as I assume cherry-pick does, which means we don't have to try and commit them back in ether. That is a huge part of the gain over faffing about with the working copy. Fixes #847 [3-way merge]: https://jvns.ca/blog/2023/11/10/how-cherry-pick-and-revert-work/
2024-07-05 18:32:02 +07:00
tree = conf.with_config(check=False).merge_tree(
'--merge-base', commits[0]['parents'][0]['sha'],
target_branch.name,
root.head,
).stdout.decode().splitlines(keepends=False)[0]
# if there was a single commit, reuse its message when committing
# the conflict
[FIX] forwardport: storage of old garbage, repo cache sizes Since the forwardport bot works off of PRs, when it was created leveraging the magic refs of github (refs/pull/*/head) seemed sensible: when updating the cache repo, the magic refs would be updated alongside and the forward-porting would have all the commits available. Turns out there are a few issues with this: - the magic refs have become a bit unreliable, so we need a fallback (b1c16fce8768080d30430f4e6d3788b60ce13de7) - the forwardport bot only needs the commits on a transient basis, but the magic refs live forever and diverge from all other content (since we rarely `merge` PRs), for a large repo with lots of PRs this leads to a significant inflation in size of repo (both on-disk and objects count) e.g. odoo/odoo has about 25% more objects with the pull refs included (3486550 to 4395319) and takes nearly 50% more space on disk (1.21G to 1.77) As a result, we can just stop configuring the pull refs entirely, and instead fetch the heads (~refs) we need as we need them. This can be a touch more expensive at times as it requires two `fetch` calls, and we'll have a few redundant calls as every forward port of a chain will require a fetch of the root's head, *however* it will avoid retrieving PR data for e.g. prs to master, as they don't get forward-ported, they also won't get parasite updates from PRs being ignored or eventually closed. Eventually this could be optimised further by - performing an update of the cache repo at the start of the cron iff there are batches to port - creating a temp clone for the batch - fetching the heads of all PRs to forward port in the temp clone in a single `fetch` - performing all the ports by cloning the temp clone (and not `fetch`-ing into those) - then cleaning up the temp clone after having cleaned up individual forward port clones Small followup for #489
2022-11-08 18:02:45 +07:00
if len(commits) == 1:
msg = root._make_fp_message(commits[0])
else:
[FIX] forwardport: don't break forward porting on huge conflicts On forward-porting, odoo/odoo#170183 generates a conflict on pretty much every one of the 1111 files it touches, because they are modify/delete conflicts that generates a conflict message over 200 bytes per file, which is over 200kB of output. For this specific scenario, the commit message was being passed through arguments to the `git` command, resulting in a command line exceeding `MAX_ARG_STRLEN`[^1]. The obvious way to fix this is to pass the commit message via stdin as is done literally in the line above where we just copy a non-generated commit message. However I don't think hundreds of kbytes worth of stdout[^2] is of any use, so shorten that a bit, and stderr while at it. Don't touch the commit message size for now, possibly forever, but note that some log diving reveals a commit with a legit 18kB message (odoo/odoo@42a3b704f7c7889a74338253138d0201d3b6e7a3) so if we want to restrict that the limit should be at least 32k, and possibly 64. But it might be a good idea to make that limit part of the ready / merge checks too, rather than cut things off or trigger errors during staging. Fixes #900 [^1]: Most resources on "Argument list too long" reference `ARG_MAX`, but on both my machine and the server it is 2097152 (25% of the default stack), which is ~10x larger than the commit message we tried to generate. The actual limit is `MAX_ARG_STRLEN` which can't be queried directly but is essentially hard-coded to PAGE_SIZE * 32 = 128kiB, which tracks. [^2]: Somewhat unexpectedly, that's where `git-cherry-pick` sends the conflict info.
2024-06-25 18:11:32 +07:00
out = utils.shorten(out, 8*1024, '[...]')
err = utils.shorten(err, 8*1024, '[...]')
msg = f"""Cherry pick of {h} failed
stdout:
[FIX] forwardport: don't break forward porting on huge conflicts On forward-porting, odoo/odoo#170183 generates a conflict on pretty much every one of the 1111 files it touches, because they are modify/delete conflicts that generates a conflict message over 200 bytes per file, which is over 200kB of output. For this specific scenario, the commit message was being passed through arguments to the `git` command, resulting in a command line exceeding `MAX_ARG_STRLEN`[^1]. The obvious way to fix this is to pass the commit message via stdin as is done literally in the line above where we just copy a non-generated commit message. However I don't think hundreds of kbytes worth of stdout[^2] is of any use, so shorten that a bit, and stderr while at it. Don't touch the commit message size for now, possibly forever, but note that some log diving reveals a commit with a legit 18kB message (odoo/odoo@42a3b704f7c7889a74338253138d0201d3b6e7a3) so if we want to restrict that the limit should be at least 32k, and possibly 64. But it might be a good idea to make that limit part of the ready / merge checks too, rather than cut things off or trigger errors during staging. Fixes #900 [^1]: Most resources on "Argument list too long" reference `ARG_MAX`, but on both my machine and the server it is 2097152 (25% of the default stack), which is ~10x larger than the commit message we tried to generate. The actual limit is `MAX_ARG_STRLEN` which can't be queried directly but is essentially hard-coded to PAGE_SIZE * 32 = 128kiB, which tracks. [^2]: Somewhat unexpectedly, that's where `git-cherry-pick` sends the conflict info.
2024-06-25 18:11:32 +07:00
{out}
stderr:
[FIX] forwardport: don't break forward porting on huge conflicts On forward-porting, odoo/odoo#170183 generates a conflict on pretty much every one of the 1111 files it touches, because they are modify/delete conflicts that generates a conflict message over 200 bytes per file, which is over 200kB of output. For this specific scenario, the commit message was being passed through arguments to the `git` command, resulting in a command line exceeding `MAX_ARG_STRLEN`[^1]. The obvious way to fix this is to pass the commit message via stdin as is done literally in the line above where we just copy a non-generated commit message. However I don't think hundreds of kbytes worth of stdout[^2] is of any use, so shorten that a bit, and stderr while at it. Don't touch the commit message size for now, possibly forever, but note that some log diving reveals a commit with a legit 18kB message (odoo/odoo@42a3b704f7c7889a74338253138d0201d3b6e7a3) so if we want to restrict that the limit should be at least 32k, and possibly 64. But it might be a good idea to make that limit part of the ready / merge checks too, rather than cut things off or trigger errors during staging. Fixes #900 [^1]: Most resources on "Argument list too long" reference `ARG_MAX`, but on both my machine and the server it is 2097152 (25% of the default stack), which is ~10x larger than the commit message we tried to generate. The actual limit is `MAX_ARG_STRLEN` which can't be queried directly but is essentially hard-coded to PAGE_SIZE * 32 = 128kiB, which tracks. [^2]: Somewhat unexpectedly, that's where `git-cherry-pick` sends the conflict info.
2024-06-25 18:11:32 +07:00
{err}
"""
# if a file is modified by the original PR and added by the forward
# port / conflict it's a modify/delete conflict: the file was
# deleted in the target branch, and the update (modify) in the
# original PR causes it to be added back
original_modified = set(conf.diff(
"--diff-filter=M", "--name-only",
"--merge-base", root.target.name,
root.head,
).stdout.decode().splitlines(keepends=False))
conflict_added = set(conf.diff(
"--diff-filter=A", "--name-only",
target_branch.name,
tree,
).stdout.decode().splitlines(keepends=False))
if modify_delete := (conflict_added & original_modified):
# rewrite the tree with conflict markers added to modify/deleted blobs
tree = conf.modify_delete(tree, modify_delete)
target_head = source.stdout().rev_parse(f"refs/heads/{target_branch.name}")\
.stdout.decode().strip()
[CHG] forwardport: perform forward porting without working copies The goal is to reduce maintenance and odd disk interactions & concurrency issues, by not creating concurrent clones, not having to push forks back in the repository, etc... it also removes the need to cleanup "scratch" working copies though that looks not to have been an issue in a while. The work is done on isolated objects without using or mutating refs, so even concurrent work should not be a problem. This turns out to not be any more verbose (less so if anything) than using `cherry-pick`, as that is not really designed for scripted / non-interactive use, or for squashing commits thereafter. Working directly with trees and commits is quite a bit cleaner even without a ton of helpers. Much of the credit goes to Julia Evans for [their investigation of 3-way merges as the underpinnings of cherry-picking][3-way merge], this would have been a lot more difficult if I'd had to rediscover the merge-base trick independently. A few things have been changed by this: - The old trace/stderr from cherrypick has disappeared as it's generated by cherrypick, but for a non-interactive use it's kinda useless anyway so I probably should have looked into removing it earlier (I think the main use was investigation of the inflateinit issue). - Error on emptied commits has to be hand-rolled as `merge-tree` couldn't care less, this is not hard but is a bit annoying. - `merge-tree`'s conflict information only references raw commits, which makes sense, but requires updating a bunch of tests. Then again so does the fact that it *usually* doesn't send anything to stderr, so that's usually disappearing. Conveniently `merge-tree` merges the conflict marker directly in the files / tree so we don't have to mess about moving them back out of the repository and into the working copy as I assume cherry-pick does, which means we don't have to try and commit them back in ether. That is a huge part of the gain over faffing about with the working copy. Fixes #847 [3-way merge]: https://jvns.ca/blog/2023/11/10/how-cherry-pick-and-revert-work/
2024-07-05 18:32:02 +07:00
commit = conf.commit_tree(
tree=tree,
[CHG] forwardport: perform forward porting without working copies The goal is to reduce maintenance and odd disk interactions & concurrency issues, by not creating concurrent clones, not having to push forks back in the repository, etc... it also removes the need to cleanup "scratch" working copies though that looks not to have been an issue in a while. The work is done on isolated objects without using or mutating refs, so even concurrent work should not be a problem. This turns out to not be any more verbose (less so if anything) than using `cherry-pick`, as that is not really designed for scripted / non-interactive use, or for squashing commits thereafter. Working directly with trees and commits is quite a bit cleaner even without a ton of helpers. Much of the credit goes to Julia Evans for [their investigation of 3-way merges as the underpinnings of cherry-picking][3-way merge], this would have been a lot more difficult if I'd had to rediscover the merge-base trick independently. A few things have been changed by this: - The old trace/stderr from cherrypick has disappeared as it's generated by cherrypick, but for a non-interactive use it's kinda useless anyway so I probably should have looked into removing it earlier (I think the main use was investigation of the inflateinit issue). - Error on emptied commits has to be hand-rolled as `merge-tree` couldn't care less, this is not hard but is a bit annoying. - `merge-tree`'s conflict information only references raw commits, which makes sense, but requires updating a bunch of tests. Then again so does the fact that it *usually* doesn't send anything to stderr, so that's usually disappearing. Conveniently `merge-tree` merges the conflict marker directly in the files / tree so we don't have to mess about moving them back out of the repository and into the working copy as I assume cherry-pick does, which means we don't have to try and commit them back in ether. That is a huge part of the gain over faffing about with the working copy. Fixes #847 [3-way merge]: https://jvns.ca/blog/2023/11/10/how-cherry-pick-and-revert-work/
2024-07-05 18:32:02 +07:00
parents=[target_head],
message=str(msg),
author=author,
committer=committer[:2],
)
assert commit.returncode == 0,\
f"commit failed\n\n{commit.stdout.decode()}\n\n{commit.stderr.decode}"
hh = commit.stdout.strip()
[FIX] forwardport: don't break forward porting on huge conflicts On forward-porting, odoo/odoo#170183 generates a conflict on pretty much every one of the 1111 files it touches, because they are modify/delete conflicts that generates a conflict message over 200 bytes per file, which is over 200kB of output. For this specific scenario, the commit message was being passed through arguments to the `git` command, resulting in a command line exceeding `MAX_ARG_STRLEN`[^1]. The obvious way to fix this is to pass the commit message via stdin as is done literally in the line above where we just copy a non-generated commit message. However I don't think hundreds of kbytes worth of stdout[^2] is of any use, so shorten that a bit, and stderr while at it. Don't touch the commit message size for now, possibly forever, but note that some log diving reveals a commit with a legit 18kB message (odoo/odoo@42a3b704f7c7889a74338253138d0201d3b6e7a3) so if we want to restrict that the limit should be at least 32k, and possibly 64. But it might be a good idea to make that limit part of the ready / merge checks too, rather than cut things off or trigger errors during staging. Fixes #900 [^1]: Most resources on "Argument list too long" reference `ARG_MAX`, but on both my machine and the server it is 2097152 (25% of the default stack), which is ~10x larger than the commit message we tried to generate. The actual limit is `MAX_ARG_STRLEN` which can't be queried directly but is essentially hard-coded to PAGE_SIZE * 32 = 128kiB, which tracks. [^2]: Somewhat unexpectedly, that's where `git-cherry-pick` sends the conflict info.
2024-06-25 18:11:32 +07:00
[CHG] forwardport: perform forward porting without working copies The goal is to reduce maintenance and odd disk interactions & concurrency issues, by not creating concurrent clones, not having to push forks back in the repository, etc... it also removes the need to cleanup "scratch" working copies though that looks not to have been an issue in a while. The work is done on isolated objects without using or mutating refs, so even concurrent work should not be a problem. This turns out to not be any more verbose (less so if anything) than using `cherry-pick`, as that is not really designed for scripted / non-interactive use, or for squashing commits thereafter. Working directly with trees and commits is quite a bit cleaner even without a ton of helpers. Much of the credit goes to Julia Evans for [their investigation of 3-way merges as the underpinnings of cherry-picking][3-way merge], this would have been a lot more difficult if I'd had to rediscover the merge-base trick independently. A few things have been changed by this: - The old trace/stderr from cherrypick has disappeared as it's generated by cherrypick, but for a non-interactive use it's kinda useless anyway so I probably should have looked into removing it earlier (I think the main use was investigation of the inflateinit issue). - Error on emptied commits has to be hand-rolled as `merge-tree` couldn't care less, this is not hard but is a bit annoying. - `merge-tree`'s conflict information only references raw commits, which makes sense, but requires updating a bunch of tests. Then again so does the fact that it *usually* doesn't send anything to stderr, so that's usually disappearing. Conveniently `merge-tree` merges the conflict marker directly in the files / tree so we don't have to mess about moving them back out of the repository and into the working copy as I assume cherry-pick does, which means we don't have to try and commit them back in ether. That is a huge part of the gain over faffing about with the working copy. Fixes #847 [3-way merge]: https://jvns.ca/blog/2023/11/10/how-cherry-pick-and-revert-work/
2024-07-05 18:32:02 +07:00
return (h, out, err, [c['sha'] for c in commits]), hh
[CHG] forwardport: perform forward porting without working copies The goal is to reduce maintenance and odd disk interactions & concurrency issues, by not creating concurrent clones, not having to push forks back in the repository, etc... it also removes the need to cleanup "scratch" working copies though that looks not to have been an issue in a while. The work is done on isolated objects without using or mutating refs, so even concurrent work should not be a problem. This turns out to not be any more verbose (less so if anything) than using `cherry-pick`, as that is not really designed for scripted / non-interactive use, or for squashing commits thereafter. Working directly with trees and commits is quite a bit cleaner even without a ton of helpers. Much of the credit goes to Julia Evans for [their investigation of 3-way merges as the underpinnings of cherry-picking][3-way merge], this would have been a lot more difficult if I'd had to rediscover the merge-base trick independently. A few things have been changed by this: - The old trace/stderr from cherrypick has disappeared as it's generated by cherrypick, but for a non-interactive use it's kinda useless anyway so I probably should have looked into removing it earlier (I think the main use was investigation of the inflateinit issue). - Error on emptied commits has to be hand-rolled as `merge-tree` couldn't care less, this is not hard but is a bit annoying. - `merge-tree`'s conflict information only references raw commits, which makes sense, but requires updating a bunch of tests. Then again so does the fact that it *usually* doesn't send anything to stderr, so that's usually disappearing. Conveniently `merge-tree` merges the conflict marker directly in the files / tree so we don't have to mess about moving them back out of the repository and into the working copy as I assume cherry-pick does, which means we don't have to try and commit them back in ether. That is a huge part of the gain over faffing about with the working copy. Fixes #847 [3-way merge]: https://jvns.ca/blog/2023/11/10/how-cherry-pick-and-revert-work/
2024-07-05 18:32:02 +07:00
def _cherry_pick(self, repo: git.Repo, branch: Branch) -> str:
""" Cherrypicks ``self`` into ``branch``
[CHG] forwardport: perform forward porting without working copies The goal is to reduce maintenance and odd disk interactions & concurrency issues, by not creating concurrent clones, not having to push forks back in the repository, etc... it also removes the need to cleanup "scratch" working copies though that looks not to have been an issue in a while. The work is done on isolated objects without using or mutating refs, so even concurrent work should not be a problem. This turns out to not be any more verbose (less so if anything) than using `cherry-pick`, as that is not really designed for scripted / non-interactive use, or for squashing commits thereafter. Working directly with trees and commits is quite a bit cleaner even without a ton of helpers. Much of the credit goes to Julia Evans for [their investigation of 3-way merges as the underpinnings of cherry-picking][3-way merge], this would have been a lot more difficult if I'd had to rediscover the merge-base trick independently. A few things have been changed by this: - The old trace/stderr from cherrypick has disappeared as it's generated by cherrypick, but for a non-interactive use it's kinda useless anyway so I probably should have looked into removing it earlier (I think the main use was investigation of the inflateinit issue). - Error on emptied commits has to be hand-rolled as `merge-tree` couldn't care less, this is not hard but is a bit annoying. - `merge-tree`'s conflict information only references raw commits, which makes sense, but requires updating a bunch of tests. Then again so does the fact that it *usually* doesn't send anything to stderr, so that's usually disappearing. Conveniently `merge-tree` merges the conflict marker directly in the files / tree so we don't have to mess about moving them back out of the repository and into the working copy as I assume cherry-pick does, which means we don't have to try and commit them back in ether. That is a huge part of the gain over faffing about with the working copy. Fixes #847 [3-way merge]: https://jvns.ca/blog/2023/11/10/how-cherry-pick-and-revert-work/
2024-07-05 18:32:02 +07:00
:return: the HEAD of the forward-port is successful
:raises CherrypickError: in case of conflict
"""
# <xxx>.cherrypick.<number>
logger = _logger.getChild(str(self.id)).getChild('cherrypick')
[CHG] forwardport: perform forward porting without working copies The goal is to reduce maintenance and odd disk interactions & concurrency issues, by not creating concurrent clones, not having to push forks back in the repository, etc... it also removes the need to cleanup "scratch" working copies though that looks not to have been an issue in a while. The work is done on isolated objects without using or mutating refs, so even concurrent work should not be a problem. This turns out to not be any more verbose (less so if anything) than using `cherry-pick`, as that is not really designed for scripted / non-interactive use, or for squashing commits thereafter. Working directly with trees and commits is quite a bit cleaner even without a ton of helpers. Much of the credit goes to Julia Evans for [their investigation of 3-way merges as the underpinnings of cherry-picking][3-way merge], this would have been a lot more difficult if I'd had to rediscover the merge-base trick independently. A few things have been changed by this: - The old trace/stderr from cherrypick has disappeared as it's generated by cherrypick, but for a non-interactive use it's kinda useless anyway so I probably should have looked into removing it earlier (I think the main use was investigation of the inflateinit issue). - Error on emptied commits has to be hand-rolled as `merge-tree` couldn't care less, this is not hard but is a bit annoying. - `merge-tree`'s conflict information only references raw commits, which makes sense, but requires updating a bunch of tests. Then again so does the fact that it *usually* doesn't send anything to stderr, so that's usually disappearing. Conveniently `merge-tree` merges the conflict marker directly in the files / tree so we don't have to mess about moving them back out of the repository and into the working copy as I assume cherry-pick does, which means we don't have to try and commit them back in ether. That is a huge part of the gain over faffing about with the working copy. Fixes #847 [3-way merge]: https://jvns.ca/blog/2023/11/10/how-cherry-pick-and-revert-work/
2024-07-05 18:32:02 +07:00
# target's head
head = repo.stdout().rev_parse(f"refs/heads/{branch}").stdout.decode().strip()
commits = self.commits()
[CHG] forwardport: perform forward porting without working copies The goal is to reduce maintenance and odd disk interactions & concurrency issues, by not creating concurrent clones, not having to push forks back in the repository, etc... it also removes the need to cleanup "scratch" working copies though that looks not to have been an issue in a while. The work is done on isolated objects without using or mutating refs, so even concurrent work should not be a problem. This turns out to not be any more verbose (less so if anything) than using `cherry-pick`, as that is not really designed for scripted / non-interactive use, or for squashing commits thereafter. Working directly with trees and commits is quite a bit cleaner even without a ton of helpers. Much of the credit goes to Julia Evans for [their investigation of 3-way merges as the underpinnings of cherry-picking][3-way merge], this would have been a lot more difficult if I'd had to rediscover the merge-base trick independently. A few things have been changed by this: - The old trace/stderr from cherrypick has disappeared as it's generated by cherrypick, but for a non-interactive use it's kinda useless anyway so I probably should have looked into removing it earlier (I think the main use was investigation of the inflateinit issue). - Error on emptied commits has to be hand-rolled as `merge-tree` couldn't care less, this is not hard but is a bit annoying. - `merge-tree`'s conflict information only references raw commits, which makes sense, but requires updating a bunch of tests. Then again so does the fact that it *usually* doesn't send anything to stderr, so that's usually disappearing. Conveniently `merge-tree` merges the conflict marker directly in the files / tree so we don't have to mess about moving them back out of the repository and into the working copy as I assume cherry-pick does, which means we don't have to try and commit them back in ether. That is a huge part of the gain over faffing about with the working copy. Fixes #847 [3-way merge]: https://jvns.ca/blog/2023/11/10/how-cherry-pick-and-revert-work/
2024-07-05 18:32:02 +07:00
logger.info(
"%s: copy %s commits to %s (%s)%s",
self, len(commits), branch, head, ''.join(
'\n- %s: %s' % (c['sha'], c['commit']['message'].splitlines()[0])
for c in commits
)
)
[CHG] forwardport: perform forward porting without working copies The goal is to reduce maintenance and odd disk interactions & concurrency issues, by not creating concurrent clones, not having to push forks back in the repository, etc... it also removes the need to cleanup "scratch" working copies though that looks not to have been an issue in a while. The work is done on isolated objects without using or mutating refs, so even concurrent work should not be a problem. This turns out to not be any more verbose (less so if anything) than using `cherry-pick`, as that is not really designed for scripted / non-interactive use, or for squashing commits thereafter. Working directly with trees and commits is quite a bit cleaner even without a ton of helpers. Much of the credit goes to Julia Evans for [their investigation of 3-way merges as the underpinnings of cherry-picking][3-way merge], this would have been a lot more difficult if I'd had to rediscover the merge-base trick independently. A few things have been changed by this: - The old trace/stderr from cherrypick has disappeared as it's generated by cherrypick, but for a non-interactive use it's kinda useless anyway so I probably should have looked into removing it earlier (I think the main use was investigation of the inflateinit issue). - Error on emptied commits has to be hand-rolled as `merge-tree` couldn't care less, this is not hard but is a bit annoying. - `merge-tree`'s conflict information only references raw commits, which makes sense, but requires updating a bunch of tests. Then again so does the fact that it *usually* doesn't send anything to stderr, so that's usually disappearing. Conveniently `merge-tree` merges the conflict marker directly in the files / tree so we don't have to mess about moving them back out of the repository and into the working copy as I assume cherry-pick does, which means we don't have to try and commit them back in ether. That is a huge part of the gain over faffing about with the working copy. Fixes #847 [3-way merge]: https://jvns.ca/blog/2023/11/10/how-cherry-pick-and-revert-work/
2024-07-05 18:32:02 +07:00
conf = repo.with_params(
'merge.renamelimit=0',
'merge.renames=copies',
).with_config(
stdout=subprocess.PIPE, stderr=subprocess.PIPE,
[CHG] forwardport: perform forward porting without working copies The goal is to reduce maintenance and odd disk interactions & concurrency issues, by not creating concurrent clones, not having to push forks back in the repository, etc... it also removes the need to cleanup "scratch" working copies though that looks not to have been an issue in a while. The work is done on isolated objects without using or mutating refs, so even concurrent work should not be a problem. This turns out to not be any more verbose (less so if anything) than using `cherry-pick`, as that is not really designed for scripted / non-interactive use, or for squashing commits thereafter. Working directly with trees and commits is quite a bit cleaner even without a ton of helpers. Much of the credit goes to Julia Evans for [their investigation of 3-way merges as the underpinnings of cherry-picking][3-way merge], this would have been a lot more difficult if I'd had to rediscover the merge-base trick independently. A few things have been changed by this: - The old trace/stderr from cherrypick has disappeared as it's generated by cherrypick, but for a non-interactive use it's kinda useless anyway so I probably should have looked into removing it earlier (I think the main use was investigation of the inflateinit issue). - Error on emptied commits has to be hand-rolled as `merge-tree` couldn't care less, this is not hard but is a bit annoying. - `merge-tree`'s conflict information only references raw commits, which makes sense, but requires updating a bunch of tests. Then again so does the fact that it *usually* doesn't send anything to stderr, so that's usually disappearing. Conveniently `merge-tree` merges the conflict marker directly in the files / tree so we don't have to mess about moving them back out of the repository and into the working copy as I assume cherry-pick does, which means we don't have to try and commit them back in ether. That is a huge part of the gain over faffing about with the working copy. Fixes #847 [3-way merge]: https://jvns.ca/blog/2023/11/10/how-cherry-pick-and-revert-work/
2024-07-05 18:32:02 +07:00
check=False,
)
for commit in commits:
commit_sha = commit['sha']
[CHG] forwardport: perform forward porting without working copies The goal is to reduce maintenance and odd disk interactions & concurrency issues, by not creating concurrent clones, not having to push forks back in the repository, etc... it also removes the need to cleanup "scratch" working copies though that looks not to have been an issue in a while. The work is done on isolated objects without using or mutating refs, so even concurrent work should not be a problem. This turns out to not be any more verbose (less so if anything) than using `cherry-pick`, as that is not really designed for scripted / non-interactive use, or for squashing commits thereafter. Working directly with trees and commits is quite a bit cleaner even without a ton of helpers. Much of the credit goes to Julia Evans for [their investigation of 3-way merges as the underpinnings of cherry-picking][3-way merge], this would have been a lot more difficult if I'd had to rediscover the merge-base trick independently. A few things have been changed by this: - The old trace/stderr from cherrypick has disappeared as it's generated by cherrypick, but for a non-interactive use it's kinda useless anyway so I probably should have looked into removing it earlier (I think the main use was investigation of the inflateinit issue). - Error on emptied commits has to be hand-rolled as `merge-tree` couldn't care less, this is not hard but is a bit annoying. - `merge-tree`'s conflict information only references raw commits, which makes sense, but requires updating a bunch of tests. Then again so does the fact that it *usually* doesn't send anything to stderr, so that's usually disappearing. Conveniently `merge-tree` merges the conflict marker directly in the files / tree so we don't have to mess about moving them back out of the repository and into the working copy as I assume cherry-pick does, which means we don't have to try and commit them back in ether. That is a huge part of the gain over faffing about with the working copy. Fixes #847 [3-way merge]: https://jvns.ca/blog/2023/11/10/how-cherry-pick-and-revert-work/
2024-07-05 18:32:02 +07:00
# merge-tree is a bit stupid and gets confused when the options
# follow the parameters
r = conf.merge_tree('--merge-base', commit['parents'][0]['sha'], head, commit_sha)
new_tree = r.stdout.decode().splitlines(keepends=False)[0]
if r.returncode:
[CHG] forwardport: perform forward porting without working copies The goal is to reduce maintenance and odd disk interactions & concurrency issues, by not creating concurrent clones, not having to push forks back in the repository, etc... it also removes the need to cleanup "scratch" working copies though that looks not to have been an issue in a while. The work is done on isolated objects without using or mutating refs, so even concurrent work should not be a problem. This turns out to not be any more verbose (less so if anything) than using `cherry-pick`, as that is not really designed for scripted / non-interactive use, or for squashing commits thereafter. Working directly with trees and commits is quite a bit cleaner even without a ton of helpers. Much of the credit goes to Julia Evans for [their investigation of 3-way merges as the underpinnings of cherry-picking][3-way merge], this would have been a lot more difficult if I'd had to rediscover the merge-base trick independently. A few things have been changed by this: - The old trace/stderr from cherrypick has disappeared as it's generated by cherrypick, but for a non-interactive use it's kinda useless anyway so I probably should have looked into removing it earlier (I think the main use was investigation of the inflateinit issue). - Error on emptied commits has to be hand-rolled as `merge-tree` couldn't care less, this is not hard but is a bit annoying. - `merge-tree`'s conflict information only references raw commits, which makes sense, but requires updating a bunch of tests. Then again so does the fact that it *usually* doesn't send anything to stderr, so that's usually disappearing. Conveniently `merge-tree` merges the conflict marker directly in the files / tree so we don't have to mess about moving them back out of the repository and into the working copy as I assume cherry-pick does, which means we don't have to try and commit them back in ether. That is a huge part of the gain over faffing about with the working copy. Fixes #847 [3-way merge]: https://jvns.ca/blog/2023/11/10/how-cherry-pick-and-revert-work/
2024-07-05 18:32:02 +07:00
# For merge-tree the stdout on conflict is of the form
#
# oid of toplevel tree
# conflicted file info+
#
# informational messages+
#
# to match cherrypick we only want the informational messages,
# so strip everything else
r.stdout = r.stdout.split(b'\n\n')[-1]
else:
# By default cherry-pick fails if a non-empty commit becomes
# empty (--empty=stop), also it fails when cherrypicking already
# empty commits which I didn't think we prevented but clearly we
# do...?
parent_tree = conf.rev_parse(f'{head}^{{tree}}').stdout.decode().strip()
if parent_tree == new_tree:
r.returncode = 1
r.stdout = f"You are currently cherry-picking commit {commit_sha}.".encode()
r.stderr = b"The previous cherry-pick is now empty, possibly due to conflict resolution."
[CHG] forwardport: perform forward porting without working copies The goal is to reduce maintenance and odd disk interactions & concurrency issues, by not creating concurrent clones, not having to push forks back in the repository, etc... it also removes the need to cleanup "scratch" working copies though that looks not to have been an issue in a while. The work is done on isolated objects without using or mutating refs, so even concurrent work should not be a problem. This turns out to not be any more verbose (less so if anything) than using `cherry-pick`, as that is not really designed for scripted / non-interactive use, or for squashing commits thereafter. Working directly with trees and commits is quite a bit cleaner even without a ton of helpers. Much of the credit goes to Julia Evans for [their investigation of 3-way merges as the underpinnings of cherry-picking][3-way merge], this would have been a lot more difficult if I'd had to rediscover the merge-base trick independently. A few things have been changed by this: - The old trace/stderr from cherrypick has disappeared as it's generated by cherrypick, but for a non-interactive use it's kinda useless anyway so I probably should have looked into removing it earlier (I think the main use was investigation of the inflateinit issue). - Error on emptied commits has to be hand-rolled as `merge-tree` couldn't care less, this is not hard but is a bit annoying. - `merge-tree`'s conflict information only references raw commits, which makes sense, but requires updating a bunch of tests. Then again so does the fact that it *usually* doesn't send anything to stderr, so that's usually disappearing. Conveniently `merge-tree` merges the conflict marker directly in the files / tree so we don't have to mess about moving them back out of the repository and into the working copy as I assume cherry-pick does, which means we don't have to try and commit them back in ether. That is a huge part of the gain over faffing about with the working copy. Fixes #847 [3-way merge]: https://jvns.ca/blog/2023/11/10/how-cherry-pick-and-revert-work/
2024-07-05 18:32:02 +07:00
logger.debug("Cherry-picked %s: %s\n%s\n%s", commit_sha, r.returncode, r.stdout.decode(), _clean_rename(r.stderr.decode()))
if r.returncode: # pick failed, bail
# try to log inflateInit: out of memory errors as warning, they
[CHG] forwardport: perform forward porting without working copies The goal is to reduce maintenance and odd disk interactions & concurrency issues, by not creating concurrent clones, not having to push forks back in the repository, etc... it also removes the need to cleanup "scratch" working copies though that looks not to have been an issue in a while. The work is done on isolated objects without using or mutating refs, so even concurrent work should not be a problem. This turns out to not be any more verbose (less so if anything) than using `cherry-pick`, as that is not really designed for scripted / non-interactive use, or for squashing commits thereafter. Working directly with trees and commits is quite a bit cleaner even without a ton of helpers. Much of the credit goes to Julia Evans for [their investigation of 3-way merges as the underpinnings of cherry-picking][3-way merge], this would have been a lot more difficult if I'd had to rediscover the merge-base trick independently. A few things have been changed by this: - The old trace/stderr from cherrypick has disappeared as it's generated by cherrypick, but for a non-interactive use it's kinda useless anyway so I probably should have looked into removing it earlier (I think the main use was investigation of the inflateinit issue). - Error on emptied commits has to be hand-rolled as `merge-tree` couldn't care less, this is not hard but is a bit annoying. - `merge-tree`'s conflict information only references raw commits, which makes sense, but requires updating a bunch of tests. Then again so does the fact that it *usually* doesn't send anything to stderr, so that's usually disappearing. Conveniently `merge-tree` merges the conflict marker directly in the files / tree so we don't have to mess about moving them back out of the repository and into the working copy as I assume cherry-pick does, which means we don't have to try and commit them back in ether. That is a huge part of the gain over faffing about with the working copy. Fixes #847 [3-way merge]: https://jvns.ca/blog/2023/11/10/how-cherry-pick-and-revert-work/
2024-07-05 18:32:02 +07:00
# seem to return the status code 128 (nb: may not work anymore
# with merge-tree, idk)
logger.log(
logging.WARNING if r.returncode == 128 else logging.INFO,
"forward-port of %s (%s) failed at %s",
self, self.display_name, commit_sha)
[CHG] forwardport: perform forward porting without working copies The goal is to reduce maintenance and odd disk interactions & concurrency issues, by not creating concurrent clones, not having to push forks back in the repository, etc... it also removes the need to cleanup "scratch" working copies though that looks not to have been an issue in a while. The work is done on isolated objects without using or mutating refs, so even concurrent work should not be a problem. This turns out to not be any more verbose (less so if anything) than using `cherry-pick`, as that is not really designed for scripted / non-interactive use, or for squashing commits thereafter. Working directly with trees and commits is quite a bit cleaner even without a ton of helpers. Much of the credit goes to Julia Evans for [their investigation of 3-way merges as the underpinnings of cherry-picking][3-way merge], this would have been a lot more difficult if I'd had to rediscover the merge-base trick independently. A few things have been changed by this: - The old trace/stderr from cherrypick has disappeared as it's generated by cherrypick, but for a non-interactive use it's kinda useless anyway so I probably should have looked into removing it earlier (I think the main use was investigation of the inflateinit issue). - Error on emptied commits has to be hand-rolled as `merge-tree` couldn't care less, this is not hard but is a bit annoying. - `merge-tree`'s conflict information only references raw commits, which makes sense, but requires updating a bunch of tests. Then again so does the fact that it *usually* doesn't send anything to stderr, so that's usually disappearing. Conveniently `merge-tree` merges the conflict marker directly in the files / tree so we don't have to mess about moving them back out of the repository and into the working copy as I assume cherry-pick does, which means we don't have to try and commit them back in ether. That is a huge part of the gain over faffing about with the working copy. Fixes #847 [3-way merge]: https://jvns.ca/blog/2023/11/10/how-cherry-pick-and-revert-work/
2024-07-05 18:32:02 +07:00
raise CherrypickError(
commit_sha,
r.stdout.decode(),
_clean_rename(r.stderr.decode()),
[FIX] forwardport: storage of old garbage, repo cache sizes Since the forwardport bot works off of PRs, when it was created leveraging the magic refs of github (refs/pull/*/head) seemed sensible: when updating the cache repo, the magic refs would be updated alongside and the forward-porting would have all the commits available. Turns out there are a few issues with this: - the magic refs have become a bit unreliable, so we need a fallback (b1c16fce8768080d30430f4e6d3788b60ce13de7) - the forwardport bot only needs the commits on a transient basis, but the magic refs live forever and diverge from all other content (since we rarely `merge` PRs), for a large repo with lots of PRs this leads to a significant inflation in size of repo (both on-disk and objects count) e.g. odoo/odoo has about 25% more objects with the pull refs included (3486550 to 4395319) and takes nearly 50% more space on disk (1.21G to 1.77) As a result, we can just stop configuring the pull refs entirely, and instead fetch the heads (~refs) we need as we need them. This can be a touch more expensive at times as it requires two `fetch` calls, and we'll have a few redundant calls as every forward port of a chain will require a fetch of the root's head, *however* it will avoid retrieving PR data for e.g. prs to master, as they don't get forward-ported, they also won't get parasite updates from PRs being ignored or eventually closed. Eventually this could be optimised further by - performing an update of the cache repo at the start of the cron iff there are batches to port - creating a temp clone for the batch - fetching the heads of all PRs to forward port in the temp clone in a single `fetch` - performing all the ports by cloning the temp clone (and not `fetch`-ing into those) - then cleaning up the temp clone after having cleaned up individual forward port clones Small followup for #489
2022-11-08 18:02:45 +07:00
commits
)
[CHG] forwardport: perform forward porting without working copies The goal is to reduce maintenance and odd disk interactions & concurrency issues, by not creating concurrent clones, not having to push forks back in the repository, etc... it also removes the need to cleanup "scratch" working copies though that looks not to have been an issue in a while. The work is done on isolated objects without using or mutating refs, so even concurrent work should not be a problem. This turns out to not be any more verbose (less so if anything) than using `cherry-pick`, as that is not really designed for scripted / non-interactive use, or for squashing commits thereafter. Working directly with trees and commits is quite a bit cleaner even without a ton of helpers. Much of the credit goes to Julia Evans for [their investigation of 3-way merges as the underpinnings of cherry-picking][3-way merge], this would have been a lot more difficult if I'd had to rediscover the merge-base trick independently. A few things have been changed by this: - The old trace/stderr from cherrypick has disappeared as it's generated by cherrypick, but for a non-interactive use it's kinda useless anyway so I probably should have looked into removing it earlier (I think the main use was investigation of the inflateinit issue). - Error on emptied commits has to be hand-rolled as `merge-tree` couldn't care less, this is not hard but is a bit annoying. - `merge-tree`'s conflict information only references raw commits, which makes sense, but requires updating a bunch of tests. Then again so does the fact that it *usually* doesn't send anything to stderr, so that's usually disappearing. Conveniently `merge-tree` merges the conflict marker directly in the files / tree so we don't have to mess about moving them back out of the repository and into the working copy as I assume cherry-pick does, which means we don't have to try and commit them back in ether. That is a huge part of the gain over faffing about with the working copy. Fixes #847 [3-way merge]: https://jvns.ca/blog/2023/11/10/how-cherry-pick-and-revert-work/
2024-07-05 18:32:02 +07:00
# get the "git" commit object rather than the "github" commit resource
cc = conf.commit_tree(
tree=new_tree,
parents=[head],
message=str(self._make_fp_message(commit)),
author=map_author(commit['commit']['author']),
committer=map_committer(commit['commit']['committer']),
)
if cc.returncode:
raise CherrypickError(commit_sha, cc.stdout.decode(), cc.stderr.decode(), commits)
[CHG] forwardport: perform forward porting without working copies The goal is to reduce maintenance and odd disk interactions & concurrency issues, by not creating concurrent clones, not having to push forks back in the repository, etc... it also removes the need to cleanup "scratch" working copies though that looks not to have been an issue in a while. The work is done on isolated objects without using or mutating refs, so even concurrent work should not be a problem. This turns out to not be any more verbose (less so if anything) than using `cherry-pick`, as that is not really designed for scripted / non-interactive use, or for squashing commits thereafter. Working directly with trees and commits is quite a bit cleaner even without a ton of helpers. Much of the credit goes to Julia Evans for [their investigation of 3-way merges as the underpinnings of cherry-picking][3-way merge], this would have been a lot more difficult if I'd had to rediscover the merge-base trick independently. A few things have been changed by this: - The old trace/stderr from cherrypick has disappeared as it's generated by cherrypick, but for a non-interactive use it's kinda useless anyway so I probably should have looked into removing it earlier (I think the main use was investigation of the inflateinit issue). - Error on emptied commits has to be hand-rolled as `merge-tree` couldn't care less, this is not hard but is a bit annoying. - `merge-tree`'s conflict information only references raw commits, which makes sense, but requires updating a bunch of tests. Then again so does the fact that it *usually* doesn't send anything to stderr, so that's usually disappearing. Conveniently `merge-tree` merges the conflict marker directly in the files / tree so we don't have to mess about moving them back out of the repository and into the working copy as I assume cherry-pick does, which means we don't have to try and commit them back in ether. That is a huge part of the gain over faffing about with the working copy. Fixes #847 [3-way merge]: https://jvns.ca/blog/2023/11/10/how-cherry-pick-and-revert-work/
2024-07-05 18:32:02 +07:00
head = cc.stdout.strip()
logger.info('%s -> %s', commit_sha, head)
[CHG] forwardport: perform forward porting without working copies The goal is to reduce maintenance and odd disk interactions & concurrency issues, by not creating concurrent clones, not having to push forks back in the repository, etc... it also removes the need to cleanup "scratch" working copies though that looks not to have been an issue in a while. The work is done on isolated objects without using or mutating refs, so even concurrent work should not be a problem. This turns out to not be any more verbose (less so if anything) than using `cherry-pick`, as that is not really designed for scripted / non-interactive use, or for squashing commits thereafter. Working directly with trees and commits is quite a bit cleaner even without a ton of helpers. Much of the credit goes to Julia Evans for [their investigation of 3-way merges as the underpinnings of cherry-picking][3-way merge], this would have been a lot more difficult if I'd had to rediscover the merge-base trick independently. A few things have been changed by this: - The old trace/stderr from cherrypick has disappeared as it's generated by cherrypick, but for a non-interactive use it's kinda useless anyway so I probably should have looked into removing it earlier (I think the main use was investigation of the inflateinit issue). - Error on emptied commits has to be hand-rolled as `merge-tree` couldn't care less, this is not hard but is a bit annoying. - `merge-tree`'s conflict information only references raw commits, which makes sense, but requires updating a bunch of tests. Then again so does the fact that it *usually* doesn't send anything to stderr, so that's usually disappearing. Conveniently `merge-tree` merges the conflict marker directly in the files / tree so we don't have to mess about moving them back out of the repository and into the working copy as I assume cherry-pick does, which means we don't have to try and commit them back in ether. That is a huge part of the gain over faffing about with the working copy. Fixes #847 [3-way merge]: https://jvns.ca/blog/2023/11/10/how-cherry-pick-and-revert-work/
2024-07-05 18:32:02 +07:00
return head
def _make_fp_message(self, commit):
cmap = json.loads(self.commits_map)
msg = Message.from_message(commit['commit']['message'])
# write the *merged* commit as "original", not the PR's
msg.headers['x-original-commit'] = cmap.get(commit['sha'], commit['sha'])
# don't stringify so caller can still perform alterations
return msg
def _outstanding(self, cutoff: str) -> typing.ItemsView[PullRequests, list[PullRequests]]:
""" Returns "outstanding" (unmerged and unclosed) forward-ports whose
source was merged before ``cutoff`` (all of them if not provided).
:param cutoff: a datetime (ISO-8601 formatted)
:returns: an iterator of (source, forward_ports)
"""
return groupby(self.env['runbot_merge.pull_requests'].search([
# only FP PRs
('source_id', '!=', False),
# active
('state', 'not in', ['merged', 'closed']),
# not ready
('blocked', '!=', False),
('source_id.merge_date', '<', cutoff),
], order='source_id, id'), lambda p: p.source_id)
def _reminder(self):
[IMP] *: crons tests running for better triggered compatibility Mergebot / forwardport crons need to run in a specific ordering in order to flow into one another correctly. The default ordering being unspecified, it was not possible to use the normal cron runner (instead of the external driver running crons in sequence one at a time). This can be fixed by setting *sequences* on crons, as the cron runner (`_process_jobs`) will use that order to acquire and run crons. Also override `_process_jobs` however: the built-in cron runner fetches a static list of ready crons, then runs that. This is fine for normal situation where the cron runner runs in a loop anyway but it's any issue for the tests, as we expect that cron A can trigger cron B, and we want cron B to run *right now* even if it hadn't been triggered before cron A ran. We can replace `_process_job` with a cut down version which does that (cut down because we don't need most of the error handling / resilience, there's no concurrent workers, there's no module being installed, versions must match, ...). This allows e.g. the cron propagating commit statuses to trigger the staging cron, and both will run within the same `run_crons` session. Something I didn't touch is that `_process_jobs` internally creates completely new environments so there is no way to pass context into the cron jobs anymore (whereas it works for `method_direct_trigger`), this means the context values have to be shunted elsewhere for that purpose which is gross. But even though I'm replacing `_process_jobs`, this seems a bit too much of a change in cron execution semantics. So left it out. While at it tho, silence the spammy `py.warnings` stuff I can't do much about.
2024-07-30 14:21:05 +07:00
cutoff = getattr(builtins, 'forwardport_updated_before', None) \
or fields.Datetime.to_string(datetime.datetime.now() - DEFAULT_DELTA)
cutoff_dt = fields.Datetime.from_string(cutoff)
for source, prs in self._outstanding(cutoff):
backoff = dateutil.relativedelta.relativedelta(days=2**source.reminder_backoff_factor)
if source.merge_date > (cutoff_dt - backoff):
continue
source.reminder_backoff_factor += 1
# only keep the PRs which don't have an attached descendant
for pr in set(prs).difference(p.parent_id for p in prs):
self.env.ref('runbot_merge.forwardport.reminder')._send(
repository=pr.repository,
pull_request=pr.number,
token_field='fp_github_token',
format_args={'pr': pr, 'source': source},
)
[CHG] forwardport: perform forward porting without working copies The goal is to reduce maintenance and odd disk interactions & concurrency issues, by not creating concurrent clones, not having to push forks back in the repository, etc... it also removes the need to cleanup "scratch" working copies though that looks not to have been an issue in a while. The work is done on isolated objects without using or mutating refs, so even concurrent work should not be a problem. This turns out to not be any more verbose (less so if anything) than using `cherry-pick`, as that is not really designed for scripted / non-interactive use, or for squashing commits thereafter. Working directly with trees and commits is quite a bit cleaner even without a ton of helpers. Much of the credit goes to Julia Evans for [their investigation of 3-way merges as the underpinnings of cherry-picking][3-way merge], this would have been a lot more difficult if I'd had to rediscover the merge-base trick independently. A few things have been changed by this: - The old trace/stderr from cherrypick has disappeared as it's generated by cherrypick, but for a non-interactive use it's kinda useless anyway so I probably should have looked into removing it earlier (I think the main use was investigation of the inflateinit issue). - Error on emptied commits has to be hand-rolled as `merge-tree` couldn't care less, this is not hard but is a bit annoying. - `merge-tree`'s conflict information only references raw commits, which makes sense, but requires updating a bunch of tests. Then again so does the fact that it *usually* doesn't send anything to stderr, so that's usually disappearing. Conveniently `merge-tree` merges the conflict marker directly in the files / tree so we don't have to mess about moving them back out of the repository and into the working copy as I assume cherry-pick does, which means we don't have to try and commit them back in ether. That is a huge part of the gain over faffing about with the working copy. Fixes #847 [3-way merge]: https://jvns.ca/blog/2023/11/10/how-cherry-pick-and-revert-work/
2024-07-05 18:32:02 +07:00
map_author = operator.itemgetter('name', 'email', 'date')
map_committer = operator.itemgetter('name', 'email')
class Stagings(models.Model):
_inherit = 'runbot_merge.stagings'
def write(self, vals):
r = super().write(vals)
# we've just deactivated a successful staging (so it got ~merged)
if vals.get('active') is False and self.state == 'success':
[CHG] *: move forward-porting over to batches Thank god I have a bunch of tests because once again I forgot / missed a bunch of edge cases in doing the conversion, which the tests caught (sadly that means I almost certainly broke a few untested edge cases). Important notes: Handling of parent links ------------------------ Unlike PRs, batches don't lose their parent info ever, the link is permanent, which is convenient to trawl through a forward port (currently implemented very inefficiently, maybe we'll optimise that in the future). However this means the batch having a parent and the batch's PRs having parents are slightly different informations, one of the edge cases I missed is that of conflicting PRs, which are deparented and have to be merged by hand before being forward ported further, I had originally replaced the checks on a pr and its sibling having parents by just the batch. Batches & targets ----------------- Batches were originally concepted as being fixed to a target and PRs having that target, a PR being retargeted would move it from one batch to an other. As it turns out this does not work in the case where people retarget forward-port PRs, which I know they do because #551 (2337bd85186de000d074e5a7178cfeb110d15de7). I could not think of a good way to handle this issue as is, so scrapped the moving PRs thing, instead one of the coherence checks of a batch being ready is that all its PRs have the same target, and a batch only has a target if all its PRs have the same target. It's possible for somewhat odd effects to arise, notably if a PR is closed (removed from batch), the other PRs are retargeted, and the new PR is reopened, it will now be on a separate batch even if it also gets retargeted. This is weird. I don't quite know how I should handle it, maybe batches could merge if they have the same target and label? however batches don't currently have a label so... Improve limits -------------- Keep limits on the PRs rather than lift them on the batchL if we can add/remove PRs of batches having different limits on different PRs of the same batch is reasonable. Also leave limit unset by default: previously, the limit was eagerly set to the tip (accessible) branch. That doesn't really seem necessary, so stop doing that. Also remove completely unnecessary `max` when trying to find a PR's next target: `root` is either `self` or `self.source_id`, so it should not be possible for that to have a later target. And for now ensure the limits are consistent per batch: a PR defaults to the limit of their batch-mate if they don't have one, and if a limit is set via command it's set on all PRs of a batch. This commit does not allow differential limits via commands, they are allowed via the backend but not really tested. The issue is mostly that it's not clear what the UX should look like to have clear and not super error prone interactions. So punt on it for now, and hopefully there's no hole I missed which will create inconsistent batches.
2024-05-21 20:45:53 +07:00
# check all batches to see if they should be forward ported
for b in self.with_context(active_test=False).batch_ids:
if b.fw_policy == 'no':
continue
# if all PRs of a batch have parents they're part of an FP
# sequence and thus handled separately, otherwise they're
# considered regular merges
if not all(p.parent_id for p in b.prs):
self.env['forwardport.batches'].create({
'batch_id': b.id,
'source': 'merge',
})
return r
class Feedback(models.Model):
_inherit = 'runbot_merge.pull_requests.feedback'
token_field = fields.Selection(selection_add=[('fp_github_token', 'Forwardport Bot')])
class CherrypickError(Exception):
...
class ForwardPortError(Exception):
pass
def _clean_rename(s):
""" Filters out the "inexact rename detection" spam of cherry-pick: it's
useless but there seems to be no good way to silence these messages.
"""
return '\n'.join(
l for l in s.splitlines()
if not l.startswith('Performing inexact rename detection')
)
class HallOfShame(typing.NamedTuple):
reviewers: list
outstanding: list
class Outstanding(typing.NamedTuple):
source: object
prs: object