[IMP] forwardport: gc/maintenance of local repo caches

The current system makes / lets GC run during fetching. This has a few
issues:

- the autogc consumes resources during the forward-porting
  process (not that it's hugely urgent but it seems unnecessary)
- the autogc commonly fails due to the combination of large repository
  (odoo/odoo) and low memory limits (hardmem for odoo, which get
  translated into soft ulimits)

As a result, the garbage collection of the repository sometimes stops
entirely, leading to an increase in repository size and a decrease in
performances.

To mitigate this issue, disable the automagic gc and maintenance
during normal operation, and instead add a weekly cron which runs an
aggressive GC with memory limits disabled (as far as they can get, if
the limits are imposed externally there's nothing to be done).

The maintenance is implemented using a full lockout of the
forward-port cron and an in-place GC rather than a copy/gc/swap, as
doing this maintenance at the small hours of the week-end (sat-sun
night) seems like a non-issue: currently an aggressive GC of odoo/odoo
(using the default aggressive options) takes a total of 2:30 wallclock
(5h user) on a fairly elderly machine (it's closer to 20mn wallclock
and 2h user on my local machine, also turns out the cache repos are
kinda badly configured leading to ~30% more objects than necessary
which doesn't help).

For the record, a fresh checkout of odoo/odoo right now yields:

    | Overall repository size      |           |
    | * Commits                    |           |
    |   * Count                    |   199 k   |
    |   * Total size               |   102 MiB |
    | * Trees                      |           |
    |   * Count                    |  1.60 M   |
    |   * Total size               |  2.67 GiB |
    |   * Total tree entries       |  74.1 M   |
    | * Blobs                      |           |
    |   * Count                    |  1.69 M   |
    |   * Total size               |  72.4 GiB |

If this still proves insufficient, a further option would be to deploy
a "generational repacking" strategy:
https://gitlab.com/gitlab-org/gitaly/-/issues/2861 (though apparently
it's not yet been implemented / deployed on gitlab so...).

But for now we'll see how it shakes out.

Close #489
This commit is contained in:
Xavier Morel 2022-11-07 09:53:11 +01:00
parent 985aaa5798
commit c35b721f0e
3 changed files with 62 additions and 1 deletions

View File

@ -42,4 +42,17 @@
<field name="numbercall">-1</field>
<field name="doall" eval="False"/>
</record>
<record model="ir.cron" id="maintenance">
<field name="name">Maintenance of repo cache</field>
<field name="model_id" ref="model_forwardport_maintenance"/>
<field name="state">code</field>
<field name="code">model._run()</field>
<!-- run sunday morning as it can take a while, unlikely someone will need to forward-port stuff at that point -->
<field name="nextcall" eval="datetime.utcnow() + relativedelta(weekday=6, hour=2, minute=0, second=0, microsecond=0)"/>
<field name="interval_number">1</field>
<field name="interval_type">weeks</field>
<field name="numbercall">-1</field>
<field name="doall" eval="False"/>
</record>
</odoo>

View File

@ -1,5 +1,8 @@
# -*- coding: utf-8 -*-
import logging
import pathlib
import resource
import subprocess
import uuid
from contextlib import ExitStack
from datetime import datetime, timedelta
@ -8,6 +11,7 @@ from dateutil import relativedelta
from odoo import fields, models
from odoo.addons.runbot_merge.github import GH
from odoo.tools.appdirs import user_cache_dir
# how long a merged PR survives
MERGE_AGE = relativedelta.relativedelta(weeks=2)
@ -266,3 +270,46 @@ class DeleteBranches(models.Model, Queue):
r.json()
)
_deleter.info('✔ deleted branch %s of PR %s', self.pr_id.label, self.pr_id.display_name)
_gc = _logger.getChild('maintenance')
def _bypass_limits():
"""Allow git to go beyond the limits set for Odoo.
On large repositories, git gc can take a *lot* of memory (especially with
`--aggressive`), if the Odoo limits are too low this can prevent the gc
from running, leading to a lack of packing and a massive amount of cruft
accumulating in the working copy.
"""
resource.setrlimit(resource.RLIMIT_AS, (resource.RLIM_INFINITY, resource.RLIM_INFINITY))
class GC(models.TransientModel):
_name = 'forwardport.maintenance'
_description = "Weekly maintenance of... cache repos?"
def _run(self):
# lock out the forward port cron to avoid concurrency issues while we're
# GC-ing it: wait until it's available, then SELECT FOR UPDATE it,
# which should prevent cron workers from running it
fp_cron = self.env.ref('forwardport.port_forward')
self.env.cr.execute("""
SELECT 1 FROM ir_cron
WHERE id = %s
FOR UPDATE
""", [fp_cron.id])
repos_dir = pathlib.Path(user_cache_dir('forwardport'))
# run on all repos with a forwardport target (~ forwardport enabled)
for repo in self.env['runbot_merge.repository'].search([('fp_remote_target', '!=', False)]):
repo_dir = repos_dir / repo.name
if not repo_dir.is_dir():
continue
_gc.info('Running maintenance on %s', repo.name)
r = subprocess.run(
['git', '--git-dir', repo_dir, 'gc', '--aggressive', '--prune=now'],
stdout=subprocess.PIPE, stderr=subprocess.STDOUT,
encoding='utf-8',
preexec_fn = _bypass_limits,
)
if r.returncode:
_gc.warning("Maintenance failure (status=%d):\n%s", r.returncode, r.stdout)

View File

@ -1152,6 +1152,7 @@ class Feedback(models.Model):
token_field = fields.Selection(selection_add=[('fp_github_token', 'Forwardport Bot')])
ALWAYS = ('gc.auto=0', 'maintenance.auto=0')
def git(directory): return Repo(directory, check=True)
class Repo:
def __init__(self, directory, **config):
@ -1167,7 +1168,7 @@ class Repo:
def _run(self, *args, **kwargs):
opts = {**self._config, **kwargs}
args = ('git', '-C', self._directory)\
+ tuple(itertools.chain.from_iterable(('-c', p) for p in self._params))\
+ tuple(itertools.chain.from_iterable(('-c', p) for p in self._params + ALWAYS))\
+ args
try:
return self._opener(args, **opts)