Commit Graph

1680 Commits

Author SHA1 Message Date
Xavier Morel
994cea467c [FIX] runbot_merge: typo in freeze wizard
Forgot to deref the id of the staging we're trying to lock, so the
specific case where we start a freeze with a bump PR and an
outstanding staging in master would instantly blow up.
2024-01-16 07:54:43 +01:00
Xavier Morel
b21fbaf9cc [IMP] runbot_merge: prevent merging empty commits
The low-level APIs used by the staging process don't do any merge
check, so because of the way Git works it's possible for them to merge
commits with content as empty commits, e.g. if something was merged
then backported and the backport was merged on top. This should
trigger a merge failure as we don't really want to merge newly
empty. This is a feature which some high level commands of git
support, kind-of, e.g. by default `git rebase --interactive` will ask
about newly empty commits.

Take care to allow merging already-empty commits, as these do have a
use for signaling, freezes, ....

Fixes #809
2023-11-30 12:45:39 +01:00
Xavier Morel
2cd3fb8999 [IMP] runbot_merge: make uniquifier commit optional
Prepares the possibility of either more direct communication with the
CI platform(s) or just assuming CI has gotten reliable enough and
colleagues intelligent enough that this is not an issue anymore
because they've stopped pushing empty branches (which we know is not
the case).

Fixes #806
2023-11-30 12:45:39 +01:00
Xavier Morel
a15086a8a9 [FIX] runbot_merge: "not something we can merge" freeze error
During the 17.0 freezeathon, the freeze wizard blew up with

    MergeError: merge-tree: {oid} - not something we can merge

Turns out when freezes were moved to local
(4d2c0f86e1) I forgot to fetch the heads
of the release and bump PRs into the local repo, so rebasing them atop
their branch would fail because the local repository would just not
find the object being rebased.

I had missed that case in testing as well, but in fairness even if I
had tried testing it I'd likely have missed it: implementation
limitations (shortcuts) of dummy central mean it currently ignores
what objects the client requests and bundles everything it can find
associated with the repository (meaning it sends the entire network).

This is not usually an issue because the test repos are pretty small,
but it means the client can have objects they should not because they
never requested them and might not even be supposed to be aware of
their existence.

Anyway solve by doing the obvious: fetch the heads of the release and
bump PRs at the same time we update the branch being forked off. Also
update the freeze tests to trigger the issue (by creating the release
/ bump PRs in different repos) and running the tests against github
actual to make sure we can actually see them fail (correctly, the
merge error we expect) not via errors in the test), and we do fix
them.

Fixes #821
2023-11-30 12:45:39 +01:00
Xavier Morel
134ce03053 [FIX] forwardport: fix sourcing on reparenting
When reparenting a commit (mostly when inserting a new forwardport in
an existing chain after a freeze / branch insertion), the new source
should be the source of the new parent (which is likely a not-change
of the source).

This was miscomputed to the root of the new parent, which often
matches but breaks if there was a conflict or a mid-port update,
leading to inconsistent presentation. Nothing critical, just somewhat
annoying.
2023-11-30 12:45:39 +01:00
Xavier Morel
f44b0c018e [IMP] forwardport: allow updating the fw limit after merging the source
Currently, once a source PR has been merged it's not possible to set
or update a limit, which can be inconvenient (e.g. people might have
forgotten to set it, or realise after the fact that setting one is not
useful, or might realise after the fact that they should *unset* it).

This PR relaxes that constraint (which is not just a relaxation as it
requires a bunch of additional work and validation), it should now be
possible to:

- update the limit to any target, as long as that target is at or
  above the current forwardport tip's
  - with the exception of a tip being forward ported, as that can't be
    cancelled
- resume a forward port stopped by a previously set limit (just
  increase it to whatever)
- set a limit from any forward-port PR
- set a limit from a source, post-merge

The process should also provide pretty chatty feedback:

- feedback on the source and closest root
- feedback about adjustments if any (e.g. setting the limit to B but
  there's already a forward port to C, the 'bot should set the limit
  to C and tell you that it happened and why)

Fixes #506
2023-10-06 13:19:01 +02:00
Xavier Morel
ea2857ec31 [IMP] forwardport: parametrize main fw limits
Probably a bit more expensive, but makes the code shorter and more
straightforward.
2023-08-31 12:31:59 +02:00
Xavier Morel
76f4ed3bf6 [ADD] runbot_merge: delete scratch branches when a branch is disabled
If a branch `foo` is disabled, then `tmp.foo` and `staging.foo` become
unnecessary (with #247 fixed the tmp refs are not used for creating
stagings anymore, but for now they're still used for the "safety
dance" of merging a successful staging into the corresponding
mainline).

Fixes #605
2023-08-31 09:07:01 +02:00
Xavier Morel
2fb26f10fb [IMP] *: make dummy saas_worker module installable
And install it. And add a hook to trigger "ready crons" (including
triggered crons).

Rather convenient to install test helpers inside the SUT.
2023-08-31 08:58:25 +02:00
Xavier Morel
2c177c83f6 [IMP] forwardport: check fwbot approval for a conflicting PR
A conflicting forward port should not be approvable via fwbot, it
should be merged like a novel PR (via the mergebot).
2023-08-30 12:24:05 +02:00
Xavier Morel
65ed7c51bc [IMP] *: note to merge using mergebot in conflict message
The message has a lot of info, but left the merging bit
unwritten. Correct this issue.

Fixes #765
2023-08-30 12:10:46 +02:00
Xavier Morel
69f5cac2d7 [FIX] runbot_merge: support non-ascii secrets & sha256 signatures
Per the Github webhook documentation:

1. sha1 signatures are deprecated, github recommends sha256 (though
   that's unlikely to be a concern anyway), and dummy-central supports
   both so it should be no issue.

   > If possible, we recommend that you use the x-hub-signature-256
   > header for improved security.

2. Non-ascii secrets are supported and should be utf8-encoded to
   compute signatures... that's not actually documented as github docs
   only mention payload encoding but it seems to make sense anyway.

Also improve the warning message by replacing the signature (which is
useless) by the delivery id (which could allow introspecing the hook
or something).
2023-08-30 11:43:13 +02:00
Xavier Morel
302fd42cae [ADD] forwardport: message on parent of detached PR
Currently a user is not notified that the parent of a detached PR
needs to be independently approved and may miss that information. Add
a notification to *that* PR as well.

Fixes #788
2023-08-29 15:59:05 +02:00
Xavier Morel
73e4ac6066 [REM] runbot_merge: check_visibility
Its sole use was removed with the switch to local staging, but I
missed removing it.

Closes #625 as there is no need to update it to v2 smart protocol.
2023-08-29 13:26:12 +02:00
Xavier Morel
b0b609ebe7 [CHG] runbot_merge: perform stagings in a local clone of the repo
The github API has gotten a lot more constraining (with rate
restrictions being newly enforced or added somewhat out of nowhere),
and as importantly a lot less reliable. So move the staging process
off of github and locally, similar to the forward porting
process (whose repo cache is being reused for this).

Fixes #247
2023-08-25 15:33:25 +02:00
Xavier Morel
136bb7d9fc [IMP] forwardport: when fetching identity avoid emails if possible
If the primary email is made public, it is returned directly as part
of the /users endpoint, in which case we don't need to fetch it via
/user/emails.

Also improve error messages, and fix the incorrect checks on the
existence of the github name and email. And allow manually updating
both via the project form.
2023-08-25 15:31:06 +02:00
Xavier Morel
f0344fd34a [ADD] runbot_merge: link back from commit to PR 2023-08-25 15:31:06 +02:00
Xavier Morel
4d2c0f86e1 [CHG] runbot_merge: convert freeze wizard to local repo
Probably less necessary than for the regular staging stuff, but might
as well while at it.

Requires updating one of the test to generate a non-ff push, as
O_CREAT doesn't exist at the git level, and the client (and it is
client-side) only protects against force pushes. So there is no way to
trigger an issue with just the creation of the new branch, it needs to
exist *and point to a non-ancestor commit*.

Also remove a sleep in the ref update loop as there are no ref updates
anymore, until the very final sync via git.

NB: maybe it'd be possible to push both bump and release PRs together
for each repo, but getting which update failed in case of failure
seems difficult.
2023-08-25 15:06:04 +02:00
Xavier Morel
85a7890023 [CHG] runbot_merge: switch staging from github API to local
It has been a consideration for a while, but the pain of subtly
interacting with git via the ignominous CLI kept it back. Then ~~the
fire nation attacked~~ github got more and more tight-fisted (and in
some ways less reliable) with their API.

Staging pretty much just interacts with the git database, so it's both
a facultative github operator (it can just interact with git directly)
and a big consumer of API requests (because the git database endpoints
are very low level so it takes quite a bit of work to do anything
especially when high-level operations like rebase have to be
replicated by hand).

Furthermore, an issue has also been noticed which can be attributed to
using the github API (and that API's reliability getting worse): in
some cases github will fail to propagate a ref update / reset, so when
staging 2 PRs it's possible that the second one is merged on top of
the temporary branch of the first one, yielding a kinda broken commit
(in that it's a merge commit with a broken error message) instead of
the rebase / squash commit we expected.

As it turns out it's a very old issue but only happened very early so
was misattributed and not (sufficiently) guarded against:

- 41bd82244bb976bbd4d4be5e7bd792417c7dae6b (October 8th 2018) was
  spotted but thought to be a mergebot issue (might have been one of
  the opportunities where ref-checks were added though I can't find
  any reference to the commit in the runbot repo).
- 2be25052e147b151d1d8a5bc73cceb351586ce03 (October 15th, 2019) was
  missed (or ignored).
- 5a9fe7a7d05a9df7186072a7bffd60c6b428fd0e (July 31st, 2023) was
  spotted, but happened at a moment where everything kinda broke
  because of github rate-limiting ref updates, so the forensics were
  difficult and it was attributed to rate limiting issues.
- f10d03bf0f2e8f88f62a5d8356b84f714196130f (August 24th, 2023) broke
  the camel's back (and the head block): the logs were not too
  interspersed with other garbage and pretty clear that github ack'd a
  ref update, returned the correct oid when checking the ref, then
  returned the wrong oid when fetching it later on.

No Working Copy
===============

The working copy turns out to not be necessary, the plumbing commands
we *need* work just fine on a bare repository.

Working without a WC means we had to reimplement the high level
operations (rebase) by hand much as we'd done previously, *but* we
needed to do that anyway as git doesn't seem to provide any way to
retrieve the mapping when rebasing/cherrypicking, and cherrypicking by
commit doesn't work well as it can't really find the *merge base* it
needs.

Forward-porting can almost certainly be implemented similarly (with
some overhead), issue #803 has been opened to keep track of the idea.

No TMP
======

The `tmp.` branches are no more, the process of creating stagings is
based entirely around oids, if staging something fails we can just
abandon the oids (they'll be collected by the weekly GC), we only
need to update the staging branches at the very end of the process.

This simplifies things a fair bit.

For now we have stopped checking for visibility / backoff as we're
pushing via git, hopefully it is a more reliable reference than the
API.

Commmit Message Formatting
==========================

There's some unfortunate churn in the test, as the handling of
trailing newlines differs between github's APIs and git itself.

Fixes #247

PS: It might be a good idea to use pygit2 instead of the CLI
    eventually, the library is typed which is nice, and it avoids
    shelling out although that's really unlikely to be a major cost.
2023-08-25 15:06:04 +02:00
Xavier Morel
2fbbe3fcdb [ADD] runbot_merge: github identity for the mergebot
Necessary to create commits *as* the mergebot without going through
the github API. Copy of the improved version from forwardport. *Not*
an override, to avoid unnecessarily triggering one or the other which
is confusing and weird.
2023-08-25 15:04:48 +02:00
Xavier Morel
86a1b5523e [MOV] runbot_merge: all the staging creation code to a separate module
Move *almost* all the staging code to free functions, in a separate
module, and extensively typed.

The only bits which didn't move are:

- the entry point (the cron hook), because it has to be a model method
  in order to be called
- the `_build_merge_message` method, because it needs to be
  overridable

There's also a bit of an import mess, because the cron &
`_build_merge_message` need to call into the new module, but the new
module wants the types they belong to, so it's a bit circular.
2023-08-25 15:04:48 +02:00
Xavier Morel
9de18de454 [CHG] *: move repo cache from forwardbot to mergebot
If the stagings are going to be created locally (via a git working
copy rather than the github API), the mergebot part needs to have
access to the cache, so move the cache over. Also move the maintenance
cron.

In an extermely minor way, this prefigures the (hopeful) eventual
merging of the ~~planes~~ modules.
2023-08-25 15:04:48 +02:00
Xavier Morel
7bca6f0bd7 [ADD] runbot_merge: allow resolving commits by sha
`_rec_name = 'sha'` means name_search and cross-model searches will
work much better.

Relates to #802
2023-08-25 11:01:46 +02:00
Xavier Morel
0826b3484b [ADD] runbot_merge: view improvements
- add formatting for a bunch of backend objects
- add cross-links in order to use toplevel navigation between objects
  e.g. project -> branch -> staging -> PR with breadcrumbs instead of
  shitty dialog boxes

Relates to #802
2023-08-25 11:01:38 +02:00
Xavier Morel
9b5bb338b4 [REM] runbot_merge: status compatibility functions
When I updated the status storage (including `previous_failure`) for
some reason I didn't just migrate from the old to the new format, and
added bridge functions instead.

This is not really necessary (or useful), so convert all the legacy
data and remove the conversion helpers.

Relates to #802
2023-08-24 10:47:16 +02:00
Xavier Morel
90961b99c9 [ADD] *: changelog entries I forgot
Can't hurt to *have* them.
2023-08-14 09:28:19 +02:00
Xavier Morel
62fb880a45 [FIX] forwardport: sync outstandings notification with page
In 81ce4ea02b the delta for PRs being
listed in the `/forwardport/outstanding` page was increased from 3
days to 7 (1 week), however the warning box in the home page still
used the old cutoffs leading to

An inconsistency between the two and an effective severe overcounting,
as the reason why the cutoff was increased to a week is forward ports
can take a while or the author / reviewer can be a touch busy at end
of week, so 3~4 days are routine when a PR is merged on thursday or
friday (and even worse if there's bank holidays in the mix).
2023-08-14 08:00:56 +02:00
Xavier Morel
7348e4d7a4 [IMP] runbot_merge: ensure at least 1s between mutating GH calls
Mostly a temporary safety feature after the events of 07-31: it's
still not clear whether that was a one-off issue or a change in
policy (I was not able to reproduce locally even doing several
set_refs a second) and the gh support is not super talkative, but it
probably doesn't hurt to commit the workaround until #247 gets
implemented.

On 2023-07-31, around 08:30 UTC, `set_ref` started failing, a lot
(although oddly enough not continuously), with the unhelpful message
that

> 422: Reference cannot be updated

This basically broke all stagings, until a workaround was implemented
by adding a 1s sleep before `set_ref` to ensure no more than 1
`set_ref` per second, which kinda sorta has been the github
recommendation forever but had never been an issue
before. Contributing to this suspicion is that in late 2022, the
documentation of error 422 on `PATCH git/refs/{ref}` was updated to:

> Validation failed, or the endpoint has been spammed.

Still would be nice if GH was clear about it and sent a 429 instead.

Technically the recommendation is:

> If you're making a large number of POST, PATCH, PUT, or DELETE
> requests for a single user or client ID, wait at least one second
> between each request.

So... actually implement that. On a per-worker basis as for the most
part these are serial processes (e.g. crons), we can still get above
the rate limit due to concurrent crons but it should be less likely.

Also take `Retry-After` in account, can't hurt, though we're supposed
to retry just the request rather than abort the entire thing. Maybe a
future update can improve this handling.

Would also be nice to take `X-RateLimit` in account, although that's
supposed to apply to *all* requests so we'd need a second separate
timestamp to track it. Technically that's probably also the case for
`Retry-After`. And fixing #247 should cut down drastically on the API
calls traffic as staging is a very API-intensive process, especially
with the sanity checks we had to add, these days we might be at 4
calls per commit per PR, and up to 80 PRs/staging (5 repositories and
16 batches per staging), with 13 live branches (though realistically
only 6-7 have significant traffic, and only 1~2 get close to filling
their staging slots).
2023-08-11 12:32:21 +02:00
Xavier Morel
85a74a9e32 [ADD] runbot_merge: staging query endpoints
`/runbot_merge/stagings`
========================

This endpoint is a reverse lookup from any number of commits to a
(number of) staging(s):

- it takes a list of commit hashes as either the `commits` or the
  `heads` keyword parameter
- it then returns the stagings which have *all* these commits as
  respectively commits or heads, if providing all commits for a
  project the result should always be unique (if any)
- `commits` are the merged commits, aka the stuff which ends up in the
  actual branches
- `heads` are the staging heads, aka the commits at the tip of the
  `staging.$name` branches, those may be the same as the corresponding
  commit, or might be deduplicator commits which get discarded on
  success

`/runbot_merge/stagings/:id`
============================

Returns a list of all PRs in the staging, grouped by batch (aka PRs
which have the same label and must be merged together).

For each PR, the `repository` name, `number`, and `name` in the form
`$repository#$number` get returned.

`/runbot_merge/stagings/:id1/:id2`
==================================

Returns a list of all the *successfully merged* stagings between `id1`
and `id2`, from oldest to most recent. Individual records have the
form:

- `staging` is the id of the staging
- `prs` is the contents of the previous endpoint (a list of PRs
  grouped by batch)

`id1` *must* be lower than `id2`.

By default, this endpoint is inclusive on both ends, the
`include_from` and / or `include_to` parameters can be passed with the
`False` value to exclude the corresponding bound from the result.

Related to #768
2023-08-11 11:13:34 +02:00
Xavier Morel
4eefc980bb [IMP] runbot_merge: logger messages 2023-08-10 16:14:33 +02:00
Xavier Morel
e9f7252ed1 [FIX] runbot_merge: sentry issue via monkeypatch
`auto_session_tracking` causes issues when not specified on the super
old version of the client which is available on ubuntu.

Also disable tracing as it seems less useful than hoped for, and I've
not been using what's been collected so far.
2023-08-10 15:27:20 +02:00
Xavier Morel
b1af2e573a [IMP] runbot_merge: split staging heads out to join tables
Currently the heads of a staging (both staging heads and merged heads)
are just JSON data on the staging itself. Historically this was
convenient as the heads were mostly of use to the staging process, and
thus accessed directly through the staging essentially exclusively.

However this makes finding stagings from merged commits e.g. for
forensic research almost impossible, because querying based on
the *values* of a JSON map is expensive, and indexing it is difficult.

To make this use case more feasible, split the `heads` field into two
join tables, one for the staging heads and one for the merged heads,
this makes looking for stagings by commits much more
efficient (although the queries may not be trivial). Also add two
utility RPC methods, so that it's possible to query stagings
reasonably easily and efficiently based on a set of commits (branch
heads).

related to #768
2023-08-10 14:04:59 +02:00
Xavier Morel
cdffa83191 [IMP] runbot_merge, forwardport: minor cleanups
Remove unused imports, unnecessary f-strings, dead code, fix
less-than-ideal operators.
2023-08-10 13:33:16 +02:00
Xavier Morel
a692163f6e [IMP] runbot_merge: add quick jump from stagings to PRs
In the backend, the intermediate jump through batches is really not
convenient (even if we kinda have to jump through batches *anyway*).

Fixes #751
2023-07-10 15:23:31 +02:00
Xavier Morel
780e20bfd6 [IMP] runbot_merge: filtering options and UX on stagings list
Allow filtering stagings by state (success or failure), and provide a
control to explicitly update the staging date limit.

Should make it easier to drill through stagings when looking for
specific information.

Related to #751
2023-07-10 15:23:31 +02:00
Xavier Morel
5bce73c97d [IMP] *: optimise loading of home page
Fix outstanding query to make a positive `state` filtering, instead of
negative, matching 3b52b1aace8674259812a76b1566260937dbcacb.

Also manually create a map of stagings (grouped by branch) sharing a
single prefetch set.

For odoo the mergebot home page has 12 branches in the odoo project
and 8 in spreadsheet, 6 stagings each. This means 120 queries to
retrieve all the heads (Odoo stagings have 5 heads and spreadsheet
have 1, but that seems immaterial).

By fixing `_compute_statuses` and creating a single prefetch set for
all stagings of all branches we can fetch all the commits in a single
query instead of 120.
2023-07-10 15:23:31 +02:00
Xavier Morel
81ce4ea02b [IMP] rewrite /forwardport/outstanding
- add support for authorship (not just approval)
- make display counts directly
- fix `state` filter: postgres can't do negative index lookups
- add indexes for author and reviewed_by as we look them up
- ensure we handle the entire source filtering via a single subquery

Closes #778
2023-07-10 15:23:31 +02:00
Xavier Morel
aefbdaf974 [IMP] *: client side sorted implementation
Allow sorting by a callback. Sort remains client-side.
2023-07-10 15:23:31 +02:00
Xavier Morel
d748d4b215 [IMP] *: create a single template db per module to test
Before this, when testing in parallel (using xdist) each worker would
create its own template database (per module, so 2) then would copy
the database for each test.

This is pretty inefficient as the init of a db is quite expensive in
CPU, and when increasing the number of workers (as the test suite is
rather IO bound) this would trigger a stampede as every worker would
try to create a template at the start of the test suite, leading to
extremely high loads and degraded host performances (e.g. 16 workers
would cause a load of 20 on a 4 cores 8 thread machine, which makes
its use difficult).

Instead we can have a lockfile at a known location of the filesystem,
the first worker to need a template for a module install locks it,
creates the templates, then writes the template's name to the
lockfile.

Every other worker can then lock the lockfile and read the name out,
using the db for duplication.

Note: needs to use `os.open` because the modes of `open` apparently
can't express "open at offset 0 for reading or create for writing",
`r+` refuses to create the file, `w+` still truncates, and `a+` is
undocumented and might not allow seeking back to the start on all
systems so better avoid it.

The implementation would be simplified by using `lockfile` but that's
an additional dependency plus it's deprecated. It recommends
`fasteners` but that seems to suck (not clear if storing stuff in the
lockfile is supported, it opens the lockfile in append mode). Here the
lockfiles are sufficient to do the entire thing.

Conveniently, this turns out to improve *both* walltime CPU time
compared to the original version, likely because while workers now
have to wait on whoever is creating the template they're not competing
for resources with it.
2023-07-10 15:23:31 +02:00
Xavier-Do
f22ae357e3 [DEL] runbot: 15.0 eol 2023-06-22 15:59:50 +02:00
Xavier-Do
b72fdff01d [DEL] runbot: 15.0 eol 2023-06-22 15:59:36 +02:00
Xavier Morel
72281b0c63 [IMP] runbot_merge: allow running test suite without an explicit addons path 2023-06-22 14:38:10 +02:00
Xavier-Do
fc41e09f44 [IMP] runbot: add useful indexes remaining from upgrade 2023-06-22 11:23:43 +02:00
Xavier Morel
765281a665 [FIX] runbot_merge: make provisioning more resilient
A few cases of conflict were missing from the provisioning
handler.

They can't really be auto-fixed, so just output a warning and ignore
the entry, that way the rest of the provisioning succeeds.
2023-06-21 14:26:19 +02:00
Xavier Morel
9260384284 [FIX] runbot_merge: concurrency error in freeze wizard (hopefully)
During the 16.3 freeze an issue was noticed with the concurrency
safety of the freeze wizard (because it blew up, which caused a few
issues): it is possible for the cancelling of an active staging to the
master branch to fail, which causes the mergebot side of the freeze to
fail, but the github state is completed, which puts the entire thing
in a less than ideal state.

Especially with the additional issue that the branch inserter has its
own concurrency issue (which maybe I should fix): if there are
branches *being* forward-ported across the new branch, it's unable to
see them, and thus can not create the now-missing PRs.

Try to make the freeze wizard more resilient:

1. Take a lock on the master staging (if any) early on, this means if
   we can acquire it we should be able to cancel it, and it won't
   suffer a concurrency error.
2. Add the `process_updated_commits` cron to the set of locked crons,
   trying to read the log timeline it looks like the issue was commits
   being impacted on that staging while the action had started:
   REPEATABLE READ meant the freeze's transaction was unable to see
   the update from the commit statuses, therefore creating a diverging
   update when it cancelled the staging, which postgres then reported
   as a serialization error.

I'd like to relax the locking of the cron (to just FOR SHARE), but I
think it would work, per postgres:

> SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as
> SELECT in terms of searching for target rows: they will only find
> target rows that were committed as of the transaction start
> time. However, such a target row might have already been updated (or
> deleted or locked) by another concurrent transaction by the time it
> is found. In this case, the repeatable read transaction will wait
> for the first updating transaction to commit or roll back (if it is
> still in progress). If the first updater rolls back, then its
> effects are negated and the repeatable read transaction can proceed
> with updating the originally found row. But if the first updater
> commits (and actually updated or deleted the row, not just locked
> it) then the repeatable read transaction will be rolled back with
> the message

This means it would be possible to lock the cron, and then get a
transaction error because the cron modified one of the records we're
going to hit while it was running: as far as the above is concerned
the cron's worker had "just locked" the row so it's fine to
continue. However this makes it more and more likely an error will be
hit when trying to freeze (to no issue, but still). We'll have to see
how that ends up.

Fixes #766 maybe
2023-06-21 14:26:19 +02:00
Xavier Morel
ed0fd88854 [ADD] runbot_merge: sentry instrumentation
Currently sentry is only hooked from the outside, which doesn't
necessarily provide sufficiently actionable information.

Add some a few hooks to (try and) report odoo / mergebot metadata:

- add the user to WSGI transactions
- add a transaction (with users) around crons
- add the webhook event info to webhook requests
- add a few spans to the long-running crons, when they cover multiple
  units per iteration (e.g. a span per branch being staged)

Closes #544
2023-06-21 14:26:19 +02:00
Xavier Morel
06a3a1bab5 [IMP] runbot_merge: add sentry filtering, rework some error messages
- move sentry configuration and add exception-based filtering
- clarify and reclassify (e.g. from warning to info) a few messages
- convert assertions in rebase to MergeError so they can be correctly
  logged & reported, and ignored by sentry, also clarify them
  (especially the consistency one)

Related to #544
2023-06-15 08:21:20 +02:00
Xavier Morel
cd4ded899b [IMP] runbot_merge: error reporting
Largely informed by sentry,

- Fix an inconsistency in staging ref verification, `set_ref`
  internally waits for the observed head to match the requested head,
  but then `try_staging` would re-check that and immediately fail if
  it didn't.

  Because github is *eventually* consistent (hopefully) this second
  check can fail (and is also an extra API call), breaking staging
  unnecessarily, especially as we're still going to wait for the
  update to be visible to git.

  Remove this redundant check entirely, as github provides no way to
  ensure we have a consistent view of anything, it doesn't have much
  value and can do much harm.
- Add github request id to one of the sanity check warnings as that
  could be a useful thing to send upstream, missing github request ids
  in the future should be noted and added.
- Reworked the GH object's calls to be clearer and more coherent:
  consistently log the same thing on all GH errors (if `check`),
  rather than just on the one without a `check` entry.

  Also remove `raise_for_status` and raise `HTTPError` by hand every
  time we hit a status >= 400, so we always forward the response body
  no matter what its type is.
- Try again to log the request body (in full as it should be pretty
  small), also remove stripping since we specifically wanted to add a
  newline at the start, I've no idea what I was thinking.

Fixes #735, #764, #544
2023-06-14 16:01:45 +02:00
Xavier Morel
270dfdd495 [REF] *: move most feedback messages to pseudo-templates
Current system makes it hard to iterate feedback messages and make
them clearer, this should improve things a touch.

Use a bespoke model to avoid concerns with qweb rendering
complexity (we just want GFM output and should not need logic).

Also update fwbot test setup to always configure an fwbot name, in
order to avoid ping messages closing the PRs they're talking
about, that took a while to debug, and given the old message I assume
I'd already hit it and just been too lazy to fix. This requires
updating a bunch of tests as fwbot ping are sent *to*
`fp_github_name`, but sent *from* the reference user (because that's
the key we set).

Note: noupdate on CSV files doesn't seem to work anymore, which isn't
great. But instead set tracking on the template's templates, it's not
quite as good but should be sufficient.

Fixes #769
2023-06-14 16:01:45 +02:00
Xavier Morel
e14616b2fb [IMP] runbot_merge: add support for draft check
1cea247e6c missed the update of the
`draft` flag, add support for it.

Fixes #753
2023-06-14 16:01:45 +02:00