Fix outstanding query to make a positive `state` filtering, instead of
negative, matching 3b52b1aace8674259812a76b1566260937dbcacb.
Also manually create a map of stagings (grouped by branch) sharing a
single prefetch set.
For odoo the mergebot home page has 12 branches in the odoo project
and 8 in spreadsheet, 6 stagings each. This means 120 queries to
retrieve all the heads (Odoo stagings have 5 heads and spreadsheet
have 1, but that seems immaterial).
By fixing `_compute_statuses` and creating a single prefetch set for
all stagings of all branches we can fetch all the commits in a single
query instead of 120.
- add support for authorship (not just approval)
- make display counts directly
- fix `state` filter: postgres can't do negative index lookups
- add indexes for author and reviewed_by as we look them up
- ensure we handle the entire source filtering via a single subquery
Closes#778
Before this, when testing in parallel (using xdist) each worker would
create its own template database (per module, so 2) then would copy
the database for each test.
This is pretty inefficient as the init of a db is quite expensive in
CPU, and when increasing the number of workers (as the test suite is
rather IO bound) this would trigger a stampede as every worker would
try to create a template at the start of the test suite, leading to
extremely high loads and degraded host performances (e.g. 16 workers
would cause a load of 20 on a 4 cores 8 thread machine, which makes
its use difficult).
Instead we can have a lockfile at a known location of the filesystem,
the first worker to need a template for a module install locks it,
creates the templates, then writes the template's name to the
lockfile.
Every other worker can then lock the lockfile and read the name out,
using the db for duplication.
Note: needs to use `os.open` because the modes of `open` apparently
can't express "open at offset 0 for reading or create for writing",
`r+` refuses to create the file, `w+` still truncates, and `a+` is
undocumented and might not allow seeking back to the start on all
systems so better avoid it.
The implementation would be simplified by using `lockfile` but that's
an additional dependency plus it's deprecated. It recommends
`fasteners` but that seems to suck (not clear if storing stuff in the
lockfile is supported, it opens the lockfile in append mode). Here the
lockfiles are sufficient to do the entire thing.
Conveniently, this turns out to improve *both* walltime CPU time
compared to the original version, likely because while workers now
have to wait on whoever is creating the template they're not competing
for resources with it.
This reverts commit 9e7441e098.
This doesn't work as expected because of db filter.
Will eb changed latter, reverting for now
Token field is kept, could still be used later.
When investigating kernel logs e.g.: for finding oom killed containers,
the kernel does not log the name of the incriminated container but only
the id. With this commit the runbot will also log the container short id
which is enough to correlate the logs.
There are two wizards for the runbot build errors:
- One to close an error with a reason
- One to update the team/user or PR
With this commit, the two wizards are merged into one wizard that helps
to update errors in bulk.
Also, a button is added in the list view that allow to save a mouse
click.
The `NEW` button is removed from the tree view as it should not be of
any use.
Pausing a host can be usefull in some case, mainly when testing new code
The loop will have no effect avoiding to break some build wainting for
testing.
Profile will help to identify potential performance flows during the
loop.
One of the most common custom trigger is to restore a build before
starting some test, either to create a multibuild or make the execution
and debug of some test faster.
It is somethimes tedious to use because we need to give an url of a
build to restore. This build must correspond to the right commits,
must still exixt, ... this means that the dump url must be adapted
everytime a branch is rebased.
The way the dump_url is defined is by going on the last batch, following
the link to the `base_reference_batch_id`, finding a slot corresponding
to the right repo set, (ex: Custom enterprise -> enterprise), and
copying the dump_url in this build.
The base_reference_batch_id is eay to automated but we have to find the
right trigger, this is now a parameter of the custom trigger wizard.
There are actually 2 strategy now to define how to download the dump:
- `url`, using `restore_ dump_url`
- `auto`, using `restore_trigger_id` and `restore_database_suffix`
To ease the setup, a `restore_trigger_id` is added on a trigger, so that
when selecting a trigger, lets say `Custom enterprise`, the defined
`trigger.restore_trigger_id` is automatically chosen for the
`custom_trigger.restore_trigger_id` and the `restore_mode` is setted to
auto.
Two actions are also added to the header of a bundle, a shorcut to
setup a multi build (restore in children) or a restore and test build
(restore in parent).
In view list widget are not always instanciated and a formater is used
instead. This means that the t-esc will try to output a jsonb field
without nowing how to render it, making the page crash.
This is quickly fixed by forcing the widget on the field in tree view.
The python steps can be long and interresting to track but the change is
actually hard to see since a block of code is logged instead of the diff.
Also, the whitespaces are not preserverd since we are note in a <pre>
block making it hard to read.
This proposes an alternative to display python code tracking values as
a diff with options to copu the raw content of the old and new version.
(as well as showing unchanged lines or not)
A few cases of conflict were missing from the provisioning
handler.
They can't really be auto-fixed, so just output a warning and ignore
the entry, that way the rest of the provisioning succeeds.
During the 16.3 freeze an issue was noticed with the concurrency
safety of the freeze wizard (because it blew up, which caused a few
issues): it is possible for the cancelling of an active staging to the
master branch to fail, which causes the mergebot side of the freeze to
fail, but the github state is completed, which puts the entire thing
in a less than ideal state.
Especially with the additional issue that the branch inserter has its
own concurrency issue (which maybe I should fix): if there are
branches *being* forward-ported across the new branch, it's unable to
see them, and thus can not create the now-missing PRs.
Try to make the freeze wizard more resilient:
1. Take a lock on the master staging (if any) early on, this means if
we can acquire it we should be able to cancel it, and it won't
suffer a concurrency error.
2. Add the `process_updated_commits` cron to the set of locked crons,
trying to read the log timeline it looks like the issue was commits
being impacted on that staging while the action had started:
REPEATABLE READ meant the freeze's transaction was unable to see
the update from the commit statuses, therefore creating a diverging
update when it cancelled the staging, which postgres then reported
as a serialization error.
I'd like to relax the locking of the cron (to just FOR SHARE), but I
think it would work, per postgres:
> SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as
> SELECT in terms of searching for target rows: they will only find
> target rows that were committed as of the transaction start
> time. However, such a target row might have already been updated (or
> deleted or locked) by another concurrent transaction by the time it
> is found. In this case, the repeatable read transaction will wait
> for the first updating transaction to commit or roll back (if it is
> still in progress). If the first updater rolls back, then its
> effects are negated and the repeatable read transaction can proceed
> with updating the originally found row. But if the first updater
> commits (and actually updated or deleted the row, not just locked
> it) then the repeatable read transaction will be rolled back with
> the message
This means it would be possible to lock the cron, and then get a
transaction error because the cron modified one of the records we're
going to hit while it was running: as far as the above is concerned
the cron's worker had "just locked" the row so it's fine to
continue. However this makes it more and more likely an error will be
hit when trying to freeze (to no issue, but still). We'll have to see
how that ends up.
Fixes#766 maybe
Currently sentry is only hooked from the outside, which doesn't
necessarily provide sufficiently actionable information.
Add some a few hooks to (try and) report odoo / mergebot metadata:
- add the user to WSGI transactions
- add a transaction (with users) around crons
- add the webhook event info to webhook requests
- add a few spans to the long-running crons, when they cover multiple
units per iteration (e.g. a span per branch being staged)
Closes#544
A first step to get more independant from web and website was done in
15.0 but some file were moved and it looks like the bootstrap version
changed breaking the frontend again. (4.3->5.1)
This commit simply copies the libs from odoo/15.0 to avoid losing time
fixing the frontend look and feel for the new bootstrap version for now.
A future refactoring could change the vendored version to addapt to 5.1
while modernizing the frontend style.
- move sentry configuration and add exception-based filtering
- clarify and reclassify (e.g. from warning to info) a few messages
- convert assertions in rebase to MergeError so they can be correctly
logged & reported, and ignored by sentry, also clarify them
(especially the consistency one)
Related to #544
Largely informed by sentry,
- Fix an inconsistency in staging ref verification, `set_ref`
internally waits for the observed head to match the requested head,
but then `try_staging` would re-check that and immediately fail if
it didn't.
Because github is *eventually* consistent (hopefully) this second
check can fail (and is also an extra API call), breaking staging
unnecessarily, especially as we're still going to wait for the
update to be visible to git.
Remove this redundant check entirely, as github provides no way to
ensure we have a consistent view of anything, it doesn't have much
value and can do much harm.
- Add github request id to one of the sanity check warnings as that
could be a useful thing to send upstream, missing github request ids
in the future should be noted and added.
- Reworked the GH object's calls to be clearer and more coherent:
consistently log the same thing on all GH errors (if `check`),
rather than just on the one without a `check` entry.
Also remove `raise_for_status` and raise `HTTPError` by hand every
time we hit a status >= 400, so we always forward the response body
no matter what its type is.
- Try again to log the request body (in full as it should be pretty
small), also remove stripping since we specifically wanted to add a
newline at the start, I've no idea what I was thinking.
Fixes#735, #764, #544
Current system makes it hard to iterate feedback messages and make
them clearer, this should improve things a touch.
Use a bespoke model to avoid concerns with qweb rendering
complexity (we just want GFM output and should not need logic).
Also update fwbot test setup to always configure an fwbot name, in
order to avoid ping messages closing the PRs they're talking
about, that took a while to debug, and given the old message I assume
I'd already hit it and just been too lazy to fix. This requires
updating a bunch of tests as fwbot ping are sent *to*
`fp_github_name`, but sent *from* the reference user (because that's
the key we set).
Note: noupdate on CSV files doesn't seem to work anymore, which isn't
great. But instead set tracking on the template's templates, it's not
quite as good but should be sufficient.
Fixes#769
I'd been convinced this was an ORM error because the field is not
recursive... in runbot_merge, in forwardbot it is and thus does indeed
need to be flagged to avoid the warning.
- currently disabling staging only works globally, allow disabling on
a single branch
- use a toggle
- remove a pair of tests which work specifically with `fp_target`,
can't work with `active` (probably)
- cleanup search of possible and active stagings, add relevant
indexes and use direct search of relevant branches instead of
looking up from the project
- also use toggle button for `active` on branches
- shitty workaround for upgrading DB: apparently mail really wants to
have a `user_id` to do some weird thing, so need to re-add it after
resetting everything
Fixes#727