Because mergebot cron can run on any runbot, it's apparently possible
that a staging gets merged and the "closed" feedback from github
overwrites the merged status which the mergebot is supposed to set
despite the supposed protection.
* [ADD] runbot_merge: more informative states to stagings on error
Currently, when a staging fails for other reasons than a CI failure:
* the staging having been cancelled is known implicitly, because the
staging will be deactivated but will never get a status beyond
pending (because it's not found when looking for it since it's not
`active`)
* the fast-forward having failed is completely silent (logging aside),
it looks for all the world like the staging succeeded
Timeout fails the PR already, but split-on-timeout was not so fix that
one bit.
* [FIX] odoo/odoo#cb2862ad2a60ff4ce66c14e7af2548fdf6fc5961
Closes#41
The webhook used the "sender" of the event as comment author, however
if the comment is edited by a maintainer github sends a
"issue_comment" event with that maintainer as sender.
This means a random user could create a comment with a robodoo
command, and if a registered reviewer happened to edit the comment the
command would suddenly be taken in account. This was not the intention.
I just spent 10mn trying to find out why staging 28 was cancelled
(a p=0 comment). Add a common prefix to all staging cancels to make
them easier to find.
staging delay was mistakenly commented in
bb664455ec
Also modified testing fixtures so the staging delay is not enabled when
running tests locally: on my box it increases the local runtime from
~70s to ~1500s (20s/staging, ~1 staging/test, 73 tests)
It should be unnecessary: creating commits directly does not update
the ref (hence 2b1cd83b07) and we're
forcefully setting the ref afterwards, either resetting it to the
original head (for rebase) or updating it to the commits we've just
created (for squash).
Before this, the bot would only acknowledge commands of the form
<botname>: <commands>
but since the bot is an actual user, people regularly use `@<botname>`
as it seems like it should work *and* provides for autocompletion.
Support that, as well as the octothorpe in case users want to pound
robodoo.
Related to odoo/runbot#38
Continuation of fa94b269de which is
apparently not sufficient:
1. log the staging event so we can check that we're staging in the
correct order
2. add a delay after each staging in case there's some sort of race
in the updating of codependent repositories
When creating staging branches from tmp, use the iteration order of
the repos in the project (that way it's easy to see and eventually
configure if we add sequences or whatever, in the short term it's the
order in which the repos were added which is the one we want).
This ensures we stage odoo/odoo before we stage odoo/enterprise
without relying on dict order of iteration, or needing meta to be an
OrderedDict.
The issue is if stagings are created/updated the other way around, the
runbot may pick up staging on odoo/enterprise before odoo/odoo has
been updated, and thus build odoo/enterprise with the wrong odoo/odoo
commit, and defeat the entire point of it.
Example: http://runbot.odoo.com/runbot/build/376112 was triggered by
the same staging as http://runbot.odoo.com/runbot/build/376113, but
used the previous staging head.
The creation order of tmp branches should not matter so ignore it.
A limitation to 50 commits PRs was put in place to avoid rebasing
huge PRs (as a rebase means 1 merge + 1 commit *per source commit*),
however the way it was done would also limit regular merges, and the
way the limitation was implemented was not clear.
* explicitly check that limit in the rebase case
* make error message on PR sizes (rebase 50 or merge 250) clearer
* remove limit from commits-fetching (check it beforehand)
* add a test to merge >50 commits PRs
* fix the local implementation of pulls/:number/commits to properly
paginate
2b1cd83b07 fixed a bug in PR
squashing (introduced when it was mis-rebuilt on top of rebase) which
was immediately committed & pushed so we could fix the running
mergebot.
This adds a test for that issue, it was checked to fail for
2b1cd83b075a99da7ed905b9e62d7e5acb48b253~1 and work as of the current
head.
Turns out the previous tests checked all the new/complex features to
see if they worked correctly, but I completely forgot that the
previously working squash had been rebuild.
Staging 13 tried merging 3 PRs (27085, 27083 and 27071) and supposedly
succeeded *but* only merged one of the 3 PRs despite marking all three
as merged. I tried building a few tests constructing multi-PR graphs
and checking them, but the only thing they exposed was the local
github implementation not correctly updating merge targets.
So fixed that, which is good.
Doesn't tell me why the staging didn't work right though.
a0063f9df0 slightly improved the error
message on non-PR ci failure (e.g. a community PR makes enterprise
break) by adding the failed commit, but that's still not exactly clear,
even for technical users (plus it requires having access to all the
repos which is not the case for everyone).
This commit further improves the situation by storing the target_url
and description fields of the commit statuses, and printing out the
target_url on failure if it's present.
That way the PR comment denoting build failure should now have a link to
the relevant failed build on runbot, as that's the target_url it
provides.
The change is nontrivial as it tries to be compatible with both old and
new statuses storage format, such that there is no migration to perform.
e98a8caffb added dummy commits to the
heads of stagings and fixed most places to make a difference between
the staging head (including dummy commit) and the actual merge head,
but the difference was missed in the comment closing a PR, which was
still using the staging head and thus pointing to the dummy commit
e.g. (https://github.com/odoo/odoo/pull/26821#issuecomment-420244592)
If CI fails on a non-PR'd branch of a staging (e.g. given repos A and B
and a PR to A, CI fails on the staging branch to B), the error message
(log and comment on the PR) is unhelpful as it states that the staging
failed for "unknown reason".
Improve this by providing the failed CI context and the commit, which
should allow finding out the branch & CI logs, and understanding the
why of the failure.
Fixes#36
Before this change, when staging batches only affecting one repo (of n)
the unaffected repositories would get a staging branch exactly matching
the target.
As a result, either runbot_merge or runbot would simply return the
result of an unrelated build, potentially providing incorrect
information and either failing a staging which should have succeeded
(e.g. change in repo A broke B, PR is making a change in repo A which
fixes B, but B's state is reported as the previous broken build) or
succeeding a staging which should have failed (change in repo A breaking
B except a previous build of the exact same B succeeded with a different
A and is returned).
To fix this issue, create a dummy commit at the head of each staging
branch. Because commit dates are included in the hash and have a second
precision it's pretty unlikely that we can get built duplicates, but
just to be completely sure some random bits are added to the commit
message as well.
Various tests fixed to correctly handle the extra dummy commit on
staging branches.
fixes#35
After discussion with mat, rco and moc, if a PR is updated it should
be unapproved for safety reasons: if a reviewer approves a PR, that's
what should be merged, if there are things to fix/change a reviewer
should at least rubberstamp the changes to avoid mistakes.
This is a bit more noisy/constraining, but can be changed or tuned
afterwards if it's considered too constraining.
rebase-and-merge (or squash-merge if pr.commits == 1) remains default,
but there are use cases like forward ports (merge branch X into branch
X+1 so that fixes to X are available in X+1) where we really really
don't want to rebase the source.
This commits implements two alternative merge methods:
If the PR and its target are ~disjoint, perform a straight merge (same
as old default mode).
However if the head of the PR has two parents *and* one of these
parents is a commit of the target, assume this is a merge commit to
fix a conflict (common during forward ports as X+1 will have changed
independently from and incompatibly with X in some ways).
In that case, merge by copying the PR's head atop the
target (basically rebase just that commit, only updating the link to
the parent which is part of target so that it points to the head of
target instead of whatever it was previously).
After discussion with al & rco, conclusion was default PR merging method
should be rebase-and-merge for cleaner history.
Add test for that scenario (w/ test for final DAG) and implement this
change.
* Add ids accessor to the remote Model fake
* Explicitly ignore order when unnecessary, a test fails since the
ordering of prs has been changed for UI purposes. This is only an
issue for Remote though it's unclear why (as the local Issue/PR
objects should still have a per-repo sequence)
* avoid fetching PRs for un-managed branches if we know up-front
* avoid processing comments with no commands (avoids fetching the
corresponding PR which we know nothing about yet and which may or
may not be for a managed branch)
The old "sync pr" thing is turning out to be a bust, while it
originally worked fine these days it's a catastrophe as the v4 API
performances seem to have significantly degraded, to the point that
fetching all 15k PRs by pages of 100 simply blows up after a few
hundreds/thousands.
Instead, add a table of PRs to sync: if we get notified of a
"compatible" PR (enabled repo & target) which we don't know of, create
an entry in a "fetch jobs" table, then a cron will handle fetching the
PR then fetching/applying all relevant metadata (statuses,
review-comments and reviews).
Also change indexation of Commit(sha) and PR(head) to hash, as btree
indexes are not really sensible for such content (the ordering is
unhelpful and the index locality is awful by design/definition).
Previously when splitting staging we'd create two never-staged
stagings. In a system where the stagings get deleted once done
with (succeeeded or failed) that's not really important, but now that
we want to keep stagings around inactive things get problematic as
this method gunks up the stagings table, plus the post-split stagings
would "steal" the original's batches, losing information (relation
between stagings and batches).
Replace these empty stagings with dedicated *split* objects. A batch
can belong to both a staging and a split, the split is deleted once a
new staging has been created from it.
Eventually we may want to make batches shared between stagings (so we
can track the entire history of a batch) but currently that's only
PR-level.
If we want a dashboard with a history of stagings, maybe not deleting
them would be a good idea.
A replacement for the headless stagings would probably be a good idea:
currently they're created when splitting a failed staging containing
more than one batch, but their only purpose is as splits of existing
batches to be deactivated/deleted to be re-staged (new batches &
stagings are created then as e.g. some of the batches may not be
merge-able anymore) and that's a bit weird.
Github apparently doesn't sync merged/closed PRs (which makes sense
but isn't really documented) so strip out test and assume that never
happens (with a log error in case it ever does).
Remote's labels are not entirely under our control as the part before
":" is the *owner* of the source repo => introduce additional "owned"
fixture to handle this case, as it may diverge from the "user" role if
running the tests against an organisation.
Can't really assume we can get the github logins "user" or "reviewer"
to run the test suite remotely, so add an indirection and backronym
those to *roles* instead. The local test suite has identical roles &
logins, but the remote version does not.
Also use the "other" role for any random user, and don't create its
partner up-front.
Also renamed the self-reviewer user to self_reviewer, that's a bit
less weird when dealing with e.g. ini files.
Turns out PATCH /git/refs/:ref returns a 422 when the ref does not
exist, rather than the 404 I'd expected.
Also improve the error message by including the JSON body which tends
to be more descriptive/helpful than the reason for Github's API.
Maybe I should replace all raise_for_status() by printing the JSON
body instead...
This is the preparation of an attempt to make these tests work with
both a local github mock (in-memory) and a remote actual github.
Move a bunch of fixtures relying on the specific github
implementation (and odoo-as-library access) to the "local" plugin,
including splitting the "repo" fixture.
The specific fixtures will likely have to be adjusted as the
remote endpoint is fleshed out.
AL thinks it's not useful and it's better to always squash/rebase a
single commit & merge multiple. Mark tests as xfail'd instead of
removing them.
Also mark test_edit_retarget_managed as skipped explicitly
Reviews are interpreted like comments and can contain any number of
commands, with the difference that APPROVED and REQUEST_CHANGES are
interpreted as (respectively) r+ and r- prefixes.
* p0 cancel existing stagings in order to be staged as soon as
possible
* p0 PRs should be picked over split batches
* p0 bypass PR-level CI and review requirements
* p0 can be set on any of a batch's PR, matched PRs will be staged
alongside even if their priority is the default