With this commit, a new model is introduced to facilitate the tracking
of the build errors.
Its based on an SQL view (Thanks @Xavier-Do), that way, there is no new
table in DB and this view is also useful from the PSQL CLI.
In the UI, the search for errors easier than manipulate the ir_logging
view because the builds informations can be used in search and filters.
The since google chrome 74, a random bug makes it crash at startup,
making the odoo tests crash.
With this commit, an odoo custom deb repo is used on nightly with a
known working chrome version.
When a build is wake-up and something goes wrong during the
_run_odoo_run method, the "fetch and build" cron is broken and the
concerned runbot host stops working.
With this commit, the exception is catched and the build goes back to
the "done" state whith a log.
With this commit, a new RunbotBuilError model is added in order to
classify and manage errors that appears during runbot builds. This is
an helper to find undeterministic bugs in Odoo builds. Build logs can
be parsed on demand, during the parsing, the logs are cleaned with some
regexes stored on the RunbotErrorRegex model. A hash is computed on the
cleaned log, if a build error already exists with the same fingerprint,
the build is appended on the build error.
Errors can also be manually linked together with a parent/children
relation in case of a related error log. e.g. the error message is
different in two different branches but the bug is the same.
Also, a new build_url field is added to the runbot_build in order to
access the build web page from the backend.
Add a new model runbot.host to keep info and configuration about
hosts (worker servers), like number of worker, reserved or not,
ping times (last start loop, successful iteration, end loop, ...)
and also last errors, number of testing per host, psql connection
count, ...
A new monitoring frontend page is created, similar to glances
but with additionnal information like hosts states and
last_monitored builds (for nightly)
Later this model will be used for runbot_build host instead of char.
Host are automaticaly created when running _scheduler.
* split action_cancel (UI button) from cancel (internal): since the
xhr mapping is weird, if there are available args the mapper thinks
it should pass the call context as reason which is unexpected
* make cancel a no-op when called on already inactive stagings
* make cancel work when called on multiple statgings
* make computing the active staging work properly in an
active_test=False context (e.g. when it's interacted with from the
form view because that comes from the list view which is
active_test=False, probably so we can see not just the stagings but
recursively see deactivated batches in deactivated stagings)
* don't show the cancel button on inactive stagings
A deactivated branch is generally treated as unmanaged which is mostly
correct except for the case of retargeting an existing PR.
When a branch is deactivated the corresponding PRs are not removed, so
it's possible to have live PRs associated with ~unmanamaged
branches. When retargeting those PRs to active branches, the mergebot
would assume there was no existing PR and would create a duplicate,
then either get completely lost (before
a84595ea04) or blow up (after the same).
Properly search amongst deactivated branches for retargeting sources
so we update the relevant PR instead of trying to create duplicates.
Fixes#169
Stagings have a "statuses" field which was shown but useless (as it's
a binary), they also have a "heads" field which only provides a
mapping of repository names to commits.
This change provides the staging heads as a commits m2m.
Fixes#178
Running multiple ngrok concurrently is only allowed from pro and up
(OOTB and without shenanigans) is only allowed from Pro and up. However
multiple tunnels through a single ngrok is allowed
-> when tunneling through ngrok, start the process without any tunnel,
use the API to create then remove the local tunnel, and shut down ngrok
IIF there's no tunnel left.
There's plenty of race conditions but given how slow tests are when they
involve github that's probably not an issue.
* extract method to create a PR object from a github result (from the
PR endpoint)
* move some of the remote's fixtures to a global conftest (so they can
be reused in the forwardbot)
In case of error while fast-forwarding a staging to its source, we'd
log the target to which we couldn't FF. Sadly this relied on a
`repo_name` variable which (likely since the introduction of the
"safety dance" fast forwarding) can not actually be set in case of
failure.
So stash the relevant bit (the repo name) inside the FF error exception
and use that to compose our logging message instead of a variable which
can only be None.
Github constrains a single issue (/PR) number per repository, having
different targets does not allow two PRs to share a number.
Doesn't fix but should mitigate #169 slightly.
Before this change mergebot assumes github's tags are in sync with its
"previous" state, but because tags update was highly non-atomic (one
call per removal plus one for additions) and state can further change
between a failure and an update retry (especially as the labels endpoint
fails *a lot*), it's possible for set tags (in github) to be completely
desync'd from the mergebot state, leading to very misleading on-pr
indications.
This first fetches the current tagstate from github (to not lose non-
mergebot tags) then (hopefully atomically) resets all tags tags based on
the current mergebot state. This should avoid desyncs, and eventually
resync PRs (if they change state).
Fixes#170
In some case _force can return an empty recordset,
if the corresponding branch is in no_build mode in other
repo may be an explanation here.
This commit avoid to stuck the fetch and build loop in this case.
Hook can represent label changes, closed pr, ....
We only want to fetch is some push or synchronize are sent.
TODO We also may want to catch retarget later in order to update branch.
indirect state was writen on parent leading to unconsistent info.
indirect was using last build regardless of build_type. Now, indirect
will only use normal build to avoid red-chain after a sticky rebuild.
A prototype of feature was added some times a go.
No really tested, this commit improves parmater format
and makes dependency closest_branch_id not required
since a repo/sha is all we need.
Docker can take some time to be considered as running after docker_run. This
issue can appear when we speedup sheduling loop. To avoid that when can add an
time condition to consider if a docker is running, but we want to avoid to wait
to much since some jobs are fast.
This solution check if a job is a docker run before waiting, and will also
update job_start after a checkout since this can take some time if
a git fetch is performed.
When a build is killed, result will be set to manually killed,
removing the 'error' or 'warn' result.
This commit removes this behaviour in order to keep error result
in this case.
If children are killed, they will all look the same in the parent view
making difficult to find the failed one in staging branches.
This commit displays result rather than status in priority if build
is in failure.
If a user really wants to keep a database up for a long time, he has the possibility
to wake it up multiple times.
Using last job end as reference will allow to keep a database alive longer.
The main motivation of this commit is to be able to notify github status only
when all children are done.
Until today, children where only used for dev branches and nightly. The needs to
use this system for staging need to enforce github_status behaviour.
Before this commit, a parent won't send github status since he will only
create childrens. And childrens are not awared of other children state,
so sending a succes may be wrong if another one failed.
Asking the parent to make the github_status looks the easiest solution:
-If top parent config will have update_github_state False, we also want to take that into account.
-If a child wants to contact github for failfast, parent will be in global_state error too and will send
message immediatly.
-If a child want to contact github for succsess, we actually want to wait for last child, parent will
be in waiting global_state and notify nothing (or pending).
Only last child will be able to notiffy success since global_state will be running or done at this step.
Orphan builds wont have any impact on result in with this scenario.
Some data are logged at each loop turn even if nothing interresting was done:
- ... builds [] where allocated to runbot
- reload nginx
That kind of info was interresting for debug but now this noise makes
logs heavier and more difficult to read.
Reload ngnix will be done only if file changed and this this will avoid
a log at each loop turn.
We also display difference between existing sources and source that should
be there instead of complete lists.
Sources can be easily exported if needed since they are in the bare repository
most of the time. To avoid using to much space, this commit will garbage collect
all sources at the beginning of a long_running cron.
Only real side effect of this is that it will be impossible to wake up
a build that was force pushed since source cannot be fetched anymore.
We may imagine that we could keep sources of recent build, maybe for
48 hours, but keeping build specific data (logs, database)
is more interresting.
This commit add a wake up button in place of connect button when build is
not running and may be wake up.
Connect button will also be visible immediatly when local_state is 'running'
since we don't need to wait sub build to finish.
On a PR being updated, closed or unreviewed, if it is part of an
active staging that staging would get cancelled (yay). However, if the
PR was part of a pending *split*, then the split would *not* get
cancelled / updated (to remove the PR from it), and the PR could go on
to get staged as if everything were right in the world which is an
issue.
It doesn't look like it actually happened (at least I got no echo of
it), but it almost did at least once.
fixes#160
Also add test for it & feedback of an approved PR failing CI, and fix
corner case with it (might not send a warning immediately on CI failure
depending on status requirement ordering).
Fixes#158
* when rebasing, store a map of rebased to source, that way it'll be
possible to link cherry-picked forward ports to the originally
integrated commit rather than just the one from the PR (which was
likely not itself integrated as the straight merge mode is somewhat
rare: as of 5600 PRs merged so far only 100 were straight merged)
* while at it, store the "merge head" of the PR (whether squashed,
merged or rebased) and put *that* in the commit message
fixes#161
When a build is completely dead, with directory and db deleted, the
wake-up system fails.
With this commit, a wake-up is not allowed on such dead builds.
In order to stores other things than logs, that could be accessible by
end users, for example screenshots and screencasts, a "tests" directory
is allowed thruough the nginx template in the builds directories.
Also, the "with" context manager is used to open the nginx configuration
to ensure that the file descriptor is released during long running crons.
When an Odoo instance is run in a Docker container, it listen on all
interfaces by default and a bridge interface is used to communicate with
the outside world.
This bridge interface is necessary to allow the instance to send logging
messages into the runbot postgresql database.
Even though Docker isolate the container from the oustide world, it's
still possible to reach the Odoo instace from the runbot host and worse,
from other Docker containers using the same bridge interface.
To confirm the Murphy's law, it finally happened with this commit
odoo/enterprise@0ba0ef99de that scanned the ip range of the interface,
disturbing other builds.
One solution would be to create a bridge interface for each instance to
isolate each Docker but that would imply a big change to garbage collect
forgotten bridges ...
An 'icc' (Inter Container Communication) option exists for the Docker
daemon which defaults to True. Setting it to False was tested and works
but it appears that this option can be messed-up by the firewall on the
runbot host.
Finally, if applied, this commit will prevent the Odoo instance to
listen on the bridge interface by only listening to 127.0.0.1 during
tests. A local_only parameter is a added on the _cmd method which is
true by default and means that the Odoo instance will only listen on
127.0.0.1. This parameter is set to False for the running step to allow
the running to be contacted through the Docker exposed ports.