Add a new model runbot.host to keep info and configuration about
hosts (worker servers), like number of worker, reserved or not,
ping times (last start loop, successful iteration, end loop, ...)
and also last errors, number of testing per host, psql connection
count, ...
A new monitoring frontend page is created, similar to glances
but with additionnal information like hosts states and
last_monitored builds (for nightly)
Later this model will be used for runbot_build host instead of char.
Host are automaticaly created when running _scheduler.
In some case _force can return an empty recordset,
if the corresponding branch is in no_build mode in other
repo may be an explanation here.
This commit avoid to stuck the fetch and build loop in this case.
indirect state was writen on parent leading to unconsistent info.
indirect was using last build regardless of build_type. Now, indirect
will only use normal build to avoid red-chain after a sticky rebuild.
Docker can take some time to be considered as running after docker_run. This
issue can appear when we speedup sheduling loop. To avoid that when can add an
time condition to consider if a docker is running, but we want to avoid to wait
to much since some jobs are fast.
This solution check if a job is a docker run before waiting, and will also
update job_start after a checkout since this can take some time if
a git fetch is performed.
Some data are logged at each loop turn even if nothing interresting was done:
- ... builds [] where allocated to runbot
- reload nginx
That kind of info was interresting for debug but now this noise makes
logs heavier and more difficult to read.
Reload ngnix will be done only if file changed and this this will avoid
a log at each loop turn.
We also display difference between existing sources and source that should
be there instead of complete lists.
Sources can be easily exported if needed since they are in the bare repository
most of the time. To avoid using to much space, this commit will garbage collect
all sources at the beginning of a long_running cron.
Only real side effect of this is that it will be impossible to wake up
a build that was force pushed since source cannot be fetched anymore.
We may imagine that we could keep sources of recent build, maybe for
48 hours, but keeping build specific data (logs, database)
is more interresting.
In order to stores other things than logs, that could be accessible by
end users, for example screenshots and screencasts, a "tests" directory
is allowed thruough the nginx template in the builds directories.
Also, the "with" context manager is used to open the nginx configuration
to ensure that the file descriptor is released during long running crons.
The requirements path and python version where defined from
server in cmd. Since in coverage we add a 'python' before server,
it is difficult to define which element of the cmd is the server.
A solution here is simply to define requirements install and
python version when building cmd since we have access to all
build/source informations. We also add python part in every
cases, and coverage params are now a _cmd python_params.
The _cmd method now returns a Command object instead of a
list, which behave has a list for the cmd part but also contains
a pres and posts list.
pres are requirement install, preparation, ...
cmd is the original cmd list, element can be append or added, this
will allow to keep existing python job without to much changes.
posts are post cmd commands, like coverage result making.
This commit also fix issue with create_job dependencies.
Multibuild can create generate a lots of checkout, especially for small
and fast jobs, which can overload runbot discs since we are trying not
to clean build immediatly. (To ease bug fix and allow wake up)
This commit proposes to store source on a single place, so that
docker can add them as ro volume in the build directory.
The checkout is also moved to the installs jobs, so that
builds containing only create builds steps won't checkout
the sources.
This change implies to use --addons-path correctly, since odoo
and enterprise addons wont be merged in the same repo anymore.
This will allow to test addons a dev will do, with a closer
command line.
This implies to change the code structure a litle, some changes
where made to remove no-so-usefull fields on build, and some
hard-coded logic (manifest_names and server_names) are now
stored on repo instead.
This changes implies that a build CANNOT write in his sources.
It shouldn't be the case, but it means that runbot cannot be
tested on runbot untill datas are written elsewhere than in static.
Other possibilities are possible, like bind mounting the sources
in the build directory instead of adding ro volumes in docker.
Unfortunately, this needs to give access to mount as sudo for
runbot user and changes docjker config to allow mounts
in volumes which is not the case by default. A plus of this
solution would be to be able to make an overlay mount.
When updating repo, order is odoo/odoo, odoo-dev/odoo, odoo/enterprise, odoo-dev/enterprise
The problem with this is that if a community hash and an enterprise hash arrives exactly
between odoo/odoo and odoo/enterprise update, enterprise could have a newest version
than community when creating builds.
By inversing this order, we have less chances to have this cornercase (as unlikelly as
it could be)
get_ref_time was cast from float to datetime and thus,
milliseconds where lost. Storing it as float make code
easier to read and avoid this rounding that was breaking
this feature.
When getting new refs, a lot of them are really old and the
find_new_commits is called for each one and thus browsing branches.
With this commit, refs older than configured max_age are ignored.
Co-authored-by: Xavier Dollé (xdo@odoo.com)
When updating github statuses, it happens that we face a "Bad gateway"
from github. In that case, the error is logged in the runbot logs and
that's it. As a consequence, when the runbot_merge is waiting status for
the staging branch and this kind of error occurs, the runbot_merge
timeouts and the users vainly search the reason.
With this commit, the runbot tries to update the status at least twice.
If it fails, an INFO message is logged on the build itself.
Nightly build have a low priority but once they have a slot, they keep it.
When pushing a branch or asking robodoo to nicelly merge a branch for the
third time at 22:00, there may be no slot left since all the nightly build are created.
This commit will only assign scheduled build if there is no other build to create and
will always keep a free slot for other builds.
When a runbot instance is scheduling builds, the numbers of builds
depends of a global ir.config_parameter. Even if one of the runbot
instance is running on a more powerful systsem, its number of workers is
limited by this global parameter.
With this commit, this parameter still exists but can be overriden by
specific ir.config_parameter.
For example, if the host 'runbot24.odoo.com' has more cpu power, the
number of workers for this host can be specified in the
ir.config_parameter named 'runbot24.odoo.com.workers'.
With recursive states computation, schedule is
most likely to have transactionnal errors.
This is particularly a problem when external
operations are done during the transaction,
like running a docker.
Adding some commits will help to reduce
transactionnal errors, and ensure that the db
is consistent with docker states.
When searchings for new refs to create builds, the for-each-ref git
commit is run and each ref is searched in the database which is a
somewhat heavy operation.
With this commit, the timestamp of the last database update with the
refs is stored in a field on the repo. This timestamp is checked each
time a for-each-ref is needed, running the operation only when
necessary.
This commit aims to replace static jobs by fully configurable build config.
Each build has a config (custom or inherited from repo or branch).
Each config has a list of steps.
For now, a step can test/run odoo or create a new child build. A python job is
also available.
The mimic the previous behaviour of runbot, a default config is available with
three steps, an install of base, an install+test of all modules, and a last step
for run.
Multibuilds are replace by a config containing cretaion steps.
The created builds are not displayed in main views, but are available
on parent build log page. The result of a parent takes the result of
all children into account.
This new mechanics will help to create some custom behaviours for specifics
use cases, and latter help to parallelise work.
At the end of the _update_git method, the "git fetch" command is run.
That makes it diffcult to override to change its behavior (for example
to avoid fetching pull requests).
With this commit, the command is separated in a new small method that
can be easily overriden.
When searching for new builds by parsing git refs, the new branches are
created as well as the pending builds in the same _find_new_commits
method.
With this commit, this behavior is splitted into two methods, that way,
it's now possible to create missing branches without creating new
builds. The closest_branch detection is enhanced because all the new
branches are created before the builds (separated loops).
The find_new_commits method uses an optimized way to search for
existsing builds. Before this commit, a build search was performed for
each git reference, potentially a huge number.
With this commit, a raw sql query is performed to create a set of tuples
(branch_id, sha) which is enough to decide if a build already exists.
A test was added to verify that new refs leads to pending builds.
Also, a performance test was added but is skipped by default because it
needs an existing repo with 20000 branches in the database so it will
not work with an empty database. This test showed a gain of performance
from 8sec to 2sec when creating builds from new commits.
co-authored by @Xavier-Do
Before this commit, dependencies (i.e. community commit to use when testing enterprise)
were computed at checkout, when the build was going from pending to testing state and
were not stored.
Since the duplicate detection was done at create, the get_closest_branch_name was called
in a loop for each posible duplicate candidate, then a last time at checkout. The main idea of this
pr is to store the build dependecies on build at create, making the duplicate detection
faster (especially when the build name is matching many indirect builds).
The side effect of this change is that the build dependencies won't be affected if a new
commit is pushed between the build creation and the checkout. The build is fully
determined at creation. get_closest_branch is only called once per build
The duplicate detection will also be more precise since we are matching on the commits groups
that were used to run the build, and not only the branch name.
Some work has also been done to rework the closest branch detection in order to manage new corner
cases. Hopefully, everything should work as before (or in a better way).
In a soon future, it will also be possible to use this information to make an "exact rebuild"
or to find corresponding community build.
Pr: #117
When some special builds are scheduled during the night, free slots on
runbot instances are used. Depending on the number of scheduled builds,
all the slots can be used. That prevents people to use the runbot for
normal builds during this time.
To mitigate the problem, the scheduled builds were postponed to the
middle of the night ... the CET night. It means that it could be morning
in India.
With this commit, a build priority is given to normal builds. On the
other hand, scheduled builds are pushed at the end of the queue.
So even if there are plenty of builds during the Belgian night, if
someone pushes a commit in between, it will be built in priority before
the scheduled pending builds.
When using a local git repo, the git name does not have colon, making
the frontend crash.
With this commit, a non-stored computed field 'short_name' is added to
compute a shortest version of the name.
When github reaches the hook controller, the repo hook_time field is
updated. That way, a git fetch is done only when the hook_time is newer
that the last fetch. If the hook_time is updated during the long running
cron that runs the _cron_fetch_and_schedule method, the hook_time is cached
and only the old hook time is seen until the cron's end. The cursor
commit is not enough. As a result, the new builds are scheduled in the
next cron run.
With this commit, the cache is invalidated after the commit, that way,
the hook_time field contains the correct value.
When a runbot execute the cron_fetch_and_build method, the repo is
updated only if the webhook time is newer than the last fetch
time.
As the cron is now split into long running crons, the hook_time field is
cached. The runbot instance that sees a new build pending use this
cached value to estimate if the repo update is needed.
With this commit, the repo update is done right before exporting the
repo and only if the commit hash is not found.
As a bonus, the environment is reset in the long running cron of the
runbot builders to update the cached values.
The Runbot Cron is executed on each runbot instance. When the number of
instances scales, the time needed for an instance to obtain the cron
increases.
With this commit, the original runbot_cron is removed. Instead, a cron
have to be created to run the _cron_fetch_and_schedule method.
This method will fetch the repo and create pending builds. This cron is
intended to run only on one runbot instance. This method needs a host
parameter to specify which runbot instance will be in charge of this
task.
On the other hand, a dedicated cron have to be manually created for each
runbot instance that will have the build task.
Those cron's only have to call the _cron_fetch_and_build method with the
runbot hostname as a parameter. This method will then self
assign pending builds if there are slots available.
All available build slots are reserved in a single LOCKED SQL query.
Both methods are intended to last a large amount of time, just a few
minutes below the cron timeout to maximize the cron productivity.
The timeout is randomized to avoid deadlocks if the runbot instances are
started at the same time.
So the --limit-time-real parameter have to be set to a minimum of 180
sec (600 or 1200 are probably better targets).
Since the runbot_merge module, some branches does not need to be built.
For example the tmp.* branches.
Some other branches does need to be tested but it could be useless to
keep them running. For example the staging branches.
Finally, some builds are generated by server actions during the night.
Those builds does not need to be kept running despite the branch configuration.
For example, the master branch can be configured to create builds with
testing and running but nightly multiple builds can be generated with
testing only.
For that purpose, this commit adds a job_type selection field on the
branch. That way, a branch can be configured by selecting the type of
jobs wanted.
A same kind of job_type was also added on the build that uses the
branch's value if nothing is specified at build creation.
A decorator is used on the job_ methods to specify their job types.
For example, a job method decorated by 'testing' will run if the
branch/build job_type is 'testing' or 'all'.
This commit permits to prioritize a branch when scheduling builds.
It's main purpose is for the runbot_merge module. It avoids to have
staging branches as sticky and pollutes the main repo view.
Also, that branch can benefit of the autokill feature when a newer build
is found, freeing ressources for other builds.
Closes#43
The unicode icon added in the build subject is not clear for the users.
In that state, it's not easy to add a title on the icon or the subject.
With this commit, a build type field is added to differentiate the
builds and add the appropriate icon and title.
Closes: #24
When a new commit is found and a new build is created, if a user
pushes more commits in the same branch a few minutes later, it causes
more builds in testing phase, resulting in a CPU waste.
With this commit, previous builds in testing phase in non sticky
branches are killed when a new build is created.
Closes: #20
When someone tries to log in an old runbot build that is not running
anymore, he lands on the runbot instance that was running the build.
Also, all the running builds are allowed on all runbot instances,
leading to the same behavior.
With this commit, only the builds that are running on the runbot
instance can be reached, others are defaulted to a 404.
closes#21
When the coverage is activated on a branch, the coverage result is not
stored.
With this commit, the coverage result will be stored on a build.
The last result will be shown on the frontend for sticky branches.
Also, an extra_parameter field is added on the build model.
When a build fails in a repo that depends on another repo, it's difficult
to figure out from which commit it comes in the depending repo.
If this commit is applied, when a new commit is found in a repo's sticky
branch, the latest build from the same branch name, in the depending
repo will be forced to rebuild.
The previous code of runbot and runbot_cla was made for Odoo API version
8.0. This commit makes it work with Odoo API 11.0 and Python 3.
Also, the present refactoring splits the code into multiple files to
make it easier to read (I hope).
The main change due to Python 3 is the job locking mechanism:
Since PEP-446 file descriptors are non-inheritable by default.
A new method (os.set_inheritable) was introduced to explicitely make
fd inheritable. Also, the close_fds parameter of the subprocess.Popen
method is now True by default.
Finally, PEP-3151 changed the exception raised by fcntl.flock from IOError to OSError
(and IOError became an alias of OSError).
As a consequence of all that, the runbot locking mechanism to check if a
job is finished was not working in python3.