The current runbot infrastructure has 100+ machine that will each build
all docker images.
This is unpractical for multiple reasons:
- impotant load on all machine when the build could be done once
- possible differences for the same image depending on the moment the
base was pulled on the host. An extreme example was the version of
python (3.11 or 3.12) when building the noble docker image before it was
stabilized.
- increase the chance to have a docker build failure in case of network
problem. (random)
A centralized registry will help with that, and allow future devlopment
to generate more docker images. All docker images will be exactly the
same, the pull time is faster than build time, only one build per docker
image, ...
When building a docker image the error is part of the stream, and
at the end.
The current behaviour will append the error message at the begining of
the "result log" breaking the temporality of the output. Adding it at
the end should be more intuitive to read. This will also help to get a
more usefull error sumary in some cases.
The migration to the new dockerfile also includes a change to the noble
docker image
This is done in a separate commit to ease the comparaison.
Notes:
The postgresql-client was initially pinned to version 16 and removed
from debian control, but this version proposes add the repositories and
let apt chose the latest version. This is experimental but was tested
on ubuntu:jammy, with 16 version installed as expected.
The requirements are installed as user instead of root to avoid the need
to use the ignore_installed flag (and solve a bunch of issues related
to this flag)
When a docker fails to build, the output is logged in the chatter
leading to a lot of noise and a not so readable output. Moreover, the
output tries to format markdown and don't render line break correctly.
This commit proposes to introduce a model to store this output, as well
as some other info like the image identifier, build time, ...
This will help to compare images versions between hosts and should be
useful later to have multiple version of the same image with variant
once the docker registry is introduced.
The batch prepare fill missing won't work if the repo is single version,
in a foreign bundle.
This commit simply adds the branches depending on their version instead
of the bundle version.
When cleaning build errors before fingerprinting, it's only possible to
replace the matching regex with something else but not an empty string.
Since the python 3.11 that may adds lines in error message in order to
visually improve them, the fingerprint of those errors does not match
anymore between different versions.
With this commit, when the replacement string is two consecutive simple
quotes, the matching element is replaced by an empty sting, allowing to
remove unwanted characters.
The order of requirements may have an impact on final outcome.
In documentation builds, we have two requirements with conflicting needs
Lets make the doc requirements install after the odoo ones
(sequence dependant)
The runbot colors can lack contrast in some case, especially for
colorblinded people.
This commit introduces a theme that should help a little.
Note that this was done without much analysis and should be tweaked in
the future.
There is an issue in unidiff 0.7.3 fixed in 0.7.4
a3faffc54e
This version is not able to parse a diff with removed files.
Since the unidiff packaged version in noble is 0.7.3 patching it looks
like the easiest solution
The MAKE_CELL opcode appeared in python 3.11 and is needed in some
python steps when using closures and generators.
Like:
`(all(s > e for e in [1,2]) for s in [0,1])`
The Debian control file was changed in odoo/odoo@55849aca in order to
work with Ubuntu Noble. Because of that, it was needed to have a more
robust parsing of the Debian Control file format.
The "Error or traceback found in logs" message is sometimes confusing
since we don't know what is the cause of the issue.
The first idea is to split the two concept, error and traceback to have
a better idea of the cause of the issue
The second one will also log the content of the line in the error
message. For traceback, tries to get the complete traceback, getting all
indentend lines and one last non idented one.
While working on it, cleaning slighty to partially get rid of
returning a dict, artefact from odoo < 13.0
Somme trigger may have an important depth and nightly result can be long
to check.
A custom view was already done for upgrade nightly, but this is hidden
and the same logic could be applied to the distro builds, ...
This commit adds a custom view on the trigger and related controller to
display a custom view for a trigger.
The current logic to define if a build has logs or not is based on
is_docker_step. This related logic is not always valid, since it is
based on step type and step content, but there are many way to have a
result that could be a docker start, one of them being to call another
step. The main issue is that we don't always now if this other step is a
docker step or not before runtime. Moreover the logic becamore more and
more complex adding more conditions like commands, _run_, docker_params,
...
Moreover, the log list is completed before the build start, meaning that
if the build is killed before completion, some logs may be missing but
will be listed. This is also an issue when trying to access logs before
the correspondong step ran.
This commit changes the logic by adding the log file to the list (and
start log) when starting a docker, wich should give a more consistent
result.
The bootstrap version was freezed during a previous migration to avoid
loosing to much time adapting the style again to fit the previous
look and feel.
Anyway, the latest version of bootsrap offers more flexibility about
themes, and it could be a good oportunity to modernise a little the
runbot interface and answer to long lasting requests.
The main part of the adaptation is to tweak colors to match the
previous style, and adapt some class in xml views.
Some css rules are also tweaked to keep the same looks without the need
to rewrite xml views too much, this could be done in a future commit.
The current stratey to check if a build can be killed is to ensure that
no slot unskipped slots points to the same build. Since the check is
only done after a prepare, it should ensure that the new slot have the
build linked before checking if the previous batch can kill it's build.
Since the slot are now filled latter on (after the minimal check) it is
possible that a build is considered killable an killed before being
linked again.
To fix this, we just need to consier params instead of builds to define
if a build is killable or not. If the params of a builds are linked
elsewere, don't kill them.
Previous commit introduced commit tree hash, but only when calling
get_commit_infos. This fixes the get_ref and find_new_commit to have
the same infos.
Right now the commit is considered as a part of the build uniquifier.
If it makes sence for a check on the commit metadata, it is not the case
for execution tests where only the content matters.
This will allow to mark a trigger as non depending on commit but tree
hash, and avoid rebuild when only fixing a commit message.
In order to keep it coherent, the cpu_limit computation can be done in
one place instead of defining it in all _run* methods.
In python steps, it can still be overridden when returning
docker_params.
When build eats all the CPU's resources, it may interfere with other
builds and cause collateral damages.
With this commit, a new settings parameter `Containers CPUs` is added in
order to limit the usage of available CPU's on runbot instances.
If left to 0, no limts are applied.
Otherwise, the cpu_quota docker parameter is computed as Containers
CPU's * (logical cpu's count / nb parallel builds) * cpu period which defaults to 100000.
e.g.:
- on a host with 16 logical CPU's
- with 8 parallel builds allowed
- with Containers CPUs set to 1.5
- with the default cpu_period
cpu_quota will be:
(16/8) * 1.5 * 100000 = 300000
This system parameter can be overridden by the `container_cpus` field on
steps.
When using a repo as a dependency for another trigger, the default
module filter for a repo is not always ideal
As an example, when using odoo as a dependency for another repo,
we may only want to install the module from the new repo.
This iss done right now by creating a custom config but this lead to
duplicates config and steps only to customize the module to install.
This commit proposes a new model to store the filters.
Note that this may be used later as module blacklist on repo too.
Git gc can last a few minutes, it's not a big deal since it's executed
once a day but the transaction is kept idele during this time wich is
not useful. This commit should help to avoid this.
Because of a bad dependency on the compute, the first seen date and last
seen date are not always updated.
e.g.: a new build is scanned and a build is added to a linked error, the
parent error seen dates are not updated.
A test is added to reproduce the case.
remove all attrs in xml views
To help with that, a scripts was written, minimal but sufficent
#!/usr/bin/python3
import glob
import re
from ast import literal_eval
def leaf_to_python(leaf):
if len(leaf) != 3:
raise ValueError('This script doesnt support leaf', leaf)
field, operator, value = leaf
if operator == '=':
return f'not {field}' if value is False else field if value is True else f'{field} == {value!r}'
if operator == '!=':
return f'not {field}' if value is True else field if value is False else f'{field} != {value!r}'
if operator == 'in':
return f'{field} in {value!r}'
if operator == 'not in':
return f'{field} not in {value!r}'
raise ValueError('This script doesnt support operator', operator)
for file in glob.glob('**/*.xml', recursive=True):
with open(file) as f:
content = f.read()
attrs_list = re.findall(r'attrs="{.*}"', content)
if attrs_list:
for attrs in attrs_list:
match = re.match(r'''attrs="{'(invisible|readonly)': ?(\[.*\])}"''', attrs)
attr = match.groups()[0]
domain = literal_eval(match.groups()[1])
condition = ' and '.join([leaf_to_python(leaf) for leaf in domain])
replace = f'{attr}="{condition}"'
content = content.replace(attrs, replace)
with open(file, 'w') as fw:
fw.write(content)
When build errors are merged together, the builds of the merged errors
should be moved to the only error that will be kept.
It 's not the case because the merge method is assigning a compute field
and moreover it was hidden in the tests because the compute was not
triggered.
With this commit, the build_error_link is updated to point to the new
error. The test is modified to properly check the case and also to add a
case when the link already exists.
The access rights are updated to allow admin to unlink the
build_error_link records. Otherwise the action could fail when the link
already exists.
When a build that contains the same error that appears two times is
parsed, it crashes because of the unique constraint on build error link.
With this commit, only one link with the same error is created.
Two tests are added for the two cases:
- a new error appearing two times in a same build
- an existing error appearing two times in a same build
Since c6f9d1f0c a new model was added to link build errors and builds.
The _search_version and _search_trigger_ids were not adapted to work
with this new model.
In the build error view, a list of build is displayed with a confusing
create date. The create date in the list is the creation date of the
build, leading to a confusion with the creation of the build log
creation.
With this commit, the real log creation is used in this view.
To achieve that, the many2many relation is extended with a
log_date which is filled when a build log entry is parsed.
It happens that a user edits or annotates a build error form without
noticing that the error is linked to another one. In that case, the
modification or the note is probably useless. So a warning ribbon should
grab his attention.
While at it, we change the warning ribbon when an error has a test-tag
to not be confused with the link ribbon and also because removing a
test-tag can lead to a real danger (for the mergebot stagings).
When filtering the build error tree view based on the versions equality,
the results may not be what you expect.
e.g.: searching for `versions is equal to 16.0` gives the errors that
appeared in `16.0` (hopefully) but also those which appeared in other
versions too.
With this commit, this search will give the errors that appeared in the
specified version only. When the user wants to list errors that appeared
in `16.0` and other versions too, he has to use the `contains 16.0`
criteria.
When defining cleaning regex, the replacement character is always the
percent sign as it's hard coded in various methods.
With this commit, a replacement string can be defined by cleaning regex
and fallback to the percent sign by default.
With this commit, when build errors are re-cleaned, they are also merged
if the fingerprints when fingerprints are matching.
Also, this fixes the ir_logging compute that associate a build error
so that the active build error is preferred over an inactive one.
Using dev branch from foreign project to fill missing commits looks like
a bad idea, mainly because the lifecycle of the branches is not the same
In some case, it can be useful to allow that to test a branch with a
future change in the base project that will be needed to make the branch
work. As an example introducing a small api change in odoo to make an
override easier, or introducing a module that may be needed to use
the feature.
This commit changes that by allowing to configure on the project or
bundle if we allow to use foreign bundle as reference *before* checking
the base bundle.
A check was add to avoid to wakeup a child if there is a parent database
Most of the time, it was a mistake.
In some case it can be legit, if the parent only creates subbuid without
installing any database.
This commit fixes that by allowing to wake up child if the parent has no
database.
In some circumstances, a commit is created from scratch with only a hash
for sole information. This is not very convenient and can lead to issues
that may be difficult to investigate.
With this commit, when creating this kind of commit object, we try to
get commit informations from git repo.
When exporting a commit, the commit date is used in the `tar` command to
set the date of the exported folder. On the other hand it happens that a
commit is not found in the database and should be quickly created on the
fly. e.g.: with the `_get` method. In this case, if the commit needs to
be exported later, the method fails and may break a runbot build.
It happened with a custom python step.
- clean thread username
- allow to write on params for debug (was mainly usefull to forbid it
at the beginning)
- imrpove some guidelines about method and actions naming/ ordering
- move some code for a cleaner organisation.
- remove some useless request.env.user (not useful anymore)
When configuring a custom trigger on a bundle by using the wizard, the
child extra params field is often too small to display all the
parameters.
e.g., specify two long test-tags as it's often the case for
multi-builds.
With this commit, the field span over 4 columns.
The runbot settings view is a bit messy and the 16.0 upgrade added mess
on the existing one.
This commit is an attempt to make it a bit clearer and cleaner.
This commit will replace the symlink used for upgrade by the
upgrade-path.
The symlink was used before because old version does not support upgrade
path, but the decision was taken to now limit the testing to version
suporting upgrade paths in order to be able to support utils in another
repository latter.
When trying to open a linked error or an error from the history, the
object is opened in a useless modal. With this commit, the object is
opened in a regular form view.
This reverts commit 9e7441e098.
This doesn't work as expected because of db filter.
Will eb changed latter, reverting for now
Token field is kept, could still be used later.
When investigating kernel logs e.g.: for finding oom killed containers,
the kernel does not log the name of the incriminated container but only
the id. With this commit the runbot will also log the container short id
which is enough to correlate the logs.
There are two wizards for the runbot build errors:
- One to close an error with a reason
- One to update the team/user or PR
With this commit, the two wizards are merged into one wizard that helps
to update errors in bulk.
Also, a button is added in the list view that allow to save a mouse
click.
The `NEW` button is removed from the tree view as it should not be of
any use.
Pausing a host can be usefull in some case, mainly when testing new code
The loop will have no effect avoiding to break some build wainting for
testing.
Profile will help to identify potential performance flows during the
loop.
One of the most common custom trigger is to restore a build before
starting some test, either to create a multibuild or make the execution
and debug of some test faster.
It is somethimes tedious to use because we need to give an url of a
build to restore. This build must correspond to the right commits,
must still exixt, ... this means that the dump url must be adapted
everytime a branch is rebased.
The way the dump_url is defined is by going on the last batch, following
the link to the `base_reference_batch_id`, finding a slot corresponding
to the right repo set, (ex: Custom enterprise -> enterprise), and
copying the dump_url in this build.
The base_reference_batch_id is eay to automated but we have to find the
right trigger, this is now a parameter of the custom trigger wizard.
There are actually 2 strategy now to define how to download the dump:
- `url`, using `restore_ dump_url`
- `auto`, using `restore_trigger_id` and `restore_database_suffix`
To ease the setup, a `restore_trigger_id` is added on a trigger, so that
when selecting a trigger, lets say `Custom enterprise`, the defined
`trigger.restore_trigger_id` is automatically chosen for the
`custom_trigger.restore_trigger_id` and the `restore_mode` is setted to
auto.
Two actions are also added to the header of a bundle, a shorcut to
setup a multi build (restore in children) or a restore and test build
(restore in parent).
In view list widget are not always instanciated and a formater is used
instead. This means that the t-esc will try to output a jsonb field
without nowing how to render it, making the page crash.
This is quickly fixed by forcing the widget on the field in tree view.
The python steps can be long and interresting to track but the change is
actually hard to see since a block of code is logged instead of the diff.
Also, the whitespaces are not preserverd since we are note in a <pre>
block making it hard to read.
This proposes an alternative to display python code tracking values as
a diff with options to copu the raw content of the old and new version.
(as well as showing unchanged lines or not)
A first step to get more independant from web and website was done in
15.0 but some file were moved and it looks like the bootstrap version
changed breaking the frontend again. (4.3->5.1)
This commit simply copies the libs from odoo/15.0 to avoid losing time
fixing the frontend look and feel for the new bootstrap version for now.
A future refactoring could change the vendored version to addapt to 5.1
while modernizing the frontend style.
Firefox blocks downloads from http link if you are on an https page
Allow to deactivate via an ICP in case the runbot is configured over
HTTP (you shouldn't really)
When the runbot tries to drop a local database, if the that raises an
exception, it goes in a loop failure. It mays happen for example if
someone forgot to close a psql during an investigation :-)
With this commit, the exceptions are catched and at least the database
name is logged.
Since all versions will have a defined dockerfile, the project one
will alway be ignored. The idea here is that for a project, we may
definea default dockerfile_id so that we don't have to set it on all
bundle to make it work.
The _kill method was called in multiple case, usually when something
wrong happen:
- exception initiating pending
- kill requested manually
- testing time exceeded
- exception running a job
- ...
But it will also be called when killing a running build.
It was usually not an issue since the status remains the same, but it is
not true if the same commit is used in two build, the new one is green,
the old one is red (enterprise commit remaining the same but community
commit changed as an example)
In this situation, the enterprise commit may receive the red ci from the
old build while the last one is green.
Since with the last version, the github status responsibility is left to
write method, this github status is not useful anymore, updating the
state and result is enough.
This commit also removes the commit since it is not always a god idea.
Most of the time the transaction will be comited quite fast after that
with the new scheduler.
Note that checking in github status if no status has a more recent build
may be a good idea. Only the most recent build using a commit could
sending a status? This would not alway be helpful Imagine a commit used
in 2 branches by mistake, the last build is not always the one we want
(usually fixed by rebuilding a subbuild of the good build)
In some case, a build can add a lot of info in a log, there
is already a limit to the number of entry but not to the size of an
entry. This will limit the database usage in case of mistake/abuse.
When filtering bundles in the frontend, the user is not able to search
for its final trigram because of the `like`search.
With this commit, if the search contains a `%` symbol, the `=like`
operator is used permitting more accurate searches.
With this commit, a custom widget is added to go to the reunbot frontend
from a Char field. This allows to go from the bundle backend page to the
bundle frontend page wich is more useful in some situations.
e.g.: when creating a custom trigger with the wizard, this allows to
test the trigger with 2 clicks.
* show only the all builds tab
* hide linked errors tab when there is no linked errors
* hide error history tab when there is no history
* add some readonly
* add a link to the fixing PR on github
* add a warning ribbon on test-tagged errors
* show different colors in tree view to spot fixed PR's
* add some search filters
The initial idea to have a wakeup for public users stopped being viable
due to some abuse of the system, maybe unintentional crawling of
some build page but still, this feature will now be limited to internal
users only.
The current post_install build mecanism is using extra params to give
test-tags. Unfortunately this disables the support for auto tags
and this have to be done manually. This means that auto tags are in the
build extra-params and not dynamic at rebuild of a post_install.
Also, using extraparams in the post install creation was removing
extra_params comming from custom trigger.
With this commit, the test-tags can be given inside config_data
and will be combined with config step test-tags and auto-tags.
This was an opportunity to simplify the logic.
This commit also fixes the test_install_tags that was broken.
SInce the previous version the build end is written when going in any
done state. This means that when a build is skipped, it has a end
but no start.
Adapat the build dime to manage this use case.
The force buttons were hidden because unfortunately miss used as a
rebuild in some case instead. The position of the button was to obvious
and used as a "magic fix" when the intended behavior was only for really
specific cases.
Unfortunately the routes were know and still used manually. This commit
blocs the access giving a message to ask for the group if needed.
Those feature would benefit for some documentation.
When a build is created, it will first check for another
build having the same params. It is usually a good idea to avoid
to much load. In some case, a build can be found, but a killed one.
This is not what we want:
The first scenario is to consecutive force push,
commit1 -> commit2 -> commit1
The build of commit1 may be killed because of commit2, then when
forcepushing commit1 again, it will be linked to a killed build.
A even more problematic problem was discovered because of a delay In
odoo/odoo repo hook. An odoo-dev/odoo 16.0-... branch was discovered
first using this commit, and a build was created.
Then, the branch was forcedpushed and the build was killed.
Finally, the 16.0 commit was discovered, and was linked to the killed
build. This was mainly an issue because the build was a template.
With this changes, the 16.0 would have created a new build, not linking
to a killed one.
Note that linking to a red build is not an error. Only a killed one.
The assigned build are in the same count of the pending build. This can
sometimes create a false queue, because you can have 1000 pending builds
on one host, this doesn't mean that a new standard build cannot be
immediatly taken by another host. This is mainly to hide the false queue
created by the full charge zfs build currently running and creating
~400 assigned build.
The main motivation is to allow to create params from data
the "new" method was called with a value list instead of a dict.
Also, makes it possible to update params when the registry is not laoded
The initial motivation is to remove the flush when a log_counter is
written. This flush was initially usefull when the limit was in a
psql trigger, but finally add a side effect to flush everything before
starting the docker. This was limiting concurrent update after starting
the docker, but we still have no garantee that the transaction is
commited after starting the docker. The use case where the docker is
started but the transaction is not commited was not handled well and was
leading to an infinite loop of trying to start a docker (while the
docker was already started)
This refactoring returns the docker to the scheduler so that the
schedulter can commit before starting the docker.
To achieve this, it is ideal to have only one method that could return
a callable in the _scheduler loop. This is done by removing the run_job
from the init_pending method. All satellite method like make result
are also modified and adapted to make direct write: the old way was
technical debt, useless optimization from pre-v13.
Other piece of code are moved arround to prepare for future changes,
mainly to make the last commit easier to revert if needed.
[FIX] runbot: adapt tests to previous refactoring