Return to site

Gitlab Pipeline Run Python Script

broken image
Run

This blog post describes how to configure a Continuous Integration (CI) process on GitLab for a python application. This blog post utilizes one of my python applications (bild) to show how to setup the CI process:

In this blog post, I’ll show how I setup a GitLab CI process to run the following jobs on a python application:

  • Test:pylint: image: python:3.6 script: - pip install pylint - quiet - pylint - ignored - classes=socketobject.py. Test:pylint is simply the name of the job. You can choose whatever you want. The rest of the code indicates that gitlab-runner should use the docker image python:3.6, and run the mentioned commands.
  • May 22, 2019 Running Tests with pytest on GitLab CI. When I run my unit and functional tests with pytest in my development environment, I run the following command in my top-level directory: $ pytest. My initial attempt at creating a new job to run pytest in ‘.gitlab-ci.yml’ file was: image: 'python:3.7' beforescript: - python -version - pip install -r requirements.txt stages: - Static Analysis - Test. Pytest: stage: Test script: - pytest.
  • See full list on section.io.
  • Unit and functional testing using pytest
  • Linting using flake8
  • Static analysis using pylint
  • Type checking using mypy

Here is my./ci-cdscripts/getversion.py: import os refName = os.environ.get('CICOMMITREFNAME') piplineID = os.environ.get('CIPIPELINEID') relVersion = refName + '.0.' + piplineID version = relVersion.replace('rel.' , ') print('current version is', version) python output in pipeline log.

What is CI?

To me, Continuous Integration (CI) means frequently testing your application in an integrated state. However, the term ‘testing’ should be interpreted loosely as this can mean:

  • Integration testing
  • Unit testing
  • Functional testing
  • Static analysis
  • Style checking (linting)
  • Dynamic analysis

To facilitate running these tests, it’s best to have these tests run automatically as part of your configuration management (git) process. This is where GitLab CI is awesome!

In my experience, I’ve found it really beneficial to develop a test script locally and then add it to the CI process that gets automatically run on GitLab CI.

Getting Started with GitLab CI

Before jumping into GitLab CI, here are a few definitions:

– pipeline: a set of tests to run against a single git commit.

– runner: GitLab uses runners on different servers to actually execute the tests in a pipeline; GitLab provides runners to use, but you can also spin up your own servers as runners.

– job: a single test being run in a pipeline.

– stage: a group of related tests being run in a pipeline.

Here’s a screenshot from GitLab CI that helps illustrate these terms:

GitLab utilizes the ‘.gitlab-ci.yml’ file to run the CI pipeline for each project. The ‘.gitlab-ci.yml’ file should be found in the top-level directory of your project.

While there are different methods of running a test in GitLab CI, I prefer to utilize a Docker container to run each test. I’ve found the overhead in spinning up a Docker container to be trivial (in terms of execution time) when doing CI testing.

Creating a Single Job in GitLab CI

The first job that I want to add to GitLab CI for my project is to run a linter (flake8). In my local development environment, I would run this command:

This command can be transformed into a job on GitLab CI in the ‘.gitlab-ci.yml’ file:

This YAML file tells GitLab CI what to run on each commit pushed up to the repository. Let’s break down each section


The first line (image: “python: 3.7”) instructs GitLab CI to utilize Docker for performing ALL of the tests for this project, specifically to use the ‘python:3.7‘ image that is found on DockerHub.

The second section (before_script) is the set of commands to run in the Docker container before starting each job. This is really beneficial for getting the Docker container in the correct state by installing all the python packages needed by the application.

The third section (stages) defines the different stages in the pipeline. There is only a single stage (Static Analysis) at this point, but later a second stage (Test) will be added. I like to think of stages as a way to group together related jobs.

The fourth section (flake8) defines the job; it specifies the stage (Static Analysis) that the job should be part of and the commands to run in the Docker container for this job. For this job, the flake8 linter is run against the python files in the application.

At this point, the updates to ‘.gitlab-ci.yml’ file should be commited to git and then pushed up to GitLab:

GitLab Ci will see that there is a CI configuration file (.gitlab-ci.yml) and use this to run the pipeline:

This is the start of a CI process for a python project! GitLab CI will run a linter (flake8) on every commit that is pushed up to GitLab for this project.

Running Tests with pytest on GitLab CI

When I run my unit and functional tests with pytest in my development environment, I run the following command in my top-level directory:

My initial attempt at creating a new job to run pytest in ‘.gitlab-ci.yml’ file was:

However, this did not work as pytest was unable to find the ‘bild’ module (ie. the source code) to test:

The problem encountered here is that the ‘bild’ module is not able to be found by the test_*.py files, as the top-level directory of the project was not being specified in the system path:

The solution that I came up with was to add the top-level directory to the system path within the Docker container for this job:

With the updated system path, this job was able to run successfully:

Final GitLab CI Configuration

Gitlab Pipeline Run Python Script

This blog post describes how to configure a Continuous Integration (CI) process on GitLab for a python application. This blog post utilizes one of my python applications (bild) to show how to setup the CI process:

In this blog post, I’ll show how I setup a GitLab CI process to run the following jobs on a python application:

  • Test:pylint: image: python:3.6 script: - pip install pylint - quiet - pylint - ignored - classes=socketobject.py. Test:pylint is simply the name of the job. You can choose whatever you want. The rest of the code indicates that gitlab-runner should use the docker image python:3.6, and run the mentioned commands.
  • May 22, 2019 Running Tests with pytest on GitLab CI. When I run my unit and functional tests with pytest in my development environment, I run the following command in my top-level directory: $ pytest. My initial attempt at creating a new job to run pytest in ‘.gitlab-ci.yml’ file was: image: 'python:3.7' beforescript: - python -version - pip install -r requirements.txt stages: - Static Analysis - Test. Pytest: stage: Test script: - pytest.
  • See full list on section.io.
  • Unit and functional testing using pytest
  • Linting using flake8
  • Static analysis using pylint
  • Type checking using mypy

Here is my./ci-cdscripts/getversion.py: import os refName = os.environ.get('CICOMMITREFNAME') piplineID = os.environ.get('CIPIPELINEID') relVersion = refName + '.0.' + piplineID version = relVersion.replace('rel.' , ') print('current version is', version) python output in pipeline log.

What is CI?

To me, Continuous Integration (CI) means frequently testing your application in an integrated state. However, the term ‘testing’ should be interpreted loosely as this can mean:

  • Integration testing
  • Unit testing
  • Functional testing
  • Static analysis
  • Style checking (linting)
  • Dynamic analysis

To facilitate running these tests, it’s best to have these tests run automatically as part of your configuration management (git) process. This is where GitLab CI is awesome!

In my experience, I’ve found it really beneficial to develop a test script locally and then add it to the CI process that gets automatically run on GitLab CI.

Getting Started with GitLab CI

Before jumping into GitLab CI, here are a few definitions:

– pipeline: a set of tests to run against a single git commit.

– runner: GitLab uses runners on different servers to actually execute the tests in a pipeline; GitLab provides runners to use, but you can also spin up your own servers as runners.

– job: a single test being run in a pipeline.

– stage: a group of related tests being run in a pipeline.

Here’s a screenshot from GitLab CI that helps illustrate these terms:

GitLab utilizes the ‘.gitlab-ci.yml’ file to run the CI pipeline for each project. The ‘.gitlab-ci.yml’ file should be found in the top-level directory of your project.

While there are different methods of running a test in GitLab CI, I prefer to utilize a Docker container to run each test. I’ve found the overhead in spinning up a Docker container to be trivial (in terms of execution time) when doing CI testing.

Creating a Single Job in GitLab CI

The first job that I want to add to GitLab CI for my project is to run a linter (flake8). In my local development environment, I would run this command:

This command can be transformed into a job on GitLab CI in the ‘.gitlab-ci.yml’ file:

This YAML file tells GitLab CI what to run on each commit pushed up to the repository. Let’s break down each section


The first line (image: “python: 3.7”) instructs GitLab CI to utilize Docker for performing ALL of the tests for this project, specifically to use the ‘python:3.7‘ image that is found on DockerHub.

The second section (before_script) is the set of commands to run in the Docker container before starting each job. This is really beneficial for getting the Docker container in the correct state by installing all the python packages needed by the application.

The third section (stages) defines the different stages in the pipeline. There is only a single stage (Static Analysis) at this point, but later a second stage (Test) will be added. I like to think of stages as a way to group together related jobs.

The fourth section (flake8) defines the job; it specifies the stage (Static Analysis) that the job should be part of and the commands to run in the Docker container for this job. For this job, the flake8 linter is run against the python files in the application.

At this point, the updates to ‘.gitlab-ci.yml’ file should be commited to git and then pushed up to GitLab:

GitLab Ci will see that there is a CI configuration file (.gitlab-ci.yml) and use this to run the pipeline:

This is the start of a CI process for a python project! GitLab CI will run a linter (flake8) on every commit that is pushed up to GitLab for this project.

Running Tests with pytest on GitLab CI

When I run my unit and functional tests with pytest in my development environment, I run the following command in my top-level directory:

My initial attempt at creating a new job to run pytest in ‘.gitlab-ci.yml’ file was:

However, this did not work as pytest was unable to find the ‘bild’ module (ie. the source code) to test:

The problem encountered here is that the ‘bild’ module is not able to be found by the test_*.py files, as the top-level directory of the project was not being specified in the system path:

The solution that I came up with was to add the top-level directory to the system path within the Docker container for this job:

With the updated system path, this job was able to run successfully:

Final GitLab CI Configuration

Here is the final .gitlab-ci.yml file that runs the static analysis jobs (flake8, mypy, pylint) and the tests (pytest):

Here is the resulting output from GitLab CI:

One item that I’d like to point out is that pylint is reporting some warnings, but I find this to be acceptable. However, I still want to have pylint running in my CI process, but I don’t care if it has failures. I’m more concerned with trends over time (are there warnings being created). Therefore, I set the pylint job to be allowed to fail via the ‘allow_failure’ setting:

Python Packaginghas recentlybeendiscusseda lot, but the articles usuallyonly focus on publishing (open source) code to PyPI.

But what do you do when your organization uses Python for in-house developmentand you can’t (or don’t want to) make everything Open Source? Where do youstore and manage your code? How do you distribute your packages?

In this article, I describe how we solve this problem with GitLab, Conda and a few other tools.

You can find all code and examples referenced in this article undergitlab.com/ownconda. These tools and examples are using the own prefixin order to make a clear distinction between our own and third-party code.I will not necessarily update and fix the code, but it is released under theBlue Oak license so you can copy and use it. Any feedback is welcome, nonetheless.

Contents:

  • Software selection
  • How it should work
  • Making it work

Software selection

In this section I’ll briefly explain the reasons why we are using GitLab and Conda.

Code and issue management

Though you could use private repositories from one of the well-known cloudservices, you should probably use a self-hosted service to retain full controlover your code. In some countries it may even be forbidden to use a US cloudservice for your organization’s data.

There are plenty of competitors in this field: GitLab, Gitea, Gogs,Gitbucket or Kallithea—just to name a few.

Our most important requirements are:

  • Repository management
  • Pull/Merge requests
  • Issue management
  • CI/CD pipelines

The only tool that (currently) meets these requirements is GitLab. It hasa lot more features that are very useful for an organization wide use, e.g.,LDAP and Kerberos support, issue labels and boards, Mattermost integration orGit LFS support. And—more importantly—it also has a really nice UX and isone of the few pieces of software that I actually enjoy using.

GitLab has a free core and some paid versions that add more features and support.

Prerequisites

The package manager: Pip or Conda?

Pip is the official package installer for Python. It supports Pythonsource distributions and (binary) Wheel packages. Pip only installs files inthe current environment’s site-packages directory and can optionallycreate entry points in its bin directory. You can use Virtualenv toisolate different projects from another, and Devpi to host your own packageindex. Devpi can both, mirror/cache PyPI and store your own packages. The Python packaging ecosystem is overlooked by the PythonPackaging Authority working group (PyPA).

Conda stems from the scientific community and is being developed by Anaconda.In contrast to Pip, Conda is a full-fledged package manager similar toapt or dnf. Like virtualenv, Conda can create isolatedvirtual environments. Conda is not directly compatible with Python’ssetup.py or pyproject.toml files. Instead, you have to createa Conda recipe for every package and build it with conda-build.This is a bit more involved because you have to convert every package that youfind on PyPI, but it also lets you patch and extend every package. With verylittle effort you can create a self-extracting Python distribution witha selection of custom packages (similar to the Miniconda distribution).

Conda-forge is a (relatively) new project that has a huge library of Condarecipes and packages. However, if you want full control over your own packagesyou may want to host and build everything on your own.

What to use?

  • Both, Conda and pip, allow you to host your own packages as well as 3rd partypackages inside your organization.
  • Both, Conda and pip, provide isolated virtual environments.
  • Conda can package anything (Python, C-libraries, Rust apps, 
) while Pip isexclusively for Python packages.
  • With Conda, you need to package and build everything on your own. Evenpackages from PyPI need to be re-packaged. On the other side, this makes iteasier to patch and extend the package’s source.
  • Newer Conda versions allow you to build everything on your own, even GCCand libc. This is, however, not required and you can rely on some low-levelsystem libraries like the manylinux standard for Wheels does. (You justhave to decide which ones, but more on that later.)
  • Due to its larger scope, Conda is slower and more complex than Pip. In thepast, even patch releases introduced backwards incompatible changes and bugsthat broke our stack. However, the devs are very friendly and usually fixcritical bugs quite fast. And maybe we would have had similar problems, too,if we used a Pip based stack.

Because we need to package more than just Python, we chose to use Conda. Thisdates back to at least to Conda v2.1 which was released in 2013. At that time,projects like conda-forge weren’t even in sight.

Supplementary tools

To aid our work with GitLab and Conda, we developed some supplementary tools.I have released a slightly modified version of them, called ownconda tools,alongside with this article.

The ownconda tools are a click based collection of commands that reside underthe entry point ownconda.

Initially, they were only meant to help with the management of recipes forexternal packages, and with running the build/test/upload steps in our GitLabpipeline. But they have become a lot more powerful by now and even includea GitLab Runner that lets you run your projects’ pipelines locally (includingartifacts handling, which the official gitlab-runner cannot do locally).

I will talk about the various subcommands in more detail in later sections.

How it should work

The subject of packaging consists of several components: The platforms on whichyour code needs to build and run, the package manager and repository,management of external and internal packages, a custom Python distribution, andmeans to keep an overview over all packages and their dependencies. I will gointo detail about each aspect in the following sections.

Runtime and build environment

Our packages need to run on Fedora desktop systems and on Centos 7. Packagesbuilt on Centos also run on Fedora, so we only have a single build environment:Centos 7.

We use different Docker images for our build pipeline and some deployments.The most important ones are centos7-ownconda-runtime andcentos7-ownconda-develop. The former only contains a minimal setup toinstall and run Conda packages while the latter includes all builddependencies, conda-build and the ownconda tools.

If your OS landscape is more heterogeneous, you may need to add more buildenvironments which makes things a bit more complicated—especially if you needto support macOS or even Windows.

To build Docker images in our GitLab pipelines, we use docker-in-docker.That means that the GitLab runners start docker containers that can access/var/run/dockers.sock to run docker build.

GitLab provides a Docker registry that allows any project to host its ownimages. However, if a project is private, other project’s pipelines can notaccess these images. For this reason, we have decided to serve Docker imagesfrom a separate host.

3rd party packages

We re-package all external dependencies as Conda packages and host them in ourown Conda repository.

This has several benefits:

  • We can prohibit installing Software from other sources than our internalConda repository.
  • If users want to depend on new libraries, we can propose alternatives that wemight already have on our index. This keeps our tree of dependencies a bit smaller.
  • We cannot accidentally depend on packages with “bad” licenses.
  • We can add patches to fix bugs or extend the functionality of a package(e.g., we added our internal root certificate to Certifi).
  • We can reduce network traffic to external servers and are less dependent ontheir availability.

Recipe organization

We can either put the recipe for every package into its own repository (whichis what conda-forge does) or use a single repository for all recipes (which iswhat we are doing).

Gitlab Run Pipeline Button

The multi-repository approach makes it easier to only build packages that havechanged. It also makes it easier to manage access levels if you have a lot ofcontributors that each only manage a few packages.

The single-repository approach has less overhead if you only have a fewmaintainers that take care of all the recipes. To identify updated packagesthat need re-building, we can use ownconda’s show-updated-recipes command.

Linking against system packages

With Conda, we can (and must) decide whether we want to link against systempackages (e.g., installed with yum or use other Conda packages tosatisfy a package’s dependencies.

One extreme would be to only build Python packages on our own and completelydepend on system packages for all C libraries. The other extreme would be tobuild everything on our own, even glibc and gcc.

The former has a lot less overhead but becomes the more fragile the moreheterogeneous your runtime environments become. The latter is a lot morecomplicated and involved but gives you more control and reliability.

We decided to take the middle ground between these two extremes: We build manylibraries on our own but rely on the system’s gcc, glibc,and X11 libraries. This is quite similar to what the manylinuxstandard for Python Wheels does.

Recipes must list the system libraries that they link against. The rules forvalid system libraries are encoded in ownconda validate-recipes and enforcedby conda-build’s –error-overlinking option.

Recipe management

Recipes for Python packages can easily be created with ownconda pypi-recipe.This is similar to conda skeleton pypi but tailored to our needs. Recipesfor other packages have to be created manually.

We also implemented an update check for our recipes. Every recipe containsa script called update_check.py which uses one of the update checkersprovided by the ownconda tools.

These checkers can query PyPI, GitHub release lists and (FTP) directorylistings, or crawl an entire website. The command owncondacheck-for-updates runs the update scripts and compares the version numbersthey find against the recipes’ current versions. It can also print URLs tothe packages’ changelogs:

We can then update all recipes with ownconda update-recipes:

The update process

Our Conda repository has various channels for packages of different maturity,e.g. experimental, testing, staging, andstable.

Updates are first built locally and uploaded to the testing channelfor some manual testing.

If everything goes well, the updates are committed into the developbranch, pushed to GitLab and uploaded to the staging channel. Wealso send a changelog around to notify everyone about importantupdates and when they will be uploaded into the stable channel.

After a few days in testing, the updates are merged into the masterbranch and upload to the stable channel for production use.

This is a relatively save procedure which (usually) catches any problems beforethey go into production.

Example recipes

You can find the recipes for all packages required to run the ownconda toolshere. As a bonus, I also added the recipes for NumPyand PyQt5.

Internal projects

Internal packages are structured in a similar way to most projects that you seeon PyPI. We put the source code into src, the pytest tests intotests and the Sphinx docs into docs. We do not use namespacepackages. They can lead to various nasty bugs. Instead, we just prefix allpackages with own_ to avoid name clashes with other packages and to easilytell internal and external packages apart.

The biggest difference to “normal” Python projects is the additional Condarecipe in each project. It contains all meta data and the requirements. Thesetup.py contains only the minimum amount of information to get thepackage installed via pip:

  • Conda-build runs it to build the Conda package.
  • ownconda develop runs it to install the package in editable mode.

ownconda develop also creates/updates a Condaenvironment for the current project and installs all requirements that itcollects from the project’s recipe.

Projects also contain a .gitlab-ci.yml which defines the GitLab CI/CDpipeline. Most projects have at least a build, a test and an uploadstage. The test stage is split into parallel steps for various test tools(e.g., pytest, pylint and bandit). Projects can optionally builddocumentation and upload it to our docs server. The ownconda tools providehelpers for all of these steps:

  • ownconda build builds the package.
  • ownconda test runs pytest.
  • ownconda lint runs pylint.
  • ownconda sec-check runs bandit.
  • ownconda upload uploads the package to the package index.
  • ownconda make-docs builds and uploads the documentation.

We also use our own Git flow:

  • Development happens in a develop branch. Builds from this branchare uploaded into a staging Conda channel.

  • Larger features can optionally branch of a feature branch. Their builds arenot uploaded into a public Conda channel.

  • Stable develop states get merged into the master branch.Builds are uploaded into our stable Conda channel.

  • Since we continuously deploy packages, we don’t put a lot of effort intoversioning. The package version consists of a major release which rarelychanges and the number of commits since the last tagged major release. TheGitLab pipeline ID is used as a build number:

    • Version: $GIT_DESCRIBE_TAG.$GIT_DESCRIBE_NUMBER
    • Build: py37_$CI_PIPELINE_ID

    The required values are automatically exported by Conda and GitLab asenvironment variables.

Package and documentation hosting

Hosting a Conda repository is very easy. In fact, you can just run python -mhttp.server in your local Conda base directory if you previously built anypackages. You can then use it like this: conda search--override-channels --channel=http://localhost:8000/conda-bld PKG.

A Conda repository consists of one or more channels. Each channel isa directory that contains a noarch directory and additional platformdirectories (like linux-64). You put your packages into thesedirectories and run conda index channel/platform to create an indexfor each platform (you can omit the platform with newer versions ofconda-build). The noarch directory must always exist, evenif you put all your packages into the linux-64 directory.

The base URL for our Conda channels ishttps://forge.services.own/conda/channel. You can put a staticindex.html into each channel’s directory that parses the repo data anddisplays it nicely:

The upload service (for packages created in GitLab pipelines) resides underhttps://forge.services.own/upload/<channel>. It is a simple webapplication that stores the uploaded file in channel/linux-64 and runsconda index. For packages uploaded to the stable channel, italso creates a hard link in a special archive channel.

Every week, we prune our channels with ownconda prune-index. In case thatwe accidentally prune too aggressively, we have the option to restore packagesfrom the archive.

We also host our own Read the Docs like service. GitLab pipelines canupload Sphinx documentation to https://forge.services.own/docs viaownconda make-docs.

Note

The server name forge does not refer to conda-forge but toSourceForge.net, which was quite popular back in the days.

Python distribution

With Constructor, you can easily create your own self-extractable Pythondistribution. These distributions are similar to miniconda, but you cancustomize them to your needs.

A constructor file is a simple YAML file with some meta data (e.g., thedistribution name and version) and the list of packages that should beincluded. You can also specify a post-install script.

The command constructor <distdir>/construct.yaml will then download allpackages and put them into a self extracting Bash script. We upload theinstaller scripts onto our Conda index, too.

Instead of managing multiple construct.yaml files manually, we createthem dynamically in a GitLab pipeline which makesbuilding multiple similar distributions (e.g., for different Python versions)a bit easier.

Deployment

We are currently on the road from copy-stuff-with-fabric-to-vms todocker-kubernetes-yay-land. I am not going to go too much into detailhere—this topic is not directly related to packaging and worth its own article.

Most of our deployments are now Ansible based. Projects contain anansible directory with the required playbooks and other files. Sharedroles are managed in a separate ownsible project. The ansible deploymentsare usually part of the GitLab CI/CD pipeline. Some are run automatically,some need to be triggered manually.

Some newer projects are already using Docker based deployments. Docker imagesare built as part of the pipeline and uploaded into our Docker registry fromwhich they are then pulled for deployments.

Dependency management

It is very helpful if you can build a dependency graph of all your packages.

Not only can it be used to build all packages in the correct order (as we willshortly see), but visualizing your dependencies may also help you to improveyour architecture, detect circular dependencies or unused packages.

The command ownconda dep-graph builds such a dependency graph from thepackages that you pass to it. It can either output a sorted list of packagesor a DOT graph. Since the resulting graph can become quite large, there areseveral ways to filter packages. For example, you can only show a package’sdependencies or why the package is needed.

The following figure shows the dependency graph for our python recipe. Itwas created with the command ownconda dep-graph external-recipes/--implicit --requirements python --out=dot > deps_python.dot:

These graphs can become quite unclear relatively fast, though. This is thefull dependency graph for the ownconda tools:

I do not want to know how this would have looked if these were all JavaScriptpackages 


Making it work

Now that you know the theory of how everything should work, we can start tobootstrap our packaging infrastructure.

Some of the required steps are a bit laborious and you may need the assistanceof your IT department in order to set up the domains and GitLab. Other stepscan be automated and should be relatively painless, though:

Set up GitLab and a Conda repo server

  1. Install GitLab. I’ll assume that it will be available underhttps://git.services.own.
  2. Setup the forge server. I’ll assume that it will be available underhttps://forge.services.own:

    • In your www root, create a conda folder which will contain thechannels and their packages.
    • Create the upload service that copies files sent to/upload/channel into www-root/conda/channel/linux-64and calls conda index.
    • Setup a Docker registry on the server.

Bootstrap Python, Pip and Conda

  1. Clone all repositories that you need for the bootstrapping process:

  2. Build all packages needed to create your Conda distribution. The owncondatools provide a script that uses a Docker container to build all packagesand upload them into the stable channel:

    Note

    The script might fail to build some packages. The most probable causesare HTTP timeouts or unavailable servers. Just re-run the script andhope for the best. If the issue persists, you might need to fix thecorresponding Conda recipe, though (Sometimes, people re-upload a sourcearchive and thereby change its SHA256 value).

  3. Create the initial Conda distributions and upload them:

    You can now download the installers fromhttps://forge.services.own/conda/stable/ownconda[-dev][-3.7].sh

  4. Setup your local ownconda environment. You can use the installer that youjust built (or (re)download it from the forge if you want to test it):

Build the docker images

  1. Create a GitLab pipeline for the centos7-ownconda-runtime project. Thiswill generate your runtime Docker image.
  2. When the runtime image is available, create a GitLab pipeline for thecentos7-ownconda-develop project. This will generate your developmentDocker image used in your projects’ pipelines.

Build all packages

  1. Create a GitLab pipeline for the external-recipes project to build andupload the remaining 3rd party packages.
  2. You can now build the packages for your internal projects. You must createthe pipelines in dependency order so that the requirements for each projectare built first. The ownconda tools help you with that:

    If a pipeline fails and the script aborts, just remove the successfulprojects from the projects.txt and re-run the for loop.

Congratulations, you are done! You have built all internal and externalpackages, you have created your own Conda distribution and you have all Dockerimages that you need for running and building your packages.

Outlook / Future work and unsolved problems

Managing your organization’s packaging infrastructure like this is a whole lotof work but it rewards you with a lot of independence, control and flexibility.

We have been continuously improving our process during the last years and stillhave a lot of ideas on our roadmap.

While, for example, GitLab has a very good authentication and authorizationsystem, our Conda repository lacks all of this (apart from IP restrictions foruploading and downloading packages). We do not want users (or automatedscripts) to enter credentials when they install or update packages, but we arenot aware of a (working) password-less alternative. Combining Conda withKerberos might work in theory, but in practice this is not yet possible.Currently, we are experimenting with HTTPS client certificates. This mightwork well enough but it also doesn’t seem to be the Holy Grail of Conda Authorization.

Another big issue is creating more reproducible builds and easier rollbackmechanisms in case an update ships broken code. Currently, we are pinning therequirements’ versions during a pipelines test stage. We are also workingtowards dockerized Blue Green Deployments and are exploring tools for containerorchestration (like Kubernetes). On the other hand, we are still deliveringGUI applications to client workstations via Bash scripts 
 (this works quitewell, though, and provides us with a good amount of control and flexibility).

We are also still having an eye on Pip. Conda has the biggest benefits whendeploying packages to VMs and client workstations. The more we use docker, thesmaller the benefit might become, and we might eventually switch back to Pip.

But for now, Conda serves us very well.

Comments

You can leave comments and suggestions at Hacker News and Reddit or reachme via Twitter and Mastodon.





broken image