atlasgce-modules

Description

The atlasgce-modules are Puppet modules for contextualizing analysis clusters for the ATLAS Experiment. They are developed primarily for Google Compute Engine (GCE).

Operating system support

The modules have been tested on the CentOS 6 operating system. It should work out of the box for most RedHat based systems of the same generation, such as SL6 and SLC6.

Work is also in progress to partially support CernVM. The SLC5 based CernVM 2.6 and 2.7 will have support for all modules except packagerepos (due to lack of Conary support in Puppet) and cvmfs (which is already configured during the CernVM contextualization). μCernVM will be SLC6 based and use RPM for package management and should be fully supported.

Debian based systems are not supported but support can be added.

Cloud support

These modules have been developed for GCE but support for other clouds can be impemented.

Functionality

Overview

The atlasgce-modules provide from-scratch contextualization on bare machines (virtual or physical) for ATLAS analysis.

Three different roles are available: The manager role (head), the worker role (node), and the worker role for a Cloud Scheduler environment (csnode).

The manager role (`head`)

The manager role consists of the following elements:

The AutoPyFactory service fetches jobs from a PanDA queue and submits them locally to Condor.
The Condor collector, negotiator, and schedd services manage job submission and the distribution of subjobs over the worker nodes.
The XRootD and Cluster Management services act as a local XRootD redirector and are responsible for accessing and caching input data files through the Federated ATLAS XRootD system. (Optional)
The CernVM-FS service provides consistent access to ATLAS software. CernVM-FS is not strictly required for the manager role, but can be helpful when debugging the Condor services. (Optional)
Compatibility packages for running SLC5 binaries on SLC6.

The worker role (`node`)

The worker role consists of the following elements:

The Condor startd service runs the individual subjobs as dictated by the manager.
The XRootD, Cluster Management, and File Residency Management services are responsible for accessing input data files through the manager and download those that are not yet available in the cache.
The CernVM-FS service provides consistent access to ATLAS software.
Compatibility packages for running SLC5 binaries on SLC6.

The Cloud Scheduler worker role (`csnode`)

The Cloud Scheduler worker role consists of the following elements:

The Condor startd service runs the individual subjobs as dictated by the Cloud Scheduler.
The CernVM-FS service provides consistent access to ATLAS software.
Compatibility packages for running SLC5 binaries on SLC6.

Contextualization

The contextualization of the machine is governed by Puppet through the means of a public collection of modules (the atlasgce-modules, this repository) and one or more scripts for bootstrapping.

Bootstrapping

The standard way of bootstrapping GCE images is to use a startup script, and this method is used for the manager and worker nodes.

For Cloud Scheduler worker nodes startup scripts are not supported, and bootstrapping is instead achieved using a machine image that is prepared with software that downloads and runs a bootstrap script supplied through the userdata metadata attribute. (Cloud Scheduler might support startup scripts on GCE in the future.)

See atlasgce-scripts for more information on the bootstrapping procedure.

What is contextualized?

The atlasgce-modules in combination with the bootstrapping procedure handles contextualization of everything from preparing and mounting additional storage, to add package repositories and download required software, to create and setup user accounts for services, and to configure and start the supported services.

This contextualization is done on a bare machine, meaning that no software other than the package manager is required. Even Puppet is installed during the bootstrapping procedure.

This means that the extra work of preparing machine images with the required software and configurations, with the rather high turn-around time that comes with it, is effectively abolished. The extra cost of redoing the contextualization at instantiation has been found to be very small on GCE.

Puppet

See What is Puppet?

Usage

This section describes suggested usage together with the atlasgce-scripts. It describes how to configure GCE options, how to configure the node template and other parts of the bootstrapping procedure, and how to start, update, and stop a cluster.

Note: Configuration of the GCE project including adding SSH keys and configuring the firewall for incoming traffic (if necessary) is not covered here. Refer to the official documentation.

Note: Detailed information about configurable parts of the atlasgce-scripts can be found in its documentation.

Getting started

Download atlasgce-scripts

git clone https://github.com/spiiph/atlasgce-scripts.git

Download atlasgce-modules (Optional)

git clone https://github.com/spiiph/atlasgce-modules.git

Enter the atlasgce-scripts directory and edit defaults.sh to change the GCE configuration to reflect your project and cluster
Edit the gce_node_head.pp and gce_node_worker.pp to configure important options such as role, manager node address, XRootD redirector, PanDA settings, etc.
Edit mount-head.sh and mount-worker.sh to match your disk setup. (Remember to change the mounts in gce_node_head.pp and gce_node_worker.pp accordingly.)
Edit modules.sh if you want to download the module repository in a non-standard way. Note: if the repository format is changed from git to something else, the update-cluster.sh script also has to be updated.

Managing the cluster

Once the node template and bootstrapping procedure has been configured three commands are used to control the cluster. These commands read information they require (such as GCE project, number of worker nodes in the cluster, etc.) from defaults.sh.

start-cluster.sh — starts a manager node and worker nodes
stop-cluster.sh — deletes the manager node and worker nodes
update-cluster.sh — fetches updates to the module repository to each node and and applies them

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
apf		apf
autofs		autofs
cernvm		cernvm
condor		condor
cvmfs		cvmfs
gce_node		gce_node
packagerepos		packagerepos
xrootd		xrootd
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

atlasgce-modules

Description

Operating system support

Cloud support

Functionality

Overview

The manager role (`head`)

The worker role (`node`)

The Cloud Scheduler worker role (`csnode`)

Contextualization

Bootstrapping

What is contextualized?

Puppet

Usage

Getting started

Managing the cluster

Contents

Detailed module documentation

packagerepos

autofs and cvmfs

xrootd

condor

apf

gce_node

About

Releases

Packages

Contributors 2

Languages

spiiph/atlasgce-modules

Folders and files

Latest commit

History

Repository files navigation

atlasgce-modules

Description

Operating system support

Cloud support

Functionality

Overview

The manager role (head)

The worker role (node)

The Cloud Scheduler worker role (csnode)

Contextualization

Bootstrapping

What is contextualized?

Puppet

Usage

Getting started

Managing the cluster

Contents

Detailed module documentation

packagerepos

autofs and cvmfs

xrootd

condor

apf

gce_node

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

The manager role (`head`)

The worker role (`node`)

The Cloud Scheduler worker role (`csnode`)

Packages