Skip to content

ATOM monitoring agent for near-real time collection of infrastructure and application metrics

License

Notifications You must be signed in to change notification settings

excess-project/monitoring-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EXCESS ATOM Monitoring Agent

ATOM enables users to monitor applications at run-time with ease. In contrast to existing frameworks, our solution profiles applications with high resolution, focuses on energy measurements, and supports a heterogeneous infrastructure.

Motivation

Reducing the energy consumption is a leading design constraint of current and future HPC systems. Aside from investing into energy-efficient hardware, optimizing applications is key to substantially reduce the energy consumption of HPC cluster. Software developers, however, are usually in the dark when it gets to energy consumption of their applications; HPC clusters rarely provide capabilities to monitor energy consumption on a fine granular level. Predicting the energy consumption of specific applications is even more difficult when the allocated hardware resources vary at each execution. In order to lower the hurdle of energy-aware development, we present ATOM---a light-weight neAr-real Time mOnitoring fraMework.

Prerequisites

The monitoring agent requires first a running server and database. In order to install these requirements, please checkout the associated monitoring server and monitoring frontend, first. Please note that the installation and setup steps mentioned below assume that you are running a current Linux as operating system. We have tested the monitoring agent with Ubuntu 14.04 LTS as well as with Scientific Linux 6 (Carbon).

Before you can proceed, please clone the repository:

git clone git://github.com/excess-project/monitoring-agent.git

Dependencies

This project requires the following dependencies to be installed:

Component Homepage Version
PAPI-C http://icl.cs.utk.edu/papi/ 5.4.0
CURL http://curl.haxx.se/download/ 7.37.0
Apache APR https://apr.apache.org/ 1.5.1
Apache APR Utils https://apr.apache.org/ 1.5.3
Nvidia GDK https://developer.nvidia.com/gpu-deployment-kit/ 352.55
bison http://ftp.gnu.org/gnu/bison/ 2.3
flex http://prdownloads.sourceforge.net/flex/ 2.5.33
sensors https://fossies.org/linux/misc/ 3.4.0
EXCESS queue https://github.com/excess-project/data-structures-library.git release/0.1.0

To ease the process of setting up a development environment, we provide a basic script that downloads all dependencies, installs them locally in the project directory, and then performs some clean-up operations. Thus, compiling the monitoring agent can be performed in a sandbox without affecting your current operating system.

Executing the following script

./setup.sh

results in a new directory named bin, which holds the required dependencies for compiling the project.

Installation

This section assumes that you've successfully installed all required dependencies as described in the previous paragraphs.

make
make install

The above commands compile and install the monitoring agent into the directory dist within the project's repository. The dist folder includes all required binaries, shared libraries, scripts, and configuration files to get you started.

Start monitoring

If you haven't yet followed our guide to set up the associated monitoring server and database, please do so now before continuing. Please check once more, that a Elasticsearch database is running

curl localhost:9200

and that the monitoring server is running at

http://localhost:3030

and that the monitoring frontend is running at

http://localhost:3000

Next, start the monitoring agent with a default set of plugins enabled to monitor, for instance, the memory consumption of your current system as follows:

cd dist
./start.sh

You can learn more about various options passed to the monitoring agent by calling

./start.sh -h

While the monitoring agent collects metric data, you can already open the Web front-end located at

http://localhost:3000

You should see your first experiment being registered. The front-end allows to

  • visualize sampled data
  • download collected data as JSON
  • download collected data as CSV

Configuring plug-ins and update intervals

The monitoring agent as well as plug-ins are configurable at run-time by a global configuration file named mf_config.ini. The configuration is implemented by using an INI file; each section name such as timings or plugins is enclosed by square brackets. For each section, various parameters can be set. These parameters are custom-defined for each plug-in. For instance, the PAPI plug-in is activated by setting mf\_plugin\_papi = on, and it supports to profile, in this small example, four different counters, for which two are profiled: PAPI\_FP\_INS and PAPI\_L1\_DCM. The configuration file can be altered at run-time; new values are applied by default every 3 minutes (update_configuration).

;EXCESS ATOM Monitoring Framework Configuration

[generic]
server = http://localhost:3030

[plugins]
mf_plugin_papi    = on
mf_plugin_meminfo = off

[timings]
default               = 100000000ns
publish_data_interval = 0s
update_configuration  = 360s
mf_plugin_papi        = 1000000000ns
mf_plugin_meminfo     = 1000000000ns

[mf_plugin_papi]
MAX_CPU_CORES = 8
PAPI_FP_INS  = on
PAPI_LST_INS = off
PAPI_L1_DCM  = on
PAPI_FLOPS   = off

Several parameters such as the timing of the plug-ins or the hostname where the server is running can be configured through this configuration file. The file is called mf\_config.ini and is located at dist/mf\_config.ini.

Implementing new plugins

We provide more details on how to implement additional plugins here.

Acknowledgment

This project is realized through EXCESS. EXCESS is funded by the EU 7th Framework Programme (FP7/2013-2016) under grant agreement number 611183. We are also collaborating with the European project DreamCloud.

Contributing

Find a bug? Have a feature request? Please create an issue.

Main Contributors

Dennis Hoppe, HLRS

Fangli Pi, HLRS

Dmitry Khabi, HLRS

Yosandra Sandoval, HLRS

Anthony Sulisto, HLRS

Release History

Date Version Comment
2016-10-12 16.8.1 5th release (stable and acme plugin)
2016-08-15 16.8 4th release (stable and libcurl multi-perform)
2016-06-13 16.6 3rd release (stable and new features)
2016-02-26 16.2 2nd release (removed backend interface)
2015-12-18 1.0 Public release.

License

Copyright (C) 2014,2015 University of Stuttgart

Apache License v2.

About

ATOM monitoring agent for near-real time collection of infrastructure and application metrics

Resources

License

Stars

Watchers

Forks

Packages

No packages published