You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/source/AdministratorGuide/Systems/WorkloadManagement/Pilots/index.rst
+71-2
Original file line number
Diff line number
Diff line change
@@ -181,7 +181,6 @@ Pilots started when not controlled by the SiteDirector
181
181
182
182
You should keep reading if your resources include IAAS and IAAC type of resources, like Virtual Machines.
183
183
If this is the case, then you need to:
184
-
185
184
- provide a certificate, or a proxy, to start the pilot;
186
185
- such certificate/proxy should have the `GenericPilot` property;
187
186
- in case of multi-VO environment, the Pilot should set the `/Resources/Computing/CEDefaults/VirtualOrganization` (as done e.g. by `vm-pilot <https://github.com/DIRACGrid/DIRAC/blob/integration/src/DIRAC/WorkloadManagementSystem/Utilities/CloudBootstrap/vm-pilot#L122>`_);
@@ -190,7 +189,7 @@ If this is the case, then you need to:
190
189
We have introduced a special command named "GetPilotVersion" that you should use,
191
190
and possibly extend, in case you want to send/start pilots that don't know beforehand the (VO)DIRAC version they are going to install.
192
191
In this case, you have to provide a json file freely accessible that contains the pilot version.
193
-
This is tipically the case for VMs in IAAS and IAAC.
192
+
This is typically the case for VMs in IAAS and IAAC.
194
193
195
194
The files to consider are in https://github.com/DIRACGrid/Pilot
196
195
@@ -269,3 +268,73 @@ A simple example using the LHCbPilot extension follows::
269
268
--name "$1" \
270
269
--cert \
271
270
--certLocation=/scratch/dirac/etc/grid-security \
271
+
272
+
Centralised Pilot Logging
273
+
===========================
274
+
The pilot jobs generate log files which are primarily accessed for debugging if
275
+
there are issues with a particular resource; these (*classic*) log files are stored in a
276
+
resource dependent manner. On a grid CE, the pilot writes logs to stdout/stderr
277
+
which are captured by the batch system and can later be retrieved using a CE
278
+
specific tool. For a cloud resource the logs are typically written to a file on
279
+
a given virtual machine instance where there is no standard or simple way for
280
+
them to be retrieved.
281
+
282
+
The centralised (*remote*) pilot logging system offers a new resource agnostic logging
283
+
to ensure that the pilot logs are captured and made readily accessible for all
284
+
resources as an extra debugging facility in parallel with the existing CE-based
285
+
logging system. It also offers the ability to preview logs while the pilot
286
+
is running.
287
+
288
+
The design of the new pilot logging system for DIRAC is based around having the
289
+
pilot jobs periodically send their logs back to a central storage service based on the
290
+
Tornado web server. For this to work *TornadoPilotLoggingHandler* has to be installed on Tornado.
291
+
Further processing of the log entries is done by a back-end plugin;
292
+
the plugin to use is selected by the collector service configuration. Currently only
293
+
a plugin which stores logs in a file on Tornado is implemented (*FileCacheLoggingPlugin*).
294
+
When a pilot job marks a log file finalised, it can be copied by the *PilotLoggingAgent*
295
+
to a selected SE.
296
+
297
+
The centralised logger can be enabled on a VO-by-VO basis. In addition a CE whitelist can
298
+
also be provided to restrict pilot logging to those CEs.
299
+
300
+
Remote logger *FileCacheLoggingPlugin* requires following obligatory configuration parameters set in *Operations/<vo_name>/Pilot* or *Operations/Defaults/Pilot*:
0 commit comments