Archiver Reference¶
Introduction¶
pScheduler’s architecture features the concept of archivers, which send completed measurement results elsewhere for storage or processing. Archivers are plugins, which means the set of destinations available for use with pScheduler can be easily expanded.
Archiving in pScheduler is reliable. After each attempt to dispose of the result, the archiver plugin will tell pScheduler whether it succeeded and, if not, whether or not to try again and how long to wait before the next attempt.
Basic JSON Syntax¶
Archiving is accomplished by providing an archive specification in the form of a JSON object containing these values:
archiver
- The name of the archiver to use. See Archivers for a list available in the base pScheduler distribution.
data
- A JSON object containing archiver-specific data to be used in deciding how to dispose of the result.
ttl
- The absolute amount of time after which the result should be discarded if not successfully archived, specified as an ISO8601 duration. This value is optional and will be treated as infinite if not provided.
For example (commentary is not part of the specification):
{
"archiver": "bitbucket", Send to the archiver that goes nowhere.
"data": { }, The "bitbucket" archiver takes no specific data.
"ttl": "PT12H" Give up after 12 hours if not successfully archived.
}
Archiving from the Command Line¶
pScheduler can be directed to send measurement results to an archiver by using the --archive
switch followed by an archive specification.
Specifying Archivers¶
Directly, as a String Literal¶
The archive specification may be added directly to the command line as a string literal containing its JSON:
% pscheduler task --archive '{ "archiver": "bitbucket", "data": {} }' trace --dest www.perfsonar.net
Indirectly, From a File¶
If the argument given to the --archive
switch begins with @
, the remainder of the argument will be treated as the path to a JSON file containing an archive specification which will be opened, read and treated as if it had been typed in by hand. If the first character of the path is a tilde (~
), it will be expanded to the user’s home directory. For example:
% cat /home/fred/archive-to-bitbucket.json
{
"archiver": "bitbucket",
"data": {}
}
% pscheduler task --archive @/home/fred/archive-to-bitbucket.json trace --dest www.perfsonar.net
Multiple Archivers¶
The results of a task can be sent to multiple archivers by using the --archive
switch multiple times:
% pscheduler task \
--archive @/home/fred/archive-to-esmond.json \
--archive '{ "archiver": "bitbucket", "data": {} }' \
trace --dest www.perfsonar.net
Other than system-imposed limits on the length of the command line, there is no limit on the number of archivers that may be specified as part of a task.
Archiving as Part of a JSON Task Specification¶
Archive specifications can be added to a JSON task specification as an array of JSON objects as part of the archives
property:
% cat mytask.json
{
"schema": 1,
"test": {
"type": "trace",
"spec": {
"schema": 1,
"dest": "www.perfsonar.net"
}
},
"schedule": {
"slip": "PT5M"
},
"archives": [
{
"archiver": "bitbucket",
"data": { }
},
{
"archiver": "syslog",
"data": { "ident": "just-testing" }
}
]
}
% pscheduler task --import mytask.json .
Note
The .
in the command above is a placeholder for the test type, which is imported from mytask.json
.)
Archiving in pSConfig Templates¶
pSConfig allows for the use of archive objects in the archives
section of pSConfig templates. They take the exact same format as described in this document. For more information on pSConfig templates see Introduction to pSConfig Templates
Archiving Globally¶
pScheduler can be configured to apply an archive specification to every run it performs on a host by placing each one in a file in /etc/pscheduler/default-archives
. Files must be readable by the pscheduler
user.
For example, this file will use the HTTP archiver to post the results of all throughput tests to https://host.example.com/place/to/post
:
{
"archiver": "http",
"data": {
"_url": "https://host.example.com/place/to/post",
"op": "post",
},
"transform": {
"script": ""if (.test.type == \"throughput\") then . else null end""
}
"ttl": "PT5M"
}
Archivers¶
The archivers listed below are supplied as part of the standard distribution of pScheduler.
Note
All items listed in each Archiver Data subsection are required unless otherwise noted.
bitbucket
¶
The bitbucket
archiver sends measurement results to the bit bucket (i.e., it does nothing with them). This archiver was developed for testing pScheduler and serves no useful function in a production setting.
Archiver Data¶
This archiver uses no archiver-specific data.
Example¶
{
"archiver": "bitbucket",
"data": { }
}
esmond
¶
The esmond
archiver submits measurement results to the esmond time series database using specialized translations of results for throughput
, latency
, trace
and rtt
tests into a format used by earlier versions of perfSONAR. If it does not recognize a test it will store the raw JSON of the pscheduler result in the pscheduler-raw
event type.
Archiver Data¶
url
- The URL for the esmond server which will collect the result.
_auth-token
- Optional. The authorization token to be used when submitting the result. Note that the _
prefix indicates that this value is considered a secret and will not be supplied if the task specification is retrieved from pScheduler via its REST API. If not specified, IP authentication is assumed.
measurement-agent
- Optional. The name of the pScheduler host that produced the result. If not specified, defaults to the endpoint pscheduler deemed the lead.
retry-policy
- Optional. Describes how to retry failed attempts to submit the measurement to esmond before giving up. The default behavior is to try once and then give up.
data-formatting-policy
- Optional. Indicates how the record should be stored. Valid values are:prefer-mapped
- This is the default. It means that if test is typethroughput
,latency
,trace
andrtt
than store using the traditional metadata and event types. If it does not recognize the result it will store as apscheduler-raw
record.mapped-and-raw
- Store both a mapped type and a raw record. Will not store either if not a recognized type that can be mapped.mapped-only
- Only store a mapped type and do not store anything if it is not a known typeraw-only
- Only store apscheduler-raw
record regardless of test type.
summaries
- Optional. A list of objects containing an event-type
, summary-type
and summary-window
. If not specified, defaults to a standard set of summaries used by perfSONAR. See the esmond documentation for more details on summaries.
verify-ssl
- Optional. Defaults to false
. If enabled, check SSL certificate of esmond server against list of known certificate authorities (CAs). See the requests documentation for more details on environment variables and other options for specifying path to CA store.
Example¶
{
"archiver": "esmond",
"data": {
"measurement-agent": "ps.example.net",
"url": "http://ma.example.net/esmond/perfsonar/archive/",
"_auth-token": "35dfc21ebf95a6deadbeef83f1e052fbadcafe57",
"retry-policy": [
{ "attempts": 1, "wait": "PT60S" },
{ "attempts": 1, "wait": "PT300S" },
{ "attempts": 11, "wait": "PT3600S" }
]
}
}
failer
¶
The failer
archiver provides the same archiving function as bitbucket
but introduces failure and retries a random fraction of the time. This archiver was developed for testing pScheduler and serves no useful function in a production setting.
Archiver Data¶
fail
- The fraction of the time that archive attempts will fail, in the range [0.0,1.0]
.
retry
- The fraction of the time that archive attempts will be retried after a failure, in the range [0.0,1.0]
.
Example¶
{
"archiver": "failer",
"data": {
"fail": 0.5,
"retry": 0.75
}
}
rabbitmq
¶
The rabbitmq
archiver sends raw JSON results to RabbitMQ.
Archiver Data¶
_url
- An amqp
URL for the RabbitMQ instance which will receive the result.
routing-key
- Optional. The routing key to be used when queueing the message.
retry-policy
- Optional. Describes how to retry failed attempts to submit the measurement to esmond before giving up. The default behavior is to try once and then give up.
Example¶
{
"archiver": "rabbitmq",
"data": {
"_url": "amqp://rabbithole.example.org/",
"routing-key": "bugs",
"retry-policy": [
{ "attempts": 5, "wait": "PT1S" },
{ "attempts": 5, "wait": "PT3S" }
]
}
}
syslog
¶
The syslog
archiver sends the raw JSON result to the system log.
Note that because most syslog implementations cannot handle arbitrarily-long log messages, this archiver should not be relied upon for anything other than debugging.
Archiver Data¶
ident
- Optional. The identification string to be used when submitting the log message.
facility
- Optional. The syslog facility to be used when the log entry is submitted. Valid valies are kern
, user
, mail
, daemon
, auth
, lpr
, news
, uucp
, cron
, syslog
, local0
, local1
, local2
, local3
, local4
, local5
, local6
and local7
.
priority
- Optional. The syslog priority to be used when the log entry is submitted. Valid values are emerg
, alert
, crit
, err
, warning
, notice
, info
and debug
.
Example¶
{
"archiver": "syslog",
"data": {
"ident", "mytests",
"facility", "local3",
"priority", "warning",
}
}
Transforms¶
As part of an archive specification, pScheduler may be instructed to pre-process a run result before it is handed to the archiver plugin. This is accomplished by adding a transform
section to the archive specification:
{
"archiver": "syslog",
"data": {
"ident": "user-task",
"facility": "local4",
"priority": "info"
},
"transform": {
"script": "...JQ Script...",
"raw-output": false
}
}
The script
is a string containing a valid script for the jq JSON processor version 1.5. There is a tutorial on jq and pScheduler available on the perfSONAR project’s YouTube channel. The value returned by the script should be JSON or plain text (see raw-output
, below).
If the script returns a JSON value of null
, pScheduler will discard the result and not pass it to the plugin. Because the transformation happens within pScheduler before any plugin code is invoked, this mechanism is a very efficient way to filter results and is preferred over writing custom plugins.
If raw-output
is present and true
, the output will be treated as plain text instead of JSON.
Note that some archiver plugins, notably esmond
, may expect the input to be in the un-transformed format produced by pScheduler. Using a transform in this case is not recommended.
Example Transforms¶
Convert to Plain Text¶
"transform": {
"script": "\"Ran \\(.test.type) with \\(.tool.name)\"",
"output-raw": true
}
Generate Different JSON¶
"transform": {
"script": "{ \"foo\": 123456, \"type\": .test.type, \"tool\": .tool.name }"
}
Archive Only One Test Type¶
"transform": {
"script": "if (.test.type == \"trace\") then . else null end"
}
Archive One Test Type, Log Others¶
"transform": {
"script": "if (.test.type == \"trace\") then . else \"Discarded unwanted \\(.test.type) test.\" end"
}
Drop and Transform¶
"transform": {
"script": "if (.test.type == \"idle\") then null else { \"foo\": 123456, \"type\": .test.type, \"tool\": .tool.name } end"
}
Summarize Trace Results¶
"transform": {
"script": "if (.test.type == \"trace\") then \"Trace to \\(.test.spec.dest), \\(.result.paths[0] | length) hops\" else null end"
}
Alternate JSON with Trace Hop List¶
"transform": {
"script": "if (.test.type == \"trace\") then { \"test\": .test.type, \"from\": .participants[0], \"to\": .test.spec.dest, \"id\": .id, \"start\": .schedule.start, \"ips\": [ .result.paths[0] | .[].ip ] } else null end"
}