Archiver Reference

Introduction

pScheduler’s architecture features the concept of archivers, which send completed measurement results elsewhere for storage or processing. Archivers are plugins, which means the set of destinations available for use with pScheduler can be easily expanded.

Archiving in pScheduler is reliable. After each attempt to dispose of the result, the archiver plugin will tell pScheduler whether it succeeded and, if not, whether or not to try again and how long to wait before the next attempt.

Basic JSON Syntax

Archiving is accomplished by providing an archive specification in the form of a JSON object containing these values:

archiver - The name of the archiver to use. See Archivers for a list available in the base pScheduler distribution.

data - A JSON object containing archiver-specific data to be used in deciding how to dispose of the result.

ttl - The absolute amount of time after which the result should be discarded if not successfully archived, specified as an ISO8601 duration. This value is optional and will be treated as infinite if not provided.

For example (commentary is not part of the specification):

{
    "archiver": "bitbucket",      Send to the archiver that goes nowhere.
    "data": { },                  The "bitbucket" archiver takes no specific data.
    "ttl": "PT12H"                Give up after 12 hours if not successfully archived.
}

Archiving from the Command Line

pScheduler can be directed to send measurement results to an archiver by using the --archive switch followed by an archive specification.

Specifying Archivers

Directly, as a String Literal

The archive specification may be added directly to the command line as a string literal containing its JSON:

% pscheduler task --archive '{ "archiver": "bitbucket", "data": {} }' trace --dest www.perfsonar.net

Indirectly, From a File

If the argument given to the --archive switch begins with @, the remainder of the argument will be treated as the path to a JSON file containing an archive specification which will be opened, read and treated as if it had been typed in by hand. If the first character of the path is a tilde (~), it will be expanded to the user’s home directory. For example:

% cat /home/fred/archive-to-bitbucket.json
{
    "archiver": "bitbucket",
    "data": {}
}

% pscheduler task --archive @/home/fred/archive-to-bitbucket.json trace --dest www.perfsonar.net

Multiple Archivers

The results of a task can be sent to multiple archivers by using the --archive switch multiple times:

% pscheduler task \
      --archive @/home/fred/archive-to-esmond.json \
      --archive '{ "archiver": "bitbucket", "data": {} }' \
      trace --dest www.perfsonar.net

Other than system-imposed limits on the length of the command line, there is no limit on the number of archivers that may be specified as part of a task.

Archiving as Part of a JSON Task Specification

Archive specifications can be added to a JSON task specification as an array of JSON objects as part of the archives property:

% cat mytask.json
{
    "schema": 1,
    "test": {
        "type": "trace",
        "spec": {
            "schema": 1,
            "dest": "www.perfsonar.net"
        }
    },
    "schedule": {
        "slip": "PT5M"
    },
    "archives": [
        {
            "archiver": "bitbucket",
            "data": { }
        },
        {
            "archiver": "syslog",
            "data": { "ident": "just-testing" }
        }

    ]
}

% pscheduler task --import mytask.json .

Note

The . in the command above is a placeholder for the test type, which is imported from mytask.json.)

Archiving in pSConfig Templates

pSConfig allows for the use of archive objects in the archives section of pSConfig templates. They take the exact same format as described in this document. For more information on pSConfig templates see Introduction to pSConfig Templates

Archiving Globally

pScheduler can be configured to apply an archive specification to every run it performs on a host by placing each one in a file in /etc/pscheduler/default-archives. Files must be readable by the pscheduler user.

For example, this file will use the HTTP archiver to post the results of all throughput tests to https://host.example.com/place/to/post:

{
    "archiver": "http",
    "data": {
        "_url": "https://host.example.com/place/to/post",
        "op": "post",
    },
    "transform": {
        "script": ""if (.test.type == \"throughput\") then . else null end""
    }
    "ttl": "PT5M"
}

Archivers

The archivers listed below are supplied as part of the standard distribution of pScheduler.

Note

All items listed in each Archiver Data subsection are required unless otherwise noted.

bitbucket

The bitbucket archiver sends measurement results to the bit bucket (i.e., it does nothing with them). This archiver was developed for testing pScheduler and serves no useful function in a production setting.

Archiver Data

This archiver uses no archiver-specific data.

Example

{
    "archiver": "bitbucket",
    "data": { }
}

esmond

The esmond archiver submits measurement results to the esmond time series database using specialized translations of results for throughput, latency, trace and rtt tests into a format used by earlier versions of perfSONAR. If it does not recognize a test it will store the raw JSON of the pscheduler result in the pscheduler-raw event type.

Archiver Data

url - The URL for the esmond server which will collect the result.

_auth-token - Optional. The authorization token to be used when submitting the result. Note that the _ prefix indicates that this value is considered a secret and will not be supplied if the task specification is retrieved from pScheduler via its REST API. If not specified, IP authentication is assumed.

measurement-agent - Optional. The name of the pScheduler host that produced the result. If not specified, defaults to the endpoint pscheduler deemed the lead.

retry-policy - Optional. Describes how to retry failed attempts to submit the measurement to esmond before giving up. The default behavior is to try once and then give up.

data-formatting-policy - Optional. Indicates how the record should be stored. Valid values are:
  • prefer-mapped - This is the default. It means that if test is type throughput, latency, trace and rtt than store using the traditional metadata and event types. If it does not recognize the result it will store as a pscheduler-raw record.
  • mapped-and-raw - Store both a mapped type and a raw record. Will not store either if not a recognized type that can be mapped.
  • mapped-only - Only store a mapped type and do not store anything if it is not a known type
  • raw-only - Only store a pscheduler-raw record regardless of test type.

summaries - Optional. A list of objects containing an event-type, summary-type and summary-window. If not specified, defaults to a standard set of summaries used by perfSONAR. See the esmond documentation for more details on summaries.

verify-ssl - Optional. Defaults to false. If enabled, check SSL certificate of esmond server against list of known certificate authorities (CAs). See the requests documentation for more details on environment variables and other options for specifying path to CA store.

Example

{
    "archiver": "esmond",
    "data": {
        "measurement-agent": "ps.example.net",
        "url": "http://ma.example.net/esmond/perfsonar/archive/",
        "_auth-token": "35dfc21ebf95a6deadbeef83f1e052fbadcafe57",
        "retry-policy": [
            { "attempts": 1,  "wait": "PT60S"   },
            { "attempts": 1,  "wait": "PT300S"  },
            { "attempts": 11, "wait": "PT3600S" }
        ]
    }
}

failer

The failer archiver provides the same archiving function as bitbucket but introduces failure and retries a random fraction of the time. This archiver was developed for testing pScheduler and serves no useful function in a production setting.

Archiver Data

fail - The fraction of the time that archive attempts will fail, in the range [0.0,1.0].

retry - The fraction of the time that archive attempts will be retried after a failure, in the range [0.0,1.0].

Example

{
    "archiver": "failer",
    "data": {
        "fail": 0.5,
        "retry": 0.75
    }
}

rabbitmq

The rabbitmq archiver sends raw JSON results to RabbitMQ.

Archiver Data

_url - An amqp URL for the RabbitMQ instance which will receive the result.

routing-key - Optional. The routing key to be used when queueing the message.

retry-policy - Optional. Describes how to retry failed attempts to submit the measurement to esmond before giving up. The default behavior is to try once and then give up.

Example

{
    "archiver": "rabbitmq",
    "data": {
        "_url": "amqp://rabbithole.example.org/",
        "routing-key": "bugs",
        "retry-policy": [
            { "attempts": 5,  "wait": "PT1S" },
            { "attempts": 5,  "wait": "PT3S" }
        ]
    }
}

syslog

The syslog archiver sends the raw JSON result to the system log.

Note that because most syslog implementations cannot handle arbitrarily-long log messages, this archiver should not be relied upon for anything other than debugging.

Archiver Data

ident - Optional. The identification string to be used when submitting the log message.

facility - Optional. The syslog facility to be used when the log entry is submitted. Valid valies are kern, user, mail, daemon, auth, lpr, news, uucp, cron, syslog, local0, local1, local2, local3, local4, local5, local6 and local7.

priority - Optional. The syslog priority to be used when the log entry is submitted. Valid values are emerg, alert, crit, err, warning, notice, info and debug.

Example

{
    "archiver": "syslog",
    "data": {
        "ident": "mytests",
        "facility": "local3",
        "priority": "warning"
    }
}

Transforms

As part of an archive specification, pScheduler may be instructed to pre-process a run result before it is handed to the archiver plugin. This is accomplished by adding a transform section to the archive specification:

{
    "archiver": "syslog",
    "data": {
        "ident": "user-task",
        "facility": "local4",
        "priority": "info"
    },
    "transform": {
        "script": "...JQ Script...",
        "raw-output": false
    }
}

The script is a string containing a valid script for the jq JSON processor version 1.5. There is a tutorial on jq and pScheduler available on the perfSONAR project’s YouTube channel. The value returned by the script should be JSON or plain text (see raw-output, below).

If the script returns a JSON value of null, pScheduler will discard the result and not pass it to the plugin. Because the transformation happens within pScheduler before any plugin code is invoked, this mechanism is a very efficient way to filter results and is preferred over writing custom plugins.

If raw-output is present and true, the output will be treated as plain text instead of JSON.

Note that some archiver plugins, notably esmond, may expect the input to be in the un-transformed format produced by pScheduler. Using a transform in this case is not recommended.

Example Transforms

Convert to Plain Text

"transform": {
    "script": "\"Ran \\(.test.type) with \\(.tool.name)\"",
    "output-raw": true
}

Generate Different JSON

"transform": {
    "script": "{ \"foo\": 123456, \"type\": .test.type, \"tool\": .tool.name }"
}

Archive Only One Test Type

"transform": {
    "script": "if (.test.type == \"trace\") then . else null end"
}

Archive One Test Type, Log Others

"transform": {
    "script": "if (.test.type == \"trace\") then . else \"Discarded unwanted \\(.test.type) test.\" end"
}

Drop and Transform

"transform": {
    "script": "if (.test.type == \"idle\") then null else { \"foo\": 123456, \"type\": .test.type, \"tool\": .tool.name } end"
}

Summarize Trace Results

"transform": {
    "script": "if (.test.type == \"trace\") then \"Trace to \\(.test.spec.dest), \\(.result.paths[0] | length) hops\" else null end"
}

Alternate JSON with Trace Hop List

"transform": {
    "script": "if (.test.type == \"trace\") then { \"test\": .test.type, \"from\": .participants[0], \"to\": .test.spec.dest,  \"id\": .id, \"start\": .schedule.start, \"ips\": [ .result.paths[0] | .[].ip ] } else null end"
}