Configuring the MaDDash Server¶
Note
The primary way most users configure MaDDash is with the pSConfig MaDDash Agent. The agent automatically fills in the values detailed below and most never need to touch any of the options listed. If you find you self in a situation where you do need to make manual configuration changes, then this document provides guidance.
The main configuration file for the maddash-server component is located at /etc/maddash/maddash-server/maddash.yaml
The file is in YAML format. In this file you define the checks you want to run and how they are organized. You can also tweak setting for the embedded web server and various other aspects of the software. The configuration file is broken into the following primary sections:
groups - In general, this is the section where you define lists of hosts that you want to check. You can define any number of custom groups in this section. When you define grids these groups will be used to define what constitutes the rows and columns. In theory they don’t have to be hosts, but for the perfSONAR checks this is the common case because your dimensions are endpoints of a test.
checks - This is where you define how to run each check. For Nagios checks, this means defining the command-line script to run and arguments to pass it.
grids - This is where you define how the groups will be applied to the checks. You define the groups that will compose the rows and columns, then define the type of check that will be run.
dashboards - This is where you group grids together.
In addition to the sections above, there are also the following optional sections:
reports - A set of patterns that can match across one or more cells in individual grids and generate notifications
notifications - Here you can define the types of notifications sent (e.g. email) when a pattern defined in the reports is matched.
There are also a number of miscellaneous parameters related to the general operation of the service. Each of these property categories is described in the remainder of this section.
groups¶
groups are used to define the resources that will compose the rows and columns of a grid. For the perfSONAR checks, this is often used to group the endpoints used in performance tests together (e.g. create a group for all the hosts running OWAMP tests and another for all the hosts running BWCTL tests). The names, members and number of groups is all customizable. groups take the following format:
groups:
groupName:
- "member"
- "member"
All groups go under the group block. In the example above groupName can replaced with any alphanumeric string that does not contain whitespace. For example “myOwampHosts”, “campusHosts”, or any other string that meets the requirements. This name will be used later in the configuration, so its important to make note of it. In the above example, you should change member to the name of the host or other resource you want in the list.
groupMembers¶
groupMembers are used to further describe members of a group. Any custom property may be added but only the values listed are understood by the default MaDDash interface. There is also a special map propert that can be used to dynamically set values based on the value of the opposing row/column. groupMembers take the following format where “member” is the name of the member referenced:
groupMembers:
- id: "member"
...member properties...
Below are the list of well-known properties:
Name |
Type |
Required |
Description |
---|---|---|---|
id |
String |
Yes |
The group member this is referencing. For example, the hostname or address of a single item in a group list |
label |
String |
No |
The value to show when displaying information about the referenced group member. |
pstoolkiturl |
String |
No |
The URL of the perfSONAR Toolkit web page. |
map |
YAML Object |
No |
Defines properties that change depending on the opposing row/column. The keys of this object are other groupMembers underneath which is a custom set of parameters that can be accessed in template strings using %row.map.<property> and %col.map.<property> respectively. The reserved key default can be used to define a set of properties that apply to hosts not explicitly listed otherwise. |
checks¶
General Parameters¶
checks are where you provide instructions as to how results should be obtained. Checks take the current format in the configuration file:
checks:
checkName :
...check-parameters...
The checkName can be any alphanumeric string with no whitespace. It is used to identify the check later in the file, so you should note it. For example you could name it something like “owampLossCheck” or “bwctl100Gbps”. The following parameters are available for the check:
Name |
Type |
Required |
Description |
---|---|---|---|
name |
String |
Yes |
A human-readable name used for display purposes when describing this check |
description |
String |
Yes |
Human-readable text describing the purpose of this check. The description field accepts the template variables %row and %col that will be populated with the current row and column values respectively when the check in applied to a grid. |
type |
String |
Yes |
The type of check. Currently the software supports net.es.maddash.checks.NagiosCheck, net.es.maddash.checks.PSNagiosCheck, and net.es.maddash.checks.RandomCheck |
params |
YAML Object |
Type dependent |
A YAML object containing parameters specific to the check. See the Type-specific Parameters section. |
checkInterval |
int |
Yes |
How frequently to run the check in seconds |
retryInterval |
int |
Yes |
How frequently to run the check in seconds if it encounters a state different from the current state |
retryAttempts |
int |
Yes |
The number of consecutive times a new state must be seen before it changes the state of the check. For example, if a check has been OK for many days, but suddenly a critical is seen, then a critical state must be seen 2 more times before the status will change |
timeout |
int |
Yes |
The number of seconds to wait for the check to return. If it does not return in this timeframe, the check is set to the UNKNOWN status. |
Type-specific Parameters¶
Currently the software supports the following types of checks:
net.es.maddash.checks.NagiosCheck - This check is performed using Nagios command. The parameters provided describe how to run that command.
net.es.maddash.checks.PSNagiosCheck - This check is a perfSONAR Nagios command. It is an extension of net.es.maddash.checks.NagiosCheck, but includes additional fields to collect information necessary to display graphs from the perfSONAR toolkit.
net.es.maddash.checks.RandomCheck - This should only be used for testing. This check returns a random result every time it runs.
NagiosCheck¶
Name |
Type |
Required |
Description |
---|---|---|---|
command |
String or YAML Object |
Yes |
The full nagios command to run on the local system. It accepts the template variables and can also be a map where the outer key is the row and inner is the column (both can take special value “default”) allowing for deterministic setting of the command. |
PSNagiosCheck¶
Name |
Type |
Required |
Description |
---|---|---|---|
command |
String or YAML Object |
Yes |
See NagiosCheck |
maUrl |
YAML Object |
Yes |
The URL of the measurement archive where performance data related to this check may be retrieved. This accepts the template variables listed in the Nagios Check Template Variables section. The object has one key that is called default which will be the default URL used for any cell in a grid. The remaining keys are members of groups assigned to the row. If default and a row key are specified, the row key is preferred for that row. The value of each key is a map where the key is a member of a group assigned to the column or you can use the default key to apply the URL to every column in the row. If default is specified and a specific value for a column, the specific value for the column is preferred. See the default configuration file for a full example. |
graphUrl |
String |
Yes |
A URL where a graph of data related to the check can be retrieved. This accepts the template variables listed in the Nagios Check Template Variables section. |
metaDataKeyLookup |
String |
Yes |
DEPRECATED A URL where metaDataKeys can be looked up for the data. These are often needed to generate the graph URL. This accepts some of the template variables listed in the Nagios Check Template Variables section. Note: Some variables it cannot accept because it is responsible for generating them. |
Nagios Check Template Variables¶
Name |
Description |
---|---|
%row |
The row in the grid associated with this check at the time its run |
%col |
The column in the grid associated with this check at the time its run |
%row.<prop> |
Custom properties defined in the groupMembers section. |
%col.<prop> |
Custom properties defined in the groupMembers section. |
%row.map.<prop> |
Custom properties defined in the groupMembers section that change depending on opposing row or column. |
%col.map.<prop> |
Custom properties defined in the groupMembers section that change depending on opposing row or column. |
%maUrl |
The url of the measurement archive. You can’t use this in the maUrl parameters as this is generated from that template. |
%maKeyF |
DEPRECATED A comma-separated list of the metaDataKeys for the forward direction of a test. This cannot be used in metaDataKeyLookup as it is generated after the URL that is called. |
%maKeyR |
DEPRECATED A comma-separated list of the metaDataKeys for the reverse direction of a test. This cannot be used in metaDataKeyLookup as it is generated after the URL that is called. |
%srcName |
DEPRECATED The hostname of the source endpoint of a point-to-point test. This cannot be used in metaDataKeyLookup as it is generated after the URL is called. |
%srcIP |
DEPRECATED The IP address of the source endpoint of a point-to-point test. This cannot be used in metaDataKeyLookup as it is generated after the URL is called. |
%dstName |
DEPRECATED The hostname of the destination endpoint of a point-to-point test. This cannot be used in metaDataKeyLookup as it is generated after the URL is called. |
%dstIP |
DEPRECATED The IP of the destination endpoint of a point-to-point test. This cannot be used in metaDataKeyLookup as it is generated after the URL is called. |
%eventType |
DEPRECATED The eventType returned by metaDataKeyLookup of the destination endpoint of a point-to-point test. This cannot be used in metaDataKeyLookup as it is generated after the URL is called. |
%event.delayBuckets |
The string http://ggf.org/ns/nmwg/characteristic/delay/summary/20110317 |
%event.delay |
The string http://ggf.org/ns/nmwg/characteristic/delay/summary/20070921 |
%event.bandwidth |
The string http://ggf.org/ns/nmwg/characteristics/bandwidth/achievable/2.0 |
%event.iperf |
The string http://ggf.org/ns/nmwg/tools/iperf/2.0 |
%event.utilization |
The string http://ggf.org/ns/nmwg/characteristic/utilization/2.0 |
grids¶
grids associate groups with checks and arrange them in a two-dimensional structure. Grids are arranged as a list of objects with the following parameters:
Name |
Type |
Required |
Description |
---|---|---|---|
name |
String |
Yes |
A human readable name of the grid |
rows |
String |
Yes |
The name of the group that will compose the rows of the grid. This must match a group name defined in the groups section of the configuration file or an error will be returned. |
columns |
String |
Yes |
The name of the group that will compose the columns of the grid. This must match a group name defined in the groups section of the configuration file or an error will be returned. |
checks |
List of Strings |
Yes |
The name of the check elements that need to be run for each row and column. Each element must match a check name defined under the checks section of the configuration or an error will be returned. |
rowOrder |
String |
Yes |
Specifies how the rows should be ordered. Valid values are alphabetical, which will sort them alphabetically, or group which will present them exactly in the order they are defined in the group section. |
colOrder |
String |
Yes |
Specifies how the columns should be ordered. Valid values are alphabetical, which will sort them alphabetically, or group which will present them exactly in the order they are defined in the group section. |
excludeSelf |
boolean |
Yes |
If set to 1, then a check will not be run where the value of the current row is equal to the value of the current column. If set to 0, then a check will be run in this case. |
excludeChecks |
YAML Object |
No |
This excludes individual checks based on the row and column. The structure is a map where the key is the name of the row where you want to exclude a check. It should match a member of the group assigned to the “rows” property of this grid or it can be the special key ‘default’ that matches every row. The value is a list of columns that should not appear in the grid. An item in the list must be a member of the group assigned to the “columns” property of this grid or the special value “all” which removes all columns for a row. A full example is provided in the default configuration file. |
columnAlgorithm |
boolean |
Yes |
Determines which checks will be run. Valid values are as follows: all - Run a check between every row and column; afterSelf - Run a check to every host that’s defined after the current row in the ‘rows’ group; beforeSelf - Run a check to every host that’s defined before the current row in the ‘rows’ group |
reports |
String |
No |
References the id field of a report in the reports section to match against this grid. |
statusLabels |
YAML object |
Yes |
Describes what each status means. It is structured as a set of key/value pairs where the key is the status and the value is the description of the status. Valid status values are ok, warning, critical, unknown and notrun, and extra. You do not need to define every status if not all are applicable to your check. |
statusLabels.extra |
YAML object |
Yes |
Object where you can define custom status labels. Valid keys are value which is an integer identifying the custom state, shortName which is a name to label the state and description which is text that will apear in the GUI legend. |
dashboards¶
dashboards group grids together. You define them as as a list of YAML objects with the following properties:
Name |
Type |
Required |
Description |
---|---|---|---|
name |
String |
Yes |
The name you want displayed as the title of the dashboard |
grids |
List of YAML objects |
Yes |
The list of grids you want included in this dashboard. Each item in the list has one property name, where you specify the name of the grid. This must map to a name property for one of the defined grids in the configuration file. |
reports¶
reports define patterns that match across one or more cells in a single grid. You define them as as a list of YAML objects with the following properties:
Name |
Type |
Required |
Description |
---|---|---|---|
id |
String |
Yes |
The identifier used in grids to reference this rule |
rule |
YAML Object |
Yes |
The parent rule object that defines that patterns to match and the “problem” it identifies |
rule.type |
String |
Yes |
The type of rule. See Rule Types. |
rule.selector |
YAML object |
Type dependent |
An object describing what cells to look at when evaluating the rule. Valid when rule.type is rule or siteRule. |
rule.match |
YAML object |
Type dependent |
An object describing how to determine if this pattern should generate a notification. Valid when rule.type is rule or siteRule. |
rule.problem |
YAML object |
Type dependent |
An object describing what to do if the rule matches. Valid when rule.type is rule or siteRule. |
rule.rules |
List of YAML objects |
Type dependent |
List of rule objects to evaluate. Valid when rule.type is forEachSite, *matchAll or matchFirst. |
rule.site |
String |
Type dependent |
Only valid when rule.type is siteRule. This is the name or the row or column to evaluate |
rule.selector.type |
String |
Yes |
The type of selector. See Selector Types. |
rule.selector.rowSite |
String |
Yes |
The name of the row to select when using a selector of type cell. |
rule.selector.colSite |
String |
Yes |
The name of the column to select when using a selector of type cell. |
rule.selector.rowIndex |
String |
Yes |
The index of the check to select when selecting a row in a match of type cell or check. |
rule.selector.colIndex |
String |
Yes |
The index of the check to select when selecting a column in a match of type cell or check. |
rule.match.type |
String |
Yes |
The type of match. See Match Types. |
rule.match.status |
Integers |
Type dependent |
For match types status and statusThreshold, the integer value of the status to match. |
rule.match.statuses |
List of Numbers |
Type dependent |
For match type statusWeightedThreshold, the list of statuses where the index in the list corresponds to the integer value of the status (starting at 0) |
rule.match.threshold |
Number |
Type dependent |
For match type statusThreshold the percentage of checks that must have status and for statusWeightedThreshold the average weight of each selected check |
rule.problem.severity |
Integer |
Yes |
The severity of the problem. 0=OK, 1=WARNING, 2=CRITICAL, 3=UNKNOWN |
rule.problem.category |
String |
Yes |
A string to define the category of the problem, for example CONFIGURATION or PERFORMANCE. |
rule.problem.message |
String |
Yes |
A message describing the problem (i.e. what it means when the rules match) |
rule.problem.solutions |
List of Strings |
No |
A list of potential solutions to the problem |
Rule Types¶
Name |
Description |
---|---|
forEachSite |
A site in this context is a a group member that is in a row and/or column. This type of rule loops through these unique groupMembers and applies the rule object in the rules property. |
matchAll |
All the sub-rules defined in rules must match for the pattern to match |
matchFirst |
The first sub-rule defined in rules that matches causes this rule to match |
siteRule |
A rule that only looks at the site specified by a site property |
rule |
The fundamental building block of the other rule types that selects a set of cells, matches them against a criteria and defines the problem a match identifies. |
Selector Types¶
Name |
Description |
---|---|
cell |
Selects a specific cell for a site given by rowSite and/or colSite |
check |
Selects a an individual check specified by rowIndex and colIndex |
column |
Selects a column |
grid |
Selects the entire grid |
row |
Selects an entire row |
site |
Selects row and column |
Match Types¶
Name |
Description |
---|---|
status |
Match if all the checks selected have the given status |
statusThreshold |
Match if the percentage of cells specified by threshold have the given status |
statusWeightedThreshold |
Assign a weight to each status using the statuses array and if the average weight of the selected cells is above threshold generate an alarm |
notifications¶
notifications define when and where to send notifications of problems found in reports. You define them as as a list of YAML objects with the following properties:
Name |
Type |
Required |
Description |
---|---|---|---|
name |
String |
Yes |
An identifier that can be any string. Mainly used as a description and may be used by notification plug-in. |
type |
String |
Yes |
The type of notification. Current valid values are email or servicenow. |
schedule |
String |
Yes |
A cron schedule in MIN HOUR DAY-OF-MONTH MONTH DAY-OF-WEEK format. |
problemReportFrequency |
Integer |
Yes |
Frequency in seconds with which to report the same problem. |
problemResolveAfter |
Integer |
Yes |
If supported, the length of time to wait after a problem has gone away to mark it as resolved. |
minimumSeverity |
Integer |
Yes |
The minimum severity a problem must be to generate a notification where 0=OK, 1=Warning, 2=Critical |
filters |
List of YAML Object |
Yes |
A list of filter objects that define which types of problems to generate notifications. |
filters.type |
String |
Yes |
The type of filter. Valid values are dashboard, grid, site, and category. |
filters.value |
String |
Yes |
The value to match. For example, if this is of type dashboard then this is the name of the dashboard for which notifications should be generated. |
parameters |
YAML Object |
Yes |
Type specific parameters. See Email Parameters and ServiceNow Parameters. |
Email Parameters¶
Name |
Type |
Required |
Description |
---|---|---|---|
mailServer |
String |
No |
An object describing how to contact your mail server. If not specified, defaults are used. |
mailServer.address |
String |
No |
The IP or hostname of your mail server. Default is 127.0.0.1. |
mailServer.port |
String |
No |
The port of your mail server. Default is port 25. |
mailServer.username |
String |
No |
The username for authenticating to your mail server. Default is none. |
mailServer.password |
String |
No |
The password for authenticating to your mail server. Default is none. |
mailServer.useSSL |
boolean |
No |
Indicates whether or not to use SSL when communicating with mail server. Default is false. |
from |
String |
No |
The from address of emails sent. |
replyTo |
List of Strings |
No |
A list of replyTo addresses for emails sent. |
to |
List of Strings |
No |
A list of addresses where the emails should be sent. |
cc |
List of Strings |
No |
A list of CC addresses to send copies of the email notifications. |
bcc |
List of Strings |
No |
A list of BCC addresses to send blind copies of the email notifications. |
subjectPrefix |
String |
No |
A string to append on beginning of the email subject. Note that the subject is the name property of the notification definition. |
format |
String |
No |
The format of the email sent. Valid values are html or text. Default is html. |
dashboardUrl |
String |
No |
The URL of your dashboard used to add links in emails. If not specified then links will not be included. |
ServiceNow Parameters¶
Name |
Type |
Required |
Description |
---|---|---|---|
instance |
String |
Yes |
Name of the ServiceNow instance. |
oauthFile |
String |
No |
A YAML file containing a combination of the username, password, clientID, and clientSecret. |
clientID |
String |
No |
The client ID to authenticate to ServiceNow |
clientSecret |
String |
No |
The client secret to authenticate to ServiceNow. |
username |
String |
No |
The username to authenticate to ServiceNow. |
password |
String |
No |
The password to authenticate to ServiceNow. |
recordTable |
String |
No |
The name of the table to create the record. |
recordFields |
YAML Object |
Yes |
A YAML object with the fields to set on creation. This is highly dependent on your ServiceNow setup. The keys are the name of the fields. The values support multiple ServiceNow Template Variables. |
resolveFields |
YAML Object |
No |
A YAML object with the fields to set on resolve. This is highly dependent on your ServiceNow setup. The keys are the name of the fields. The values support multiple ServiceNow Template Variables. |
dashboardUrl |
String |
No |
The URL of the dashboard. This is used to generate links to the dashboard in created records. |
duplicateRules |
YAML Object |
No |
Rules used to determine if a record that already exists in ServiceNow matches a problem and what action to take if it does. |
duplicateRules.identityFields |
List of Strings |
No |
A list of fields that MUST match in both records for them to be considered equal. |
duplicateRules.rules |
List of YAML Object |
No |
A list of rules evaluated in order until one matches that determine the action to take when a record is found that matches the given identity fields. |
duplicateRules.rules[N].equalsFields |
YAML Object |
No |
An object where the key is the field to match and the value is what that field must equal for the given updateFields to be applied to the record. |
duplicateRules.rules[N].gtFields |
YAML Object |
No |
An object where the key is the field to match and the value is what that field must be greater than for the given updateFields to be applied to the record. |
duplicateRules.rules[N].ltFields |
YAML Object |
No |
An object where the key is the field to match and the value is what that field must less than for the given updateFields to be applied to the record. |
duplicateRules.rules[N].updateFields |
YAML Object |
No |
The fields to update in the record and the values they should be given if this rule matches. The values support multiple ServiceNow Template Variables. |
ServiceNow Template Variables¶
Name |
Description |
---|---|
%br |
A line break |
%problemEntity |
The source of the problem - either a grid or a site (i.e. a row and/or column) |
%gridUrl |
The URL of the grid |
%gridLink |
An HTML link to the grid |
%siteName |
The name of the site (i.e. row and/or column) that is the source of the alarm |
%gridName |
Name of the grid that is source of the alarm |
%isGlobal |
Boolean indicating whether this affects the entire grid. |
%severity |
The severity of the problem. |
%category |
The problem category. |
%solutions |
A bulleted list of potential solutions. |
%name |
The name of the problem. |
General Properties¶
Name |
Type |
Required |
Description |
---|---|---|---|
database |
String |
Yes |
The path to the directory where the database is stored |
jobThreadPoolSize |
Integer |
No |
The maximum number of checks that can run in parallel. Defaults to 20 |
jobBatchSize |
Integer |
No |
The maximum number of checks that can be running or waiting to run in memory. Defaults to 250. |
disableScheduler |
Boolean |
No |
If set to 1 then the server will only run as a REST server and not execute any new checks. Default is 0. |
skipTableBuild |
Boolean |
No |
If set to 1 then the database tables will not be built and indexes will not be built/rebuilt. The first time you run the server it must be set to 0. After that, you may find that setting it to 1 significantly speeds-up boot time. Keeping it on though has the advantage of rebuilding indexes on startup which can improve query performance. |
Web Server Properties¶
Name |
Type |
Required |
Description |
---|---|---|---|
serverHost |
String |
No |
The hostname of the interface where the web server should listen. Defaults to localhost. |
http |
YAML Object |
Yes (unless https specified) |
Parameters related to http. See http properties section. |
https |
YAML Object |
Yes (unless http specified) |
Parameters related to https. See https properties section. |
http properties¶
Name |
Type |
Required |
Description |
---|---|---|---|
port |
Integer |
Yes |
The port on which the web server should listen for HTTP connections |
proxyMode |
String |
Yes |
Reserved for future use. Currently let’s the server know that if it is behind a proxy. This may be used in later implementation to extract headers that forward information related to authentication. |
https properties¶
Name |
Type |
Required |
Description |
---|---|---|---|
port |
Integer |
Yes |
The port on which the web server should listen for HTTPS connections |
keystore |
String |
Yes |
The keystore containing the key ‘mykey’ to use as the ssl server certificate. It should also contain any trusted certificates if doing client authentication. |
keystorePassword |
String |
Yes |
The password to access the keystore. |
clientAuth |
String |
Yes |
Indicates whether a client to the rest server must have a trusted SSL certificate. Valid values are require, want and off. require means the user MUST have a trusted certificate or the request will be rejected. want means the server will check the certificate if one is presented, but will not reject requests that do not provide one. off means no certificate is required. |
proxyMode |
String |
Yes |
Reserved for future use. Currently let’s the server know that if it is behind a proxy. This may be used in later implementation to extract headers that forward information related to authentication. |