Checks are used to determine the state or health of infrastructure resources, services, or applications.

Dynamic and customizable, Checks allow you to define a time duration and then notify you when that state change occurs. Checks can be completely custom built by you or leverage our OOTB checks.

Prerequisites

You must have already installed and configured a Windows and/or Linux agent to set up checks. The latest agent needs to be installed before using this document.

Enable Linux Checks

Currently, Metricly comes with three pre-built checks; Heartbeat, Processes, and Ports. These are turnkey checks that do not require any scripting or coding, just simple configuration setting in the respective configuration files.

  1. Make sure the Linux agent is installed.
  2. Metricly checks can be enabled via the configuration files included with the agent.
  3. All checks configuration files for the Linux agent can be found in /opt/netuitive-agent/conf/collectors
  4. Some of the checks are “enabled” by default, while you would need to “enable” other checks.

Heartbeat Check

This check is enabled by default. For additional configuration, you can modify the following in the HeartbeatCollector.conf file where TTL represents Time To Live of the check and is expressed in seconds:

enabled = True
ttl = 120

Port Checks

This check is not enabled by default. To enable Port Checks, update the PortCheckCollector.conf file. Make sure to follow the structure indicated in the file for [port] and [port_app_name].

# To enable the collector, set to True.
# Configure the ports that need to be monitored.

# [port] is the parent and [[port_app_name]] are the child entries.  
# Child entries must be listed below the parent as shown below. 

enabled = False
ttl = 150

[port]

[[port_app_name]]
number = 8888

Process Checks

This check is not enabled by default. Users need to update ProcessCheckCollector.conf to enable the collector and configure the processes that need to be monitored. There are three options for matching a process: process name, process executable or process command line. Note: exe and name are both lists of comma-separated regexps.

enabled = True
ttl = 150

[process][[example_process_name]]
name = ".*example_regex.*"

Match by Name:

 [[nginx]] 
name=^nginx

Match by Executable:

 [[postgres]] 
exe=^/usr/lib/postgresql/+d.+d/bin/postgres$

Match by Command Line:

[[elasticsearch]]
cmdline=java.*Elasticsearch

DNS Checks

The DNS check is not enabled by default. Users need to update the DNSLookupCheckCollector.conf file to enable this collector. URLs added to this list must be in the following format: www.google.com.  Note that the dnsAddressList requires a comma even when only one URL is provided. If the comma is not added, the collector returns with a ‘cannot resolve hostname‘ error.

enabled = true
ttl = 150

#Replace www.google.com and www.yahoo.com with the DNS names you want to check
dnsAddressList = www.google.com, www.yahoo.com

Enable Windows Checks

  1. Make sure the Windows agent is installed.
  2. Metricly checks can be enabled via the configuration files included with the agent.
  3. All checks configuration files for the Windows agent can be found in C:/Program Files (x86)/CollectdWin/conf/ or C:/Program Files/CollectdWin/conf/ (depending on the version of windows).
  4. Simply change the “enable” setting for the ReadSystemChecks from “false” to “true” in the CollectdWin.config file to enable the system checks.
  5. To configure the checks edit the ReadSystemChecks.conf file.

Currently, Metricly comes with three pre-built checks; Heartbeat, Processes, and Ports. These are turnkey checks that do not require any scripting or coding, just simple configuration setting in the respective configuration files.


Heartbeat Checks

This check is enabled by default. The Heartbeat check is to monitor the state of the agent. This check is enabled by default so no additional configuration is required once the Windows Checks have been enabled.

To disable this check, open the ../CollectdWin/conf/ReadSystemChecks.config and change the “EnableAgentHeartbeat” setting to “false”.

Note that each check has a TTL (time to live) timer which is expressed as a multiple of the agent collection interval time. The default agent collection interval is 60 seconds so a TTLMultiplier of 2.0 would mean that the check timer would expire if no new post has been made to the API within 120 seconds. The minimum value allowed is 1.0 and decimal values are allowed. The intent is to have the TTL timer slightly be longer in duration than the posting frequency for the checks. This will allow some buffer and avoid potential “flapping” due to network latency or processing delays.

<readsystemchecks enableagentheartbeat="true" heartbeatttlmultiplier="2.0"> </readsystemchecks>

HTTP Checks

Configure HTTP checks to send an HTTP GET request to a URL. If a successful response is returned a check is sent to Metricly. By default, no HTTP checks are configured.

Add something like:

<HttpCheck Name="MyTestHTTPCheck" Url="http://www.google.com" StatusMatches="^(?!4|5)" />

To your ReadySystemChecks.config file:

<ReadSystemChecks EnableAgentHeartbeat="true" HeartbeatTTLMultiplier="2.5">

  <Checks>
 
  <HttpCheck Name="MyTestHTTPCheck" Url="http://www.google.com" StatusMatches="^(?!4|5)" />
 </Checks>
</ReadSystemChecks>
  • Name: This is used as the name of the check in Metricly if Alias is not set.
  • URL: This is the URL to test. A check is sent if an HTTP GET request sent to the given URL returns a successful response. Redirects are automatically followed.
  • StatusMatches: (optional) A regular expression to evaluate a successful response code. The default expression is “^2” which matches any 2xx code.
    Other examples:
    ^(?!4|5) : any code except 4xx or 5xx
    ^(2|3) : any 2xx or 3xx code
  • AuthHeader:  (optional) An authorization header to send with the request. e.g., “Basic dXNlcm5hbWU6cGFzc3dvcmQ”
  • Alias: (optional) An alias to use for the check name in Metricly.
  • TTLMultiplier: (optional) Sets the time-to-live of the check as a multiple of the agent execution interval. For example, if the agent is configured to collect data every 60 seconds (the default) and the check is configured with a TTLMultiplier of 2.5 (the default) then the next check must be received by Metricly within 150 seconds in order to pass.

Process and Service Checks

The agent supports “Service” checks which verify whether a Windows Service is in the running state and “Process” checks which verify whether a process of the given name is in the list of processes running on the computer. By default no process or service checks are enabled. To add a new check:

  1. Edit the ../CollectdWin/conf/ReadSystemChecks.config file
  2. Insert a new entry between the … tags:
  3. The check can be either Service or Process.
  4. For a Service check the “Name” setting is the service name. This can be found by opening the service in the Service Control Manager (note that it is the Service Name, not the Display Name).
  5. For a Process check the “Name” setting is is the process name as it appears in the performance monitor process list (this is typically the same as it appears in Task Manager but without the file extension).
  6. (Optional) set the “TTLMultiplier” to configure the check time-to-live as a multiple of the agent collection interval. For example, if the agent is configured to collect data every 60 seconds (the default) and the check is configured with a TTLMultiplier of 2.5 (the default) then the next check must be received by Metricly within 150 seconds in order to pass. The minimum allowed value is 1.0 but we recommended that it is set slightly higher to allow for processing time and network latency etc.
  7. (Optional) You can add Alias=”my check alias” setting to provide an alias for the check received by Metricly. If it is not supplied then the process name is used.
  8. (Advanced) To capture multiple processes with a single check you can add UseRegex=”true” to the check configuration. With this set to true the Name field is used as a regular expression instead of an exact match and may match several processes.
<ServiceCheck Name="MSSQLSERVER" Alias="sqlservercheck" TTLMultiplier="2.5"/>
<ProcessCheck Name="Process123" Alias="process123" TTLMultiplier="2.5"/>

Port Checks

No port checks are configured by default.To configure the port checks requires the same steps documented above for the Service and Process checks except that the Name is simply the check name and the Port must be specified.

 <PortCheck Name="ApplicationABC" Port="8081"/>

Create Custom Checks

Our platform is flexible to support any custom checks, but you will need a mechanism to schedule the scripts to run. Linux cron jobs or Window task scheduler will typically work for most cases. If you are running on the Linux platform our agent can also schedule the running of your scripts via the Users Scripts Integration. This option will allow you to schedule a script that may post to our REST API as output either a system check, or a time-series metric value, or even a text-based data. And it will remove the need for a separate scheduler or a loop function. The agent will execute the script on the small cycle as the data collection (ex. 60 seconds).

Only 4 parameters are required to send custom checks:

  1. apiId – This is the API key that can be found by clicking on the corresponding integration (ex. Windows or Linux) on the integration page of our product once you are logged in. We suggest using the apiId associated with the element type (ex. Linux)
  2. checkName – This is any name you want to give the check. It is the name that will show up in the user interface which you would also use to create an alerting policy.
  3. elementFqn – This is the name of the element (ex. Linux hostname) that you want to associate with the system check. For example if you are checking if an application is running on Server123, you would set the elementFqn to “Server123”. If the element does not exist in the system, we will check an element with the type “check” and add it to the system. It is best pratice to always associate a check with some monitored element.
  4. TTL – The value is in seconds. It is the amount of time you would expect to see a response back from your check. This time can exactly match the time you are running the scheduled job, so for example the check could run every 60 seconds and have a 60 TTL. But we suggest putting in a buffer to deal with any small latency (ex. network or DNS delay) and prevent any “check flapping’. A better example would be to set the check on a 60 second cycle and set the TTL for 90 seconds.

Endpoint URL
The URL format:

https://api.app.metricly.com/check/{apiId}/{checkName}/{elementFqn}/{ttl}

CURL Example
For example, if you had a daily backup running on host db1234 you could run the following at the end of your backup script:

curl -X POST https://api.app.metricly.com/check/00000000000000000000000000000000/dailybackup/db1234/90000

25 hours (90000 seconds) is used as the check TTL to provide a one hour buffer in case backup times fluctuate a bit, reducing false alarms.

Find Your Checks

Leveraging our saved filters is a good way to find your checks. All checks are tagged with the key  n.checks and value check name.

  1. Click Type (squared in green) and select either SERVER (for Linux) or WINSRV (for Windows) to pick where the check has been configured.
  2. Navigate to MoreTag (squared in blue) and search for n.checks.Create Check Filter
  3. Select checks by name
  4. Click Save Filter (squared in green) and input a filter name (squared in blue).
    name saved check filter

Alerting on System Checks

Setting up an alert in Metricly requires the creation of a policy and the system checks are no exception. Any check coming into the system can have a corresponding alert as well as a notification.

    1. Click on policies and select “New Policy”
    2. Name the policy and apply any scoping or filtering required (for example, narrowing the scope to WinServ in US-West region with Tag Environment:Production)
    3. Next click “Conditions”, “Add Condition”, and from the drop down you will see “Add System Check Condition”
    4. Now you just have to select the particular check. As long as the check has been posted at least once to the API, it would automatically show up on this menu. Then save, and you are done.
    5. To add notifications, click the tab, and select the notification type. To see more details on configuring notifications.
capterra

Join other DevOps who love Metricly!

Sign up for a free, fully featured, 21-day trial. No credit card required!