Checks are used to determine the state or health of infrastructure resources, services, or applications.
Dynamic and customizable, Checks allow you to define a time duration and then notify you when that state change occurs. Checks can be completely custom built by you or leverage our OOTB checks.
Enable Linux Checks
Currently, Metricly comes with three pre-built checks; Heartbeat, Processes, and Ports. These are turnkey checks that do not require any scripting or coding, just simple configuration setting in the respective configuration files.
- Make sure the Linux agent is installed.
- Metricly checks can be enabled via the configuration files included with the agent.
- All checks configuration files for the Linux agent can be found in
- Some of the checks are “enabled” by default, while you would need to “enable” other checks.
This check is enabled by default. For additional configuration, you can modify the following in the
HeartbeatCollector.conf file where TTL represents Time To Live of the check and is expressed in seconds:
enabled = True ttl = 120
This check is not enabled by default. Users need to update
ProcessCheckCollector.conf to enable the collector and configure the processes that need to be monitored. There are three options for matching a process: process name, process executable or process command line. Note: exe and name are both lists of comma-separated regexps.
enabled = True ttl = 150 [process][[example_process_name]] name = ".*example_regex.*"
Match by Name:
Match by Executable:
Match by Command Line:
Enable Windows Checks
- Make sure the Windows agent is installed.
- Metricly checks can be enabled via the configuration files included with the agent.
- All checks configuration files for the Windows agent can be found in C:/Program Files (x86)/CollectdWin/conf/ or C:/Program Files/CollectdWin/conf/ (depending on the version of windows).
- Simply change the “enable” setting for the ReadSystemChecks from “false” to “true” in the CollectdWin.config file to enable the system checks.
- To configure the checks edit the ReadSystemChecks.conf file.
This check is enabled by default. The Heartbeat check is to monitor the state of the agent. This check is enabled by default so no additional configuration is required once the Windows Checks have been enabled.
To disable this check, open the
../CollectdWin/conf/ReadSystemChecks.config and change the “EnableAgentHeartbeat” setting to “false”.
Note that each check has a TTL (time to live) timer which is expressed as a multiple of the agent collection interval time. The default agent collection interval is 60 seconds so a TTLMultiplier of 2.0 would mean that the check timer would expire if no new post has been made to the API within 120 seconds. The minimum value allowed is 1.0 and decimal values are allowed. The intent is to have the TTL timer slightly be longer in duration than the posting frequency for the checks. This will allow some buffer and avoid potential “flapping” due to network latency or processing delays.
<readsystemchecks enableagentheartbeat="true" heartbeatttlmultiplier="2.0"> </readsystemchecks>
This check is not enabled by default. The agent supports “Service” checks which verify whether a Windows Service is in the running state and “Process” checks which verify whether a process of the given name is in the list of processes running on the computer. To add a new check:
- Edit the ../CollectdWin/conf/ReadSystemChecks.config file
- Insert a new entry between the … tags:
- The check can be either Service or Process.
- For a Service check the “Name” setting is the service name. This can be found by opening the service in the Service Control Manager (note that it is the Service Name, not the Display Name).
- For a Process check the “Name” setting is is the process name as it appears in the performance monitor process list (this is typically the same as it appears in Task Manager but without the file extension).
- (Optional) set the “TTLMultiplier” to configure the check time-to-live as a multiple of the agent collection interval. For example, if the agent is configured to collect data every 60 seconds (the default) and the check is configured with a TTLMultiplier of 1.2 (the default) then the next check must be received by Metricly within 72 seconds in order to pass. The minimum allowed value is 1.0 but we recommended that it is set slightly higher to allow for processing time and network latency etc.
- (Optional) You can add Alias=”my check alias” setting to provide an alias for the check received by Metricly. If it is not supplied then the process name is used.
- (Advanced) To capture multiple processes with a single check you can add UseRegex=”true” to the check configuration. With this set to true the Name field is used as a regular expression instead of an exact match and may match several processes.
ServiceCheck Name="MSSQLSERVER" Alias="sqlservercheck" TTLMultiplier="1.2" ProcessCheck Name="Process123" Alias="My Process" TTLMultiplier="1.5"/>
This check is not enabled by default. To configure the port checks requires the same steps documented above for the Service and Process checks except that the Name is simply the check name and the Port must be specified:
PortCheck Name="ApplicationABC" Port="8081"
Create Custom Checks
Our platform is flexible to support any custom checks, but you will need a mechanism to schedule the scripts to run. Linux cron jobs or Window task scheduler will typically work for most cases. If you are running on the Linux platform our agent can also schedule the running of your scripts via the Users Scripts Integration. This option will allow you to schedule a script that may post to our REST API as output either a system check, or a time-series metric value, or even a text-based data. And it will remove the need for a separate scheduler or a loop function. The agent will execute the script on the small cycle as the data collection (ex. 60 seconds).
Only 4 parameters are required to send custom checks:
- apiId – This is the API key that can be found by clicking on the corresponding integration (ex. Windows or Linux) on the integration page of our product once you are logged in. We suggest using the apiId associated with the element type (ex. Linux)
- checkName – This is any name you want to give the check. It is the name that will show up in the user interface which you would also use to create an alerting policy.
- elementFqn – This is the name of the element (ex. Linux hostname) that you want to associate with the system check. For example if you are checking if an application is running on Server123, you would set the elementFqn to “Server123”. If the element does not exist in the system, we will check an element with the type “check” and add it to the system. It is best pratice to always associate a check with some monitored element.
- TTL – The value is in seconds. It is the amount of time you would expect to see a response back from your check. This time can exactly match the time you are running the scheduled job, so for example the check could run every 60 seconds and have a 60 TTL. But we suggest putting in a buffer to deal with any small latency (ex. network or DNS delay) and prevent any “check flapping’. A better example would be to set the check on a 60 second cycle and set the TTL for 90 seconds.
The URL format:
For example, if you had a daily backup running on host db1234 you could run the following at the end of your backup script:
curl -X POST https://api.app.netuitive.com/check/00000000000000000000000000000000/dailybackup/db1234/90000
25 hours (90000 seconds) is used as the check TTL to provide a one hour buffer in case backup times fluctuate a bit, reducing false alarms.
Find Your Checks
Leveraging our saved filters is a good way to find your checks. All checks are tagged with the key
n.checks and value
- Click Type (squared in green) and select either SERVER (for Linux) or WINSRV (for Windows) to pick where the check has been configured.
- Navigate to More > Tag (squared in blue) and search for
- Select checks by name
- Click Save Filter (squared in green) and input a filter name (squared in blue).
Alerting on System Checks
Setting up an alert in Metricly requires the creation of a policy and the system checks are no exception. Any check coming into the system can have a corresponding alert as well as a notification.
- Click on policies and select “New Policy”
- Name the policy and apply any scoping or filtering required (for example, narrowing the scope to WinServ in US-West region with Tag Environment:Production)
- Next click “Conditions”, “Add Condition”, and from the drop down you will see “Add System Check Condition”
- Now you just have to select the particular check. As long as the check has been posted at least once to the API, it would automatically show up on this menu. Then save, and you are done.
- To add notifications, click the tab, and select the notification type. To see more details on configuring notifications.