Metrinome – Continuous Monitoring and Security Validation of Distributed Systems

https://prd-wret.s3.us-west-2.amazonaws.com/assets/palladium/production/s3fs-public/styles/full_width/public/thumbnails/image/close-up-code-coding-239898.jpg
Credit: Lorenzo Cafaro (Public Domain)

IV. Modes of Use

The Metrinome framework supports a number of operational use cases and scenarios, including use during demonstrations, experiments, and continuous monitoring.

A. Runtime Visualization During Demonstrations

A major hurdle facing users during a demonstration is the ability to showcase a holistic view of the system operation while highlighting specific aspects that are being demonstrated, such as performance, load balancing, resistance to security attacks, etc. Metrinome’s GUI equips the demonstrator not only with the ability to pinpoint the changes in the system as these events occur, but also to visualize these changes to the measurements graphically during runtime.

B. Experimentation

Metrinome seamlessly integrates with off-the-shelf continuous integration frameworks, such as Jenkins [17]. Users can easily specify assertions showcasing desired system behavior. Metrinome evaluates assertions at specific control points within an experiment or at the end of an experiment. Metrinome’s HTTP interface also allows user controlled and on-demand evaluation of assertions at runtime. An HTTP response will indicate whether the assertion evaluation passed successfully or failed. In the case of failure, the HTTP response also includes information about the particular assertions that failed.

When an experiment is complete, Metrinome stores the state of all assertions along with metrics values, historical statistics, and definition of metrics. This process supports offline analysis and reproducibility of experiments, and can also generate inputs to C&A processes. Finally, Metrinome has the ability to export the metrics data into other programs using the Comma Separated Value (CSV) format which allows administrators to perform customized analysis over the data, using spreadsheet and visualization software of their choice.

C. Continuous Monitoring

Continuous monitoring is a desirable feature in enterprise environments because it decreases the time to react to occasional hardware and software failures and minimizes the time to mitigate security attacks such as Denial of Service attacks. While guidance for continuous monitoring is maturing [18], agencies have already started to struggle with compliance mainly due to implementation costs [19]. Metrinome reduces costs by virtue of integrating with existing logging and auditing frameworks. It also provides ready dashboard functionality that increases situational awareness at no additional implementation cost.

V. Interfaces

A. Metrinome Language

Metrinome processes receive messages based on user-specified processing logic, which is dynamically loaded into the Metrics Server. This processing logic echoes a user’s perception of the desired system behavior and is declared in terms of metrics and assertions. Users are able to express such terms using a XML-based representation.

Figure 2 shows the XML schema for specifying the processing logic. The metrics element serves as an enclosing element to the entire document, while the section element serves not only to organize the metrics and assertions into different clusters, but also to limit the scope of assertions.

Atighetchi_Page_4_Image_0001

Figure 2: Metrinome’s DSL Schema

Thus an assertion specified for a particular section will not be triggered against metrics in another section

The core of the language consists of two major elements: Metric and assert. Metric is used to specify a measurement evaluation while assert − associated with a metric or a set of metrics − is used to specify the expected system behavior.

A metric element has a unique name and a description to provide information about the Metric. An assert element has a unique name and a metricRef element which uses regular expressions to allow the referencing of a metric or a set of metrics that the assertion will be evaluated on. Both elements encompass a function which expresses a statistical calculation to perform. Functions allow the user to configure the actual operation to be performed, which in the case of metrics occurs over the incoming message (e.g., counting the number of exceptions occurred) or in the case of assertions, through the specification of a logic expression (e.g., zero number of errors).

A function specified as part of an assertion is triggered when an experiment is complete or by an external entity request. The main purpose of assertion functions is to validate metric values, thus they tend to be logical in nature.

A function specified in a metric can be triggered by a single event which is equivalent to specifying unary functions, such as count, or two separate events (denoted start_event and end_event) which is equivalent to specifying binary functions, such as time difference.

An event consists of two parts: component and regex. A component outlines the actual set of processes whose log messages can trigger such an event. All processes not specified via the component element will not trigger the specific event. The regex specifies the message string to be processed. The processing engine allows the use of regular expressions in both component and regex, thus enabling easy specification of processes and messages.

Finally, a function element can have several attributes:

  • round: rounds a numeric value of the measurement to the nearest specified number beyond the decimal point.
  • roundhistory: similar to round but applies over the statistical calculations rather than individual values.

Two special attributes epochs and colors are used to indicate the staleness of a measurement as observed by the Metrics Server and can be customized per metric. A user can specify a staleness threshold and an associated severity color which will be highlighted on the HTML GUI Interface.

Metrinome provides a set of predefined functions for computing metrics, including the following:

  • count: counts the number of occurrences of an expression,
  • ratio: provides the ratio of two expressions,
  • diff: calculates the time difference between two events,
  • absdiff: calculates absolute time difference between two events, and
  • sum: calculates the sum of two expressions.

Examples of assertion functions are equals, greater than, less than, and greater than or equal. The library of functions can be easily extended to support additional functions, which currently requires changes to the Metrics Server but not to monitored processes.

Atighetchi_Page_4_Image_0002

Figure 3: Example Metric: Count

Atighetchi_Page_5_Image_0001 (1)

Figure 4: Example Assertion: No Out Of Memory Error

Atighetchi_Page_5_Image_0003

Figure 5: Example Assertion: Non processing and error metrics should be greater than zero.

Figure 3 highlights an example of a security assessment metric called ‘reqAuthz_pass’ which provides the number of requests that failed during authorization generated by processes containing ‘CoTToPubSvc’ or ‘CoTToSubSvc’ in their descriptive names, based on which they are sending log messages to the Metrics Server. This metric is useful especially for testing the authorization process of an application during high load or automated attacks.

Figure 4 shows a simple assertion example over a metric called ‘error_outOfMemoryErrors’. As the name indicates, this is a useful assertion for testing that a system has no out-of-memory exceptions.

Another example shown in Figure 5 highlights an assertion that showcases the correct functionality of the system under evaluation. The assertion uses regular expressions to state that all metrics except the ones containing error or processing in their names should have values greater than zero values.

B. HTML User Interface

Figure 6 displays a screenshot of the GUI. The first column highlights the name of the metric as specified in the configuration file. The next column highlights the latest measurement of the metric. By default, Metrinome provides statistical information such as the average, median, and standard deviation across historical values. The last column highlights the changes in the value of the metric over time graphically. This feature is useful to quickly pinpoint measurement anomalies. Users can view metrics without the graphs by clicking on the “Metrics” link.

C. Metrinome API Interface

The service API consists of an Assertion and Metrics service accessible via HTTP.

The Assertion service offers the following functionality:

  • HTTP GET http://localhost:8080/assertions
    • Triggers evaluation of assertions against the current status of the metrics, which either returns success in the form of a HTTP response code of 204, or a list of failed assertions, encoded as XML payload in the HTTP response.
  • HTTP GET
    http://localhost:8080/assertions?SHOWDEFS

    • Displays a table of current assertion definitions.

The Metrics service offers the following API:

  • HTTP GET http://localhost:8080/metrics?CSV
    • Returns the metrics values in a CSV format.
  • HTTP GET http://localhost:8080/metrics?EVENTS
    • Returns the collected events that were used to generated the metrics.

2014-02-12_1311

Figure 6: Metrinome’s Metrics with Graphs Interface

Want to find out more about this topic?

Request a FREE Technical Inquiry!