Chapter 2 Introduction

The MultiNav R package provides facilities for visualizing multivariate data with focus on tools custom made for unsupervised anomaly detection tasks, such as:

  • EDA - Exploratory Data Analysis.
  • Evaluating different methods of anomaly scores.
  • Given an anomaly: understand the nature of the anomaly.
  • Tracking anomaly scores levels over time.

The MultiNav R package contains variety of custom displays with interactive visualizations as well as data pre-processing functions with algorithms from several fields. These include multivariate statistics, robust statistics, process control, unsupervised machine learning, and network analysis.

The package is developed based on learning’s from data analysis projects of industrial IoT real-world use cases. It is developed as part of a PhD study.

MultiNav is powered by an R backend, with javascript + D3.js frontend (enabled by utilizing HTMLwidgets).

2.1 Concept

The MultiNav package is designed to work with multivariate streaming data coming from sensors network, where all variables are continuous and from the same scale. But usage can be extended to any general multivariate dataset with continuous variables.

  • n - number of observations. In our use cases, typically the observations are sensors readings from different equally spaced time periods. Row name will be used as unique identifier for the observation. Row names should be numeric.
  • p - number of variables. In our use cases, typically the variables are readings from different sensors (each variable contains data from a different sensor). A unique numeric column name should be assigned to each variable.

  • Snapshot Scores vs Sliding Temporal Window: There are two modes of analysis. “Snapshot Scores” mode - the input data is analyzed as one window. On the other hand, in the “Sliding Temporal Window” mode, we use the sliding window approach. Each sliding window is analyzed separately.

Important: Missing values should be handled prior to using MultiNav functions. Any records with missing values will be automatically omitted. It is recommended to handle missing values as a pre-processing step (for example by interpolations and extrapolations).

2.2 Definitions / Terminology

  • Scoring - Each anomaly method typically calculates “outlyingness” score for each sensor (variable) in the dataset. Lower scores indicate normal sensors and high scores reflect outliers.

  • Thresholding - Cut-off values of anomaly scores in order to label sensors into anomaly classes.

  • Anomaly / Outlier classification / Filtering - The process of giving anomaly scores and applying a threshold to identify “Outlier” sensors, assigning each sensor a label. Typically classifying the sensors into binary class of “Outlier” and “Normal”.

  • Signal Interpretation - Understanding why an instance (in our use-case a sensor) was given an anomalous score.

  • Sliding Temporal Window - Anomalies are declared based on readings from recent history. This is achieved by choosing a time window of a fixed length, and sliding it along the streaming dataset. The anomaly scores are calculated per sliding window. Like any multi-period process-control algorithm, the window’s length balances detection power against speed of detection.

2.3 Development

The package is currently in active development. To get updates on package development news, subscribe for the MultiNav updates mailing list.

Testing, feedback and ideas are welcomed (via the github repository).

In the future, additional functionality may be added:

  • Support more unsupervised anomaly detection tasks, such as:
    • Evaluating thresholds for anomaly classification.
    • Ensemble anomaly scores.
    • Data pre-processing and transformation.
  • Additional functionality for multivariate exploratory data analysis.
  • Tools for unsupervised clustering tasks.

2.4 Installation

Stable version from CRAN - not available yet (coming soon …).

install.packages("MultiNav")

Or development version from GitHub

#install.packages("devtools")
devtools::install_github("efratvil/MultiNav")