Assessment of outliers in statistical data analysis


Onoz B., Oguz B.

NATO Advanced Research Workshop on Integrated Technologies for Environmental Monitoring and Information Production, Marmaris, Turkey, 10 - 16 September 2001, vol.23, pp.173-180 identifier

  • Publication Type: Conference Paper / Full Text
  • Volume: 23
  • City: Marmaris
  • Country: Turkey
  • Page Numbers: pp.173-180

Abstract

The first step in any statistical data analysis is to check whether the data are appropriate for the analysis. In such analyses, the presence of outliers appears as an unavoidable important problem. Thus, in order to manage the data properly, outliers must be defined and treated. Tests for outliers are well established in the statistical literature. Methods for the processing of outliers take on an entirely relative form, that is, relative to the basic model so that an examination of the outlier allows a more appropriate model to be formulated. One of the two extreme choices in the analysis of outliers is either to reject them with the risk of loss of genuine information or to include them with the risk of contamination. Outliers can be treated in four possible ways: they can be rejected as erroneous, identified as important, tolerated within the analysis, or incorporated into the analysis. This treatment can be performed by two different approaches. One of these approaches is called the accommodation procedures, which make use of 'robust' methods of inference, employing all the data but minimizing the influence of any outliers. The second type of statistical method for handling outliers is discordancy procedures, namely those of 'testing' an outlier with the prospect of rejecting it from the data set or of 'identifying' it as a feature of special interest. It is possible to perform outlier analysis not only for univariate cases but also for multivariate cases. In this study, different techniques for the statistical analysis of outliers are considered, and some examples are given.