Different methods are used to identify outliers in a normal distribution. As most statistical tests assume that data are normally distributed, outlier identification should precede data analysis. Given that the outliers are data points lying far away from the majority of other data points, outliers in the data that is not normally distributed do not require identification. In this review paper, we discuss the types of missing values and different methods used to identify outliers and to handle missing values and outliers efficiently. Therefore, adequate treatment of missing data and outliers is crucial for analysis. The different approaches for handling missing values and outliers can drastically change the results of data analysis. This involves modifying outliers after identifying their sources or replacing them with substituted values. Dealing with outliers is essential prior to the analysis of the data set containing outlier. The outliers contained in sample data introduce bias into statistical estimates such as mean values, leading to under- or over-estimated resulting values. In a distribution of variables, outliers lie far from the majority of the other data points as the corresponding values are extreme or abnormal. Outliers result from various factors including participant response errors and data entry errors.
When weight data are collected, a value of 250 kg cannot fit into the normal distribution for weights it thus represents an outlier. The other problem is that of outliers, which refers to extreme values that abnormally lie outside the overall pattern of a distribution of variables.
In general, the analysis of missing values involves the consideration of efficiency, handling of missing data and the resulting complexity in analysis, and the bias between missing and observed values. As a part of the pretreatment process, missing data are either ignored in favor of simplicity or replaced with substituted values estimated with a statistical method. It can also produce biased results when inferences about a population are drawn based on such a sample, undermining the reliability of the data. The presence of missing values leads to a smaller sample size than intended and eventually compromises the reliability of the study results. Missing values can arise from information loss as well as dropouts and nonresponses of the study participants. Missing values and outliers are frequently encountered during the data collection phase of observational or experimental studies conducted in all fields of natural and social sciences.