Descriptive Statistics in R

Descriptive statistics is the term given to the analysis of data that helps, describe, show or summarize data in a meaningful way, It doesn’t allow us to make conclusion beyond the data we have analyzed or reach the conclusion regarding any hypothesis we have made.

There are generally two types of statistics:-

  • Measures of central tendency
  • Measures of Variability

Measures of central tendency:- Measures of central tendency are numbers that describe what is average or typical within a distribution of data. While they are all measures of central tendency, each is calculated differently and measures something different from the others.

Measures of Variability:-Shows  how the data within the set is “spread out” (or “dispersed”, or “scattered”) .If the data is clustered around the centre value, the “spread” is small. The further the distances of the data values from the center value, the greater the “spread”.


Mean or average, in theory, is the sum of all the elements of a data set divided by the number of elements in the data set. Mean could be treated as a collaborative property of the whole set of values.

Example :-


Median is the middle value of a set. So, if a set consists of odd number of sets, then the middle value is the median of the set, and if the set consists of an even number of sets, then the median is the average of the two middle values. The median may be used to separate a set of data into two parts.

If vector has missing value then use na.rm=TRUE, to ignore missing value.

na.rm=TRUE: –it will ignore missing values from the vector while performing calculation.



Mode is most frequent occurring values in a distribution. Here , I have used  user defined function as their is no such inbuilt function is their in R.


Range: –

Range of a variable is the difference between its largest and smallest data values. It is a measure of how far apart the entire data spreads in value.



There are several quartiles of an observation variable. The first quartile, or lower quartile, is the value that cuts off the first 25% of the data when it is sorted in ascending order. The second quartile, or median, is the value that cuts off the first 50%. The third quartile, or upper quartile, is the value that cuts off the first 75%.


Inter Quartile Range:-

The interquartile range is the distance or range between the 25th percentile and the 75th percentile.

Formula:-  Inter quartile Range = Upper Quartile – Lower Quartile



The nth percentile of an observation variable is the value that cuts off the first n percent of the data values when it is sorted in ascending order.


Variance is a measurement of the spread between numbers in a data set. The variance measures how far each number in the set is from the mean. Variance is calculated by taking the differences between each number in the set and the mean, squaring the differences (to make them positive) and dividing the sum of the squares by the number of values in the set 0.

X: individual data point

U: mean of data points

N: total # of data points


Standard deviation:-

Square root of variance is called as standard deviation. Many useful interpretations can be carried out by analyzing the variance in data. The variance is obtained by:

  1. Finding out the difference between the mean value and all the values in the set.
  2. Squaring those differences.
  3. Adding the differences.

A thumb rule of standard deviation is that generally 68% of the data values will always lie within one standard deviation of the mean, 95% within two standard deviations and 99.7% within three standard deviations of the mean.



Skewness is used to describe asymmetry from the normal distribution in a set of statistical data. Skewness can come in the form of negative skewness or positive skewness, depending on whether data points are skewed to the left and negative, or to the right and positive of the data average.



Kurtosis is used in the statistical field to describes trends in charts. Kurtosis can be present in a chart with fat tails and a low, even distribution, as well as be present in a chart with skinny tails and a distribution concentrated toward the mean.





About Shyama:

Shyama is an MBA. Currently she is working as an Analyst Intern with NikhilGuru Consulting Analytics Service LLP, Bangalore. She has prior worked for around 3 Years with Apollo Hospitals.

Be the first to comment on "Descriptive Statistics in R"

Leave a comment

Your email address will not be published.



Subscribe for Data Analytics Edge Newsletter & Share..:-)

error: Content is protected !!