Dados & Tecnologia

What is standard deviation and how does it work in descriptive statistics?

Find out what standard deviation is, its relationship to variance, how to apply it, and what its role is in descriptive statistics

By Alicia Soares

Published on Aug 19, 2022

What is standard deviation and how does it work in descriptive statistics?

The standard deviation is one of the important measures of dispersion in Statistics, as well as the variance, the mode, the arithmetic mean and the median.

It is related to the degree of dispersion of a dataset compared to its arithmetic mean, indicating how homogeneous or heterogeneous this information is.

Continue in this article and discover information about this statistical variant and step by step to apply it through the following topics:

What is standard deviation?
How to calculate standard deviation?
What is the difference between standard deviation and variance?
What are the other elements of descriptive statistics?
Learn to use Microsoft Excel!

Several times we get lost in so many Excel functions, and we don't know where to start and what to learn. To help you, we created the First Steps eBook in Excel to guide your practice with the tool, as well as how and where to learn about its most complex resources. Learn how to solve everyday problems quickly and efficiently in Excel.

By downloading our eBook, you will learn the first steps to create your spreadsheets and use their basic and essential functions.

Let's get started?

What is standard deviation?

Standard deviation is the measure of the dispersion of a data set in relation to its arithmetic mean. It indicates how close or far from it these data are. It can be applied in areas such as biology, finance, physics and laboratory analysis.

In statistics, the sample standard deviation is an absolute value related to the deviation of the values of the data set in comparison to the mean.

In probability, it is given as the separation of the total number, measuring how the values of this set differ from each other. Thus, it can also be called population standard deviation.

How did standard deviation come about?

The term was first used in 1894 by Karl Pearson, replacing the term “mean error” previously used by Carl Friedrich Gauss.

Pearson was an important figure in statistical studies. He was responsible for founding the Department of Applied Statistics at University College London in 1911, making it the first department in the field in the world.

In 1918, statistician William Gosset (pseudonym Student) defined the differences between sample standard deviation and population standard deviation after empirical studies.

How to calculate standard deviation?

This is the standard deviation formula:

Standard Deviation

On what:

S: standard deviation;
Xi: dataset numbers (using i= 1, 2, 3…);
X: arithmetic mean;
n: number of numbers in the data set.

The standard deviation is calculated using the square root of this result: the sum of the difference of your variables by the arithmetic mean, divided by the number of variables.

Thus, the lower the value found, the more homogeneous and constant this dataset will be. The higher the standard deviation, the more heterogeneous the dataset will be.

Standard Deviation

Simplifying these calculations is essential and can be performed in Microsoft Excel and Google Sheets.

What is the difference between standard deviation and variance?

Both standard deviation and variance are measures of dispersion used in statistical studies. However, variance and standard deviation are directly related.

So, the variance is the difference between the collected data points compared to the arithmetic mean. The standard deviation is an absolute value related to this homogeneity or heterogeneity.

Therefore, the variance can be calculated as follows:

Variance

On what:

S: variance;
Xi: dataset numbers (using i= 1, 2, 3…);
X: arithmetic mean;
n: number of numbers in the data set.

Hence, the standard deviation is the square root of the variance.

What are the other elements of descriptive statistics?

Descriptive statistics is the initial step of data analysis used to summarize and understand information.

The availability of a large amount of data and very efficient computational methods has reinvigorated this area of statistics. It can be used in the application of the Lean Six Sigma methodology.

Standard deviation and variance are measures of descriptive statistics. Check out other variables below:

Mean

The mean is nothing more than the sum of all values in the database divided by the total number of elements.

There is also the weighted mean, in which each data is assigned and multiplied by a weight. The sum of these elements is then divided by the sum of all weights.

Mode

If we say that a number represents the mode of a database, this number appears the most in that database, the most frequent value.

It is worth remembering that if no value is repeated in your database, then we will not have a mode in this case.

Median

The median is the measure of the central positioning of the data. It is the central term for a set of data placed in ascending or descending order.

If the number of sorted values is odd, the median is exactly the number located in the middle of the list. If the number of ordered values is even, the median is calculated as the average of the two central values.

Percentiles

Percentiles are measures that divide the sample (in ascending order of data) into 100 equal parts, each with an approximately equal percentage of data.

Therefore:

the 1st percentile determines the lowest 1% of the data;
the 50th percentile determines the lowest 50% of the data and is also equal to the median seen previously;
the 98th percentile determines the lowest 98% of the data.

In this way, it can be calculated as follows:

Percentile

In which:

K = the position where the percentile will be in the data
i = the desired percentile number
n = number of samples.

Quartiles

In descriptive statistics, quartiles are values that divide ordered data into four equal parts.

Using quartiles, you can quickly assess the dispersion and central tendency of a set of samples, which are important steps in understanding your data.

Hence,

1st quartile consists of 25% of the data;
2nd quartile consists of 50% of the data;
3nd quartile consists of 75% of the data.

It can be calculated as follows:

Quartile

In which:

Q = the position where the quartile will be in the data
i = the quartile we want to find
n = number of samples.

Amplitude

The amplitude is used to show us how far apart or not the data in the sample being worked on is.

To calculate it, simply make the difference between the highest and lowest value.

If the range is a high value, it means that your data is distributed within a large range. If not, it shows us that the ranges are small.

Interquartile-Range

The interquartile range was created with the objective of analyzing the degree of dispersion around the data centrality measure.

This measure is calculated as the difference between the third quartile and the first quartile.

Learn to use Microsoft Excel!

Are you tired of missing out on incredible career opportunities because you don't know about the main Excel tools?

So you are in the right place! In the Excel for Beginners course, you will learn to work with data in an agile way.

In addition, the course is available for FREE, just click on the banner button below and embark on this journey of knowledge!