Lean Six Sigma

How to analyze a Boxplot graph?

Understand the particularities of the boxplot and how to interpret this graph.

Thiago Coutinho
Published on Jul 30, 2021  ·  Updated on Nov 22, 2021
Understand the particularities of the boxplot and how to interpret this graph.

The manipulation and interpretation of the database is of paramount importance within a Lean Six Sigma program for the success of the DMAIC method.

In order to facilitate the visualization of data distribution, a number of graphical tools are developed that migrate data from a table to a diagram, such as the histogram and boxplot, for example.
You've probably heard about histogram, central tendency, or even the time-series diagram. But what about the boxplot? In this article you’ll read abaout:

  • What is a Boxplot graph?
  • How to analyze a Boxplot graph?
  • How to create a Boxplot graph with Minitab software?
  • How to analyze the practical result?

What is a Boxplot graph?


The Boxplot or box plot is a box diagram constructed using the references of minimum and maximum values, first and third quartile, median and outliers of the database.

The boxplot aims to study the statistical measures of the data set, such as lease properties, variability, mean, and outliers. You must be thinking that this can all be calculated with the histogram, right?

Actually, these two tools are very similar, each one with its peculiarities, and when they are together, one complements the information of the other.

In the histogram, you can see better the mean and standard deviation. Already in the boxplot one notices a little better the measurements of quartiles, median, amplitude, besides identifying very well the outliers.

In the boxplot the central part of the chart contains the values ​​that are between the first quartile and the third quartile. The lower and upper stems extend respectively from the first quartile to the lowest value, the lower limit, and from the third quartile to the highest value.

When we talk about continuous improvement projects, we have several stages that are susceptible to errors. Thinking about it, we developed the Ebook The most common errors in the Six Sigma Implementation, so you know how to prevent and avoid failures at every step of your project!

Download this material now, and see how to avoid and act on the possible failures of Lean Six Sigma projects!

[EBOOK] The Most Commom Errors In The Six Sigma Implementation

How to analyze a Boxplot graph?


To help you better understand this diagram, I have separated the following image with two boxplot examples, containing an indication of what each of its remarkable points means.

boxplot graph diagram


The boxplot always starts at the minimum value of the database and ends at the maximum value, just like the histogram.

The box represents the central values ​​of the database, where in the histogram that part is represented by the higher bars, that is, data more frequently.

The line that starts the box represents the value of the first quartile, since the line that is contained inside represents the median, and finally, the box is finished in the third quartile.

It is important to note that on each side of the carton is a line extending from the minimum value at the bottom to the maximum value at the top.

That is, all boxplot represents 100% of the database. And its great advantage is that each region of this graph represents a part of the data, thus facilitating decision making.

From the minimum value until the beginning of the box, 25% of the data are represented, with 50% of the data inside the box, and finally the upper one representing the remaining 25%.

In the boxplot graph, when you have an outlier, atypical values, we can easily visualize them with the asterisk presented in the boxplot. This can be seen in the right boxplot in the figure.

When one has an outlier, it distorts the question of the symmetry of the figure a bit more when compared to the figure without outlier. This happens when we have discrepant values ​​in the database and is observed by the difference in the size of the lower rod compared to the upper rod in the figure on the right.

If we make the histogram with the data that gave rise to the figure on the right, we will observe a behavior of a normal curve more extended for one side than for the other, that is to say, an asymmetric normal curve.


How to create a Boxplot graph with Minitab software?


Beauty, now we know how to interpret. But how about building a boxplot for later analysis? Thinking about it, I separated a space in that article for that example. Check out!

In the company Voitto Tubes, the manager built and collected information about the three main reasons for declassifying your company to do a data analysis.

For this, we will use the boxplot, because with it becomes more evident the presence of outliers and the asymmetry of data. In addition, the comparison between the three motifs analyzed will be more effective since the amplitude and location of the Boxplot is visually simpler to identify than in a histogram.

To perform this analysis, after collecting data you must go to the software Minitab, select the Graphics option and then Boxplot as shown in the following figure.

minitab lean six sigma

In order to facilitate the comparison between the databases, in this case, choose the option of multiple y's so that both boxplot are plotted in the same mesh, as shown below.

boxplot lean six sigma

Once this is done, just click OK and Minitab generates the chart that we will compare.

boxplot lean six sigma

How to analyze the practical result?


By the boxplot, we can see that each of the three motifs has a different behavior. This did not happen by chance, the idea is really to be able to explain various types of analysis to you so that you can absorb the maximum content.

The splicing boxplot is flatter, indicating poor variability and standard deviation. However, it is positioned at the top of the screen. That is, the mean and median values ​​are very high.

Already in the weak solder boxplot, the problem is a bit different. In this case we have a median value lower than the amendment, but very high variability.

This brings a certain unpredictability, because hour has very low values, hour if it has very high values. It is easy to see that when this variability is very large, predictability is much lower.

Already for tuning machines, we have two outliers. Because this boxplot has a lower median value and these two machine tuning values ​​are more distant from the median value, we end up understanding that they are outliers.

If we did the histogram of this situation it would not be so easy to see this condition of the outlier for the adjustment of machine, but using this tool, boxplot, this becomes very evident.


Do you want to learn more about Lean Six Sigma?

Lean Six Sigma is a methodology that seeks to increase the profitability of companies through the improvement of their processes.

Thinking about you and your career, we, at Think Lean Six Sigma, created a FREE Yellow Belt training in Lean Six Sigma.In this training, we will introduce you to the DMAIC method and all its 5 steps, namely: "Define, Measure, Analyze, Improve and Control.

With this course, you will be able to develop small improvement projects within your area of expertise and work with Green Belts and Black Belts in support of the Lean Six Sigma program.

Have you ever thought about being the agent of change within your organization? Click on the image below for more information and become a Yellow Belt, for FREE!

Yellow Belt in Lean Six Sigma

Thiago Coutinho
Written by
Thiago has a degree in Production Engineering, a graduate course in statistics and a degree in administration from the Federal University of Juiz de Fora (UFJF). Black Belt in Lean…

Related articles