Get a Pareto Chart & Analysis Template Right Here

Sometimes, the vast majority of our problems have just a few small causes.

Pareto charts, and Pareto analysis, are tools that help us discover which problems are causing most of our defects (or alternatively, which small number of opportunities offer the majority of the payoff).

Today, I’ll talk about how to build a Pareto Chart and run analysis on it, as well as address some common questions about it, including a few pitfalls you’ll want to avoid.

As usual, I’ve also put together a spreadsheet that you can use to skip all the busy work. The spreadsheet does all the sorting, calculates your cumulative percentages, and builds the chart for you, as well as identifying your “vital few” and “trivial many,” and helps you identify whether they actually “obey” the 80-20 rule. You’ll need Excel 2007 or higher for this one.

You can get that spreadsheet, and sign up for more practical updates like this one, using the form below. (I will never share your email address with anybody.)


Continue reading →

Bootstrapping Statistics & Confidence Intervals, Tutorial

If you use applied statistics in your career, odds are you’ve used the Great Assumption Of Our Era, the assumption of the Normal distribution. There are some good reasons for this. The Central Limit Theorem is usually thrown in there as a justification, and it works reasonably well for practical applications. But the Central Limit [...]

Continue reading →

6 Sigma Calculator to Convert Between PPM / DPMO & Sigma

One of the primary goals of statistical process control is to reduce the probability of a “defect,” however you define it, to acceptable levels.

Probably the most widely known example is Six Sigma, which aims to keep the number of defects below 3.4 per million. (More on that later, considering that it technically corresponds to 4.5 sigma.)

Defects are often measured in PPM (parts per million), but statistical processes are usually understood in terms of standard deviations (sigma).

The terminology DPMO (defects per million opportunities) is also sometimes used in place of PPM, but it means essentially the same thing.

It should go without saying that being able to convert back and forth between PPM and sigma can be very handy. I went ahead and put together a “calculator” for you to accomplish this (okay, it’s an Excel spreadsheet). You can use the calculator to convert between PPM and sigma in both traditional Statistical Process Control and Six Sigma (and yes, the results are different).

To get the calculator, and to sign up for more updates like this one, go ahead and sign up using the form below. (I will never share your email address with anybody.) Or you can go ahead and read on to understand how this conversion works, and set it up yourself.

Continue reading →

The Cohen’s d Formula

Repeat after me: “statistical significance is not everything.” It’s just as important to have some measure of how practically significant an effect is, and this is done using what we call an effect size. Cohen’s d is one of the most common ways we measure the size of an effect. Here, I’ll show you how to calculate it. [...]

Continue reading →

I’ve Added a Normality Test Spreadsheet to the Menu

Blog posts can only be so useful.

Today, I’ve updated the menu with a new “Downloads” section, and I’ve added a resource to this site, a spreadsheet that you can use in Excel to test whether your data comes from the normal distribution.

Expect a lot more updates like this soon.

Continue reading →

Learn Data Analysis for Excel in 2.5 Hours, Part 3: AKA – “Everything Else”

Today I’m going to finish up this series on data analysis in Excel. This time around, I’ll cover all the basic statistics like correlation, covariancedescriptive statistics, and so on. We’ll also talk about a few miscellaneous tools for exponential smoothing, Fourier analysis, moving averages, random number generation, rank and percentile, and sampling.

You’ll need to enable the Analysis ToolPak if you haven’t already. Click here to take a look at Part 1 to find out how. If you were trying to figure out how to do real regression analysis in Excel, you will also find that in Part 1.

For statistical tests like the ANOVA, t-tests, and the F-test, click here to take a look at Part 2.

I’ll just go ahead and dive right in.

Continue reading →

Learn Data Analysis for Excel in 2.5 Hours, Part 2: Statistical Testing

excel test

If you’ve ever tried to set up a legitimate statistical test in Excel, you already know it’s painful, but if you have the Analysis ToolPak enabled, things get a bit easier. Today, we’re going to learn how to run statistical tests in Excel. We’ll cover F-tests to compare variances, t-tests to compare 2 averages, and ANOVAs to compare multiple averages.

(If you need to know how to enable the Analysis ToolPak, or how to run in-depth regression analysis, click here and take a look at Part 1 now.)

(For correlation analysis, covariance, descriptive stats, and miscellaneous tools like exponential smoothing, moving averages, random number generation, rank and percentile, and sampling, check out part 3.)

A word of warning: the ToolPak might seem self explanatory, but some of the results are easy to interpret incorrectly.

This post should take between 22 and 44 minutes to read and toy with. Alright, let’s go ahead and get started.

Continue reading →

Learn Data Analysis for Excel in 2.5 Hours, Part 1: Enabling the ToolPak, and Regression Analysis

excel-2010-logoStatistical analysis in Excel is a huge pain unless you know how to enable the Analysis Toolpak. In part 1 of this series on data analysis in Excel, I’m going to tell you how to do that. Next, we’ll talk about regression analysis (the real thing, including multiple variables, not just fitting a line to a graph). This whole post should take 20 to 40 minutes.

(If you’re looking for info on statistical tests for Excel, click here to check out Part 2.)

(Take a look at Part 3 for more basic but less useful statistical tools, as well as miscellaneous tools like Fourier Analysis, moving averages, exponential smoothing, random number generation, sampling, and rank and percentile.)

Continue reading →

The P-Value “Formula,” Testing Your Hypothesis

The p-value, while it is one of the most widely-used and important concepts in statistics, is actually widely misunderstood. Today we’ll talk about what it is, and how to obtain it.

(If you’re in a statistics class, or using this stuff out there in the real world, consider ordering . It’s got the readability of the Idiot’s Guide on the same subject, and (thank God) a non-textbook price, but without the glaring mistakes.)

Continue reading →