Which of the Following Is a Biased Estimator? — A Real‑World Guide to Spotting Bias in Statistics
Ever stared at a list of formulas and wondered which one “lies” about the true value? In class, on a test, or while skimming a research paper, the phrase biased estimator pops up, and suddenly you’re asked to pick the odd one out. The long answer? On top of that, you’re not alone. The short answer is: the estimator that systematically misses the target. That’s what we’re digging into here Less friction, more output..
What Is a Biased Estimator
Think of an estimator as a recipe. You feed it data, it spits out a number that’s supposed to stand in for some unknown population parameter—like the average height of all adult cats in the world. If you run the same recipe over and over on different random samples, an unbiased estimator will, on average, hit the true parameter right on the bullseye Took long enough..
Some disagree here. Fair enough.
A biased estimator, on the other hand, consistently lands a few centimeters off. The bias can be positive (always too high) or negative (always too low). It’s not a mistake in the math; it’s a built‑in systematic error that shows up no matter how many times you repeat the experiment.
Most guides skip this. Don't.
Formal definition (in plain English)
[ \text{Bias}(\hat\theta)=E[\hat\theta]-\theta ]
- (\hat\theta) = the estimator (the number you calculate)
- (\theta) = the true population parameter you’re after
- (E[\hat\theta]) = the expected value of the estimator over all possible samples
If that difference equals zero, you have an unbiased estimator. Anything else, and you’ve got bias.
Why It Matters / Why People Care
You might think, “A few percent off isn’t the end of the world.” In practice, bias can wreck decisions.
- Policy making – A biased unemployment rate could trigger unnecessary stimulus or, worse, leave people out in the cold.
- Medical trials – Over‑estimating a drug’s effectiveness can expose patients to false hope and side‑effects.
- Business analytics – A biased forecast of sales leads to over‑stocking, wasted warehouse space, and bruised margins.
In short, bias skews the story your data tells, and the story drives action. Knowing which estimator is biased lets you either correct it or choose a better tool.
How It Works: Spotting the Biased Estimator
Below we walk through the most common estimators you’ll see in textbooks, exams, or data‑science interviews. For each, we’ll ask: Does it give the right answer on average?
1. Sample mean ((\bar X))
The go‑to for estimating a population mean (\mu).
How it’s calculated: Add up all the observations and divide by the number of observations.
Bias check: The expected value of (\bar X) is exactly (\mu). No surprise here—sample mean is unbiased.
2. Sample variance with divisor (n)
[ s_n^2=\frac{1}{n}\sum_{i=1}^{n}(X_i-\bar X)^2 ]
How it’s calculated: Same as the usual variance formula but you divide by the total number of observations, not (n-1) Easy to understand, harder to ignore..
Bias check: The expected value of (s_n^2) equals (\frac{n-1}{n}\sigma^2). That’s a systematic under‑estimate of the true variance (\sigma^2). Biased—the bias shrinks as (n) grows, but it never disappears completely Worth keeping that in mind..
3. Sample variance with divisor (n-1) (the “unbiased” version)
[ s_{n-1}^2=\frac{1}{n-1}\sum_{i=1}^{n}(X_i-\bar X)^2 ]
How it’s calculated: Same sum as before, but you divide by (n-1) It's one of those things that adds up..
Bias check: Its expected value is exactly (\sigma^2). This is the classic unbiased estimator for variance.
4. Maximum‑likelihood estimator (MLE) for the variance of a normal distribution
[ \hat\sigma_{\text{MLE}}^2 = \frac{1}{n}\sum_{i=1}^{n}(X_i-\bar X)^2 ]
How it’s calculated: Same algebra as the biased sample variance.
Bias check: Because MLE maximizes the likelihood, it often lands on the biased version (divide by (n)). So, for a normal distribution, the MLE of variance is biased Simple, but easy to overlook..
5. Proportion estimator (\hat p = \frac{X}{n})
How it’s calculated: Count the number of “successes” (X) in a sample of size (n) and divide And that's really what it comes down to..
Bias check: The expected value of (\hat p) equals the true proportion (p). Unbiased—simple and clean.
6. Sample median for a symmetric distribution
How it’s calculated: Sort the data, pick the middle value (or average the two middle values if (n) is even).
Bias check: For perfectly symmetric distributions, the median’s expected value matches the population median, making it unbiased. For skewed distributions, the median can be biased, but that’s a nuance we’ll revisit later And it works..
Quick cheat‑sheet
| Estimator | Formula (short) | Biased? | Why |
|---|---|---|---|
| Sample mean | (\bar X) | No | Expected value = (\mu) |
| Sample variance (divide by (n)) | (s_n^2) | Yes | Under‑estimates (\sigma^2) by factor ((n-1)/n) |
| Sample variance (divide by (n-1)) | (s_{n-1}^2) | No | Corrects the divisor |
| MLE of variance (normal) | (\hat\sigma_{\text{MLE}}^2) | Yes | Same as biased variance |
| Proportion (\hat p) | (X/n) | No | Linear expectation |
| Sample median (symmetric) | middle value | No | Symmetry guarantees unbiasedness |
Common Mistakes / What Most People Get Wrong
Mistake #1: Assuming “MLE = unbiased”
Maximum‑likelihood estimators are great for many reasons—consistency, asymptotic efficiency—but they are not automatically unbiased. The variance example is a classic trap Worth keeping that in mind. That's the whole idea..
Mistake #2: Forgetting the sample size effect
People often think bias disappears once you have “enough” data. The biased variance estimator still under‑estimates (\sigma^2) no matter how large (n) gets; the bias just becomes tiny. In high‑stakes settings (e.Now, g. , quality‑control tolerances), even a 0.5 % bias can be costly Most people skip this — try not to..
Mistake #3: Mixing up population vs. sample formulas
You might see the same algebraic expression written with different denominators in different textbooks. The key is which denominator you’re using for the parameter you care about That's the part that actually makes a difference..
Mistake #4: Treating the median as always unbiased
The median shines for reliable statistics, but if the underlying distribution is skewed, the sample median drifts away from the true median. That’s a subtle source of bias that trips up analysts who rely on the median as a “safe” default.
Mistake #5: Ignoring finite‑population correction
When you sample without replacement from a small population, the usual variance formulas need a correction factor ((N-n)/(N-1)). Skipping that step yields a biased variance estimate for the finite population.
Practical Tips / What Actually Works
-
Always check the divisor
- If you see a variance formula with (n) in the denominator, flag it as potentially biased unless the context explicitly calls for the MLE.
-
Use unbiased estimators for small samples
- When (n < 30), the bias in variance can meaningfully affect confidence intervals. Switch to the (n-1) version or apply a small‑sample correction.
-
take advantage of bootstrap bias correction
- Resample your data many times, compute the estimator each round, and adjust the original estimate by the observed bias. This works for medians, quantiles, or any statistic where a closed‑form unbiased version is messy.
-
Report bias alongside standard error
- Transparency builds trust. If you know an estimator is biased, state the bias magnitude (or that you used a bias‑corrected version).
-
Prefer method‑of‑moments for variance when you need unbiasedness
- The method‑of‑moments estimator for variance is exactly the (n-1) divisor version. It’s simple, unbiased, and works for any distribution with finite variance.
-
Check symmetry before trusting the median
- Plot a histogram or compute skewness. If the distribution is clearly right‑skewed, consider a bias‑adjusted median or switch to a trimmed mean.
-
Use software defaults wisely
- R’s
var()uses (n-1) by default (unbiased). Python’snumpy.var()defaults to (n) (biased) unless you setddof=1. Know what your tool does under the hood.
- R’s
FAQ
Q1: Can an estimator be unbiased but have huge variance?
Yes. The sample mean is unbiased for any distribution with finite mean, but if the data are extremely noisy, the estimate can still be wildly spread out. Unbiasedness is about the center of the sampling distribution, not its width Easy to understand, harder to ignore..
Q2: Is bias always a bad thing?
Not necessarily. In some shrinkage methods (e.g., ridge regression), we intentionally introduce bias to dramatically reduce variance, improving overall predictive performance. It’s a trade‑off, not a moral judgment.
Q3: How do I know if a textbook’s estimator is biased?
Look at the denominator in variance‑type formulas, and check whether the authors mention “MLE” or “method‑of‑moments.” If they cite “biased estimator,” they’ll usually note the correction factor Worth keeping that in mind. That's the whole idea..
Q4: Does bias matter for large‑sample asymptotics?
As (n\to\infty), many biased estimators become consistent—they converge to the true parameter despite the bias. Still, the finite‑sample bias can matter for confidence intervals and hypothesis tests Most people skip this — try not to..
Q5: What’s the quickest way to test bias in practice?
Simulate. Generate many synthetic datasets from a known distribution, apply your estimator, and compute the average. The difference between that average and the true value is the empirical bias Most people skip this — try not to..
That’s the long and short of it. In real terms, when you’re handed a list of estimators and asked, “Which of the following is a biased estimator? ”—look for the one that divides the sum‑of‑squares by n instead of n‑1, or the MLE version of variance for a normal model.
Understanding why that estimator is biased gives you the power to correct it, choose a better alternative, or at least explain the limitation to a stakeholder. In the world of data‑driven decisions, that kind of clarity is worth its weight in gold.
Worth pausing on this one Worth keeping that in mind..
Happy estimating!