Live
All Statistics Calculators
⏳ Loading calculators…

📚 Sources & Methodology

Cohen, J. (1988) — Statistical Power Analysis for the Behavioral Sciences (2nd ed.) — original d = 0.2/0.5/0.8 effect size benchmarks for small/medium/large effects, ncbi.nlm.nih.gov1988 / 2025 review
Goodman, S.N. (2008) JAMA survey finding 100% of medical residents misinterpreted p-values despite 88% confidence in understanding — “A Dirty Dozen: Twelve P-Value Misconceptions,” Seminars in Hematology, ncbi.nlm.nih.govSeminal paper
American Statistical Association (ASA) Statement on P-Values (2016) — official guidance on correct and incorrect interpretation of statistical significance, amstat.org2016 statement
American Psychological Association (APA) Publication Manual 7th edition — statistical reporting standards, effect size requirement alongside p-values, Bessel’s correction notation, apastyle.apa.org2020 7th edition

Standard Deviation, P-Values & Sample Size — The Concepts Behind the Calculations

Statistics is both a discipline and a source of systematic errors when the underlying concepts are unclear. Standard deviation has two versions (population and sample) that differ by whether you divide by N or N-1, and the N-1 correction exists for a specific mathematical reason that matters for small samples. P-values measure one very specific thing and are routinely misinterpreted as measuring something else entirely. And sample size has a quadratic relationship with precision — halving your margin of error requires four times as many observations. The percentile calculator, quartile calculator, and sample size calculator cover the mechanical calculations; this page covers the conceptual context that makes those calculations interpretable.

Standard Deviation — Population (N) vs Sample (N-1) and Why Bessel’s Correction Matters

Standard deviation measures how spread out data values are from the mean. Two versions exist. Population standard deviation (σ) divides the sum of squared deviations by N — the full count of data points. This is correct when you have every member of the population you care about. Sample standard deviation (s) divides by N-1. This is used when your data is a sample drawn from a larger population and you want to estimate the population’s standard deviation. The N-1 denominator is Bessel’s correction, named after Friedrich Bessel. The reason: when you calculate a sample mean, all data points are defined relative to that sample mean. The data points are systematically closer to their own sample mean than they would be to the true population mean. Dividing by N produces a variance that systematically underestimates the true population variance. Dividing by N-1 corrects for this bias. For large samples (N=100), the difference is 1% and negligible. For small samples (N=5), dividing by N gives an estimate 25% smaller than dividing by N-1 — a meaningful bias.

Standard Deviation — Population vs Sample Population SD (σ): σ = √( ∑(x − μ)² / N ) Sample SD (s): s = √( ∑(x − x̄)² / (N−1) ) ← Bessel’s correction — Worked example: data = {4, 7, 13, 2} — Mean (x̄): (4+7+13+2)/4 = 6.5 Squared deviations: (4−6.5)²=6.25, (7−6.5)²=0.25, (13−6.5)²=42.25, (2−6.5)²=20.25 Sum = 69 Population variance: 69/4 = 17.25 → σ = 4.15 Sample variance: 69/3 = 23.00 → s = 4.80 ✗ Using N (population formula) on a sample: produces biased, underestimated SD ✓ For sample data estimating population: always use N-1 (sample formula) Most calculators, spreadsheets (Excel STDEV), and statistical software default to sample SD (N-1). Population SD (N) is STDEV.P in Excel. When in doubt which to use: if your data is a subset of a larger group you want to make inferences about, use N-1.

P-Values — What They Mean, What They Do Not Mean, and Why Experts Get It Wrong

A p-value is: the probability of observing data as extreme as or more extreme than your result, assuming the null hypothesis is true. Nothing more, nothing less. P = 0.04 means: if there were truly no effect (null hypothesis), there would be only a 4% chance of seeing data this extreme or more extreme purely by chance. A JAMA study found that 88% of medical residents felt confident in their understanding of p-values — and 100% had the interpretation wrong. The most common error: treating the p-value as the probability the null hypothesis is true, or the probability the result is real. These are completely different things (this error is called the “inverse probability fallacy”). A p = 0.04 does not mean there is a 96% chance the result is real. It means the data would be observed by chance only 4% of the time if there were no effect.

Effect Size — Why Statistical Significance Without Effect Size Is Meaningless

Statistical significance (p < 0.05) tells you a result is unlikely to be noise. Effect size (Cohen’s d, r, eta-squared) tells you how large the effect is. With large enough samples, even trivially small effects become statistically significant. A study with n = 500 participants finding p < 0.001 for a meditation intervention sounds compelling. If Cohen’s d = 0.10, the standardised effect is 0.1 standard deviations — a 1.5-point change on a 100-point scale. Statistically significant, practically negligible. Cohen’s benchmarks: small effect = d ≈ 0.2, medium = d ≈ 0.5, large = d ≈ 0.8. The APA Publication Manual now requires effect sizes to be reported alongside p-values for this reason. A p-value without an effect size is an incomplete statistical report.

P-Value — Correct Interpretation vs Common Misinterpretations P-value = P(data this extreme or more extreme | null hypothesis is true) — What p = 0.04 MEANS — ✓ If there were no effect, we would see data this extreme only 4% of the time by chance ✗ NOT: 4% chance null hypothesis is true ✗ NOT: 96% chance the result is real or the alternative is true ✗ NOT: probability the study will replicate 96% of the time ✗ NOT: a measure of effect size (small p does not mean large effect) — Statistical vs practical significance example — n=500, p<0.001 (highly significant), Cohen’s d = 0.10 → Statistically significant, practically negligible (1.5 pts on 100-pt scale) n=30, p=0.08 (not significant), Cohen’s d = 0.60 → Underpowered study: real medium effect, insufficient sample to detect it Always report effect size (Cohen's d, r, or eta-squared) alongside any p-value. A p-value without effect size is an incomplete report of a finding.
💡

P-value misinterpretation is the norm, not the exception — even among experts: A seminal JAMA survey (Goodman 2008) tested medical residents on p-value interpretation. 88% expressed fair to complete confidence in their understanding. 100% got the interpretation wrong. The most common errors: treating p-value as the probability the null is true, treating it as the probability the result will replicate, and conflating statistical significance with practical importance. The American Statistical Association issued a formal statement in 2016 clarifying that a p-value below 0.05 does not by itself constitute adequate evidence for a scientific claim. The replication crisis in psychology, medicine, and social science is partly traceable to over-reliance on p < 0.05 without consideration of effect size, study power, and pre-registration.

Statistics Reference Tables — Effect Size, Sample Size & Percentile Interpretation

Cohen’s d Effect Size Benchmarks — What Small, Medium, and Large Mean in Practice

Cohen’s benchmarks are guidelines for effect interpretation in the absence of domain-specific context. A d = 0.2 is “small” in abstract terms but may be very meaningful if the intervention is cheap and safe. A d = 0.8 may be meaningful in education but insufficient in medical treatment contexts. Always interpret effect size relative to the domain and practical stakes.

Cohen’s dClassificationDistribution OverlapPractical Example
0.10Negligible~92% overlap1.5-pt change on 100-pt scale (often noise)
0.20Small~85% overlapHeight difference between 15–16 year olds
0.50Medium~67% overlapIQ difference between clerical and semi-skilled workers
0.80Large~53% overlapIQ difference between PhD and typical college freshman
1.20Very Large~37% overlapSubstantial clinically meaningful difference
2.0+Huge<22% overlapGroup differences clearly visible without statistics

Sample Size vs Margin of Error — The Quadratic Relationship

For estimating a proportion at 95% confidence, assuming maximum variance (p = 0.5). To halve the margin of error, you need four times the sample. This diminishing return is why large surveys require careful cost-benefit analysis: going from ±5% to ±2.5% precision costs 4× more but delivers only 2× more precision.

Sample Size (n)Margin of ErrorTo Halve MOENotes
100±9.8%→ need n=400Rough estimates only
385±5.0%→ need n=1,537Common survey standard
600±4.0%→ need n=2,401Political polling minimum
1,067±3.0%→ need n=4,268National survey standard
1,537±2.5%→ need n=6,147High-precision surveys
9,604±1.0%→ need n=38,416Census-level precision

Z-Score and Percentile Reference — Standard Normal Distribution

A z-score measures how many standard deviations a data point is from the mean. Positive z-scores are above average; negative are below. Used for normalising datasets and calculating percentile rank from a known distribution.

Z-ScorePercentile RankMeaning
−3.00.13%Extreme low — rarer than 1 in 750
−2.02.28%Low — bottom 2.3%
−1.015.87%Below average
0.050.00%Exactly at the mean
+1.084.13%Above average
+1.6595.05%Top 5% threshold (one-tailed)
+1.9697.50%95% confidence interval boundary
+2.097.72%Top 2.3%
+2.57699.50%99% confidence interval boundary
+3.099.87%Top 0.13% — rare
⚠️

Statistical significance ≠ practical significance — the most consequential misunderstanding in applied statistics: With large enough samples, any effect, no matter how small, will be statistically significant. A randomised trial with 10,000 participants comparing two teaching methods might find p < 0.0001 — massively significant. If the effect is d = 0.05, that is approximately a 0.75-point improvement on a 100-point test. The finding is real (not noise), replicable, and practically meaningless for most policy decisions. Conversely, a study with n = 30 finding p = 0.08 and d = 0.65 has a genuine medium effect that failed to reach significance because the study was underpowered — the failure to reach p < 0.05 is a failure of sample size, not a failure of the effect. The ASA and APA both now require effect size reporting alongside p-values precisely because p-values alone cannot distinguish “real but tiny” from “real and meaningful.”

Which Statistics Calculator to Use — A Practical Guide for Researchers, Analysts & Students

For Descriptive Statistics (Summarising Data)

Use the standard deviation calculator for any dataset where you want to understand spread. Enter the data and specify whether you want population standard deviation (you have all the data you care about) or sample standard deviation (your data is a sample and you want to estimate the population). For most research and survey contexts, use sample standard deviation (N-1). Use the ascending order calculator as a first step before calculating median, quartiles, and percentiles — all of these require sorted data. Use the percentile calculator to find where a specific value ranks in a distribution, and the quartile calculator to find Q1, Q2, Q3, and IQR for box plot construction and outlier detection.

For Hypothesis Testing

When interpreting any p-value result: read the p-value correctly (probability of data this extreme given null is true), report the effect size alongside it (Cohen’s d for means, r for correlations, odds ratio for categorical outcomes), and consider study power. A non-significant p-value does not mean “no effect” — it means “insufficient evidence to reject the null at this significance level.” An underpowered study can miss a real and meaningful effect. Always run a power analysis before data collection to ensure your sample size is large enough to detect the effect size you consider meaningful.

For Survey and Study Design

Use the sample size calculator before designing any survey or study. Enter your required confidence level (95% is standard), desired margin of error (5% for general surveys, 3% for policy-relevant research), and expected proportion (use 0.5 if unknown — this maximises the required sample). Remember the quadratic relationship: cutting margin of error in half requires four times the sample. Population size has minimal impact on required sample size for large populations (a nationally representative sample of 385 achieves ±5% margin of error whether the population is 100,000 or 100 million) — this counterintuitive result confuses many people who assume larger populations always require larger samples.

What Researchers and Analysts Consistently Get Wrong

Three statistical errors are pervasive across fields. First: p-hacking — running multiple analyses and reporting only the one that crosses p < 0.05. Each additional analysis increases the chance of false positives; with 20 independent tests at α = 0.05, at least one significant result is expected by chance alone even if all null hypotheses are true. Second: treating a non-significant result as evidence of no effect, especially from an underpowered study. Absence of evidence is not evidence of absence when the study had insufficient power to detect a meaningful effect. Third: using N instead of N-1 for sample standard deviation on small datasets, which systematically understates spread and affects all downstream calculations including confidence intervals and t-tests that depend on the standard deviation estimate.

Frequently Asked Questions — Statistics Calculators

Population SD (σ) divides by N. Sample SD (s) divides by N-1 (Bessel’s correction). Use population SD when you have all data from the population you care about. Use sample SD when your data is a subset and you want to estimate the population’s SD. Why N-1: sample data points are closer to their sample mean than to the true population mean, causing systematic underestimation of variance when dividing by N. N-1 corrects this bias. For N=100, the difference is 1%. For N=5, it is 25% — substantial. Most software defaults to sample SD (N-1). Excel STDEV = sample; STDEV.P = population.
P-value = probability of observing data this extreme or more extreme, assuming the null hypothesis is true. P = 0.04: if there were truly no effect, data this extreme would occur only 4% of the time by chance. What it is NOT: not the probability the null hypothesis is true. Not the probability the result will replicate. Not a measure of effect size. Not the probability the finding matters. A JAMA survey found 100% of medical residents misinterpreted p-values despite 88% feeling confident. Always pair p-values with effect size (Cohen’s d). A p < 0.001 with d = 0.10 is significant but trivial. P = 0.08 with d = 0.65 may be a real medium effect in an underpowered study.
Statistical significance (p < 0.05) means the result is unlikely by chance alone. Practical significance (effect size) means the result is large enough to matter in the real world. Large samples make tiny effects statistically significant. A study of n = 10,000 might find p < 0.00001 for a teaching intervention with Cohen’s d = 0.04 — a 0.4-point improvement on a 100-point test, statistically undeniable and practically meaningless. Always report effect size. Always ask: is this effect large enough to be worth acting on, regardless of significance? The APA and ASA both now require or strongly recommend effect size reporting alongside p-values.
Step 1: Find the mean. Step 2: Subtract mean from each value; square the result. Step 3: Sum all squared differences. Step 4: Divide by N-1 (sample) or N (population). Step 5: Take the square root. Example: {4, 7, 13, 2}. Mean = 6.5. Squared deviations: 6.25, 0.25, 42.25, 20.25. Sum = 69. Sample variance = 69 ÷ 3 = 23. Sample SD = √23 = 4.80. Population SD = √(69÷4) = √17.25 = 4.15. The standard deviation calculator handles any number of data points and shows the step-by-step work.
Percentile = the value below which a given percentage of observations fall. 70th percentile: 70% of data is below this value. To find percentile rank of a score: (values below your score ÷ total values) × 100. To find the value at the kth percentile in a sorted dataset: index = (k ÷ 100) × N. If the index is whole: average that element and the next. If not whole: round up and take that element. Example: {2, 4, 7, 9, 11, 15}, 75th percentile: index = 0.75 × 6 = 4.5. Round up to 5th element = 11. The percentile calculator handles any dataset and any percentile.
Quartiles are the 25th, 50th, and 75th percentiles. Q1 = 25th percentile (lower quartile). Q2 = 50th percentile (median). Q3 = 75th percentile (upper quartile). IQR = Q3 − Q1 = middle 50% of data. IQR is used for outlier detection: Tukey fence method gives lower fence = Q1 − 1.5×IQR, upper fence = Q3 + 1.5×IQR. Points outside these fences are potential outliers. Percentiles are more precise (any percentage) while quartiles divide into four equal parts. The quartile calculator returns Q1, Q2, Q3, IQR, and outlier fences for any dataset.
n = (z² × p × (1−p)) ÷ e². z = z-score for confidence level: 1.96 for 95%, 2.576 for 99%. p = expected proportion (use 0.5 to maximise required sample when unknown). e = margin of error as decimal. For 95% confidence, ±5% MOE: n = (1.96² × 0.5 × 0.5) ÷ 0.05² = 384. Key insight: halving the margin of error requires 4× the sample (quadratic relationship). Population size barely matters for large populations: n=384 achieves ±5% MOE whether the population is 10,000 or 10 million.
Bessel’s correction uses N-1 instead of N in sample variance calculations. Reason: when you use the sample mean as an estimate of the population mean, the data points are by definition pulled toward their own sample mean — they were used to calculate it. This makes the sample appear less spread out than the population actually is. Dividing by N systematically underestimates population variance. Dividing by N-1 corrects this. For N observations, there are only N-1 degrees of freedom for estimating variance: the Nth value is constrained by the others through the sample mean. N-1 provides an unbiased estimator of population variance; N does not when used on sample data.
Cohen’s d = (Mean1 − Mean2) ÷ Pooled SD. Measures the standardised difference between two means. Cohen’s benchmarks (1988): Small = 0.2 (58% distribution overlap), Medium = 0.5 (33% separation), Large = 0.8 (21% overlap). Context matters: d = 0.2 is “small” but meaningful for a cheap, safe intervention. d = 0.8 might be required in a high-stakes clinical context. Effect sizes should be reported alongside every p-value per APA standards. A p < 0.05 with d = 0.10 is significant but negligible. P = 0.07 with d = 0.60 is an underpowered study with a meaningful effect that failed to reach significance.
Ascending order: smallest to largest. Required before calculating median, quartiles, percentiles, IQR, and rank-based statistics. For {14, 3, 7, 21, 1, 9}: ascending = {1, 3, 7, 9, 14, 21}. Descending = {21, 14, 9, 7, 3, 1}. Many statistical errors stem from operating on unsorted data — if you calculate the “middle value” of an unsorted dataset, you get a random value, not the median. The ascending order calculator takes any list of numbers separated by commas, spaces, or line breaks, sorts them, and returns rank order alongside the sorted sequence.
Confidence interval width ∝ 1 ÷ √n. Quadratic relationship: to double precision (halve MOE), need 4× the sample. For a proportion at 95% confidence: n=100 gives ±9.8%, n=400 gives ±4.9%, n=1,600 gives ±2.5%. Each 4× increase in sample halves the margin of error. Switching from 95% to 99% confidence with the same sample: multiply required sample by (2.576÷1.96)² = 1.72 — 72% more observations needed. Population size has minimal impact for large populations: the finite population correction factor approaches 1.0 for samples under 10% of the population, which is almost always the case in national surveys.
No. Every statistics calculation runs entirely in your browser. Your datasets, test statistics, and all other inputs never leave your device. Nothing is logged or stored. Statistical results are only as valid as the data entered and the statistical assumptions met. Always verify data meets test assumptions (normality for parametric tests, independence of observations, appropriate measurement scale). For publication or high-stakes research, verify results with dedicated statistical software (R, SPSS, SAS, Stata) and have analyses reviewed by a qualified statistician.

Popular Calculators

Most used tools across all 14 categories

Related Calculator Categories

📊

Missing a Statistics Calculator?

Can’t find the tool you need? Tell us — we build new statistics calculators every week, prioritising the most-requested tools for researchers, analysts, and students.