Question 1

What is accuracy and what is the formula?

Accepted Answer

Accuracy is the proportion of all classifications that are correct, both positive and negative. Formula: Accuracy equals (TP plus TN) divided by (TP plus TN plus FP plus FN), where TP is true positives, TN is true negatives, FP is false positives, and FN is false negatives. Example: a model with 80 TP, 70 TN, 10 FP, and 20 FN has accuracy equals (80 plus 70) divided by (80 plus 70 plus 10 plus 20) equals 150 divided by 180 equals 83.3 percent.

Question 2

What is precision and how is it different from accuracy?

Accepted Answer

Precision (also called positive predictive value or PPV) measures how many of your positive predictions are actually correct. Formula: Precision equals TP divided by (TP plus FP). A model with 80 TP and 10 FP has precision equals 80 divided by 90 equals 88.9 percent. Accuracy measures overall correctness across both classes. Precision measures quality of positive predictions only. When false positives are very costly (spam filters, medical overdiagnosis), optimize for precision.

Question 3

What is recall (sensitivity) and when should I use it?

Accepted Answer

Recall (also called sensitivity or true positive rate) measures how many actual positives were correctly identified. Formula: Recall equals TP divided by (TP plus FN). A model with 80 TP and 20 FN has recall equals 80 divided by 100 equals 80 percent. Use recall when missing a positive is very costly: cancer detection (false negatives miss tumors), fraud detection (false negatives miss fraud), safety alerts. In these cases, maximize recall even at the cost of more false alarms.

Question 4

What is the F1 score and how is it calculated?

Accepted Answer

The F1 score is the harmonic mean of precision and recall, providing a single metric that balances both. Formula: F1 equals 2 times (Precision times Recall) divided by (Precision plus Recall). Example: Precision equals 0.889, Recall equals 0.80. F1 equals 2 times (0.889 times 0.80) divided by (0.889 plus 0.80) equals 1.422 divided by 1.689 equals 84.2 percent. The F1 score is most useful when you want to balance precision and recall, especially with imbalanced datasets.

Question 5

What is balanced accuracy and when should I use it?

Accepted Answer

Balanced accuracy equals (sensitivity plus specificity) divided by 2, or equivalently equals (recall plus TNR) divided by 2. It accounts for class imbalance by averaging performance on positive and negative classes separately. When one class is much more common, regular accuracy is misleading: a model predicting all negatives gets 99% accuracy on a 99% negative dataset but 0% sensitivity. Balanced accuracy would be 50% for that useless model, correctly identifying it as no better than chance.

Question 6

What is the Matthews Correlation Coefficient (MCC) and why use it?

Accepted Answer

MCC equals (TP times TN minus FP times FN) divided by the square root of (TP plus FP) times (TP plus FN) times (TN plus FP) times (TN plus FN). MCC ranges from minus 1 (perfect inverse prediction) through 0 (random) to plus 1 (perfect prediction). It is considered the single best metric for imbalanced binary classification because it uses all four cells of the confusion matrix and gives a high score only when the model performs well on both classes. MCC equals 1 only when TP, TN are high and FP, FN are both zero.

Question 7

When is accuracy a misleading metric?

Accepted Answer

Accuracy is misleading when classes are imbalanced. Example: 1000 transactions, 10 fraudulent (1%). A model predicting all transactions as legitimate gets 99% accuracy while detecting zero fraud. This is why precision, recall, F1 score, balanced accuracy, and MCC are preferred for imbalanced datasets. As a rule of thumb, if the minority class is less than 20 percent of the data, use balanced accuracy or MCC instead of accuracy.

Question 8

What is specificity and how is it different from precision?

Accepted Answer

Specificity (also called true negative rate or TNR) equals TN divided by (TN plus FP). It measures how many actual negatives are correctly identified as negative. Precision equals TP divided by (TP plus FP) and measures how many predicted positives are actually positive. Specificity and sensitivity (recall) are threshold-invariant metrics used in clinical testing. Precision depends on the prevalence of positives in the population (it changes when class balance changes). Specificity does not depend on prevalence.

Question 9

What is negative predictive value (NPV)?

Accepted Answer

Negative predictive value (NPV) equals TN divided by (TN plus FN). It measures how many predicted negatives are actually negative — the probability that a negative result is truly negative. Critical in medical testing: if a test returns negative for cancer, what is the probability the patient truly does not have cancer? NPV depends on disease prevalence. A test with high sensitivity and specificity will have very high NPV when prevalence is low (rare disease) but lower NPV when prevalence is high.

Question 10

What is the difference between sensitivity and specificity?

Accepted Answer

Sensitivity (recall, TPR) equals TP divided by (TP plus FN). It measures the ability to correctly identify positives. High sensitivity means few false negatives. Specificity (TNR) equals TN divided by (TN plus FP). It measures the ability to correctly identify negatives. High specificity means few false positives. There is typically a trade-off: lowering the classification threshold increases sensitivity but decreases specificity. Medical screening tests prioritize high sensitivity to avoid missing disease. Confirmatory tests prioritize high specificity to avoid false alarms.

Question 11

How do I choose between accuracy, F1 score, and MCC?

Accepted Answer

Use accuracy when classes are balanced (roughly equal positive and negative samples) and all errors cost the same. Use F1 score when classes are moderately imbalanced and you want to balance precision and recall. Use MCC when classes are highly imbalanced or you want a single reliable metric that handles all four cells of the confusion matrix. MCC is recommended by many researchers as the best single metric for binary classification evaluation. Balanced accuracy is a good alternative to MCC when MCC values are hard to interpret.

Question 12

What is a confusion matrix?

Accepted Answer

A confusion matrix is a 2x2 table comparing actual vs predicted classes in binary classification. It contains: TP (true positives) = correctly predicted positive cases; TN (true negatives) = correctly predicted negative cases; FP (false positives) = negative cases incorrectly predicted as positive (Type I error); FN (false negatives) = positive cases incorrectly predicted as negative (Type II error). All classification metrics including accuracy, precision, recall, F1, specificity, and MCC are derived from these four values.

Metric	Formula	Meaning	Best For
Accuracy	(TP+TN)/n	Overall fraction correct	Balanced classes
Precision (PPV)	TP/(TP+FP)	Of predicted positives, how many are real	Costly false positives
Recall (Sensitivity)	TP/(TP+FN)	Of actual positives, how many found	Costly false negatives
Specificity (TNR)	TN/(TN+FP)	Of actual negatives, how many identified	Clinical tests
F1 Score	2PR/(P+R)	Harmonic mean of precision & recall	Imbalanced data
Balanced Accuracy	(Recall+Specificity)/2	Average per-class accuracy	Highly imbalanced
MCC	(TP×TN−FP×FN)/√(…)	Correlation between actual & predicted	Best overall for imbalanced
NPV	TN/(TN+FN)	Of predicted negatives, how many are real	Ruling out disease
FPR	FP/(FP+TN)	False alarm rate (1 − Specificity)	ROC curve x-axis
FNR	FN/(FN+TP)	Miss rate (1 − Recall)	Safety-critical

Accuracy Calculator

Sources & Methodology

Accuracy, Precision, Recall, F1 — Complete Guide to Classification Metrics

The Confusion Matrix Explained

All 12 Classification Metrics — Formulas and Interpretation

When Accuracy Is Misleading — Class Imbalance Problem

Precision vs Recall Trade-off

Matthews Correlation Coefficient (MCC) — The Best Single Metric

Missing a Statistics Calculator?