Accuracy is a comforting number.
It looks decisive. Clean. Simple.
In regulated environments, that’s dangerous.
In payments, small failure rates compound.
In banking, edge cases become regulatory events.
In infrastructure, “mostly correct” is not sufficient.
That’s why evaluation metrics matter more than model complexity.
This week I revisited precision, recall, and confusion matrices.
On an imbalanced dataset, a classifier can show 95% accuracy while failing on the minority class entirely.
That’s not performance.
That’s blindness.
The parallel with regulated delivery is obvious.
In banking, you design controls for tail risk.
In ML, you design evaluation for minority error.
Precision and recall are not academic constructs.
They represent tradeoffs between false positives and false negatives — between friction and risk.
In regulated domains, those tradeoffs must be explicit.
Machine learning is not just modelling.
It’s governance expressed in math.