ANN COURSE PROJECT — PROBABILITY CONCEPTS

Probability
Visualized.

Four interactive demos and a full presentation deck to build intuitive understanding of core probability concepts — the mathematical foundation of Artificial Neural Networks.

PRESENTATION — 21 SLIDES
📑

Probability Slides

21 slides across 7 sections — from axioms and distributions to MLE and cross-entropy loss — with a slide browser, thumbnail strip, and fullscreen mode.

Covers: Foundations · Rules · Bayes · Distributions · Expectation · MLE & Cross-Entropy
DEMO 01

Independence & Dependence

Explore how events relate through interactive Venn diagrams — mutually exclusive, overlapping, and fully dependent events.

ANN link: feature independence assumptions & Naïve Bayes classifiers
DEMO 02
🔄

Bayes' Theorem

Update your beliefs with new evidence. Medical testing and spam filtering scenarios bring Bayesian reasoning to life with real-time calculations.

ANN link: probabilistic classifiers & spam filters as ML models
DEMO 03
📈

Law of Large Numbers

Watch how random coin flips converge toward true probability as the number of trials grows. Uncertainty becomes certainty.

ANN link: why large training datasets make gradient estimates reliable
DEMO 04
📊

Softmax Function

See how a neural network turns raw scores into a probability distribution — and why exp() makes classifiers more decisive than simple normalization.

ANN link: output layer of every multi-class neural network classifier
ASSESSMENT
📝

20 MCQ Quiz

Test your understanding of all four demos and the presentation content. Instant feedback, score tracking, and curriculum references for every question.

Covers: Venn, Bayes, LLN, Softmax, Distributions, MLE & more
DEMO 01 — INTERACTIVE

Independence & Dependence

Having an intersection does not automatically mean independence. It depends on exactly how much the circles overlap — independence requires P(A∩B) = P(A)×P(B) precisely.

CONCEPTUAL MAP
All Event Relationships
  ├── ● Independent  → overlap exists, but P(A∩B) = P(A)×P(B) exactly. Knowing B tells you nothing about A.
  └── ● Dependent  → knowing one event changes the probability of the other
        ├── ◌ Mutually Exclusive  → no intersection. If B happened, A definitely did NOT.
        ├── ◌ Overlap — wrong size  → circles intersect but P(A∩B) ≠ P(A)×P(B)
        └── ◌ Subset  → B inside A. If B happened, A definitely happened.
Mutually Exclusive — A special case of Dependence
No intersection. If B happens, A cannot. This is dependence.
sample space S A P(A)=0.40 B P(B)=0.30 A∩B

Probabilities

P(A)0.40
P(B)0.30
P(A ∩ B) — actual0.00
P(A)×P(B) — if independent0.12
P(A ∪ B)0.70
P(A | B)0.00
P(B | A)0.00
Independence check — actual overlap vs expected
■ Actual P(A∩B) | Expected P(A)×P(B)

Formula

DEMO 02 — INTERACTIVE

Bayes' Theorem

How should you update your belief after seeing new evidence? Adjust the sliders and watch the posterior probability change in real time.

Scenario

P(Disease) — Base Rate1%
P(+|Disease) — Sensitivity95%
P(−|Healthy) — Specificity90%

Population Visualization

Out of 1000 people (each dot = 10 people)

Has Disease
True Positive
False Positive
Healthy (−)

Result

P(DISEASE | POSITIVE TEST)
True Positives (TP)
False Positives (FP)
False Negatives (FN)
True Negatives (TN)

Step-by-Step Calculation

1
Prior: P(D) = 0.01
P(Healthy) = 0.99
2
Likelihood: P(+|D) = 0.95
P(+|Healthy) = 1 − spec = 0.10
3
P(+) total = P(+|D)·P(D) + P(+|H)·P(H)
=
4
Bayes: P(D|+) = P(+|D)·P(D) / P(+)
=
Bayes' Theorem:
P(H|E) = P(E|H) · P(H) / P(E)

H = Hypothesis
E = Evidence
DEMO 03 — INTERACTIVE

Law of Large Numbers

Flip a coin thousands of times and watch the observed frequency converge to the true probability. Chaos becomes certainty — given enough trials.

Speed:
Coin bias: Fair (50%)
TOTAL FLIPS
0
HEADS
0
OBSERVED P(H)
TRUE P(H)
50%
?
Last 30 flips will appear here...
Convergence to True Probability
Observed P(H) True P(H)
Law of Large Numbers: As the number of trials increases, the sample mean approaches the expected value (true probability). Start flipping to see the convergence happen in real time.
DEMO 04 — INTERACTIVE

Softmax Function

Softmax converts raw scores (logits) from a neural network into a proper probability distribution — all values between 0 and 1, summing to exactly 1. It's the final step in almost every classifier.

Scenario

🌡 Temperature (T) 1.0
0.1 — more confident3.0 — more uniform

Output Probabilities

✓  Sum of all probabilities = 1.000

Step-by-Step Calculation

1
Raw logits (z):
2
Apply exp(zi / T):
3
Sum of exponentials:
Σ exp =
4
Divide each by the sum:
Softmax Formula:
σ(z)i = exp(zi / T) / Σj exp(zj / T)

z = raw scores (logits) · T = temperature (default 1.0)
Output: probability distribution over classes

Why exp()? — Softmax vs. Simple Normalization

Simple normalization (dividing each score by the total) also produces values that sum to 1 — but it treats scores linearly. Softmax uses exp(), which amplifies differences: the highest score gets a disproportionately larger probability, making the model more decisive. This is critical for training via cross-entropy loss.

Softmax (exp-based)

Simple Normalization (linear)