ANN COURSE PROJECT — PROBABILITY CONCEPTS

Probability
Visualized.

Four interactive demos and a full presentation deck to build intuitive understanding of core probability concepts — the mathematical foundation of Artificial Neural Networks.

PRESENTATION — 21 SLIDES

📑

Probability Slides

21 slides across 7 sections — from axioms and distributions to MLE and cross-entropy loss — with a slide browser, thumbnail strip, and fullscreen mode.

Covers: Foundations · Rules · Bayes · Distributions · Expectation · MLE & Cross-Entropy

→

DEMO 01

⭕

Independence & Dependence

Explore how events relate through interactive Venn diagrams — mutually exclusive, overlapping, and fully dependent events.

ANN link: feature independence assumptions & Naïve Bayes classifiers

→

DEMO 02

🔄

Bayes' Theorem

Update your beliefs with new evidence. Medical testing and spam filtering scenarios bring Bayesian reasoning to life with real-time calculations.

ANN link: probabilistic classifiers & spam filters as ML models

→

DEMO 03

📈

Law of Large Numbers

Watch how random coin flips converge toward true probability as the number of trials grows. Uncertainty becomes certainty.

ANN link: why large training datasets make gradient estimates reliable

→

DEMO 04

📊

Softmax Function

See how a neural network turns raw scores into a probability distribution — and why exp() makes classifiers more decisive than simple normalization.

ANN link: output layer of every multi-class neural network classifier

→

ASSESSMENT

📝

20 MCQ Quiz

Test your understanding of all four demos and the presentation content. Instant feedback, score tracking, and curriculum references for every question.

Covers: Venn, Bayes, LLN, Softmax, Distributions, MLE & more

→

DEMO 01 — INTERACTIVE

Independence & Dependence

Having an intersection does not automatically mean independence. It depends on exactly how much the circles overlap — independence requires P(A∩B) = P(A)×P(B) precisely.

CONCEPTUAL MAP

All Event Relationships
  ├── ● Independent → overlap exists, but P(A∩B) = P(A)×P(B) exactly. Knowing B tells you nothing about A.
  └── ● Dependent → knowing one event changes the probability of the other
        ├── ◌ Mutually Exclusive → no intersection. If B happened, A definitely did NOT.
        ├── ◌ Overlap — wrong size → circles intersect but P(A∩B) ≠ P(A)×P(B)
        └── ◌ Subset → B inside A. If B happened, A definitely happened.

Mutually Exclusive — A special case of Dependence

No intersection. If B happens, A cannot. This is dependence.

Probabilities

P(A)0.40

P(B)0.30

P(A ∩ B) — actual0.00

P(A)×P(B) — if independent0.12

P(A ∪ B)0.70

P(A | B)0.00

P(B | A)0.00

Independence check — actual overlap vs expected

■ Actual P(A∩B) | Expected P(A)×P(B)

Formula

—

DEMO 02 — INTERACTIVE

Bayes' Theorem

How should you update your belief after seeing new evidence? Adjust the sliders and watch the posterior probability change in real time.

Scenario

P(Disease) — Base Rate1%

P(+|Disease) — Sensitivity95%

P(−|Healthy) — Specificity90%

Population Visualization

Out of 1000 people (each dot = 10 people)

Has Disease

True Positive

False Positive

Healthy (−)

Result

P(DISEASE | POSITIVE TEST)

—

True Positives (TP)—

False Positives (FP)—

False Negatives (FN)—

True Negatives (TN)—

Step-by-Step Calculation

1

Prior: P(D) = 0.01
P(Healthy) = 0.99

2

Likelihood: P(+|D) = 0.95
P(+|Healthy) = 1 − spec = 0.10

3

P(+) total = P(+|D)·P(D) + P(+|H)·P(H)
= —

4

Bayes: P(D|+) = P(+|D)·P(D) / P(+)
= —

Bayes' Theorem:
P(H|E) = P(E|H) · P(H) / P(E)

H = Hypothesis
E = Evidence

DEMO 03 — INTERACTIVE

Law of Large Numbers

Flip a coin thousands of times and watch the observed frequency converge to the true probability. Chaos becomes certainty — given enough trials.

Speed:

Coin bias: Fair (50%)

TOTAL FLIPS

0

HEADS

0

OBSERVED P(H)

—

TRUE P(H)

50%

?

Last 30 flips will appear here...

Convergence to True Probability

Observed P(H) True P(H)

Law of Large Numbers: As the number of trials increases, the sample mean approaches the expected value (true probability). Start flipping to see the convergence happen in real time.

DEMO 04 — INTERACTIVE

Softmax Function

Softmax converts raw scores (logits) from a neural network into a proper probability distribution — all values between 0 and 1, summing to exactly 1. It's the final step in almost every classifier.

Scenario

🌡 Temperature (T) 1.0

0.1 — more confident3.0 — more uniform

Output Probabilities

✓ Sum of all probabilities = 1.000

Step-by-Step Calculation

1

Raw logits (z):
—

2

Apply exp(z_i / T):
—

3

Sum of exponentials:
Σ exp = —

4

Divide each by the sum:
—

Softmax Formula:
σ(z)_i = exp(z_i / T) / Σ_j exp(z_j / T)

z = raw scores (logits) · T = temperature (default 1.0)
Output: probability distribution over classes

Why exp()? — Softmax vs. Simple Normalization

Simple normalization (dividing each score by the total) also produces values that sum to 1 — but it treats scores linearly. Softmax uses exp(), which amplifies differences: the highest score gets a disproportionately larger probability, making the model more decisive. This is critical for training via cross-entropy loss.

Probability
Visualized.

Probability Slides

Independence & Dependence

Bayes' Theorem

Law of Large Numbers

Softmax Function

20 MCQ Quiz

Independence & Dependence

Probabilities

Formula

Bayes' Theorem

Scenario

Population Visualization

Result

Step-by-Step Calculation

Law of Large Numbers

Softmax Function

Scenario

Output Probabilities

Step-by-Step Calculation

Why exp()? — Softmax vs. Simple Normalization

Softmax (exp-based)

Simple Normalization (linear)

ProbabilityVisualized.

Probability Slides

Independence & Dependence

Bayes' Theorem

Law of Large Numbers

Softmax Function

20 MCQ Quiz

Independence & Dependence

Probabilities

Formula

Bayes' Theorem

Scenario

Population Visualization

Result

Step-by-Step Calculation

Law of Large Numbers

Softmax Function

Scenario

Output Probabilities

Step-by-Step Calculation

Why exp()? — Softmax vs. Simple Normalization

Softmax (exp-based)

Simple Normalization (linear)

Probability
Visualized.