Discrete Probability Distributions

A Probability Distribution is a table/graph that depicts the assignment of probabilities to the assumption of specific values by a given random variable.

The following concepts are useful to understand probability distributions:

  • If Event A can occur in p possible ways and Event B can occur in q possible ways, then both A and B can occur in p x q ways.
  • The number of different ways that a set of objects can be arranged is called Combination. The number of combinations of n objects taken r at a time is given by nCr = n! / (n - r)! r!
  • The number of different ways that a set of objects can be arranged in order is called Permutation. The number of permutations of n objects taken r at a time is given by
    nPr = n! / (n - r)!
Here is a PL/SQL code snippet to compute factorial:
FUNCTION factorial(p_n IN NUMBER) RETURN NUMBER IS
BEGIN
IF p_n IS NULL OR p_n < 0 THEN
RAISE_APPLICATION_ERROR(-20000, 'Invalid Input Value');
ELSIF p_n <= 1 THEN
RETURN 1;
ELSE
RETURN factorial(p_n-1) * p_n;
END IF;
END;
EX> Compute 9!
select factorial(9) from dual;
I was curious to see how far I can push this function - the maximum value of n was 83.7 with NUMBER types, and 84.7 when I changed the input parameter and return type to BINARY_DOUBLE
SQL> select factorial(83.7) from dual;
FACTORIAL(83.7)
---------------
9.642E+125
SQL> select factorial(83.71) from dual;
FACTORIAL(83.71)
---------------
~
SQL> select factorial2(84.7) from dual;
FACTORIAL2(84.7)
----------------
8.167E+127
SQL> select factorial2(84.71) from dual;
FACTORIAL2(84.71)
-----------------
Inf
EX> Compute the number of combinations of 9 objects taken 3 at a time.
select factorial(9)/(factorial(9-3) * factorial(3)) from dual;
EX> Compute the number of different ways of arranging 9 objects taken 3 at a time.
select factorial(9)/factorial(9-3) from dual;
Discrete Probability Distributions
  • The discrete probability distribution is a table that lists the discrete variables (outcomes) of an experiment with the relative frequency (a k a probability) of each outcome.
    Example: Tossing a coin two times gives you the combinations (H,H), (H,T), (T,H), (T,T) and hence, the following tuples for (#Heads, Frequency, Relative_Frequency):
    (0, 1, 1/4=0.25), (1, 2, 2/4=0.5), (2, 1, 1/4=0.25).
    This is the probability distribution for # heads after flipping a coin twice.
  • Mean or Expected value of the discrete probability distribution μ = ∑i=1_to_n xi * P(xi) For the coin example, μ = 0 * 0.25 + 1 * 0.5 + 2 * 0.25 = 1
  • Variance of the discrete probability distribution σ² = ∑i=1_to_n (xi - μ)² * P(Xi)
  • Standard deviation is the square root of the variance
Binomial Probability Distribution
A binomial or Bernoulli experiment is one which consists of a fixed number of trials, each independent of the other, each with only two possible outcomes, with a fixed probability for success or failure representation in each outcome. The Bernoulli process counts the number of successes over a given number of attempts, or in other words, the random variable for a Binomial distribution is the number of successes over given number of attempts.
  • The probability of r successes in n trials with probability of success p and probability of failure q is given by P(r, n) = (n! / (n - r)! r!) pr q(n - r)
  • The binomial probability distribution is a table of (r, P(r, n)) which can be subsequently graphed, as discussed in this example
EX> Over the next 7 days, assume a 40% chance of rain and 60% chance of no rain. The probability that it will rain exactly 2 days over the next 7 days is P(2, 7) = (7! / (7 - 2)! 2!) 0.42 0.6(7 - 2), which can be computed using
select factorial(7) * power(0.4,2) * power(0.6,(7-2))/
(factorial(7-2) * factorial(2)) p_2_7
from dual;
The probability that it will rain at least 6 days over the next 7 days is P(r >= 6) = P(6,7)+P(7,7), computed using
select (factorial(7) * power(0.4,6) * power(0.6,(7-6))/
(factorial(7-6) * factorial(6))) +
(factorial(7) * power(0.4,7) * power(0.6,(7-7))/
(factorial(7-7) * factorial(7))) p_r_ge_6
from dual;
Finally, the probability that it will rain no more than 2 days over the next 7 days is P(r <= 2) = P(0,7) + P(1,7) + P(2,7)
  • The mean of a binomial distribution is μ = np
  • The standard deviation is σ² = npq
Excel has a function BINOMDIST(r, n, p, cumulative). p is the probability of success, set cumulative=TRUE if you want the probability of r or fewer successes, set cumulative=FALSE if you want exactly r successes. Here is the PL/SQL version:
FUNCTION binomdist(r NUMBER, n NUMBER, p NUMBER, cumulative BOOLEAN DEFAULT FALSE) RETURN NUMBER IS
ri NUMBER;
ret NUMBER;
fn NUMBER;

BEGIN
ret := 0;
fn := factorial(n);
FOR ri IN REVERSE 0..r LOOP
ret := ret + (fn * power(p, ri) * power((1-p),(n - ri)))/
(factorial(n - ri) * factorial(ri));
IF NOT cumulative THEN
EXIT;
END IF;
END LOOP;
RETURN ret;
END binomdist;

Poisson Probability Distribution
The random variable for Poission distribution is the number of occurrences of the event over a measurable metric (time, space). In a Poisson process, the (measured) mean number of occurences of an event is the same for each interval of measurement, and the number of occurrences in a particular interval are independent of number of occurrences in other intervals.

  • The probability of exactly r occurrences over a given interval is given by P(r) = μr * e(-μ)/ r!
  • The variance of the Poisson distribution is the same as the (observed) mean.
  • A goodness of fit test helps verify if a given dataset fits the Poisson distribution
A simple example of a Poisson process is customer arrival at your favorite coffee shop. Assume that you know that an average of 25 customers walk into a Dunkin Donuts every hour, then the likelihood of exactly 31 customers walking into the customer in the next hour is
select power(25,31) * exp(-25)/factorial(25) p_31
from dual;
Just as we saw in Binomial distribution, the probability that no more than 31 customers will walk into the coffee shop is P(r <= 31) = P(0)+P(1)+..+P(31). Inversely, the probability that at least 31 customers will walk into the coffee shop is P(r >= 31) = 1 - P(r < 31). Obviously, this leads up to the need for a function similar to POISSON(r, μ, cumulative) in Excel - where cumulative = FALSE indicates computation of exactly r occurrences, and cumulative = TRUE indicates r or fewer.
FUNCTION poissondist(r NUMBER, mu NUMBER,
cumulative BOOLEAN DEFAULT FALSE) RETURN NUMBER IS
ri NUMBER;
ret NUMBER;
BEGIN
ret := 0;
FOR ri IN REVERSE 0..r LOOP
ret := ret + (power(p, ri) * exp(-mu)/factorial(ri));
IF NOT cumulative THEN
EXIT;
END IF;
END LOOP;
RETURN ret;
END poissondist;

Poisson approximation - a Poisson distribution can be used to approximate a Binomial distribution if the number of trials (in the binomial experiment) is >= 20 and the probability of success p is <= 5%.

No comments: