Probability for Data Science

Probability is simply defined as a chance of something happening or the likelihood of an event is to happen.

Probability played a vital role in decision making that’s why companies’ executives based on it before going to take any decision prior to investment.

Probability for Data Science

An event can be likely or unlikely and having a specific outcome or several outcomes. Probability can be expressed numerically, percentage or fraction (33%, 1/6, 0.30).

Probability of an event A is denoted as P(A). General formula of probability is given below: –

probability formula
  • Probability of an event A is lies between  0 ≤ P(A) ≤ 1
  • If P(A) > P(B) then event A is more likely to occur than event B.
  • If P(A) = P(B) then events A and B are equally likely to occur. 
probability likelihood

Example: What is the probability of getting tail when tossing a coin?


Sample Space = {Head, Tail}
Total number of possible outcomes = 2
Number of favorable outcomes = 1  (because of only one head or tail) 

probability of getting tail
probability of getting tail

Experimental Probability:

It is the ratio of the number of success trials to the total number of trials is performed.

Experimental Probability

For instance, if a dice is rolled 500 times and the number ‘6’ occurs 100 times, then the experimental probability that ‘6’ shows up on the dice is 100/500=0.2

Let’s discuss trial and experiment as both of these terms are different from each other.

Trial – In trial, we observe an event occurring and recording its outcome, whereas,

Experiment –is the collection of one or more trials

Trial Experiment
Rolling a dice and recording the outcome Rolling a dice 6 times and recording the 6 individual outcomes

Above described theoretical probability cannot be calculated in some cases, so we need to depend on experimental probability.  Experimental probability describes the probability of an event happening when an experiment is conducted. It is commonly used in research and experiments.

Expected Value:

It is a specific outcome that we expect to occur when we run an experiment. Expected value can be Boolean, numerical, categorical or some other its depend on the type of event which is going to occur. The basic expected value formula is given below: –

E(X) = P(X) × n

For example,      E(A) = P(A) x n = 0.50 x 30 = 15

If there are multiple probabilities then expected value formula will be E(X) = ∑X P(X)

For instance, when we roll a six-sided die, it has an equal chance i.e. 1/6 of landing on 1, 2, 3, 4, 5, or 6. So, we can calculate as follow:

(1/6 × 1) + (1/6 × 2) + (1/6 × 3) + (1/6 × 4) + (1/6 × 5) + (1/6 ×6) = 3.5

If we roll a six-sided die an infinite amount of times, we will get an average value equals 3.5.

Probability Frequency Distribution

Probability Frequency Distribution is a collection of the probabilities for each possible outcome. It is a way to express how frequently an even can occur.


If there are 30 girls, 15 had black hair, 5 had brown hair, 5 had blond hair and 5 had red hair. Find out the probability a girl has neither blond nor red hair.


Hair Color Frequency Probability
Black 15 15/30
Brown 5 5/30
Blond 5 5/30
Red 5 5/30

Probability of a girl having black hair = 15/30

Probability of a girl having brown hair = 5/30

Total number of girls who have black or brown hair = 15/30 + 5/30 = 20/30 Therefore,20 out of 30 girls have either black or brown hair.


A’ or Ac is a complement of A which means everything an event is not

A + A= Sample space

A + A= 1

A = 1 – Ac

(Ac)c = A


Ac = B + C

It is important to note that sum of all possibilities must be equal to one P(A) + P(B) + P(C) = 1 (which means 100% certain)

We can explain the above equation with this example, a coin having two sides i.e. A & B, here, A is denoted as Head and B is denoted as Tail, then

head and tail probability
  • If P = 1, absolute certainty
  • If P = 1.5, does not make sense
  • If P < 1, Event not guaranteed to be occurred


When we roll a die five times we get, 1, 2, 3, 5, 6 and 4 not appeared.



A → 1, 2 , 3 , 5 , 6 B’ → 4 (not 4)

P(A) = P(1) + P(2) + P(3) + P(5) + P(6) = 1/6 + 1/6 + 1/6 + 1/6 + 1/6  = 5/6 P(B’) = P(4) = 1/6 (complement of B shows the absence of 4)


P(B) = 1 – P(B’) = 1 – 1/6  = 5/6

Therefore, P(A) = 5/6 = P(B)

Read also: Introduction to Data Science| A Beginner’s Guide


It is a branch of mathematics which deals with the combination of objects belonging to a specific finite set. The important parts of combinatorics are,

  • Permutations
  • Variations
  • Combinations

i. Permutations

It is a number of different possible ways we can arrange a set of elements where elements can be object, digit or people.

P(n) = n × (n-1) × (n-2) ×………………………..× 1 = n!

There is no repetition. For instance, if we have to arrange 3 students in a row then we have P(3) = 3! = 6 ways to arrange the students.


It is simply the product of a series of integers from 1 to n. It is nothing more than a notation which is denoted by a sign !

n! = 1 × 2 × 3 × …………………………………..× n

5! = 1 × 2 × 3 × 4 × 5


  • 0! = 1
  • If n < 0 then n! does not exist
  • There is no factorial of a negative number

Important properties:

important factorial properties
factorial properties


How many two letters arrangements could be made from the letters in “SPEAK”.


We know that,

factorial example

Two dependent task possibilities:

If we can perform an operation in two different ways i.e. m and n which are dependent on each other then the two operations can be performed in m × n ways.


If a stadium has 4 gates then how many ways a person can enter the stadium through one gate and come out from another gate?

Solution: In this situation, person has a choice to enter through 4 gates and come out from 3 gates, therefore, total number of ways are 4 × 3 = 12

Two independent task possibilities:

If we can perform an operation in two different ways i.e. m and n which are independent on each other then the two operations can be performed in m + n ways.


In a classroom, there are 20 students in which 12 are boys and 8 are girls. Class teacher intends to select a monitor of the class which is either a girl or boy then how many ways the class teacher can make the selection of monitor?


In this situation, class teacher has a choice to select a monitor from 12 boys and 8 girls, so, number of ways are 12+8 = 20

ii. Variations

Variation is the total number of different possible ways we can pick and arrange the same element of a given set.

Variation with repetition:

variation formula

Where, n = numbers of different elements we have available p = total number of elements we are going to arrange


If we have three alphabets a, b & c and 2 positions in which we can arrange them then how many different possible ways to pick and arrange these alphabets?


variation without repetition

iii. Combinations

It represents the number of different possible ways we can pick anumber of elements of a set.

Combination with repetition:

Combination without repetition:

For instance, pick up 4 students out of 10 students to send them for quiz program, then


All the different permutations of a single combination are different variation.

If you pick more elements then you have few combinations.

Symmetry of combinations:

We apply symmetry of combinations in order to avoid calculating factorial of large numbers and to simplify calculations.

Symmetry of combinations formula

For instance, pick up 6 students out of 10 students who don’t attend the quiz program, then

symmetry with respect to n over 2

Sometime, a combination can be a mixture of various smaller individual events, in such cases, we simply multiplying the number of options available for each individual event. For instance, if we go for lunch in a restaurant, there are 3 different kinds of juices and 12 dishes then we simply multiple 3 x 12 = 36


It is a collection of elements having certain values and every event has a set of outcomes that satisfy it. A set which has no value called null set or empty set which is denoted as Ø.

Element is denoted by small letter like ‘x’, whereas, set is denoted by capital letter like ‘A’,


Multiple Events:

In multiple events, there are two or more events.

Events never touch:

If two events never touch at all as shown in figure

events never touch

Means, these events never happened simultaneously. If event A occur then guarantees that event B is not occurring and if event B occur then guarantees that event A is not occurring.

Events partially intersect:

Events partially intersect or overlap means two events can occur at the same time as shown in figure

Events partially intersect

Events completely overlap:

Events completely overlap means one event can only occur when other event occur as well as shown in figure

In this situation, if event A does not occur then there is a guarantee that event B will not be occurred but in case of event B not occurring does not guarantee event A not occur.

More preciously, if an outcome is not part of a set then we assured that it cannot be the part of any of its subsets. Similarly, an outcome not being part of some subset does not exclude from the total of the larger set.


The intersections of two or more events show the set of outcomes that are favorable for both events A and B concurrently. Generally, we use intersection in such cases where both events happened simultaneously.

a intersection b


The union of two or more events shows a combination of all outcomes that are performed for either A or B.

a union b


Mutually exclusive sets are not allowed to have any overlapping elements. They have the empty set as their intersection. If the intersection of any number of sets is empty set, then they must be mutually exclusive.



Complement set is a set in which all values that are the part of the sample space but not part of the set. Complement set is not equal to mutually exclusive set because complement sets are always mutually exclusive sets but not vice versa. Let us try to clear this concept with this example,

If set A contain all odd numbers and set B contain all even numbers as shown in figure

Dependent Events:

If the likelihood of an event A happening is affected another event B happening then we say that A and B are dependent events. Here, outcome of event A depend on the outcome of another event B.

Independent Events:

If the likelihood of an event A happening is not affected another event B happening then we say that A and B are independent events. Here, outcome of event A does not depend on the outcome of another event B.

independent events


The likelihood of an event A is occurring, given event B has already happened. The formula of conditional probability is given below: –

conditional probability
conditional probability

P (A | B), it can be read as “conditional probability of A, given B”

  • If P(B) > 0 then event B is occurred
  • If P(B) = 0 then event B would never occurred

It is important to note that, P (A | B) is not same as P (B | A)


The probability of the union of two sets A and B is equal to the sum of its individual probabilities of each event minus probability of their intersection.

Additive Law
Additive Law


This rule is used to calculate the probability of the intersection based on the conditional probability.

multiplication rule


If an event B occur in 60% of the time then P(B) = 0.6 and event A occur in 30% of the time B occurs then P(A|B) = 0.3 then,

P(A|B) × P(B) = 0.3 × 0.6 = 0.18

They would simultaneously occur in 18% of the time.


This law is helpful to understand the relationship between two events by calculating the different conditional probabilities. It is used in medical research to find out the fundamental relationship between symptoms. For instance, 60% of patients with headache wear glasses while 35% of patients with eyesight issues have headache.

Mathematically, Bayes’ theorem is given by,

Probability for data science
Probability for data science

View Part (2) :

Download: Probability for Data Science pdf

1 thought on “Probability for Data Science”

Leave a Comment