Practice Exercise inferential statistics¶

- Jayabharathi Hari(https://www.jayabharathi-hari.com/)¶

Importing the necessary libraries¶

In [1]:

#Scipy has all the probability distributions available along with many statistical functions
import scipy.stats as stats 
import numpy as np

Problems on Probability¶

A multinational bank is concerned about the waiting time of its customers before they use the ATM for their transactions. A study of a random sample of 500 customers reveals the following probability distribution. The below table shows the distribution.

Waiting time(in minutes)	Probability
0	0.2
1	0.18
2	0.16
3	0.12
4	0.1
5	0.09
6	0.08
7	0.04
8	0.03

a) What is the probability that a customer will have to wait for more than 5 minutes?

In [2]:

poisson = np.array([.2,.18,.16,.12,.1,.09,.08,.04,.03])

In [59]:

prob = np.sum(poisson[6:]) # use np.sum instead of value.sum().
print(f"The probability that a customer will have to wait for more than 5 minutes is {prob.round(4)}")

The probability that a customer will have to wait for more than 5 minutes is 0.006

b) What is the probability that when customers visit the bank they will not have to wait at all?

In [4]:

prob = poisson[0]
print(f"The probability that the customer will not have to wait at all is {prob:.3f}")

The probability that the customer will not have to wait at all is 0.200

c) What is the expected waiting time for a customer?¶

Hint : Expected value = $\sum P(Xi) * Xi$

In [67]:

value = np.array([(i*p) for i, p in enumerate(poisson)])
#print(value)
expected_value = np.sum(value)
print(f"Expected wait time for the customer is {expected_value.round(4)} minutes")

Expected wait time for the customer is 1.6 minutes

2) The past 12 months data of a car renting company shows that 45.8% of people rented a car for business purposes, 54% rented a car for personal reasons, and 30% for both personal & business reasons.¶

a) What is the probability that a car was rented for business or personal reasons.

Hint : P(A or B) = P(A) + P(B) - P(A and B)

In [64]:

# not mutually exclusive events, need to subtract out the intersection
prob = .458 + .54 - .30
print(f"The probability that a car was rented for business or personal reasons is {prob:.3f}")
print("The probability that a car was rented for business or personal reasons is", round(prob, 4))

The probability that a car was rented for business or personal reasons is 0.698
The probability that a car was rented for business or personal reasons is 0.698

b) What is the probability that a car was not rented for either business or personal reasons

Hint: Complement Rule

Complement of P(A) = 1 - P(A)

In [68]:

print(f"The probability that a car was not rented for either business or personal reasons is {1-prob:.3f}")
print("The probability that a car was not rented for either business or personal reasons is", round(1-prob, 3))

The probability that a car was not rented for either business or personal reasons is 0.302
The probability that a car was not rented for either business or personal reasons is 0.302

Problems on Marginal, Joint and Conditional Probability¶

A multinational firm conducted a job satisfaction survey on a sample of its employees. The table below summarizes the result of the survey. Assuming that the sample drawn is a random sample from its entire employee population, the HR would like to know the probability of the following events.

Education Level	Satistfied	Neutral	Dissatisfied	Highly Dissatisfied	Total
Did not complete High School	10	20	30	40	100
High School graduate	20	30	25	50	125
Some college	30	60	35	25	150
College Graduate	120	40	30	10	200
Post-Graduate	60	15	0	0	75
Total	240	165	120	125	650

a) What is the probability that a randomly selected employee would be satisfied?

In [8]:

prob = (240/650)
print(f"The probability of a satisfied employee is {prob:.3f}")

The probability of a satisfied employee is 0.369

b) What percentage of employees are at a risk of attrition? (Assume a dissatisfied employee will attrite eventually.)

In [9]:

perc = (120+125)/650*100
print(f"The percentage of employees at a risk of attrition is {perc:.1f}%")

The percentage of employees at a risk of attrition is 37.7%

c) If a dissatisfied employee attrite with probability 0.4 within a year, what is the probability of an employee to attrite in a year? (Do not consider neutral employees)

Hint : P(An employee will attrite) = P(Employee dissatisfied AND Employee attrite) = P(Employee dissatisfied) * P(Employee attrite|Dissatisfied)

In [10]:

prob = 0.4 * ((120 + 125)/650) 
print(f"The probability that an employee will attrite is {prob:.3f}")

The probability that an employee will attrite is 0.151

d) If an employee is selected randomly, what is the probability that he/she at least a high school graduate?

Hint : P(At least High School) = P(High school) + P(Some college) + P(College Grad) + P(Post-Grad) = 1 – P(Did not complete school)

In [11]:

prob = 1 - (100/650)
print(f"The probability that the employee is at least a high school graduate is {prob:.3f}")

The probability that the employee is at least a high school graduate is 0.846

e) What is the probability that employees who are college graduate or more are satisfied?

In [12]:

prob = (180/275)
print(f"The probability that employees who are college graduate or more are satisfied is {prob:.3f}")

The probability that employees who are college graduate or more are satisfied is 0.655

f) Among the employees who have at least some college education, what percentage of employees are either satisfied or neutral?

Hint : Calculate Probability(Not dissatisfied | at least some college)

In [69]:

prob = (90 + 160 + 75)/(150 + 200 + 75) * 100
print(f"The percentage of employees with some college education who are either satisfied or neutral is {prob:.1f}%")

The percentage of employees with some college education who are either satisfied or neutral is 76.5%

g) Among the employees who are not satisfied, what percentage of employees are college graduate or more?

Hint : Calculate Probability(College Grad or Post-grad | Not satisfied)

In [14]:

prob = ((40 + 30 + 10 + 15)/(165 + 120 + 125)) * 100
print(f"Among those employees not satisfied, the percentage of employees who are college grads or more is {prob:.2f}%")

Among those employees not satisfied, the percentage of employees who are college grads or more is 23.17%

h) Among the employees who have not had any college education, what percentage of employyes are not highly dissatisfied?

In [15]:

prob = (1 - (90/225)) * 100
print(f"Among those employees with no college, the percentage of employees who are highly dissatisfied is {prob:.1f}%")

Among those employees with no college, the percentage of employees who are highly dissatisfied is 60.0%

A company collected data on how many people were planning to purchase iPhone Xs Max and how many of them ended up actually placing the order and how many didn't:

Planned to purchase Apple iPhone Xs Max	Actually placed an order for Apple iPhone Xs Max- Yes	Actually placed an order for Apple iPhone Xs Max - No	Total
Yes	400	100	500
No	200	1300	1500
Total	600	1400	2000

a) What percenatge of the population makes a planned purchase?

In [16]:

prob = 400/2000 * 100
print(f"The percentage of the population that made a planned purchase is {prob:.1f}%")

The percentage of the population that made a planned purchase is 20.0%

b) Out of all those people who planned to buy an iPhone, What percentage of them actually placed an order?

In [17]:

prob = 400/500* 100
print(f"The percentage of those who planned to buy an iPhone and actually bought one is {prob:.1f}%")

The percentage of those who planned to buy an iPhone and actually bought one is 80.0%

3) Visa Card studied how frequently, young consumers, ages 18-24, use plastic cards. The results provided the following probabilities.¶

• Probability that a consumer uses a plastic card when making a purchase .37

• Given that consumer uses a plastic card, there is a .19 probability That the consumer is 18-24 years old.

• Given that consumer uses a plastic card, there is a .81 probability That the consumer is 24+ years old.

• 14% of the consumer population is between 18-24 years

Age Group	18-24	older than 24	Total
Consumer uses plastic cards	0.37 * 0.19 = 0.0703	0.81* 0.37=0.2997	0.37
Consumer does not use plastic cards	0.14 - 0.0703 = 0.0697	0.86 - 0.2997 =0.5603	1 - 0.37 = 0.63
Total	0.14	1-0.14 = 0.86	1

a) Given the consumer is between 18-24, what is the probability that the consumer uses plastic card

In [18]:

prob = .0703/.14
print(f"The probability that the customer uses plastic card is {prob:.3f}")

The probability that the customer uses plastic card is 0.502

b) Given the consumer is 24+, what is the probability that the consumer uses plastic card

In [19]:

prob = .2997/.86
print(f"The probability that the customer uses plastic card is {prob:.3f}")

The probability that the customer uses plastic card is 0.348

Problem on Bayes' Theorem¶

A local bank reviewed its credit card policy with the intention of recalling some of its credit cards. In the past approx. 5% of cardholders defaulted, leaving the bank unable to collect the outstanding balance. Hence, management established a prior probability of 0.05 that any particular cardholder will default. The bank also found that the probability of missing a monthly payment is .20 for customers who do not default. The probability of missing a monthly payment for those whose default is 0.5.

Given that a customer missed one or more monthly payments, compute the probability that a customer will default

Hint : Bayes' Theorem

No description has been provided for this image

P(Default|Missed) = P(Missed|Default) * P(Default) / (P(Missed|Default) * P(Default) + P(Missed|Not-Default) * P(Not-Default))

P(Missed|Not-Default) : Probability of missing the payment given that the customer doesn't default

In [20]:

prob = (.5*.05)/((.5*.05) + (.2*.95))
print(f"The probability that a customer will default is {prob:.3f}")

The probability that a customer will default is 0.116

2) A computer component is given scores (A, B, C) after production.¶

On an average, 70% components were given a score of A, 18% were given a score of B and 12% a score of C.
It was found that 2% of the components that were given a score of A, 10% that were given a score of B and 18% that were given a score of C, eventually failed.
If you randomly pickup a failed component, what is the probability that it had received a score of B?

Hint :

The probablity will be given by the below formula:

P(B | F) = Probability that quality score was B and the component failed/ Probability of any quality score and the component failed

It can be written as:

P(B | F) = P(B ∩ F)* P(B) / P(F)

and the formula can be break down into:

In [21]:

prob = (.18*.10)/(.7*.02 + .18*.10 + .12*.18)
print(f"The probability that the quality is B, given the component failed is {prob:.3f}")

The probability that the quality is B, given the component failed is 0.336

Problems on Binomial Distribution¶

HHH HHT HTH HTT THH THT TTH TTT

In [22]:

p = .083333*3
.60*.45

Out[22]:

0.27

1) You flip a fair coin 10 times. What is the probability of getting 8 or more heads?¶

In [23]:

#probability of flipping a heads
p = .75
n = 7
k = np.arange(0, 15)
binomial = stats.binom.pmf(k, n, p)
binomial[1:4].sum()

Out[23]:

0.07049560546875004

In [24]:

prob = sum(binomial[8:])
print(f"The probability of getting 8 or more heads is {prob:.3f}")

The probability of getting 8 or more heads is 0.000

2) The probability that you will win a certain game is 0.3. You play the game 20 times. What is the mean of this binomial distribution?¶

Hint :Mean = n * p

In [25]:

def binom_mean(n: int, p: float) -> float:
    return n * p

assert binom_mean(n=10, p=.1) == 1

In [26]:

binomial_mean = binom_mean(n=20, p=.3)
print(f"The mean of this distribuation is {binomial_mean:.1f}")

The mean of this distribuation is 6.0

3) An automatic camera records the number of cars running a red light at an intersection (that is, the cars were going through when the red light was against the car). Analysis of the data shows that on average 15% of light changes record a car running a red light. Assume that the data has a binomial distribution. What is the probability that in 20 light changes there will be exactly three (3) cars running a red light?¶

In [27]:

p = .15
n = 20
k = np.arange(0, 20)
binomial = stats.binom.pmf(k, n, p)
print(f"The probability of exactly three cars running the red light is {binomial[3]:.3f}")

The probability of exactly three cars running the red light is 0.243

4. There are 15 sets of traffic lights on the journey.The probability that a driver must stop at any one traffic light coming to Alliance University is 0.3¶

a) What is the probability that a student must stop at exactly 2 of the 15 sets of traffic lights?

In [28]:

p = .3
n = 15
k = np.arange(0, 16)
binomial = stats.binom.pmf(k, n, p)
print(f"The probability that a student must stop at exactly 2 traffic lights is {binomial[2]:.3f}")

The probability that a student must stop at exactly 2 traffic lights is 0.092

b) What is the probability that a student will be stopped at 1 or more of the 15sets of traffic lights?

In [29]:

print(f"The probability that a student will be stopped at 1 or more traffic lights is {1 - binomial[0]:.3f}")

The probability that a student will be stopped at 1 or more traffic lights is 0.995

5) Determine the mean and standard deviation of the variable X in each of the following binomial distributions:¶

a) n = 4 and p = 0.10

b) n = 5 and p = 0.80

In [30]:

def binom_stddev(n: int, p: float) -> float:
    return (binom_mean(n, p) * (1-p))**(1/2)

n, n2 = 4, 5
p, p2 = .1, .8

print(f"a) mean: {binom_mean(n, p)}  \n   standard deviation: {binom_stddev(n, p):.2f}\n")
print(f"b) mean: {binom_mean(n2, p2)} \n   standard deviation: {binom_stddev(n2, p2):.2f}")

a) mean: 0.4  
   standard deviation: 0.60

b) mean: 4.0 
   standard deviation: 0.89

6) A survey including 500 individuals(ages 25 and older) was conducted, it shows 28% of individuals have completed 4 years of college. For a sample of 20 individuals, answer the following.¶

a) What is the probability 4 people would have completed four years of college

In [31]:

p = .28
n = 20
k = np.arange(0, 21)
binomial = stats.binom.pmf(k, n, p)
print(f"The probability 4 people would have completed 4 years of college is {binomial[4]:.3f}")

The probability 4 people would have completed 4 years of college is 0.155

b) What is the probability that 3 or more people would have completed 4 years of college

In [32]:

print(f"The probability that 3 people would have completed 4 years of college is {binomial[3:].sum():.3f}")

The probability that 3 people would have completed 4 years of college is 0.947

7) If the likelihood of a tagged order form is 0.1, What is the probability that there are three tagged order forms in the sample of four?¶

Hint: Use Binomial distribution equation

In [33]:

p = .1
n = 4
k = np.arange(0, 5)
binomial = stats.binom.pmf(k, n, p)
print(f"The probability that there are three tagged order forms in the sample of four is {binomial[3]:.3f}")

The probability that there are three tagged order forms in the sample of four is 0.004

8) Determine the following:¶

a) For n = 4 and p = 0.12, what is P(X = 0)?

b) For n = 6 and p = 0.83, what is P(X = 5)?

In [34]:

binomial1 = stats.binom.pmf(np.arange(0, 5), 4, .12)
binomial2 = stats.binom.pmf(np.arange(0, 7), 6, .83)
print(f"a) P(X=0) = {binomial1[0]}")
print(f"b) P(X=5) = {binomial2[5]}")

a) P(X=0) = 0.59969536
b) P(X=5) = 0.4017821455860002

Problems on Poisson Distribution¶

1) Assume a poisson distribution with lambda = 5.0. What is the probability that¶

a) X <= 1?

In [35]:

rate = 5
n = np.arange(0 , 20)
prob = stats.poisson.pmf(n, rate)[:2].sum()
print(f"P(X <= 1) is {prob}")

P(X <= 1) is 0.040427681994512805

In [ ]:

b) X > 1?

In [36]:

print(f"P(X > 1) is {1-prob}")

P(X > 1) is 0.9595723180054871

Hint: Use Poisson distribution equation, find X = 0, Given lambda = 2.5

a) What is the probability that in a given month, no work-related injuries occur?

In [37]:

rate = 2.5
n = np.arange(0, 20)
poisson = stats.poisson.pmf(n, rate)
print(f"The probability that in a given month, no work-related injuries is {poisson[0]}")

The probability that in a given month, no work-related injuries is 0.0820849986238988

b) That at least one work- related injury occurs?

In [38]:

print(f"The probability that at least one work-related injury occurs is {poisson[1:].sum()}")

The probability that at least one work-related injury occurs is 0.9179150013726206

3) A 5-liter bucket of water is taken from a swamp. The water contains 75 mosquito larvae. A 200mL flask of water is taken from the bucket for further analysis. What is¶

Hint:

lambda = Number of larvae in 5L = 75

Number of larvae in 1mL = 75/5000

Number of larvae in 200mL = (75/5000) * 200

lambda = 3

a) the expected number of larvae in the flask?

Hint: expected value of a poisson distribution is equal to lambda

In [39]:

rate = 3
n = np.arange(0, 20)
poisson = stats.poisson.pmf(n, rate)
expected_value = np.array([(i*p) for i,p in enumerate(poisson)]).sum()
print(f"The expected value is {expected_value}")

The expected value is 2.999999998323492

b) the probability that the flask contains at least one mosquito lava?

In [40]:

print(f"The probability that the flask contains at least one mosiquito lava is {poisson[1:].sum()}")

The probability that the flask contains at least one mosiquito lava is 0.9502129315489921

4) A bank is interested in studying the number of people who use the ATM located outside during night hours. On an average, 1.6 customers walk up to the ATM during a 10-minute interval, between 9 pm and midnight.¶

In [41]:

rate = 1.6
n = np.arange(0, 20)
poisson = stats.poisson.pmf(n, rate)
poisson

Out[41]:

array([2.01896518e-01, 3.23034429e-01, 2.58427543e-01, 1.37828023e-01,
       5.51312092e-02, 1.76419869e-02, 4.70452985e-03, 1.07532111e-03,
       2.15064222e-04, 3.82336394e-05, 6.11738231e-06, 8.89801063e-07,
       1.18640142e-07, 1.46018636e-08, 1.66878441e-09, 1.78003670e-10,
       1.78003670e-11, 1.67532866e-12, 1.48918103e-13, 1.25404719e-14])

a) Find the probability of exactly 3 customers using the ATM in a 10-minute interval.

In [42]:

print(f"The probability of exactly three customers using the ATM in a 10 minute interval is {poisson[3]}")

The probability of exactly three customers using the ATM in a 10 minute interval is 0.13782802295101812

b) What is the probability of 3 or fewer customers using the ATM?

In [43]:

print(f"The probability of threee fewer customers using the ATM is {1 - poisson[4:].sum()}")

The probability of threee fewer customers using the ATM is 0.9211865127702822

Problems on Normal Distribution¶

1) A radar unit is used to measure speeds of cars on a Highway. The speeds are normally distributed with a mean of 70 km/hr and a standard deviation of 10 km/hr.¶

a) What is the probability that a car picked at random is traveling at more than 100 km/hr?

In [44]:

def z(xbar: float, mu: float, stddev: float) -> float:
    return (xbar - mu)/stddev

assert z(1, 0, 1) == 1

In [45]:

prob = 1 - stats.norm.cdf(z(100, 70, 10))
print(f"The probability that a car picked at random is traveling at more than 100 km/hr is {prob}")

The probability that a car picked at random is traveling at more than 100 km/hr is 0.0013498980316301035

b) What is the probability that the car speed is between 80 Km / hr and 100 Km / hr

In [46]:

prob = stats.norm.cdf(z(100, 70, 10)) - stats.norm.cdf(z(80, 70, 10))
print(f"The probability that the car speed is between 80km/hr and 100km/hr is {prob}")

The probability that the car speed is between 80km/hr and 100km/hr is 0.15730535589982697

2) For on-campus recruitment Ms. Z has sat for tests by Company A and Company B. For both tests her score is 50. It is known that for Company A, scores have a normal distribution with mean 40 and standard deviation 15 whereas for Company B, scores have a normal distribution with mean 45 and standard deviation 10. Relatively speaking in which test has Ms. Z done better?¶

In [47]:

za = z(50, 40, 15)
zb = z(50, 45, 10)

if za > zb:
    print(f"Ms. Z has done better in Test A with a z score of {za:.3f}")
elif za < zb:
    print(f"Ms. Z has done better in Test B with a z score of {zb:.3f}")
else:
    print("Both of Ms. Z's test scores are equally good.")

Ms. Z has done better in Test A with a z score of 0.667

3) Why the need for Standardization?¶

We need standarization to determine our z scores and find the probability under the distribution curve. Standardization also helps show the variation within the distribution.

4) What is the area under the curve of a probability distribution? Explain.¶

The total area under the curve represents the total probability of an event occuring. The total area under the curve is equal to 1, which represents a 100% likelihood of occurence.

5) After a course in business statistics, 300 students sat for a written examination. The result of exam gives the following information: Marks obtained have a mean of 60 and a standard deviation of 12 and the pattern of marks follows a normal distribution¶

a) what is the percentage of students who score more than 80

In [48]:

perc = 100 * (1 - stats.norm.cdf(z(80,60,12)))
print(f"The percentage of students who score more than 80 is {perc:.2f}%")

The percentage of students who score more than 80 is 4.78%

b) What is the percentage of students who score less than 50

In [49]:

perc = 100 * (1 - (stats.norm.cdf(50, loc=60, scale=12)))
print(f"The percentage of students who score less than 50 is {perc:.2f}%")

The percentage of students who score less than 50 is 79.77%