Warning! Your browser does not support this Website: Try Google-Chrome or Firefox!

Introduction to Statistics

by Elvira Siegel
(Published: Sun Nov 10, 2019)

Part III

In this third and last part of the series "Introduction to Statistics" we will cover questions as what is probability and what are its types, as well as the three probability axioms on top of which the entire probability theory is constructed.

Go to Part I

Go to Part II

Understanding the Terminology

Let's start with probability. In simple terms, probability is a measure that tells us the confidence that an event will happen. Or, in other words, it shows us how probable an occurrence of an event is.

We used the word event. In probability theory an event denotes exactly the same thing as in normal life, namely something that happens (or already happened). Exactly said, an event is an outcome of an experiment. This outcome is called a random variable.

A random variable is a value that changes randomly, it is an output of some random event. If I toss a coin, I get heads or tails at random. Therefore, heads and tails both denote random variables in my experiment. Random variables can be of two types: discrete and continuous.

  1. a discrete random variable has values which are countable and not infinite, e.g.: number of ill patients, number of times getting a 6 when rolling a dice only seven times, etc.
  2. a continuous random variable has values which come from an interval which can have infinitely many numbers, e.g.: temperature, height of a tree, cats' weight or time it takes to get to work, etc. With all these examples, there are infinitely many intermediate values, a variable can take on. That means, that the probability of a continuous random variable X will have a specific value, is equal to 0.

Note: consult the Probability Density Function (PDF) in context of continuous random variables.

Another frequently used term is experiment. An experiment in probability theory is a random trial, mainly defined through the fact that it can be repeated an infinite amount of times. It has some defined set of possible outcomes called the sample space.

Consider some descriptive examples: you toss a coin and get heads - tossing a coin will be an experiment, getting heads is considered to be an event and the sample space of this experiment will be {heads, tails}. You roll a fair dice and you get 6 - rolling a fair dice is an experiment, getting the 6 is an event and the sample space of the experiment is {1,2,3,4,5,6}. A sample space is often marked with the Ω omega symbol.

Probability Axioms

An axiom is a statement that was proven to be true. It is applied as a premise on which we can build our further argumentation.

In the probability theory there are three axioms.

Axiom 1.

The first axiom says: the probability of any event is a not negative, real number between 0 and 1 where 0 denote 0% probability that an event happens and 1 denotes 100% probability that something happens.


Axiom 2.

The second axiom says if we sum up all the outcomes from the sample space, we get 1. The axiom also means that at least one outcome will happened with 100% probability. If we toss a coin, we will definitely get heads or tails. We can't know for sure whether it will be heads or tails but we know exactly that one of them will happen.

Note: remember, all possible outcomes together build a sample space


Axiom 3.

This one sounds complicated when you first read it: if two events are mutually exclusive then the probability of one of both happening is the sum of their individual probabilities.

All right, mutually exclusive means that two events cannot occur simultaneously, meaning to find out the probability of one of these two events happening, we must sum up these two probabilities.


Example: if I roll a fair dice, the probability of getting 1 OR 6 is:

number of all possible outcomes when rolling a fair dice: 6

probability of rolling a 1: 1/6

probability of rolling a 6: 1/6

1/6 + 1/6 = 2/6 = 1/3 for rolling 1 OR 6

These three axioms are universal in the probability theory. We use them to derive or prove other concepts with the help of logical reasoning.

Nevertheless, these three axioms still don't give us all the answers. As an example, take some function that conforms with all the three axioms. Such functions are called probability functions. Yet, we still cannot know from the axioms WHICH function we should use. We only know that the function of our choice must correspond with the three axioms.

Types of probability

In statistics when dealing with real world problems, we often have to calculate probabilities taking into account multiple random variables. To do so, we might consider following types of probabilities which build the fundamental basis of statistics:

Joint Probability

The joint probability describes events that happen at one and the same time. Let's represent the joint probability visually by using two sets:

We have two sets :


Now the joint probability (as the name already suggests) of two sets, is calculated by taking an event A from the set1 AND and an event B from the set2.


An example would be: what is the probability of pulling a card from a card deck which is a Queen AND black?

Note: Sometimes in the context of joint probability you will see the symbol which comes from the set theory and is called intersection.

p(A ∩ B) = p(Queen AND black)


So, the joint probability is the probability that an event A occurs as well as the event B occurs simultaneously. For example, the probability that when I pull a card from a card deck, it is Queen and black is:


Statisticians apply joint probability in cases when they want to measure two or more events happening simultaneously. As an example: what is the probability that the The Dow Jones Index will drop if Amazon shares will drop at the same time p(DJIA drop and Amazon drop)?

Conditional Probability

This type gives us the probability that an event A will happen, given (the condition) that an event B already happened. This relationship is represented as:

p(A | B)

Let's look at an example: again a deck of 52 cards, you pull a card and it's a black one. You only have this information. What you want to know is: what's the probability that this black card is a Queen? : p(Queen | black) ?

We know for sure that we have 2 black Queens in total in a 52 card deck, from these 52 cards 26 cards are black:

p(Queen | black) = 2 / 26 = 1/13

Marginal Probability

This type of probability can be thought of as "unconditional" probability. It is just the probability of some event A happening: p(A). So, there is only one event we want to analyze. A a nice example for a case of marginal probability would be: if you take a card from a card deck of 52 cards, what's the probability this card is black, p(black)? Well, we know that we have 52 in total, we also know that 26 from them are red, so the other 26 are black. That means, 26/52 = 1/2 or 0.5 is the marginal probability of pulling a black card.

Connecting the probability types together :

The conditional probability can be represented using the joint and the marginal probabilities:

p(A | B) = p(A ∩ B) / p(B)

We can also rearrange the above equation to build the joint probability:

p(A ∩ B) = p(A | B) * p(B)

Probability vs. Likelihood

Probability and likelihood are often used by non statisticians as synonyms. But strictly speaking they are not synonyms. Let's clear out what's the difference between probability and likelihood.

When we talk about probability, we normally mean some area under a distribution:


It's basically a probability of data given some distribution (curve). We can think of a probability function application as of taking an interval from the distribution and analyzing the probabilities within that interval. We do so in cases when we have extremely small probabilities for very exact measurements, as e.g. what is the probability that if a friend of mine is randomly picking a number between 0 and 1 that she picks exactly 0.3? She could pick 0.5 or 0.7 or 0.3459 ... The interval is too diverse to define one probability certainly, so we pick an interval (or an area under the distribution in accordance to the interval) and find its probability.

We should see this interval in the context of continuous random variables (we already mentioned continuous random variable above). Because we cannot really find an exact value for a continuous random variable X , we find a probability that X falls in an interval between a and b: P(a < X < b)

To deepen your knowledge about probability distribution, read more about PDF (Probability Density Function) or watch this nice explanation video.

In the context of likelihood, we want to find the distribution (curve) given some data:

Likelihood(distribution | data)


We can summarize likelihood as a "probability" that some event, which has already occurred, would produce a certain outcome.

Likelihood refers to a past event with known outcomes, while probability refers to the occurrence of future events.


In these three parts of Introduction to Statistics, you've learned about measures of central tendency, measures of variability, different distributions and types of probabilities as well as the three crucial axioms in the probability theory. Feel free to read other related articles on the blog!

Further recommended readings:

Introduction to Statistics Part I

Introduction to Statistics Part II

Classification with Naive Bayes

Search Siegel.work:

Copyright © 2020 by
Richard Siegel at siegel.work

Contact & Privacy Policy