Poisson Process as counting process distribution

The Poisson process is one of the most popular and widely used counting and estimation process. It can be used in a variety of scenarios when we need to count the occurrence of certain events, which is random but follows a certain rate. Examples like the number of customers visit in a barbershop, number of earthquakes, number of accidents in a given year, rate of growth of bacteria, and many other things. The Poisson process is also preferred in modeling the count data. The model fit and prediction is one of the big use cases of the Poisson process.
The heart of poison process is the Poisson distribution and exponential distribution: Suppose X∼poisson(λ), then the probability mass function of X is:

The only parameter in the Poisson distribution is the rate parameter λ, which determines the rate of occurrence of a given event. The mean and variance of the distribution is both λ. The higher the rate parameter is, the faster the event is likely to occur. The support of the Poisson distribution is a set of whole numbers, and the probability decays after the x crosses the rate parameter. The Poisson distribution is also a part of the exponential family of distribution, which is widely used in theoretical statistics. It is also sometimes helpful to understand that the Poisson process is limiting the distribution of the Bernoulli process

Exponential distribution

The exponential distribution is also at the hearing of the Poisson process. The inter-arrival time can be proved to have independent and identically distributed exponential distribution with rate parameter λ. The inter-arrival time is defined as the time duration between the two arrivals or the occurrence of given events.
The exponential distribution is a continuous distribution with the density function given below: if Y∼exp(λ), then
The mean of the distribution is 1/λ, and the variance of the distribution is 1/λ^2. The distribution also has a very interesting memory loss property which can be represented mathematically by: For a>b

This means that the condition on the random variable does not really matter in such cases. So, if you go to a barber whose service time follows an exponential distribution. Suppose you ask him what the expected time for a haircut is and he says 30 mins. Now during the haircut, after say 15 min, if you will ask him how much expected time is left, he will still say 30 mins! This is why the above property is known as memoryless property. Let’s plot a histogram to see how the density looks in Poisson distribution.

par(mfrow =c(2,2))

x <-rpois(1000, lambda =2)

hist(x, main ="Rate = 2" )

x <-rpois(1000, lambda =4)

hist(x, main ="Rate = 4" )

x <-rpois(1000, lambda =0.5)

hist(x, main ="Rate = 0.5" )

x <-rpois(1000, lambda =10)

hist(x, main ="Rate = 10" )


Poisson process modeling

The Poisson process is defined as follows: Let λ>0 be fixed. The process {N(t),t∈R^+} is called the Poisson process if it holds below conditions:

* N(0)=0

* N(t) has independent increments * N(t)∼poisson(λt), i.e., arrival count is distributed as Poisson distribution.

From the above definition, it can be proved that the inter-arrival time is exponential.

Let’s look at an example from Poisson distribution.

Q. If the number of the earthquake in Florida follows a Poisson distribution with rate 2/year, then find the probability that:

a. There arethree earthquakes in the year 2004.b. Given there are three earthquakes in 2004, there arefour earthquakes in 2005.c. Expected number of earthquakes in 2010-2012.

Ans: a. The probability that there are three earthquakes in 2004 is 0.18. We can calculate the same in r

dpois(3 , 2)

## [1] 0.180447

 The number of events is independent at a disjoint time interval. So the condition has no effect on probability. We can simply calculate the probability for the 4 events in a year, which is 0.09

dpois(4 , 2)

## [1] 0.09022352

 The number of earthquakes in the 3 years follows poisson (3* 2 = 6). Hence, the expected number of earthquakes is with variance, also being 6.

Example:

Q. The sequencing error in a genome sequencing project is, on average 1 wrong base pair in 100,000. A genome has length of 50,000,000 base pairs.

(a) Explain what a Poisson distribution is.

(b) (i) Using a Poisson distribution, calculate the expected number of sequencing errors in the genome.

(ii) If the genome were sequenced multiple times, how would the number of errors fluctuate around this expected value?

Ans:

  1. Poisson distribution: X follows poisson if , for a real positive parameter , It has mean and variance both .
  2. (i) Since the Poisson distribution here has mean , we have expected number of errors
(ii) The variance of the mean will be 500/n when the number of sequencing is n. This indicates that the variance reduces as the number of replications of the experiment increases. (Since the different sequence is considered independent,