Problem 1

A ``truncated" distribution is a distribution where the support of the random variable is restricted to some interval \([a,b]\). If we know the pdf \(f(x)\) and cdf \(F(x)\) of the original random variable, then the pdf and cdf of the truncated random variable can be easily derived as: \[ f_{Tr}(x) = \begin{cases} \frac{f(x)}{F(b) - F(a)} \; \forall a < x \le b \\ 0 \; \text{otherwise} \end{cases} \] Consider the exponential random variable \(X \sim \text{Exp}(\lambda)\), with pdf \[ f(x) = \lambda e^{-\lambda x}, \; x \ge 0, \lambda > 0 \]

Answer the following:

  1. How would you generate from an exponential truncated to the interval \([0,b]\) using accept-reject scheme? Try the R code for \(\lambda = 2\), \(b = 5\).

  2. Can you use the inverse CDF transformation here?

Solution

Part 1 is easy. All you have to do is sample from \(Exp(\lambda)\) and discard all samples that fall outside the truncation boundary. Also, to generate from exponential we can use the inverse CDF method with \[ F^{-1}(u) = - \frac{1}{\lambda} \log(1-u) \]

set.seed(123)
n = 1000
u = runif(n)
lambda = 2 
a = 0
b = 5

x = (-1/lambda)*log(1-u*(1-exp(-lambda*b)))
## Standard Exponential then truncate 
x = -1/lambda*log(u) #u ~ U[0,1] implies 1-u ~ U[0,1]
x = x[x<=b]

## Plotting
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.5.2
ggplot(data.frame(x))+geom_histogram(aes(x=x),bins = 50)

Part 2 involves a little math! The pdf of the truncated distribution is given above. We can then calculate the CDF of the truncated exponential:

First the area between \([a,b]\) would be \[ \int_{a}^{b} \lambda e^{-\lambda x} d x = F(b)-F(a) = e^{-\lambda a} - e^{-\lambda b} \]

The CDF for the truncated distribution can be written as: \[ F_{tr}(x) = \frac{F(x)-F(a)}{F(b)-F(a)}, a \le x \le b \]

Since \(F(0) = 0\) for exponential, the inverse can be derived by solving the following equation for \(x\) \[ \text{Solve } u = \frac{F(x)-F(a)}{F(b)-F(a)} = \frac{1- \exp(-\lambda x)}{1- exp(-\lambda b)} \\ 1 - exp(-\lambda x) = u (1 - exp(-\lambda b)) \\ x = (-1/\lambda)*\log(1-u (1 - exp(-\lambda b))) \] The R code can be written as follows:

n = 1000
lambda = 2
b = 5
u = runif(n)
x = (-1/lambda)*log(1-u*(1-exp(-lambda*b)))

## Plotting
library(ggplot2)
ggplot(data.frame(x))+geom_histogram(aes(x=x),bins = 50)

Problem 2 (Bonus Problem)

(Standard Normal Sampling without Box-Muller)

The aim of this problem is to generate standard normal random variables by a different method. To do this, generate \(U \sim \text{Unif}[0.1]\), and apply the following approximation to the inverse CDF for standard Normal: \[ \Phi^{-1}(\alpha) = t - \frac{a_0 + a_1 t}{1 + b_1 t + b_2 t^2}, \] where \(t^2 = \log(\alpha^{-2})\) and \[ a_0 = 2.30753; a_1 = 0.27061; b_1 = 0.99229, b_2 = 0.04481. \] Plot the histogram of the generated \(X\) samples, and compare it with a standard \(N(0,1)\) density curve. Do they look similar? Will you be able to use these samples for your experiments? Submit the plot and the R code.

Solution

The following solution was submitted by Samrat Nath. This is a good example of well-documented coding.

# Inverse CDF transform method
Finv <- function(a){
  t=sqrt(log(1/(a*a)))
  a0 = 2.30753
  a1 = 0.27061
  b1 = 0.99229
  b2 = .04481
  x= t - ((a0+a1*t)/(1+b1*t+b2*t*t))
}

n=1e5
u= runif(n)
x=Finv(u)

y=seq(-2,2,0.05)
fy= (exp(-(y^2)/2))/(sqrt(2*pi))

hist(x,breaks = 50, freq=F, col= "blue")
lines(y, fy, col = "red")
legend("topright",c("Histogram from Inverse CDF \n transform method","Density"),
       col=c("blue","red"), lty = c(1,1))