About

This page provides the R (and R-stan) codes necessary to implement the Gauss hypergeometric prior for quasi-sparse count data proposed in the following paper:

Bayesian inference on quasi-sparse count data: Datta, J and Dunson, D. (2016+)

Gauss hypergeometric prior

Our Poisson-Gamma hierarchical model can be expressed as: \[ y_i \sim \mbox{Poi} (\theta_i),\quad \theta_i \sim \mathrm{Ga}(\alpha, \lambda_i^2 \tau^2),\quad \lambda_i \sim p(\lambda_i^2),\quad \tau \sim p(\tau^2), \] where \(p(\lambda_i^2)\) and \(p(\tau^2)\) are densities for \(\lambda_i^2\) and \(\tau^2\), respectively. Marginalizing out \(\theta_i\) and writing \(\kappa_i = 1/(1+\lambda_i^2\tau^2)\), the hierarchical relationship can be re-written as \[ p(y_i \mid \lambda_i, \tau) \propto \left(\frac{\lambda_i^2\tau^2}{1+ \lambda_i^2\tau^2} \right)^{y_i} \left( \frac{1}{1+ \lambda_i^2\tau^2} \right)^\alpha,\\ p(y_i \mid \kappa_i) \propto (1-\kappa_i)^{y_i} \kappa_i^\alpha \equiv [y_i \mid \kappa_i] \sim \mathrm{NB}(\alpha, 1-\kappa_i). \] The Gauss hypergeometric prior (abbreviated as GH) is pseudo-conjugate in the sense that it produces a Gauss hypergeometric posterior when combined with a negative binomial likelihood: \[ p(\kappa_i | \tau^2, \gamma) \propto \kappa_i^{a-1} (1-\kappa_i)^{b-1} (1+\kappa_i (\tau^2-1))^{-\gamma} \] For modelling quasi-sparse count data, we recommend \(a = b = 1/2\), which results in the posterior: \[ p(\kappa_i | y_i, \gamma, \tau^2, \alpha) \propto \kappa_i^{\alpha-1/2} (1-\kappa_i)^{y_i - 1/2} (1+\kappa_i (\tau^2-1))^{-\gamma} \] The \(k^{th}\) posterior moment for \(\kappa_i\) given \(y_i\), \(\tau\), and \(\gamma\) can be written as \[ E (\kappa_i^k \mid y_i, \tau, \gamma) = \frac{\beta(k+\alpha+1/2,y_i+1/2) _2F_1(\gamma,k+\alpha+1/2,y_i+\alpha+1+k,1-\tau^2)}{\beta(\alpha+1/2,y_i+1/2) _2F_1(\gamma,\alpha+1/2,y_i+\alpha+1,1-\tau^2)}. \] This provides a way to rapidly calculate an empirical Bayes estimate of posterior mean \(E (\kappa_i \mid y_i, \hat{\tau}, \hat{\gamma})\).

Simulation

There are a number of different ways of Bayesian inference using the Gauss-Hypergeometric prior depending on the posterior sampling scheme. We discuss three different ways in this project:

The Stan code and the slice sampling algorithm and codes are given in the links above, and we show the R codes for the EBayes inference below, with R codes for reproducing one of the examples in our paper. But first, we discuss a few things to understand the code better.

Kiefer–Wolfowitz and REBayes

Note that we need the ‘REBayes’ package for the Kiefer–Wolfowitz nonparametric MLE. The REBayes package depends on the Mosek software for convex optimization, which free personal academic license. For more instructions on how to install Mosek, R-mosek, and REBayes: please see the following:

Once we are past the installation phase for running KW estimator, we can start loading the libraries.

library(BMS)
library(REBayes)
printf <- function(...) invisible(print(sprintf(...)))
vf21hyper = Vectorize(f21hyper)

We generate a sparse \(\theta\) and a sparse observation vector \(y \sim \mbox{Poi}(\theta)\).

set.seed(123)
niter = 1000
direrr = rep(0,niter);kwerr = rep(0,niter)
nverr = rep(0,niter);
n = 200;w = 0.9
thetasparse = rep(0,n)
for (i in 1:n)
{
  if(runif(1)<w){thetasparse[i]<-0}
  else {thetasparse[i] <-abs(rt(1,3))}  
}
truerisk = sum(thetasparse); printf("True Risk %f",truerisk)
## [1] "True Risk 17.705109"

Choice of Gamma hyper-parameter

One way to speed up the computation is to fix the hyperparameter \(\gamma\) to a suitable constant, given the data rather than putting a prior on it. One way of fixing \(\gamma\) for a sparse two-groups scenario is to run a k-means clustering on the data \(y\) and take \(\gamma\) to be midpoint of the line segment connecting the centroids of two clusters.

y.sample <- rpois(n,thetasparse)
gam.km <- kmeans(y.sample,centers =2)
gam.c = mean(gam.km$centers)
printf("Gamma = %f", gam.c)
## [1] "Gamma = 1.507653"

To assess if this is a sensible choice, we can look at posterior density plots for a small and a large observation (that can be classified as noise and signal) and \(\gamma =\) 1.5076531. If the posterior for \(\kappa\) for a large \(y\) and a small \(y\) puts most of its mass near \(0\) and \(1\) respectively for this value of \(\gamma\), it should be a reasonable choice for the threshold parameter \(\gamma\).

Sampling from GH distribution

To illustrate that the value of \(\gamma\) obtained by a k-means algorithm works for the quasi-sparse case, we need to plot the posterior density of \(\kappa\) for different values of \(\gamma\). Here we do so by plotting the kernel density estimate of 1,000 draws from the GH posterior distribution, by using a slice sampling algorithm. Detailed discussion for this algorithm appears here.

library(truncdist)
## sampling from gauss hypergeometric posterior for given tau-sq and gamma
rGaussHG <- function(r,y,gamma,t2,alpha)
{
  p = length(y)
  k = rep(1/4,p)
  K = matrix(0,p,r)
  Omega = matrix(0,p,r)
  U = matrix(0,p,r)
  TH = matrix(0,p,r)
  
  for ( i in 1:r )
  {
    omega = rgamma(p,shape=gamma,rate=k*(t2-1)+1) 
    u = runif(p,0,exp(omega*k*(1-t2))) 
    s1 = rep(alpha+1/2,p) 
    s2 = y+1/2
    if (t2<1)
    { 
      lb = pmax(log(u)/(omega*(1-t2)),0)
      ub = rep(1,p)
    }
    else
    {
      ub = pmin(log(1/u)/(omega*(t2-1)),1)
      lb = rep(0,p)
    }
    k = rtrunc(p,"beta",a = lb, b = ub, shape1=s1,shape2=s2)
    theta = rgamma(p,y+1,1/(1-k))
    K[,i] = k
    Omega[,i] = omega
    U[,i] = u
    TH[,i] = theta
  }
  list(K=K,U=U,TH=TH)
}
library(truncdist)
y.small = min(y.sample[y.sample > 0])
y.large = max(y.sample[y.sample > 0])
y.set = c(y.small, y.large)

par(mfrow = c(1, 2))
for (i in 1:2) {
    y.tmp = y.set[i]
    tau.sq = sum((y.sample > gam.c))/length(y.sample)
    ans = rGaussHG(1000, y.tmp, gam.c, tau.sq, a = 1/2)
    rpost <- ans$K
    d_rpost <- density(rpost)
    plot(d_rpost, col = "red", lwd = 2, xlim = c(0, 1), xlab = expression(kappa), 
        ylab = "density", main = paste("y =", y.tmp, ",gamma =", format(gam.c, 
            digits = 3), sep = " "))
}

Comparison

We calculate 4 different types of estimators, as discussed before - the Gauss hypergeometric prior, the Kiefer–Wolfowitz nonparametric maximum likelihood estimator, the global shrinkage estimator, and the Robbins’ frequency based estimator.

for (iter in 1:niter)
{
  if(iter%%(niter/10)==0){printf("Iteration = %d",iter)}
  y = rpois(n,thetasparse)
  ## Direct Posterior mean
  alpha = 0.5;tau.sq = sum((y>0))/sum((y==0))
  kappahat = (beta(alpha+3/2,y+1/2)/beta(alpha+1/2,y+1/2))*
    (vf21hyper(gam.c,3/2+alpha,y+alpha+2,1-tau.sq)/vf21hyper(gam.c,alpha+1/2,y+alpha+1,1-tau.sq))
  thetahat.dir = (1-kappahat)*(y+alpha)

  # Kiefer-Wolfowitz: REBayes
  fit <- Pmix(y)   
  ymax = max(y)
  xgrid <- seq(from=0,to=ymax+1,length=ymax+2)
  # Naive
  pmf = sapply(xgrid,function(x) sum((xgrid[x+1]==y)))/n
  pmf[length(pmf)]<- pmf[length(pmf)-1]
  thetahat.fr = rep(0,n)
  thetahat.kw = rep(0,n)
  for (i in 1:n)
  {
    val = y[i]
    id = which(xgrid == val)
    id1 = which(xgrid == val+1)
    thetahat.kw[i] = (val+1)*fit$g[id1]/fit$g[id]
    thetahat.fr[i] = (val+1)*pmf[id1]/pmf[id]
  }
  thetahat.kw[is.na(thetahat.kw)] <- 0
  direrr[iter] = sum((thetasparse-thetahat.dir)^2)
  kwerr[iter] = sum((thetasparse-thetahat.kw)^2)
  nverr[iter] = sum((thetasparse-thetahat.fr)^2)
}
## [1] "Iteration = 100"
## [1] "Iteration = 200"
## [1] "Iteration = 300"
## [1] "Iteration = 400"
## [1] "Iteration = 500"
## [1] "Iteration = 600"
## [1] "Iteration = 700"
## [1] "Iteration = 800"
## [1] "Iteration = 900"
## [1] "Iteration = 1000"

Plots and Tables

The total Bayes risks are given as a table as follows:

  error
Competing Methods Mean Std. Dev Median
GH prior 16.3  7.5 15.0
Kiefer-Wolfowitz 24.9  6.5 25.1
Robbins 40.4 26.2 34.4

Since the current version of paper reports the average Bayes risk \(n^{-1}E(|| \hat{\theta} - \theta ||^2)\), we report those values for \(n = 200, \omega = 0.1\) below:

##            KW       GH  Robbins
## [1,] 12.45777 8.131078 20.18886