There are 4 questions. Each question is worth 7 to 11 points.
1. (10 pts) Consider a random variable X with probability density fX(x) = cos(x), −π < x < π.
(a) Write a function that takes input u ∈ (0, 1) and computes the inverse of the CDF, F −1(u).
(Note: The R asin() function computes arcsine, i.e., sin−1().)
- (b) Usingthefunctionin(a),generate1000samplesfromfX(x)withtheinverseCDFmethod. On a Q-Q plot, compare the empirical quantiles of the obtained draws and the theoretical quantiles of X.
(Note: The inverse CDF function is equivalent to the theoretical quantile function.)
- (c) Propose a probability density g(y) to conduct acceptance-rejection sampling from fX(x). You should provide the analytic expression for the PDF, g(y).
- (d) Using acceptance/rejection sampling with g(y) from part (c), draw approximately 1000 accepted samples from f(x). Plot the histogram of empirical density and superimpose the theoretical density curve.
2. (11+3 pts) Suppose we want to use Monte Carlo estimation to approximate E(√1 2π
where X follows fX (·) in Question 1. Then,
1 −X2 π/2 1 x2 cos(x) θ=E√e2 = √exp−2·2dx.
2π −π/2 2π
- (a) Using fX(·) in Question 1 as the importance function and exactly 1000 random samples, perform importance sampling to obtain a Monte Carlo estimate for θ.
(Note: You may reuse R objects in Q1(b) from the inverse CDF sampling.)
- (b) Using simple Monte Carlo estimation with 1000 samples, obtain a point estimate and a 95% confidence interval for θ.
- (c) With a total sample size of 1000 and 4 equal-width, equal-size strata, compute a stratified sampling estimate for θ.
- (d) Out of the three estimators, (a) the importance sampling estimator, (b) the simple Monte Carlo estimator, and (c) the stratified sampling estimator, which one do you expect to be the LEAST efficient? Provide a brief verbal justification.
- (e) (Bonus, 3 pts) One practical application of importance sampling is for estimating an expectation, E[g(X)], when the random variable X is not straightforward to draw from. For this question, θ can also be regarded as EY [a·cos(Y )/2], where Y follows the truncated standard normal distribution on (− π2 , π2 ). Find out the value of the constant a, and use 1000 samples from truncated N (0, 1) on (− π2 , π2 ) to estimate θ.
3. (7 pts) Suppose Y1, Y2 are independent, and Y1 ∼ χ2(m), Y2 ∼ χ2(n). Then X = Y1/m ∼ F(m,n),
that is, X follows the F distribution with m and n degrees of freedom.
- (a) Using only draws from the standard normal distribution (rnorm()), apply the transfor- mation method to generate 1000 samples from the F (2, 30) distribution.
(Note: Do NOT use rchisq, rt, rf or the corresponding density/quantile functions.)
- (b) Using the samples from part (a) and the F (2, 30) density, fX (·), as the importance func- tion, obtain a Monte Carlo estimate of ∞ 0 CompareyourresultswiththatfromtheRbuilt-infunction,1-pf(3, df1 = 2, df2 = 30).1 (Note: You do not need to evaluate fX(), because the importance function and the integrand should partially cancel out.)
4. (9 pts) An instructor who teaches on a whiteboard has run out of whiteboard markers and plans to buy new markers. Suppose that, for the type of marker he will order:
- Each marker has 70% chance of coming from a “high-quality” manufacturer and 30% chance of coming from a “low-quality” manufacturer.
- The lifetime (in weeks) of a marker produced by the high-quality manufacturer follows an exponential distribution with rate λ = .5 (i.e., Exp(.5).)
- Markers produced by the low-quality manufacturer are worn out at a faster rate, with lifetime following Exp(2). Therefore, the lifetime of a randomly chosen marker (i.e., the number of weeks it takes to be worn out), X, follows a discrete mixture distribution: fX (x) = .7f1(x) + .3f2(x), x > 0, where f1(·) is the Exp(.5) density, and f2(·) is the Exp(2) density.
- (a) Generate a random sample of size 500 for X, i.e., the lifetime of 500 randomly chosen markers. Plot the histogram of empirical density, with the theoretical density, fX(·), superimposed.
- (b) This instructor plans to buy 20 markers. He uses one marker at a time and replaces with a new one when the current one is worn out. Thus, the total time it takes for all 20
markers to be worn out is
T =Xi, i=1
where X1, . . . , X20 ∼ fX (·). Draw 1000 random samples of the total time, T .
(c) Let us use Monte Carlo integration to help the instructor estimate when he should expect to run out of markers again, i.e.,
(b) to obtain an importance sampling estimate for E(T). 2