Question 1
1,200 words limit
(a) [20%]
A researcher estimates using a random sample of workers the following log wage regression
log(wagei) = β0 + β1 Si+ β2 sibsi + β3 wexpi + β4 (wexpi)2 + ui, (1.1) where i is the subscript for individuals and goes from 1 to n, log(wagei) is the natural logarithm of hourly wage, Si is schooling in number of years, sibsi is the number of siblings and wexpi is the number of years of work experience. Provide details on the interpretation of the coefficients. Explain why schooling Si could be endogenous and describe a potential cause of this endogeneity. Explain why the ordinary least squares estimator of model (1.1) is biased and inconsistent. More points will be given for a formal explanation. Make sure to adapt formulas and discussion to model (1.1).
(b) [15%]
Define two different instruments that could be used for schooling Si to solve the issue of endogeneity in model (1.1). Define the formula for the two-stage least squares estimation for model (1.1) using the two instruments you proposed and explain how the first and second stages’ estimations are computed. Make sure to adapt formulas and discussion to model (1.1).
(c) [20%]
Explain what assumptions your instrumental variables must satisfy to produce a consistent estimation of model (1.1). Explain why you think the two instruments proposed in (1.b) are satisfying these assumptions. Show that the instrumental variable estimation defined in (1.b) is consistent under these assumptions.
(d) [20%]
Explain how you would test for the validity of your instrumental variables. Explain also how you would test for whether there is an endogeneity issue in your model. Provide details on how you would perform these tests for model (1.1).
(e) [25%] The researcher decided to estimate model (1.1) using two-stage least squares estimation and instrumenting schooling with the birth order (brthord), which is a variable taking value 1 if the individual is a first-born child, 2 if he/she is a second-born child, and so on. Do you think it is a valid instrument? Notice that the model is conditional on sibs so variation in brthord is not explained by the number of siblings. Explain in detail your answer first under the assumption that brthord is moderately correlated with sibsandthen underthe assumption that the correlationbetween brthord and sibs is almost 1.
Question 2
1,200 words limit
(a) [30%]
Discuss an empirical example of a panel data model where you would use a fixed effect estimation rather than a random effect estimation. The example must consider a panel data of children but you are free to consider any dependent variable. You could consider e.g. school test scores, cognitive skills, socio-emotional skills, health outcomes, level of competitiveness, measure of social skills, average wage for the aspired occupation, parental time investment in the child, expenditure in child private tuition, child health expenditure, hours of physical activity per week, time spent in school, calories intake per day, height, birth weight, BMI, days of absence in school, etc. Write the regression equation and provide details on the dependent and explanatory variables, on the error term and on the interpretation of the coefficients. Explain how you would compute the fixed effect estimation for your defined model.
(b) [35%]
What does the unobserved individual effect in the model defined in (2.a) capture? Explain the differences in the assumptions needed for the consistency of the fixed effect estimation and of the random effect estimation for the model you discussed in (2.a). Why is the fixed effect estimation more appropriate than the random effect estimation in the empirical example you discussed in (2.a)? Explain how you would perform a testto decide whether toadoptarandom effector afixedeffect estimation.
(c) [35%]
Discuss an empirical example of a panel data model where one of the explanatory variables is endogenous because it is correlated with unobserved variables that are relevant to explain both the dependent variable and the endogenous variable. The example must be different from those in lectures, seminars and past exams. Explain under which conditions the fixed effect estimation can solve such an issue of endogeneity. Explain which type of estimation you would adopt to solve the endogeneity issue if the conditions for the consistency of the fixed effect estimation were not satisfied.
Question 3
1,200 words limit
(a) [20%]
Black people are much more likelyto have high blood pressure and a researcher wants to understand whether this is caused by genetic differences between black and white people or by differences in diet between black and white people which can lead to overweight and higher level of triglycerides in the blood. The researcher can observe for a sample of black and white people the following variables:
- highbp, a dummy variable taking value 1 if an individual has high blood pressure and 0 otherwise,
- black, a dummy variable taking value 1 if an individual is black and 0 if white, - female, a dummy variable taking value 1 for women and 0 for men,
- bmi, the body mass index which is weight in kilograms divided by height in meters squared,
- age, age in years,
- tgresult, level of triglycerides (mg/dL) in the blood, - tgresult2, tgresult squared.
Provide an interpretation of the results that are reported in Table 3.1 below. Write down the models that the researcher has estimated. Explain what estimator the researcher has used and provide details on the optimization procedure the researcher adopted. Discuss the interpretation of the coefficients and of the tests reported in Table 3.1. Based on these results can you conclude that the higher propensity of black people for high blood pressure is caused by diet? Explain in detail your answer using any test that might be useful.
Table 3.1
. reg highbp black female age
Source |
SS df MS |
Model Residual |
373.211904 3 124.403968 2152.78558 10,347 .208058914 |
Total |
2525.99749 10,350 .244057728 |
Number of obs = 10,351 F(3, 10347) = 597.93 Prob > F = 0.0000 R-squared = 0.1477 Adj R-squared = 0.1475 Root MSE = .45613
highbp |
Coefficient Std. err. t P>|t| [95% conf. interval] |
black female age _cons |
.105039 .0146385 7.18 0.000 .0763447 .1337333 -.091399 .0089789 -10.18 0.000 -.1089993 -.0737987 .0106311 .0002606 40.80 0.000 .0101203 .011142 -.0460855 .0140881 -3.27 0.001 -.0737008 -.0184702 |
. reg highbp black female bmi tgresult tgresult2
Source |
SS df MS |
Model Residual |
153.92635 5 30.7852701 1056.46098 5,044 .209449044 |
Total |
1210.38733 5,049 .23972813 |
Number of obs = 5,050 F(5, 5044) = 146.98 Prob > F = 0.0000 R-squared = 0.1272 Adj R-squared = 0.1263 Root MSE = .45766
highbp |
Coefficient Std. err. t P>|t| [95% conf. interval] |
black female bmi tgresult tgresult2 _cons |
.110363 .0216769 5.09 0.000 .0678669 .1528591 -.0695884 .0129744 -5.36 0.000 -.0950238 -.0441531 .0272994 .0013635 20.02 0.000 .0246265 .0299724 .0010101 .0001101 9.17 0.000 .0007942 .001226 -5.66e-07 1.10e-07 -5.16 0.000 -7.81e-07 -3.51e-07 -.4018466 .0348891 -11.52 0.000 -.4702444 -.3334488 |
. test bmi tgresult tgresult2
( 1) bmi = 0
( 2) tgresult = 0 ( 3) tgresult2 = 0
F( 3, 5044) = 220.53 Prob > F = 0.0000
Page 5 of 9
(b) [25%]
The researcher also produced the results in Table 3.2. Write down the models that the researcher has estimated. Explain what type of estimator the researcher has used and provide details on the optimization procedure the researcher adopted. Discuss the interpretation of the coefficients and the tests reported in Table 3.2. Consider a test procedure to decide which of the two probit models considered in Table 3.2 is to be preferred. Explain in detail your answer.
Table 3.2
. probit highbp black female age
Iteration 0: Iteration 1: Iteration 2: Iteration 3:
log likelihood = -7050.7655 log likelihood = -6241.0373 log likelihood = -6238.8309 log likelihood = -6238.8309
Probit regression
Log likelihood = -6238.8309
Number of obs = 10,351 LR chi2(3) = 1623.87 Prob > chi2 = 0.0000 Pseudo R2 = 0.1152
highbp |
Coefficient Std. err. z P>|z| [95% conf. interval] |
black female age _cons |
.302711 .042485 7.13 0.000 .2194419 .3859802 -.2743514 .0261651 -10.49 0.000 -.3256341 -.2230687 .0297 .0007895 37.62 0.000 .0281526 .0312475 -1.529171 .0427501 -35.77 0.000 -1.61296 -1.445383 |
. probit highbp black female bmi age tgresult tgresult2
Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4:
log likelihood = -3395.4388 log likelihood = -2804.804 log likelihood = -2801.251 log likelihood = -2801.2489 log likelihood = -2801.2489
Probit regression
Log likelihood = -2801.2489
Number of obs = 5,050 LR chi2(6) = 1188.38 Prob > chi2 = 0.0000 Pseudo R2 = 0.1750
highbp |
Coefficient Std. err. z P>|z| [95% conf. interval] |
black female bmi age tgresult tgresult2 _cons |
.3658577 .0643065 5.69 0.000 .2398193 .4918961 -.2607344 .0391055 -6.67 0.000 -.3373798 -.1840891 .0751805 .0043272 17.37 0.000 .0666995 .0836616 .0264713 .0011995 22.07 0.000 .0241203 .0288224 .0018013 .0003208 5.62 0.000 .0011726 .00243 -9.67e-07 3.02e-07 -3.21 0.001 -1.56e-06 -3.76e-07 -3.642458 .1256051 -29.00 0.000 -3.888639 -3.396276 |
(c) [30%]
Provide an explanation line by line of the Stata code that the researcher has used to produce the results in Tables 3.3 and 3.4. Explain in detail what the reported ‘scaleprobit’ and ‘scaleprobit2’ are. Use the results in Tables 3.3. and 3.4 to provide comments on the average marginal effect for each explanatory variable including tgresult. Based on these results can you conclude that the higher probability of black
Page 6 of 9
people to have high blood pressure is caused by diet? How would you assess the fitness of the model in Table 3.4.
Table 3.3
. probit highbp black female age
Iteration 0: Iteration 1: Iteration 2: Iteration 3:
log likelihood = -7050.7655 log likelihood = -6241.0373 log likelihood = -6238.8309 log likelihood = -6238.8309
Probit regression
Log likelihood = -6238.8309
Number of obs = 10,351 LR chi2(3) = 1623.87 Prob > chi2 = 0.0000 Pseudo R2 = 0.1152
highbp |
Coefficient Std. err. z P>|z| [95% conf. interval] |
black female age _cons |
.302711 .042485 7.13 0.000 .2194419 .3859802 -.2743514 .0261651 -10.49 0.000 -.3256341 -.2230687 .0297 .0007895 37.62 0.000 .0281526 .0312475 -1.529171 .0427501 -35.77 0.000 -1.61296 -1.445383 |
. predict xbprobit, xb
. gen scaleprobit=normalden(xbprobit)
. sum scaleprobit
Variable Obs Mean Std. dev. Min Max
scaleprobit |
10,351 .3426472 .0568408 .1919711 .3989289 |
Page 7 of 9
Table 3.4
. probit highbp black female bmi age tgresult tgresult2
Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4:
log likelihood = -3395.4388 log likelihood = -2804.804 log likelihood = -2801.251 log likelihood = -2801.2489 log likelihood = -2801.2489
Probit regression
Log likelihood = -2801.2489
Number of obs = 5,050 LR chi2(6) = 1188.38 Prob > chi2 = 0.0000 Pseudo R2 = 0.1750
highbp |
Coefficient Std. err. z P>|z| [95% conf. interval] |
black female bmi age tgresult tgresult2 _cons |
.3658577 .0643065 5.69 0.000 .2398193 .4918961 -.2607344 .0391055 -6.67 0.000 -.3373798 -.1840891 .0751805 .0043272 17.37 0.000 .0666995 .0836616 .0264713 .0011995 22.07 0.000 .0241203 .0288224 .0018013 .0003208 5.62 0.000 .0011726 .00243 -9.67e-07 3.02e-07 -3.21 0.001 -1.56e-06 -3.76e-07 -3.642458 .1256051 -29.00 0.000 -3.888639 -3.396276 |
. predict xbprobit2, xb
(5301 missing values generated)
. gen scaleprobit2=normalden(xbprobit2) (5,301 missing values generated)
. sum scaleprobit2
Variable
scaleprobit2
Obs Mean
5,050 .3139517
Std. dev.
.0892273
Min
.0280051
Max
.3989423
. predict highbphat, p
(5,301 missing values generated)
. gen phighbp=highbphat>0.5
. tab phighbp highbp
phighbp |
High blood pressure |
|
|
0 1 |
Total |
||
0 1 |
2,399 869 3,576 3,507 |
3,268 7,083 |
|
Total |
5,975 4,376 |
10,351 |
(d) [25%]
By using the sampleandexplanatory variablesdescribedabovediscuss how youwould estimate a model to predict the blood pressure if the dependent variable was taking four ordered values: 0 for normal pressure, 1 for elevated blood pressure, 2 for high blood pressure and 3 for very high blood pressure. Write the likelihood for the model and provide an interpretation for all the parameters included in the likelihood. Make sure to adapt formula and discussion to this specific empirical example.
Page 8 of 9
Question 4
1,200 words limit
(a) [25%]
For a sample of individuals involved in car accidents you observe their insurance reimbursement for health expenditure. Each individual got a reimbursement for health expenditure up to a maximum of £100,000. You cannot observe the individual health expenditure, but you can observe the reimbursement. Explain what type of model you would use to explain the individual health expenditure in pounds using, as explanatory variables, the value of the car before the accident in pounds, the age of the individual in years, and the number of days of hospitalization of the individual. Write down the model and explain how you would interpret the coefficients in this model. Explain why the ordinary least squares estimator would be biased and inconsistent.
(b) [20%]
By considering the sample in (4.a) and dropping all observations with health expenditure over £100,000 consider a truncated regression. Define the truncated model by adapting any formula to the specific example in (4.a) and explain why the maximum likelihood estimator of this truncated model is consistent but inefficient.
(c) [30%]
Write down the likelihoods for the models discussed in (4.a) and (4.b) and define each of the parameters and variables. Make sure to adapt formulas and discussion to the specific example.
(d) [25%]
Now assume that you observe a sample of individuals involved in car accidents whose insurance company has a fixed excess at £200 but no maximum threshold for reimbursementsforhealthexpenditure. This implies thatallindividuals with adamage for less than £200 do not get any payment from the insurance company. For this reason, all people with a health expenditure for less than £200 did not file any claim and are not included in the sample. Explain what type of model you would use to explain the amount of health expenditure in pounds using, as explanatory variables, the value of the car before the accident in pounds, the age of the individual in years, and the number of daysof hospitalization of the individual. Write down the model and explain how you would interpret the coefficient in such a model.
This question has not been answered yet!
Copyright © 2012 - 2024 Apaxresearchers - All Rights Reserved.