Call/WhatsApp/Text +44 20 3289 5183

Call/WhatsApp/Text +1(838)201-9170

Call/WhatsApp/Text +1(838)201-9170

**Question 1**

1,200 words limit

(a) [20%]

A researcher estimates using a random sample of workers the following log wage

regression

log(wagei) = β0 + β1 Si+ β2 sibsi + β3 wexpi + β4 (wexpi)

2 + ui, (1.1)

where i is the subscript for individuals and goes from 1 to n, log(wagei) is the natural

logarithm of hourly wage, Si is schooling in number of years, sibsi is the number of

siblings and wexpi is the number of years of work experience. Provide details on the

interpretation of the coefficients. Explain why schooling Si could be endogenous and

describe a potential cause of this endogeneity. Explain why the ordinary least squares

estimator of model (1.1) is biased and inconsistent. More points will be given for a

formal explanation. Make sure to adapt formulas and discussion to model (1.1).

(b) [15%]

Define two different instruments that could be used for schooling Si to solve the issue

of endogeneity in model (1.1). Define the formula for the two-stage least squares

estimation for model (1.1) using the two instruments you proposed and explain how

the first and second stages’ estimations are computed. Make sure to adapt formulas

and discussion to model (1.1).

(c) [20%]

Explain what assumptions your instrumental variables must satisfy to produce a

consistent estimation of model (1.1). Explain why you think the two instruments

proposed in (1.b) are satisfying these assumptions. Show that the instrumental

variable estimation defined in (1.b) is consistent under these assumptions.

(d) [20%]

Explain how you would test for the validity of your instrumental variables. Explain also

how you would test for whether there is an endogeneity issue in your model. Provide

details on how you would perform these tests for model (1.1).

(e) [25%] The researcher decided to estimate model (1.1) using two-stage least squares

estimation and instrumenting schooling with the birth order (brthord), which is a

variable taking value 1 if the individual is a first-born child, 2 if he/she is a second-born

child, and so on. Do you think it is a valid instrument? Notice that the model is

conditional on sibs so variation in brthord is not explained by the number of siblings.

Explain in detail your answer first under the assumption that brthord is moderately

correlated with sibs and then under the assumption that the correlation between brthord

and sibs is almost 1.

Page 4 of 9

**Question 2**

1,200 words limit

(a) [30%]

Discuss an empirical example of a panel data model where you would use a fixed effect

estimation rather than a random effect estimation. The example must consider a

panel data of children but you are free to consider any dependent variable. You could

consider e.g. school test scores, cognitive skills, socio-emotional skills, health

outcomes, level of competitiveness, measure of social skills, average wage for the

aspired occupation, parental time investment in the child, expenditure in child private

tuition, child health expenditure, hours of physical activity per week, time spent in

school, calories intake per day, height, birth weight, BMI, days of absence in school,

etc. Write the regression equation and provide details on the dependent and

explanatory variables, on the error term and on the interpretation of the coefficients.

Explain how you would compute the fixed effect estimation for your defined model.

(b) [35%]

What does the unobserved individual effect in the model defined in (2.a) capture?

Explain the differences in the assumptions needed for the consistency of the fixed

effect estimation and of the random effect estimation for the model you discussed in

(2.a). Why is the fixed effect estimation more appropriate than the random effect

estimation in the empirical example you discussed in (2.a)? Explain how you would

perform a test to decide whether to adopt a random effect or a fixed effect estimation.

(c) [35%]

Discuss an empirical example of a panel data model where one of the explanatory

variables is endogenous because it is correlated with unobserved variables that are

relevant to explain both the dependent variable and the endogenous variable. The

example must be different from those in lectures, seminars and past exams. Explain

under which conditions the fixed effect estimation can solve such an issue of

endogeneity. Explain which type of estimation you would adopt to solve the

endogeneity issue if the conditions for the consistency of the fixed effect estimation

were not satisfied.

**Question 3**

1,200 words limit

(a) [20%]

Black people are much more likely to have high blood pressure and a researcher wants

to understand whether this is caused by genetic differences between black and white

people or by differences in diet between black and white people which can lead to

overweight and higher level of triglycerides in the blood. The researcher can observe

for a sample of black and white people the following variables:

– highbp, a dummy variable taking value 1 if an individual has high blood pressure

and 0 otherwise,

– black, a dummy variable taking value 1 if an individual is black and 0 if white,

– female, a dummy variable taking value 1 for women and 0 for men,

Page 5 of 9

– bmi, the body mass index which is weight in kilograms divided by height in meters

squared,

– age, age in years,

– tgresult, level of triglycerides (mg/dL) in the blood,

– tgresult2, tgresult squared.

Provide an interpretation of the results that are reported in Table 3.1 below. Write

down the models that the researcher has estimated. Explain what estimator the

researcher has used and provide details on the optimization procedure the researcher

adopted. Discuss the interpretation of the coefficients and of the tests reported in

Table 3.1. Based on these results can you conclude that the higher propensity of black

people for high blood pressure is caused by diet? Explain in detail your answer using

any test that might be useful.

Table 3.1

Prob > F = 0.0000

F( 3, 5044) = 220.53

( 3) tgresult2 = 0

( 2) tgresult = 0

( 1) bmi = 0

. test bmi tgresult tgresult2

_cons -.4018466 .0348891 -11.52 0.000 -.4702444 -.3334488

tgresult2 -5.66e-07 1.10e-07 -5.16 0.000 -7.81e-07 -3.51e-07

tgresult .0010101 .0001101 9.17 0.000 .0007942 .001226

bmi .0272994 .0013635 20.02 0.000 .0246265 .0299724

female -.0695884 .0129744 -5.36 0.000 -.0950238 -.0441531

black .110363 .0216769 5.09 0.000 .0678669 .1528591

highbp Coefficient Std. err. t P>|t| [95% conf. interval]

Total 1210.38733 5,049 .23972813 Root MSE = .45766

Adj R-squared = 0.1263

Residual 1056.46098 5,044 .209449044 R-squared = 0.1272

Model 153.92635 5 30.7852701 Prob > F = 0.0000

F(5, 5044) = 146.98

Source SS df MS Number of obs = 5,050

. reg highbp black female bmi tgresult tgresult2

_cons -.0460855 .0140881 -3.27 0.001 -.0737008 -.0184702

age .0106311 .0002606 40.80 0.000 .0101203 .011142

female -.091399 .0089789 -10.18 0.000 -.1089993 -.0737987

black .105039 .0146385 7.18 0.000 .0763447 .1337333

highbp Coefficient Std. err. t P>|t| [95% conf. interval]

Total 2525.99749 10,350 .244057728 Root MSE = .45613

Adj R-squared = 0.1475

Residual 2152.78558 10,347 .208058914 R-squared = 0.1477

Model 373.211904 3 124.403968 Prob > F = 0.0000

F(3, 10347) = 597.93

Source SS df MS Number of obs = 10,351

. reg highbp black female age

Page 6 of 9

(b) [25%]

The researcher also produced the results in Table 3.2. Write down the models that the

researcher has estimated. Explain what type of estimator the researcher has used and

provide details on the optimization procedure the researcher adopted. Discuss the

interpretation of the coefficients and the tests reported in Table 3.2. Consider a test

procedure to decide which of the two probit models considered in Table 3.2 is to be

preferred. Explain in detail your answer.

Table 3.2

(c) [30%]

Provide an explanation line by line of the Stata code that the researcher has used to

produce the results in Tables 3.3 and 3.4. Explain in detail what the reported

‘scaleprobit’ and ‘scaleprobit2’ are. Use the results in Tables 3.3. and 3.4 to provide

comments on the average marginal effect for each explanatory variable including

tgresult. Based on these results can you conclude that the higher probability of black

_cons -3.642458 .1256051 -29.00 0.000 -3.888639 -3.396276

tgresult2 -9.67e-07 3.02e-07 -3.21 0.001 -1.56e-06 -3.76e-07

tgresult .0018013 .0003208 5.62 0.000 .0011726 .00243

age .0264713 .0011995 22.07 0.000 .0241203 .0288224

bmi .0751805 .0043272 17.37 0.000 .0666995 .0836616

female -.2607344 .0391055 -6.67 0.000 -.3373798 -.1840891

black .3658577 .0643065 5.69 0.000 .2398193 .4918961

highbp Coefficient Std. err. z P>|z| [95% conf. interval]

Log likelihood = -2801.2489 Pseudo R2 = 0.1750

Prob > chi2 = 0.0000

LR chi2(6) = 1188.38

Probit regression Number of obs = 5,050

Iteration 4: log likelihood = -2801.2489

Iteration 3: log likelihood = -2801.2489

Iteration 2: log likelihood = -2801.251

Iteration 1: log likelihood = -2804.804

Iteration 0: log likelihood = -3395.4388

. probit highbp black female bmi age tgresult tgresult2

_cons -1.529171 .0427501 -35.77 0.000 -1.61296 -1.445383

age .0297 .0007895 37.62 0.000 .0281526 .0312475

female -.2743514 .0261651 -10.49 0.000 -.3256341 -.2230687

black .302711 .042485 7.13 0.000 .2194419 .3859802

highbp Coefficient Std. err. z P>|z| [95% conf. interval]

Log likelihood = -6238.8309 Pseudo R2 = 0.1152

Prob > chi2 = 0.0000

LR chi2(3) = 1623.87

Probit regression Number of obs = 10,351

Iteration 3: log likelihood = -6238.8309

Iteration 2: log likelihood = -6238.8309

Iteration 1: log likelihood = -6241.0373

Iteration 0: log likelihood = -7050.7655

. probit highbp black female age

Page 7 of 9

people to have high blood pressure is caused by diet? How would you assess the

fitness of the model in Table 3.4.

Table 3.3

scaleprobit 10,351 .3426472 .0568408 .1919711 .3989289

Variable Obs Mean Std. dev. Min Max

. sum scaleprobit

. gen scaleprobit=normalden(xbprobit)

. predict xbprobit, xb

_cons -1.529171 .0427501 -35.77 0.000 -1.61296 -1.445383

age .0297 .0007895 37.62 0.000 .0281526 .0312475

female -.2743514 .0261651 -10.49 0.000 -.3256341 -.2230687

black .302711 .042485 7.13 0.000 .2194419 .3859802

highbp Coefficient Std. err. z P>|z| [95% conf. interval]

Log likelihood = -6238.8309 Pseudo R2 = 0.1152

Prob > chi2 = 0.0000

LR chi2(3) = 1623.87

Probit regression Number of obs = 10,351

Iteration 3: log likelihood = -6238.8309

Iteration 2: log likelihood = -6238.8309

Iteration 1: log likelihood = -6241.0373

Iteration 0: log likelihood = -7050.7655

. probit highbp black female age

Page 8 of 9

Table 3.4

(d) [25%]

By using the sample and explanatory variables described above discuss how you would

estimate a model to predict the blood pressure if the dependent variable was taking

four ordered values: 0 for normal pressure, 1 for elevated blood pressure, 2 for high

blood pressure and 3 for very high blood pressure. Write the likelihood for the model

and provide an interpretation for all the parameters included in the likelihood. Make

sure to adapt formula and discussion to this specific empirical example.

Total 5,975 4,376 10,351

1 3,576 3,507 7,083

0 2,399 869 3,268

phighbp 0 1 Total

High blood pressure

. tab phighbp highbp

. gen phighbp=highbphat>0.5

(5,301 missing values generated)

. predict highbphat, p

scaleprobit2 5,050 .3139517 .0892273 .0280051 .3989423

Variable Obs Mean Std. dev. Min Max

. sum scaleprobit2

(5,301 missing values generated)

. gen scaleprobit2=normalden(xbprobit2)

(5301 missing values generated)

. predict xbprobit2, xb

_cons -3.642458 .1256051 -29.00 0.000 -3.888639 -3.396276

tgresult2 -9.67e-07 3.02e-07 -3.21 0.001 -1.56e-06 -3.76e-07

tgresult .0018013 .0003208 5.62 0.000 .0011726 .00243

age .0264713 .0011995 22.07 0.000 .0241203 .0288224

bmi .0751805 .0043272 17.37 0.000 .0666995 .0836616

female -.2607344 .0391055 -6.67 0.000 -.3373798 -.1840891

black .3658577 .0643065 5.69 0.000 .2398193 .4918961

highbp Coefficient Std. err. z P>|z| [95% conf. interval]

Log likelihood = -2801.2489 Pseudo R2 = 0.1750

Prob > chi2 = 0.0000

LR chi2(6) = 1188.38

Probit regression Number of obs = 5,050

Iteration 4: log likelihood = -2801.2489

Iteration 3: log likelihood = -2801.2489

Iteration 2: log likelihood = -2801.251

Iteration 1: log likelihood = -2804.804

Iteration 0: log likelihood = -3395.4388

. probit highbp black female bmi age tgresult tgresult2

Page 9 of 9

**Question 4**

1,200 words limit

(a) [25%]

For a sample of individuals involved in car accidents you observe their insurance

reimbursement for health expenditure. Each individual got a reimbursement for

health expenditure up to a maximum of £100,000. You cannot observe the individual

health expenditure, but you can observe the reimbursement. Explain what type of

model you would use to explain the individual health expenditure in pounds using, as

explanatory variables, the value of the car before the accident in pounds, the age of

the individual in years, and the number of days of hospitalization of the individual.

Write down the model and explain how you would interpret the coefficients in this

model. Explain why the ordinary least squares estimator would be biased and

inconsistent.

(b) [20%]

By considering the sample in (4.a) and dropping all observations with health

expenditure over £100,000 consider a truncated regression. Define the truncated

model by adapting any formula to the specific example in (4.a) and explain why the

maximum likelihood estimator of this truncated model is consistent but inefficient.

(c) [30%]

Write down the likelihoods for the models discussed in (4.a) and (4.b) and define each

of the parameters and variables. Make sure to adapt formulas and discussion to the

specific example.

(d) [25%]

Now assume that you observe a sample of individuals involved in car accidents whose

insurance company has a fixed excess at £200 but no maximum threshold for

reimbursements for health expenditure. This implies that all individuals with a damage

for less than £200 do not get any payment from the insurance company. For this

reason, all people with a health expenditure for less than £200 did not file any claim

and are not included in the sample. Explain what type of model you would use to

explain the amount of health expenditure in pounds using, as explanatory variables,

the value of the car before the accident in pounds, the age of the individual in years,

and the number of days of hospitalization of the individual. Write down the model and

explain how you would interpret the coefficient in such a model.

Question?