Part A (20 marks)

File parta.dta contains information of a randomized intervention. In this randomized intervention 1,000 children were treated with a dosage of fish oils on a daily basis for three months.


The intervention then compared the test scores of the treated students with a group of students that randomly received a placebo. Neither of the participants knew whether they were given the real fish oils nor the placebo.


1) Using t-tests explain whether the treated and control groups have on average same characteristics before the intervention took place? Include the table of t-test along your response.                                                                                                          

                                                                                                                                     [5 marks]


2) Estimate the impact of the intervention, by comparing the outcome (the student’s test scores) after the intervention between the treatment and control groups. For this purpose, show the t-test table clearly explaining the impact of the intervention (if any) and whether this impact is statistically significant.     

                                                                                                                                      [5 marks]


3) Using an OLS regression estimate the impact of the intervention by comparing the test scores between the treatment and control groups whilst also controlling in the same regression for other covariates that might have affected the outcome. Explain if your results differ in sub-questions 2) and 3). If so explain which results are more reliable of the true impact.    Include the regression table along your response.                                

                                                                                                                                     [5 marks]


4) Test whether the OLS regression used in option 3) suffers from any violations for OLS to be reliable and BLUE. If there are any violations, then try correcting for these violations clearly explaining your rationale for these corrections. Note: Some of you will receive files where it will not be possible to correct some of these violations. In these cases, just explain briefly how you tried to correct. Advice, do not spend over an hour on this sub-question 4.

                                                                                                                                      [5 marks]






Part B (30 marks)


File partb.dta contains information of a non-randomized intervention. The intervention consisted of providing job training to people working in fast food industry in New Jersey in USA. The training provided courses on IT, numeracy and customer service. The people used as a “control group” were also working in the fast food industry but in Pennsylvania state. 


Independent researchers hope to investigate whether the intervention had any impact by comparing the change in earnings (measured in natural logarithm, learnings) in participants of the programme in New Jersey before and after the programme was implemented to those of the control group in Pennsylvania.


1) Estimate and interpret the impact of the programme using the difference-in-difference estimator using panel fixed effects. Provide the regression table along your response.

                                                                                                                                    [10 marks]


2) Estimate and interpret the impact of the programme using the difference-in-difference estimator combined with kernel matching. To match people use the following variables: bk kfc mc wendy. Provide the regression table along your response.

                                                                                                                                    [10 marks]


3) With the data provided, test whether the treatment and control groups are statistically similar before the intervention took place and discuss whether this might affect the reliability of the difference-in-difference estimators obtained above. Provide the stata output that help you support your answer.

                                                                                                                                    [10 marks]



Part C (50 marks)

File partc.dta contains information from a real policy programme implemented in Colombia in the 1990s that aimed at increasing education attainment among poor people. To this end, the World Bank gave a secondary school voucher to poor children that wished to continue with their education at secondary level.


These vouchers covered about half of students’ schooling expenses and were renewable depending upon students’ performance.


Given that the programme did not have enough funds to give vouchers to all poor children, these vouchers were randomized through a lottery among eligible households.


The variable won­_lottry denotes whether the student won=1 or lost the lottery=0.


The variable use_fin_aid denotes whether the student used the voucher or any other sort of scholarship=1 or not=0.


To estimate the impact of this school voucher programme, all students were tested after the intervention.  The file provides information on the students’ tests scores (lscores) including those who won and not the voucher. Note that this test score variable is already measured in natural logarithm.





Questions for part C:

1) Using a simple OLS regression estimate the following regression:

lscores =a+b1 won­_lottry + b2  male+ b3 base_age + error


Provide the regression table and interpret the coefficient of having won the lottery (variable won­_lottry). In your interpretation be clear on whether this variable has a significant impact on the dependent variable, the scores obtained (lscores), and the magnitude of this coefficient.                                                                                                     

                                                                                                                                    [10 marks]


The regression estimated in question above is likely to be biased. As you can see in the dataset, some students that won the lottery ended up not using the voucher. Also some students that did NOT win the school voucher still managed to go to secondary school as obtained other scholarships or funding (use_fin_aid). Thus, a simple comparison in test scores between winners and losers of the lottery is likely to give a biased estimate of the intervention. 


Thus, researchers from the MIT and Stanford have suggested to identify the effect of this intervention on test scores using instrumental variables. 


These researchers suggest to investigate what is the impact of use_fin_aid on test scores. Since use_fin_aid is likely to be endogenous, the researchers suggest to use the variable lottery (won­_lottry) as its instrument.


The researchers argue that having won the lottery (won­_lottry) is a good instrument as it is random, and very closely correlated to having obtained a school voucher.


2) So your tasks for question 2. Run an instrumental variable regression using as main dependent variable, lscores, the test score variable.


The main covariate of this regression is use_fin_aid. Since use_fin_aid variable is likely to be endogenous, use as instrument whether the student was winner or not of the lottery (variable won­_lottry).  In your IV regression also control for male and base_age as additional covariates.


Interpret your results of both the first and second stage IV regression (the regression coefficients). The results of both stages need to be presented as well as tables.                                                                                

                                                                                                                                    [20 marks]


3) Explain what characteristics a good instrument should have to deal with endogeneity and whether the instrument used in question above satisfies these characteristics. Show exactly all the tests you used to formulate your answer.                                                                                                             

                                                                                                                                    [10 marks]



4) Based on an endogeneity test, which regression offers a more reliable estimate of the impact of the intervention, the OLS or the IV regression?  Present the results of the endogeneity test and bear in mind the response you offered in question 3.


                                                                                                                                    [10 marks]

