Data Science & Machine Learning in Finance
— This assignment counts towards 35% of the overall course grade. This is an individual as sessment. Answer all questions. Submission to be made electronically via the course Moodle page. Each part specifies further instructions. The grading weights are described below:
Part 
1 
2 
3 
4 
Weight 
10% 
30% 
30% 
30% 
— Results should be reported in a clear format. Avoid reporting numbers in ‘scientific format’
e.g. 7.2031e06. All reported numbers should be rounded to two decimal points. For example, report 0.00 in place of 7.2031e06.
Report Organization The assignment requested results are described under Section (E.), Parts [1][4], clearly number each part as [1][4] in the report. The contents are to be structured as follows:
j j
The grading is carried out strictly based on the precision of results and clarity of visualisations. The final part is graded based on the relevance of finance analysis supported by the methodologies and empirical results.
Obtain data for the variables needed to construct and estimate the model in Section (D.). The data should cover the period 2000/01/032022/12/30, on a daily basis. When acquiring the data, en sure relevant characteristics, such as calendar dates and timestamps are obtained as these additional characteristics are essential throughout the data cleaning and dataset arrangement.
(r_{t}) real logreturns to be constructed based on Microsoft stock price, acquired from WRDS1
^{1}wharton.upenn.edu — Get Data, CRSP, Annual Update, Stock / Security Files, Daily Stock File
(r_{M,t}) real market logreturns to be constructed based on the S&P500 composite market index, acquired from WRDS
(r_{f,t}) real interest rates, associated with US 10year maturity treasuries acquired from FRED2 (CPI) The US consumer price index may be used to transform nominal data to real terms3
All series must be researched thoroughly to ensure consistency with other variables, in terms economic interpretations, units, frequency, and other characteristics.
The cleaned dataset must be arranged in both daily and weekly frequencies in preparation for various results requested in Section (E.).
^{2}fred.stlouisfed.org/ — key: DGS10
^{3}fred.stlouisfed.org/ — key: CPILFESL
Consider the capital asset pricing model characterised by the following specification, used to in terrelate the real excess logreturn, on a given asset r_{t} − r_{f,t} where r_{f,t} is the riskfree rate, to the market real logreturn denoted by r_{m,t}:

. In partic

utive but limited span of data, between the market excess logreturn r_{M,t} − r_{f,t} and an individual investment excess logreturn. The diagram below provides an illustration to describe overlapping windows (w), including a calendar year of data:
W1 W2 W3 W4 W5 W6 ... W48 W49 W50 W51 W52 W53 W54 W55 . . .
` . . .
˛¸ x
. . .
.` . .
. .
w1 ˛¸ x. . .
_{w} . .
` . 2 ˛¸ x .
.` w3
˛¸ x.
w_{4}
Figure 1: The timeline illustrates a rolling window setup, where each iteration includes a consecutive 52 weekly datapoints, where W1, W2, ... refer to week numbers throughout the entire sample and w_{i} refer to a rolling window indentifier.
Based on the dataset and instructions in Sections (B.)(C.), complete the following parts.
Part [1] Construct daily real excess MSFT logreturns (xr) and daily real excess market logreturns (xrm). Report (i) the precise values for two averages: xr% and xrm% over the entire sam ple in daily net logreturn averages (rounded to two decimal points) and (ii) a diagram depicting the cumulative daily logreturns {r_{t}, r_{M,t}, r_{f,t}}, overlaid within the same dia gram4 space, with the vertical axis showing the cumulative logreturns cumsum(rets) versus the horizontal axis, showing a representation of calendar time. (Mark: 10%)
Arrange the dataset, based on a calendar variable, such that all data are set to a weekly frequency. The construction must be based on the first trading days between two consecutive weeks. Proceed to parts [2][4] based on this data frequency. Each window within the rolling specification contains 52 adjacent weekly observations.
Part [2] Implement a restricted least squares model based on specification (1) in addition to the
^{4}Overlaid diagrams are created via various approaches, e.g. use plot(xseries,yseries); hold on, followed by additional plots in the same format and ending the last plot with hold off.
following constraints, implemented individually across two cases c_{j}, j = 1, 2:
α_{w} < τ (c_{2})

the results in three diagrams for each of the estimated variables alongside the time hori zon — for example, the first diagram depicts two α^_{w} series in expressions (c_{1})(c_{2}) (over laid in the same diagram on the vertical axis) versus time (horizontal axis, displayed as the year or a simplified date format), with the diagram legend identifying each series as the outcome for expressions (c_{1})(c_{2}).5 The illustration of lines must clearly be identifiable either with colors or line patterns. Clearly label all axes with variable names and their units. (Mark: 30%)
Part [3] Assume τ is now variable and implement the following optimization. Consider a linear regression between the LHS of expression (1) versus the lagged value of λ^(c ), i.e. two
w j
cases given j = 1, 2, xr_{t} = θ_{0} + θ_{1}λ^w,t−1(c_{j}) + ν_{t}. Compute the forecast MSE of a 1
step ahead predictions (based on 52consecutive observations to predict one week ahead, using weekly data) for both expressions (c_{1})(c_{2}) throughout the series horizon, computed separately once for all points in a grid for parameter τ ∈ [−1% : 0.01% : +1%] i.e. the points are defined as −1%, −0.99%, . . ., +0.99%, 1.00%. Based on the FMSE’s (lower
FMSE is better) for each of the 201 cases within the grid search, report the best value
τ ^{∗} for expressions (c_{1})(c_{2}) that generates the lowest FMSE, together with the FMSE(τ ^{∗})
j j
values for both expressions (no additional comments). (Mark: 30%)
Part [4] Explain with comments, why predictions based on expressions (c_{1})(c_{2}) should result in the FMSE values (ranking) above. Comments should draw on finance analysis in con nection with the empirical framework developed in Parts [1][3]. The research may refer to the two references (F[5], F[6]) (Mark: 30%, 500 words).
While calendar timeline may be handled via various approaches, using the following builtin libraries is recommended:
^{5}Similarly, a diagram including β^ series in expressions (c )(c ), and a separate diagram including λ^ series in
[G1] Lecture slides
[G2] Constrained least squares optimization with inequality constraints may be carried out using analytical or computational approaches such as lsqlin as described in The MathWorks Inc.
Optimization Toolbox: Solve Constrained Linear LeastSquares Problems (lsqlin), 2022
[G3] Frank A Wolak. An exact test for multiple inequality and equality constraints in the linear regression model. Journal of the American Statistical Association, 82(399):782–793, 1987
[G4] Chong Kiew Liew. Inequality Constrained LeastSquares Estimation. Journal of the American
Statistical Association, 71(355):746–751, 1976 [G5] S&P U.S. Indices Methodology
[G6] Microsoft Annual Report (2023)
https://apaxresearchers.com/storage/files/2024/02/18/9667BEv_09_33_20_instructions1.pdf
This Question Hasn’t Been Answered Yet! Do You Want an Accurate, Detailed, and Original Model Answer for This Question?
Copyright © 2012  2024 Apaxresearchers  All Rights Reserved.