See attached below.
1. Open dataset “mortgage_rates_and_house_prices” that depicts 30-year fixed rate mortgage interest rate and an index that summarizes house prices in the U.S., between 1975 and 2023, observed quarterly. Use these data to answer all of the following:
a.
Generate the clock variable that starts at 1 in the first row and increases by one in each subsequent row until the last row.
Name this clock variable “time”. Then,
declare dataset as time series using this clock variable.
b. Use the “time” variable in a simple
regression setting to answer two separate questions concerning linear trends of these two series:
i. What is the
change in the house price index per each new quarter, on average?
ii. What is the
change in the mortgage rate per each new quarter, on average?
c.
Depict the mortgage rate and house price index on
two
separate plots.
Show them. Using
eyeballs
only, do those time series variables look stationary or non-stationary to you?
Why?
How can you tell?
Explain in plain English.
d. Use
two different flavors of the Dickey-Fuller test – the DF test
with trend and the DF test
with drift – to determine whether mortgage rates and house prices are either non-stationary, trend-stationary, or difference-stationary. What do you
conclude?
Explain your reasoning.
i.
State the null and the alternative hypotheses of the DF test with trend.
ii.
State the null and the alternative hypotheses of the DF test with drift.
iii.
Explain why you rejected or failed to reject the null hypotheses.
e. If both mortgage rates and house prices are non-stationary, check to see if they are cointegrated
using the Engle-Granger procedure. You will recall that a
regression is involved in this procedure, so use good judgment to decide which series will be dependent, and which independent variable.
i. Describe every step, in words, that you are taking in your .do file.
f. Based on the Engle-Granger test output, would running a regression between the house price index and mortgage rates result in spurious results?
Why or why not? Explain.
g. Based on the results of the DF tests
, conduct the appropriate treatment to induce stationarity in mortgage rates and house prices. Then,
depict their stationary versions on two separate time series plots.
Show them.
h. Now, create a
detrended version of house prices.
Name is “detrended_prices”.
i. Then,
subject the detrended house prices to the
DF test with drift.
ii.
What are null and alternative hypotheses? What is the
conclusion of this test?
iii.
Show the detrended house prices on a time series plot.
i. Now, create the
differenced version of house prices.
Name it “differences_prices”.
i. Then,
subject the differenced house prices to the
DF test with drift.
ii.
What are null and alternative hypotheses? What is the
conclusion of this test?
iii.
Show the differenced house prices on a time series plot.
j. Now,
extract a cointegrating relationship between house prices in level (raw form, not detrended or differences) and mortgage rates in level (raw form, not detrended or differences) using a simple vector error correction (VEC) model. Again, use the appropriate variable as dependent here, and the other as independent.
i. Note the exact
cointegrating equation in Stata output once you run this model.
ii.
Write that equation out.
iii.
Interpret the coefficient of the independent variable here.
iv. Recall that unless any variable is explicitly logged, you default to a lin-lin interpretation, whatever the unit the variable is measured in.
2. Open dataset “airtravel_monthly” that gives a long-run time series of
monthly domestic airline passengers in the United States from October 2002 through July 2023.
a.
Generate the clock variable that starts at 1 in the first row and increases by one in each subsequent row until the last row.
Name this clock variable “time”. Then,
declare dataset as time series using this clock variable.
b. Run the regression between current number of passengers as dependent, linear trend as independent, and also X lags of the number of passengers as independent variables, where X should be
identified correctly using the Akaike Information Criterion (AIC).
c. Then,
generate the
predicted (fitted) values of the number of passengers, your dependent variable.
d.
Place both actual and predicted series on the
same time series plot.
e. Perform
Chow structural break test with an
unknown break date
using the model you estimated part (b). Notice the estimated break date (at the given clock variable value). What month and year does it correspond to? Explain why the test finds what it does.
3. Open dataset “Airlines” which contains the daily number of airline passengers and
daily number of new COVID-19 cases in the United States between March 1 and October 31, 2020. Use these data to answer the following:
a.
Generate the clock variable that starts at 1 in the first row and increases by one in each subsequent row until the last row.
Name this clock variable “time”. Then,
declare dataset as time series using this clock variable.
b.
Create the time series plot of airline passengers.
c. Perform the
Granger causality test on between new COVID cases and the number of airline passengers.
Use the optimal number of lags, as
determined by AIC for both variables, in the vector autoregression (VAR) model to set up the Granger test. Recall that in a VAR, it doesn’t matter what you initially select as dependent and independent variables, since you’re estimating a system where they swap places anyway.
i.
Describe every step that you are taking for this test procedure in your .do file.
i.
State the
null
hypothesis of the Granger test.
State the
alternative hypothesis.
ii. Using
p = 0.10 as the threshold for statistical significance (as is sometimes done), what do you
conclude based on the findings of the Granger test? Which variable “causes” which, if any? Which doesn’t cause which, if any?
4. Carefully address each of the sub-problems, as successful completion of one part requires the preceding part to be correct also. Use dataset
“Employment_06_07.dta”.
a.
Generate a binary dummy variable called “white” that equals 1 if a person is white, and 0 otherwise. Rely on the non-binary indicator “race” to generate “white”.
b.
Generate log of earnings by taking the natural logarithm of “earnwke”. Call it “log_earning”.
c. Run regression with log_earnings as dependent, and “white”, “union”, “age”, “unemployed”, and “female” as independent variables.
Interpret the magnitude (size) of each effect (coefficient) for each independent variable. Use robust standard errors.
d. Now, run the same regression, but with an added interaction term between “white” and “female”. Don’t generate the interaction term manually.
Use the hash symbols instead.
e. From the above regression in (d), give the actual magnitude (size) of the effect
of being white on earnings, when female = 0.
What is this effect relative to?
Be sure to interpret coefficients correctly. Is this log-lin, lin-log… regression? Use the table in Notes Set 8 to help you with interpretation.
f. From the above regression in (d), give the actual magnitude (size) of the effect
of being white on earnings, when female = 1.
What is this effect relative to? Be sure to interpret coefficients correctly. Is this log-lin, lin-log… regression? Use the table in Notes Set 8 to help you with interpretation.
h. Give the actual magnitude (size) of the effect on earnings
of being white female relative to
white male?
i. Use the command that
starts with margins, dydx(… to confirm your findings in (e) and (f) on the impact of “white” on earnings when the person is, separately, female and male.
j. Use command marginsplot immediately after the preceding command to
generate the plot of these conditional marginal effects.
Which is statistically significant?
Which is insignificant?
Explain how you know.
Page 2 of 2