** Chapter 6.4 ** for W.H. Greene, Econometric Analysis 6th ed. *****************
* (c) Noel Roy 2003, 2008
*
*                   MODELING AND TESTING FOR A STRUCTURAL BREAK
*
* The tutorial for this chapter will review the following tasks in SHAZAM:
* •	Use of temporary variables
* •	Use of dummy variables
* •	Storing covariance matrix of estimates (/COV= option)
* •	PDF and CDF calculation (DISTRIB command)
* •	Conditional execution (IF command)
* •	Tests of parameter instability: the Chow test (DIAGNOS command)
*
*===============================================================================
*
* Example 6.7 (p. 125) Structural Break in the Gasoline Market
*
* Repeat the setup in Example 4.4. 
TIME 1953.0 1
SAMPLE 1953.0 2004.0
READ (TableF2-2.txt) Year,GasExp,Pop,Gasp,Income,PNC,PUC,PPT,PD,PN,PS /SKIPLINES=1 
*
GENR lnGpop = LOG(GasExp/Pop/Gasp)
GENR lnincome = LOG(Income)
GENR lnPg = LOG(Gasp)
GENR lnPnc = LOG(Pnc)
GENR lnPuc = LOG(Puc)
*
* Replicate Figure 6.5
*
GENR G=GasExp/Gasp
GRAPH Gasp G /NOKEY
*
* Calculate time trend 
GENR t=year-1952
* Overall regression
OLS lnGpop lnPg lnincome lnPnc lnPuc t / LOGLOG
*
* 6.4.1 Different Parameter Vectors 
*
* Test for a structural break in the model after the OPEC price shock in 1973.
* The DIAGNOS command is used to perform numerous tests. The CHOWONE= option 
* with the DIAGNOS command performs the Chowtest done in the text. This option
* specifies the breakpoint for this test, including the final observation
* before the break. With 21 observations in the preshock period 1952-1973,
* the test is implemented using the coption CHOWONE=21. The DIAGNOS command must 
* immediately follow an estimation command. See chapter 14 of the SHAZAM
* Manual for further information on the DIAGNOS command.
* 
DIAGNOS / CHOWONE=21
*
* Now calculate the Chow test statistic the "long way". Note that $N (number
* of observations), $SSE (sum of squared errors), and $K (number of coef-
* ficients, including the constant term) are temporary variables available 
* following an OLS command.
*
* (Save the estimated coefficients and the covariance matrix of coefficients 
* using the COEF= and COV= options for use in calculating the test 
* statistic 6-17).
*
OLS lnGpop lnPg lnincome lnPnc lnPuc t 
GEN1 SSE=$SSE
GEN1 K=$K
SAMPLE 1953.0 1973.0
OLS lnGpop lnPg lnincome lnPnc lnPuc t / COEF=Theta1 COV=V1 
GEN1 SSE1=$SSE
GEN1 N1=$N
SAMPLE 1974.0 2004.0
OLS lnGpop lnPg lnincome lnPnc lnPuc t / COEF=Theta2 COV=V2 
GEN1 SSE2=$SSE
GEN1 N2=$N
*
* Now compute the test statistic.
*
GEN1 DF1=K
GEN1 DF2=N1+N2-2*K
GEN1 F=((SSE-SSE1-SSE2)/DF1)/((SSE1+SSE2)/DF2) 
PRINT F DF1 DF2
*
* The DISTRIB command will compute the probability density function (PDF) and
* the cummulative density function (CDF) for certain distributions. This is
* useful when adequate statistical tables are unavailable. The general format
* is: DISTRIB vars / options, where vars is a list of variables and options is
* a list of the options that are required on the specified type of distribution.
* The TYPE= option specifies the type of distribution, and the DF= (DF1= and
* DF2= option for the F-distribution) specifies the degrees of freedom. So,
* the command
* 
DISTRIB F /TYPE=F DF1=DF1 DF2=DF2
*
* gives the p-value of the test statistic (area under the tail of the
* distribution as 1-CDF. See Chapter 36 of the SHAZAM Manusl for further 
* information on the DISTRIB command.
*
* 7.4.2 Insufficient Observations
*
* Testing whether the observations for 1974, 1975, 1980, and 1981 
* are consistent with the unrestricted estimate.
* A period of four observations is too short to test for stability using
* the standard test, so we must use the expression (6-15).
*
SAMPLE 1953.0 1973.0 1976.0 1979.0 1982.0 2004.0
* This SAMPLE command defines the sample as 1953-1973, 1976-1979, 1982-2004.
* The four years 1974, 9175, 1980, and 2981 are excluded.
?OLS lnGpop lnPg lnincome lnPnc lnPuc t 
*
* Now compute the test statistic (6-15).
*
GEN1 DF1=4
GEN1 DF2=$N-K
GEN1 FSTAT=((SSE-$SSE)/DF1)/($SSE/DF2)
PRINT FSTAT DF1 DF2
DISTRIB FSTAT /TYPE=F DF1=DF1 DF2=DF2
*
* An alternative method of calculating this statistic takes the full sample
* with dummy variables for the years ofr which there is a structural break.
* These can be created using the DUM function, or, alternatively,
* by the IF command.
*
* Restore the full sample.
SAMPLE 1953.0 2004.0
* Define the dummy variables
GENR Y1974 = 0
GENR Y1975 = 0
GENR Y1980 = 0
GENR Y1981 = 0
*
* The IF command sets a variable at a certain value
* if a logical condition is satisfied.
*
IF (YEAR .EQ. 1974) Y1974 = 1
IF (YEAR .EQ. 1975) Y1975 = 1
IF (YEAR .EQ. 1980) Y1980 = 1
IF (YEAR .EQ. 1981) Y1981 = 1
* Estimate the equation with dummy variables included.
?OLS lnGpop lnPg lnincome lnPnc lnPuc t Y1974 Y1975 Y1980 Y1981
*
* Now compute the test statistic.
*
GEN1 DF1=4
GEN1 DF2=$N-$K
GEN1 F=((SSE-$SSE)/DF1)/(($SSE)/DF2) 
PRINT F DF1 DF2
DISTRIB F /TYPE=F DF1=DF1 DF2=DF2
*
* 7.4.3 Change in a subset of coefficients
*
* Test the restriction that the coefficients in the two equations
* are the same apart from the constant term.
* Estimate the pooled model with different constant terms CON1 and CON2.
*
* The constants can be generaged using the IF command.
*
GENR CON1 = 1
GENR CON2 = 1
IF (YEAR .LE. 1973) CON2 = 0
IF (YEAR .GT. 1973) CON1 = 0
*
* Since the constant terms are included in the variable list, the NOCONSTANT
* option must be used.
*
OLS LnGPOP CON1 CON2 lnpg lnincome lnPnc lnPuc t / NOCONSTANT
*
* Now compute the test statistic.
*
GEN1 DF1=K-1
GEN1 DF2=N1+N2-2*K
GEN1 F=(($SSE-SSE1-SSE2)/DF1)/((SSE1+SSE2)/DF2) 
PRINT F DF1 DF2
DISTRIB F /TYPE=F DF1=DF1 DF2=DF2
*
* Suppose that in the restricted model, the coefficients of LY, LPG, 
* and the constant may differ in the two periods.
*
GENR LY1 = CON1*lnincome
GENR LY2 = CON2*lnincome
GENR LPG1 = CON1*LnPG
GENR LPG2 = CON2*LnPG
?OLS LnGPOP CON1 CON2 LY1 LY2 LPG1 LPG2 LnPNC LnPUC t  / NOCONSTANT
*
* Now compute the test statistic.
*
GEN1 DF1=K-3
GEN1 DF2=N1+N2-2*K
GEN1 F=(($SSE-SSE1-SSE2)/DF1)/((SSE1+SSE2)/DF2)
PRINT F DF1 DF2
DISTRIB F /TYPE=F DF1=DF1 DF2=DF2
*
* 6.4.4 Tests of structural break with unequal variances
*
* Compute the Wald Statistic 6-17 testing the null hypothesis 
* that the difference between the parameters of the two time periods 
* is zero. This requires the use of SHAZAM's matrix manipulation capabilities.
* See chapter 33 of the SHAZAM manual for more information on the MATRIX command.
*
MATRIX W=(Theta1-Theta2)'INV(V1+V2)(Theta1-Theta2) 
PRINT W
* Note that the value of this statistic is far greater than that reported in 
* Greene's textbook (158.753). Apparently Greene (incorrectly) compared 
* the preshock regression with the overall (1953-2004) regression, not the
* postshock regression.
*
* The Wald statistic is asymptotic chi-square, so we can use the DISTRIB
* command to calculate its p value.
DISTRIB W /TYPE=CHI DF=K
STOP
*
*===============================================================================
*
* Updated September 2, 2008