** Chapter 3 ** for W.H. Greene, Econometric Analysis 6th ed. ****************** * (c) Noel Roy 2003, 2008 * * LEAST SQUARES * * The tutorial for this chapter will review the following tasks in SHAZAM: * • Reading data files (READ command) * • Setting sample size (SAMPLE and TIME commands) * • Creating variables (GENR command) * • Generating a time trend (TIME function) * • Generating lagged variables (LAG function) * • Producing descriptive statistics (STAT command) * • Correlation coefficients (STAT / PCOR) * • Ordinary Least Squares regression (OLS command) * • Analysis of Variance (OLS / ANOVA) * * You may run this command file in SHAZAM in order to replicate the examples * in Chapter 3 of Greene's textbook. * *=============================================================================== * * A line preceded by an asterisk is ignored by SHAZAM. Such lines will be * used as comment lines to briefly describe the command and options as they * are initially introduced to the user. For a complete description of these * and other options, see the SHAZAM's User's Reference Manual. * *=============================================================================== * * READING DATA FILES *Most of the data files which will be used in these tutorials consist of *plaintext data, delimited by spaces, with observations for all variables *on separate lines. Normally, the first line will consist of a header line *containing variable names. Such files can be read very easily using the *READ command with the NAMES option as follows: * * READ(filename) /NAMES * *The NAMES option is used to indicate that the NAMES in the header line *should be used (the forward slash symbol / is used to introduce options). *Variable names in SHAZAM may be up to 8 characters long and must consist *only of letters or numbers and start with a letter. Where the data file *comes without variable names, or where some of these names may not be *suitable, the variable names must be specified by the READ command *as in READ (filename) varnames. * *When the data are not in a separate file, the data directly follow the READ command. * *The filename can be any legal name in the current directory. If the file is *not in the current directory, the complete pathname must be given. The current *directory can be determined by giving the FILE PWD command, and can be changed *by the FILE CD folder command, where folder is the name of the new directory. * FILE PWD * * In the Professional Edition, the default folder for files can be set through * the Project/Options... menu item. * *While we will not be using this feature, the READ command can also read *data from an Excel spreadsheet (.XLS extension), if the XLS option *is given after the READ command. The first row of the spreadsheet *MUST contain variable names. * *While any filename can be used with a READ command, the Data Editor in *the Professional Edition will import space delimited data only from files *with a .PRN extension. While we will not be using the Data Editor in *these tutorials, its use has some advantages (see Chapter 45 of the *SHAZAM Manual for further information,so where the formatting of the file *permits, we have renamed data files to have a .PRN extension to permit its use. * *Further information about the READ command can be obtained from *chapter 3 of the SHAZAM Manual. * *=============================================================================== * * 3.2 LEAST SQUARES REGRESSION * * 3.2.2 (p. 22) Application: an Investment Equation * * First attempt to replicate Table 3.1 in the textbook, using the raw data from * Data Table F3.1 (see p. 947). We begin by reading the data file. * READ (TableF3-1.prn) / NAMES LIST * * If the current sample has not been set (which is the case here), SHAZAM *reads the data to the end of the file, then sets the sample implicitly *in accordance with the number of observations that have been read. *In this case, the sample consists of 15 yearly observations, so each observation *is denoted by a number from 1 through 15. The LIST option lists the data, which * can be useful with small datasets to confirm that the data are being read * correctly. * * Now Replicate Table 3.1. First, convert the nominal GNP and Invest variables * to real terms (deflating by CPI) and scale them so they are measured in * trillions (not hundreds of billions) of dollars. New variables are created * using the GENR (for GENeRate) command, as in * GENR Y=Invest/CPI/10 GENR G=GNP/CPI/10 * * The GENR command has the format GENR newvar=expression, where expression * is an arithmetic expression involving exiting variables, constants, and * mathematical expressions. The command is described further in Chapter 6 * of the SHAZAM manual. * * The GENR command supports a number of special functions. The TIME(x) function * generates a time trend beginning at value x+1. So the trend variable in Table * 3.1 can be generated by * GENR T=TIME(0) * * Also generate a variable for the inflation rate (percentage rate of change * in CPI). The LAG(x,n) function with the GENR command lags a variable x n * time periods. If the n is omitted, the series will be lagged one period. * Notice, however, that the value of this function is undefined for the first * observation, since we have no data for the preceding year. When this * happens, SHAZAM inserts -99999 (by default) for the missing observation. * This default value can be modified by the SET MISSV= command. * GENR CPILAG=LAG(CPI) PRINT Year CPILAG * * Appendix F tells us that CPI 1967 is 79.06. We can include this by setting the * sample to the first observation only, and changing the value of CPILAG to this * number, then resetting the sample. This is accomplished through the * SAMPLE command, which specifies the beginning and ending observations for * subsequent commands. * SAMPLE 1 1 GENR CPILAG=79.06 SAMPLE 1 15 PRINT Year CPILAG * * Now generate the Inflation Rate variable. * GENR P=((CPI/CPILAG)-1)*100 * * Without information for CPI 1967, we would have had to discard the first * observation (using the command SAMPLE 2 15) whenever we used the variable P. * IT IS A COMMON ERROR TO FORGET TO DO THIS. * * Print Table 3.1. * PRINT Y T G Interest P * * The regression results can be generated by the OLS command. The OLS command * performs Ordinary Least Squares regressions where the first variable listed * is the dependent variable. For example, the results on p. 23 can be obtained * by * OLS Y T G * * Note that SHAZAM by default estimates a constant term in the regression, and * there is no need to explicitly allow for it in the OLS command. However, * unlike most presentations, SHAZAM reports the constant term last. * * The results on the top of p. 25 can be obtained by the command * OLS Y T G Interest P * * The results produced by SHAZAM do not exactly replicate the results in the * textbook. This is because the textbook bases its calculations on the rounded * data in Table 3.1, while SHAZAM uses full-precision results from the raw data. * * *=============================================================================== * * 3.4 PARTIAL REGRESSION AND PARTIAL CORRELATION COEFFICIENTS * * Example 3.1 (p. 31): Partial Correlations * * The simple correlation coefficients can be obtained from the STAT * command. The STAT command computes means, standard deviations, variances, * minima, and maxima for the variables listed. The PCOV and PCOR options with * the STAT command print the matrix of covariances and correlation coefficients * of pairs of the listed variables. The column of simple correlation coefficients * between investment Y and the four regressors can be obtained from the first * column of the correlation matrix generated by the STAT command * STAT Y T G Interest P / PCOR * * The partial correlation coefficients are printed with the results of the OLS * command (in the sixth column). * * *=============================================================================== * * 3.6 GOODNESS OF FIT AND ANALYSIS OF VARIANCE * * Example 3.3 (p. 34-35) Analysis of Variance for an Investment Equation * * The OLS command automatically reports the coefficient of determination as * R-SQUARE, and also reports R-SQUARE ADJUSTED (see p. 35), and it also reports * the sum of squared errors (SSE). But it does not do an Analysis of Variance * unless the ANOVA option is used, as in * OLS Y T G Interest P /ANOVA * * The Amemiya Prediction Criterion PC (see p. 37) and a number of alternative * model selection criteria are also calculated. * * * Example 3.2 (p. 34) Fit of a Consumption Function * * This example is based on the Consumption (C) and disposable income (X) data * in Table F2.1. * * Since the investment data are no longer needed, it is good practice to delete * them from the workspace. * DELETE /ALL * * When referencing specific years in a time series, it is often convenient to * use an alternative form of the SAMPLE command in which the SAMPLE range is * specified in dates not observation numbers. Such a SAMPLE command must be * preceded by a TIME command which specifies the beginning year and frequency * for the time series. The general format is TIME beg freq, where beg specifies * the start of the series; freq specifies the frequency (e.g., 1-annual, * 4-quarterly, 12-monthly -- annual is default). Since Table F2.1 consists of * annual data for the period 1940-1950, we can use the commands * TIME 1940 1 SAMPLE 1940.0 1950.0 * * Note the use of the decimal in the SAMPLE command, which is used to indicate * the quarter or month (as appropriate) of the beginning and ending observations * but is necessary even with annual data in order to distinguish this form of * the SAMPLE command from the other form. * READ (TableF2-1.prn) Year x y W / SKIPLINES=1 LIST * * Note that we have chosen y rather than C as the variable name for consumption, *in order to use the same notation as in the text. Therefore, we must specify *the variable names in the READ command. (An alternative would have been to use *the RENAME command, as in RENAME C y). Because the data file has a header line *at the top, which we are not using, SHAZAM must be told to skip that line, *which is the purpose of the / SKIPLINES= option. * * The STAT command gives sample means of the listed data. The PCPDEV option * prints a cross-product matrix of the variables listed in deviations from the * means. * STAT x y / PCPDEV * * Analysis of variance: * OLS y x /ANOVA * * We can omit the war years 1942-45 from the sample simply by modifying the * SAMPLE command. Two or more discontinuous intervals can be chained together * as in * SAMPLE 1940.0 1941.0 1946.0 1950.0 OLS y x * * Alternatively, we can account for the war years by using the dummy variable W. * SAMPLE 1940.0 1950.0 OLS y x W * STOP *=============================================================================== * * Updated August 27, 2008.