Type of paper:Â | Essay |
Categories:Â | Data analysis Statistics |
Pages: | 5 |
Wordcount: | 1374 words |
The multiple regression project will involve a dataset that will help to approximate the stock price index (S & P 500). The stock market provides various opportunities for investors to raise money as risk it because the market relies on distinct aspects including the performance of foreign countries, unanticipated events, and overall observed national performance. S & P 500 is one of the most significant stock market indexes as it constitutes the 500 largest American firms across many industries and sectors. Majority of individuals invest their money to get a return on their investment. Investors inquire among themselves questions such as how to make money on the stock market and is there a way to forecast in some extent how the stock market will behave? There are various variables involved in how the stock market behaves at a specified time. There were seven factors that were used to influence the S&P500. The input variables are collected on a monthly basis from 1980 to 2011. The annual CPI was represented by the symbols v0 and v1. Secondly, the annual average PPI which was represented by v2. Thirdly, the annual average house price Index signified by v3 while the fourth factor, Annual average interest rate, was represented by v4. The fifth factor was Percentage Change of annual average GDP of US) represented by the symbol v5. Sixthly, Percentage Change of Annual Average GDP of Spain was represented by v6. Lastly, Percentage Change of Annual Average GDP of Germany represented by the symbol v7 was used in this context. The results of the multiple regression are displayed in Appendix B.
From a statistical viewpoint, the output of the regression can be interpreted for the convenience of the audience. The regression equation for the model is y = 0.3533x + 17.774. An increase in r-squared by 1 results to an increase in S & p 500 by 18.1273. It implies that the R-squared equals 0.515380787, which is a very good fit. The 51.54% of the variation in S & P 500 is illuminated by the independent variables v0, v1, v2, v3, v4, v5, v6, and v7. The closer to 1, the better the regression line fits the data. Hence, the relationship between the independent variables is strong. Assessing the relationship between the Y and X variables helps in obtaining the significance of independent variables.
The assumptions of the multiple regression model are evident. Firstly, there is a linear relationship between the outcome and the independent factors which are expressed by the R-Squared. The scatter plots illustrate that there is a linear relationship. Secondly, the assumption of multivariate normality is expressed in the regression model. Thirdly, the multiple regression presumes that the independent factors are highly correlated with one another. The assumption is evaluated using the variance inflation factor (VIF) values. Fourthly, the assumption of homoscedasticity is evident in the model. The presumption asserts that the variance of the error terms is equivalent across the values of the independent factors.
The multiple correlation coefficient is 0.717900262, and this explains that the correlation between the dependent and independent elements is positive. This statistic, which ranges from -1 to +1, does not illustrate the statistically significant difference of this correlation. The coefficient of determination, R2, is 51.54 % which implies that 0.515380787 of values fit the model. The adjusted R-Squared of 0.367887983 quantifies the explanatory power. This statistic is not interpreted as it is neither a % nor a test of significance (F-statistic). The standard error of regression is 0.133187449 which is estimated S & P 500 about the regression line.
The variance analysis section offers an evaluation of the cumulative variation of the dependent variable, in this case, study (the projected value of S & P 500 ) into the addressed and unaddressed portions. The SS regression (0.43389239) elucidates the variation addressed by the regression line. SS residual (0.407994621) is the incongruity of the dependent variable that is not explained.
The F-Statistic is quantified using the percentage of the MS regression to the MS residual. This statistic is then compared with the critical F-Value to assessment the null hypothesis. The p-value linked with the calculated surpass the calculated F-Value. Offering a comparison of this value with 5%, for instance, evaluates rejection of the null hypothesis.
The results of the estimated regression constituted the predicted parameters of the t-statistic, the standard error of the parameter, the corresponding p-value and the bounds of 95% and 90 % confidence intervals. The explanatory element is the seven factors that are t-statistic and less than a 5% significance level that surpass the critical values. The relationship between independent variables and S & P 500 won is positive. The coefficient is 0.331186474 which implies that an increase in the dependent variables increases the S & P 500 by 0.331186474. CPI, US PPI, HPI, Maturity Rate, US GDP, Spain GDP, Germany GDP are positively correlated with the projected S & P 500 essential in stock market returns (Bolin et al., 2013).
Introduction of an interaction term to a regression model can help simplify the understanding of the relationships among the factors and enable more testing of the hypothesis. A variable can be eliminated to improve the model.
The stepwise selection regression of the dataset does not improve the model. The R2 are biased high 0.363, 0.509, and 0.561. The F-Statistics of 21.690, 19.195, and 15.343 do not have a claimed distribution. The standard errors of the coefficients are too small (0.138426230, 0.123171454, and 0.118081723). The confidence intervals around the coefficients are too narrow (0.1 and 0.05). The p-values are low because of various comparisons and are problematic to rectify. The issue of collinearity is exacerbated.
Forward selection is a form of stepwise regression which starts with an empty model and includes variables in order. In every forward step, the statistician is bound to include one variable that offers the single best enhancement in the model. This regression does not improve the model. The regression has lower adjusted and predicted R-squared values. Adjusted R-squared lowers when a new variable does not improve the model by more than likelihood. The predicted R-Squared is a cross-validation technique that can also reduce. The cross-validation partitions help to evaluate whether the model is generalizable in the dataset. The adjusted R-Squared value is 0.347, 0.483 and 0.525.
The backward selection does not improve the model. This form of regression guarantees the similar optimality of the large subsets though might yield very poorly performing small subsets. The regression has lower adjusted and predicted R-squared values. These values become lower when a new variable does not enhance the model by more than chance. The values of the adjusted R-Squared value are 0.516, 0.531, 0.542, and 0.542
The chosen model is the second regression model that involves the inclusion of the dummy variable and interaction term. The intercept is the expected mean value of Y when all X equals to zero. It is important to start with a regression equation with a solitary predictor. If x sometimes is equivalent to zero, then the intercept is typically the anticipated mean value of Y at that value. If x is never equivalent to zero, then the slope has no intrinsic connotation. The motive of a regression model is to grasp the association between the predictors and the response. In case so, and in case x is never equivalent to zero, there is no interest in the intercept. It does not address anything regarding the relationship between X and Y.
I am comfortable with the model because of the large R-Squared implying low multi-collinearity. Changing the sample size of data set from 40 to 55 observations would be the top priority if permitted to do so. The statistician can rebuild the regression model by getting rid of highly correlated predictors from it. Secondly, in case there are at least two elements with a high VIF, it is important to remove it from the model. Finally, the statistician can utilize the Partial Least Squares Regression (PLS) or Principal Components Analysis to enhance the model. These regression methods reduce the number of predictors to a smaller sequence of the uncorrelated component.
Work Cited
Bolin, Jocelyn H. "Hayes, Andrew F.(2013). Introduction to Mediation, Moderation, and Conditional Process Analysis: A RegressionBased Approach. New York, NY: The Guilford Press." Journal of Educational Measurement 51.3 (2014): 335-337.
APPENDIX
Appendix A
Year S&P500 US US US US US SPAIN GERMANY
Change CPI PPI HPI Maturity Rate GDP GDP GDP
1978 0.153153 109.6 56.0986 135.6 0.0885 0.0451 0.0093 0.0511
1979 0.03125 113.6 57.48551 144.33 0.0849 0.0425 -0.0103 0.0192
1980 0.112225 82.4 48.15998 102.71 0.1143 0.0962 0.0221 0.0101
1981 0.199279 90.9 52.62937 107.79 0.1392 0.0969 -0.0013 0.0053
1982 -0.11805 96.5 54.72269 111.5 0.1301 0.0379 0.0125 -0.0039
1983 0.230179 99.6 55.62113 115.8 0.111 0.1139 0.0177 0.0157
1984 0.153153 103.9 56.78408 120.81 0.1246 0.0926 0.0178 0.0282
1985 0.03125 107.6 57.27663 126.97 0.1062 0.0737 0.0232 0.0233
1986 0.213287 109.6 56.0986 135.6 0.0767 0.0486 0.0325 0.0229
1987 0.270413 113.6 57.48551 144.33 0.0839 0.0757 0.0555 0.014
1988 -0.05293 118.3 59.49463 152.04 0.0885 0.0776 0.0509 0.0371
1989 0.139321 124 62.48219 160.05 0.0849 0.0648 0.0483 0.039
1990 0.191205 130.7 65.23701 165.04 0.0855 0.0451 0.0378 0.0526
1991 -0.04259 136.2 66.05396 168.17 0.0786 0.0425 0.0255 0.0511
1992 0.278319 140.3 66.89465 172.87 0.0701 0.0666 0.0093 0.0192
1993 0.046025 144.5 67.89209 176.88 0.0587 0.05 -0.0103 -0.0096
1994 0.086759 148.2 68.79928 181.41 0.0709 0.0631 0.0238 0.0246
1995 -0.01636 152.4 70.80365 186.94 0.0657 0.0432 0.0276 0.0174
1996 0.320623 156.9 72.43279 193.43 0.0644 0.0625 0.0267 0.0082
1997 0.247062 160.5 72.67027 199.95 0.0635 0.0605 0.0369 0.0185
1998 0.257289 163 71.90083 210.2 0.0526 0.0611 0.0431 0.0198
1999 0.296265 166.6 73.112 220.55 0.0565 0.0644 0.0448 0.0199
2000 0.141595 172.2 76.09006 234.66 0.0603 0.055 0.0529 0.0296
2001 -0.0631 177.1 76.68851 252.2 0.0502 0.0219 0.04 0.017
2002 -0.14631 179.9 76.1803 268.16 0.0461 0.0376 0.0288 0
2003 -0.21432 184 78.10867 284.85 0.0401 0.0642 0.0319 -0.0071
2004 0.264199 188.9 81.46671 311.47 0.0427 0.0631 0.0317 0.0117
2005 0.043169 195.3 85.95992 346.78 0.0429 0.0652 0.0372 0.0071
2006 0.082376 201.6 89.43668 371.6 0.048 0.0512 0.0417 0.037
2007 0.11373 207.3 92.85172 375.69 0.0463 0.044 0.0377 0.0326
2009 -0.3722 214.537 95.25031 336.68 0.0326 0.0011 -0.0357 -0.0562
2010 0.298066 218.056 100 322.88 0.0322 0.0456 0.0001 0.0408
2011 0.141548 224.939 107.78 310.5 0.0278 0.0364 -0.01 0.0366
2012 0.264199 188.9 81.46671 311.47 0.0427 0.0631 0.0317 0.0117
2013 0.043169 195.3 85.95992 346.78 0.0429 0.0652 0.0372 0.0071
2014 0.082376 201.6 89.43668 371.6 0.048 0.0512 0.0417 0.037
2015 0.11373 207.3 92.85172 375.69 0.0463 0.044 0.0377 0.0326
2016 -0.3722 214.537 95.25031 336.68 0.0326 0.0011 -0.0357 -0.0562
2017 0.298066 218.056 100 322.88 0.0322 0.0456 0.0001 0.0408
2018 0.141548 224.939 107.78 310.5 0.0278 0.0364 -0.01 0.0366
Cite this page
Essay Example Comprising the Multiple Regression Project. (2022, Oct 18). Retrieved from https://speedypaper.com/essays/multiple-regression-project
Request Removal
If you are the original author of this essay and no longer wish to have it published on the SpeedyPaper website, please click below to request its removal:
- Free Essay on Child Development
- The Renaissance Essay Sample
- Morse and Niehaus Research. Essay Sample.
- Essay Sample: Healthcare Delivery Systems and Nursing Practice
- The Truth Behind Demagoguery, Essay Example for Everyone
- The Bhagavad Gita Essay Example for Your Inspiration
- Essay Example - DateTechnology Regulation
Popular categories