1: Identify the three most expensive ushabtis in the dataset and describe their attributes. Are they statistical outliers? Do you think there may have been a mistake in recording the entries? If so, suggest a suitable remedy for dealing with these entries. (10 marks)
ANSWER: The ushabtis with lot numbers 27, 155 and 84 are the most expensive, second most and third most expensive respectively. Looking at their distribution in the boxplots below, the point are clearly statistical outliers and are caused by erroneous entry. This can easily be avoided by double entry, cross-checking or using modern digital data entry tools that facilitate data quality measures.
2: Display the distribution of "hammerpriceinpounds" and its (natural) logarithmic transformation in histograms. Explain whether you would use the raw data or the logarithmic transformation for OLS regression analysis. (10 marks)
ANSWER: I would use a log-transformed price of hammer as opposed to a non-transformed price because, as the histograms below show, the log transformed observations closely conform to a normal distribution which is a major requirement for an ordinary least squares (OLS) regression. The latter is heavily positively skewed and hence violates the parametric requirement for OLS regression and thus is not suitable for the said regression technique.
3: There are 5 different categories of material. Create a single graph with boxplots comparing lnhammerprice for wooden, stone, and faience ushabtis. (10 marks)
ANSWER: The figure below shows three box plots for natural (ln) logarithmic transformation of the price of hammer in dollars for three types of material; wood, faience and stone. The median price for stone is clearly higher than those of wood and faience, and has no outliers as opposed to wood that has one above the 13 and faience that has two; one below 7 and ad the other above 11.5.The distribution of observations for wood is more close than that of stone and faience, as supported by the upper and lower whiskers, suggesting a smaller variance.
4: Display the relationship between "lnhammerprice" and your "lowdate" in a scatterplot. Does the plot suggest a significant correlation between the variables? Describe the relationship in appropriate statistical language. (10 marks)
ANSWER: The scatter plot below between the natural log of hammer price in dollars versus low date gives a hint at a weak negative correlation. That is, hammer price declines with increasing values of low date. Further correlational analysis reveals the correlation coefficient for this relationship -0.28. Moreover, the relationship is evidently linear.
5: We have data from three auction houses: Sotheby's, Christies, and Bonhams and two locations (London and New York). Use (appropriate) t-tests to explore the following hypotheses. Give a detailed explanation of your method and findings.
Christies sells more expensive ushabtis than Sotheby's on average. (10 marks)
Higher priced ushabtis are traded in London (10 marks)
ANSWER: (a). A variable derived from natural log of hammer price in dollars was generated for Christies and another one for Sotheby's. The mean for two variables were compared using independent sample t-test technique, with unequal variances, to check whether the mean price of hammer Christies was greater than that of Sotheby's, as per the following hypothesis
Ho: Mc<=MsHa: Mc>MsWhere Mc stands for mean hammer price in Christies and Ms stands fro mean hammer price in Sotheby. The log-transformed hammer price has been used so as to meet the parametric requirement for t-test analysis. This resulted in a p-value of 0.0471, meaning we reject the null hypothesis and conclude that Mc>Ms. Therefore, Christies sells more expensive ushabtis than Sotheby on average.
ANSWER: 5 (b). To test the hypothesis that London sells the most expensive hammers of all the other auction towns (Bonhams, Christies, Sotheby, New York) we assess the means of the hammers traded in all the five cities using Analysis of Variance ANOVA. We generate five variables based on hammer prices and whether there were sales in the respective towns. Running an ANOVA showed there is a significant (p-value=0.015) difference in the means across the five towns. We go ahead to run a post-hoc test after ANOVA to find out which towns have different means than the rest. Bonferroni test was conducted for this purpose, confirming London.
6: Use bivariate OLS regressions to explore the correlation between the following characteristics and lnhammerprice paid. Report all your findings in a single table. In each case carefully explain your findings and highlight the statistical significance in the table using the *-system (with a legend). (5 marks each, 20 marks overall)
a)lowdateb)sizeincmc)Year (i.e. is there a linear time trend in sales prices)
d)Whether an object has been "published" (i.e. is considered of scholarly importance)
ANSWER: The table below shows the coefficients, p-values and 95% confidence intervals for bivariate linear regression where the natural log of hammer price is the dependent variable. Low date has a linear coefficient of -0.000937, this means for a unit increase in low date, natural log of hammer price decreases by 0.000937. This coefficient is very statistically significant with a p-value of 0.002 and lying between -0.00152 and -0.0003585. Similarly, a unit increase in size produces a highly significant (p-value<0.0001) increase in natural log of hammer price by 0.119355, which lies between 0.078573 and 0.1601368. A unit increase in year as well leads to a very significant (p-value=0.002) increase in the natural log of hammer price by 0.0725211 which lies between 0.027072 and 0.1179701. However, a change in the status of publish from "not" to "yes" leads to an insignificant (p-value=0.09) increase in the natural log of hammer price by 0.4461607 which lies between -0.07065 and 0.9629729.
BIVARIATE LINEAR REGRESSION
ln(Hammer price) coefficient p-value [95% Confidence Interval]
Low date -0.000937 0.002 -0.00152 -0.0003585
Size (cm) 0.119355 <0.0001 0.078573 0.1601368
year 0.0725211 0.002 0.027072 0.1179701
Published 0.4461607 0.09 -0.07065 0.9629729
7: Use multivariate OLS regression analysis to explore the effect of material on the hammer price. In your model use "lowdate" and "Size", and "Year" as control variables and test whether wooden, metal, stone, and faience objects command a price premium (compared to composition / terracotta / other).
a) Display your results in a table with all important information included and statistical significance highlighted. (10 marks)
b) Explain your results in plain English. (10 marks)
ANSWER: 7 (a).
Multivariate OLS Regression Analysis
Ln (hammer price) Coefficient p-value [95% Conf. Interval]
wood -0.6059513 0.430 -2.12054 0.908633
stone 0.3336815 0.654 -1.13884 1.806207
faience -0.5714101 0.429 -1.99609 0.853265
metal 0.0439654 0.959 -1.66097 1.748900
Low date -0.0000142 0.969 -0.00074 0.000707
Size (cm) 0.1246007 <0.0001 0.078841 0.170360
Year 0.1039464 <0.0001 0.066591 0.141302
Error term -201.1890 <0.0001 -276.383 -125.995
ANSWER: 7 (b). The table in 7 (a) above shows the multiple linear regression coefficients, p-values and the 95% confidence intervals. Wood, faience and low date have insignificant negative coefficients of 0.6059513, 0.5714101, and 0.0000142 with p-values 0.430, 0.429, and 0.969 respectively. The 95% confidence interval are (-2.12054, 0.908633), (-1.99609, 0.853265), and (-0.00074, 0.000707) for wood, faience and low date respectively. Such a negative relationship implies inverse association, i.e. the use of wood reduces hammer price by 0.6059513, use of faience reduces hammer price by 0.5714 and a unit increase in low date corresponds to a decrease in hammer price by 0.0000142, keeping all other factors constant.
On the other hand, stone, metal, size and year have a positive relationship with hammer price. That is, use of stone leads to an increasing hammer price by 0.3336815 with an insignificant p-value of 0.64 and lying between -1.13884and 1.806207. Use of metal is associated with an increase in hammer price by 0.0439654 with an insignificant p-value of 0.959 and lying between -1.66097 and 1.748900. Increase in size is associated significantly (p-value<0.0001) with an increase in hammer price by 0.1246007 which ranges sfrom0.078841 to 0.170360. Finally, a unit increase in year significantly increases hammer price by 0.1039464 ranging from 0.066591 to 0.141302 with a p-value less than 0.0001. The constant term of -201.1890 ranges from -276.383 to -125.995 is significant with a p-value below 0.0001.
Cite this page
Stata Assignment. (2022, Dec 06). Retrieved from https://speedypaper.com/essays/stata-assignment
If you are the original author of this essay and no longer wish to have it published on the SpeedyPaper website, please click below to request its removal:
- Student Leadership Activities Essay Sample
- Free Essay on Total Quality Management in the BT Company
- Marketing Research Essay Example at No Charge
- Licensing Agreement - Definition and Example, Free Paper for Everyone
- Apocalypto - Movie Review Essay Example
- Subject: Briefing Paper to the Minister on Tax and Pensions
- Essay Sample on Impact of the Foreign Waste Ban