Task 1: Summarising Data
Demographics
The company has 474 employees composed of 216 males and 258 females. The average salary paid by the company is Â£5441.36. The highest earning employee is paid Â£25876.91, while the lowest earner in the company takes Â£2304.25. The income disparity between the lower earner and the highest earner is Â£23037.04. On average, most employees earn Â£4787.21. The average age of most employees is 37.186 years and the oldest person is 64.50 Years while the youngest person in the company is 23 years old. Most of the employees are 30 years. The mean salary now is13819.093 and the lowest salary now is Â£2807.42 while the highest salary now is Â£43896.31. Most people earn salaries above average.
Tem main issue facing the company include
1. Gender inequality. There are fewer women compared to men. For example, there are 216 female employees compared to 256 men.
2. Gender is also seen in incomes as the salary of some employee is relatively lower as compared to the salary of the male employees in the same position.
3. Most of the people in the higher position are men compared to only a few women.
4. Finally, the income or salary despite is also higher as the highest income earner takes home Â£54100 and the lowest salaried employees earn Â£2807.42. The range of the salary now is Â£41080.77.
5. The management need to understand the despite and sand implement solution aimed at ensuring that there equality in the company (Decker, Kipping, and Wadhwani, 2015).
There are trey variables.
1. Classify the ten variables by type (that is, nominal, ordinal, interval, ratioscale), and say whether 1. They are quantitative or qualitative.
Variable 
Type 
Data type 
IDNUM 
ordinal 
Quantitative 
SEX 
nominal 
qualitative 
SALBEG 
ordinal 
Quantitative 
AGE 
ratioscale 
Quantitative 
SALNOW 
ordinal 
Quantitative 
JOBCAT 
nominal 
Quantitative 
ISWELSH 
nominal 
qualitative 
EDLEVEL 
nominal 
Quantitative 
AGEBAND 
interval 
Quantitative 
SALNBAND 
interval 
Quantitative 
2. Using Excel, present the Job Category as a Pie Chart and as a Simple Bar Chart. Pay attention to labels, titles, and general neatness
3. Using Excel, use sensible grouping to organize the Salbeg data into a grouped frequency distribution and cumulative frequency distribution.
4. Using Excel construct a Histogram, Frequency Polygon and a Relative Cumulative Frequency curve for the data obtained in Q3 (the variable Salbeg). Pay particular attention to scale, labels, titles and general neatness.
..
AndersonDarling 
NonNormal at 0.01 
ASquared 
8.202 
0.000 

95% Critical Value 
0.787 
99% Critical Value 
1.092 
Mean 
6544.084 
Mode 
5900.000 
Standard Deviation 
2719.522 
Variance 
7395798.801 
Skewedness 
2.296 
Kurtosis 
5.862 
95.000 

Std Err 
279.017 
Minimum 
3980.000 
1st Quartile 
5000.000 
Median 
5900.000 
3rd Quartile 
6500.000 
Maximum 
18896.000 
Range 
14916.000 
Confidence Interval 
553.995 
6.By putting lines on the Relative Cumulative Frequency Curve estimate the median and quartiles of Salbeg. (can draw the lines using â€˜Insert>Shapes>Lines' tool or handdraw them and scan the sheet in with your submission) What are the estimated values?
Minimum 
3980.000 
1st Quartile 
5000.000 
Median 
5900.000 
3rd Quartile 
6500.000 
Maximum 
18896.000 
Range 
14916.000 
6. Is the variable Salbeg symmetrical, negatively skew or positively skew? Write a short note summarizing the evidence for your answer.
The variable is positively skewed. The Skewedness is 2.296 positively skewed meaning that the symmetry of the probability distribution of the variables about the mean is positive. In this case, the model is significantly smaller than the median which is also much less than the sample mean. The data is normally distributed, and the longer tail is on the positive side to the graph peak or skewed or heavy to the right (Eriksson and Kovalainen, 2008).
Task 2: Regression & Correlation
1. Use Excel to obtain scatter plots and coefficients of correlation (r values please â€“ not R2) for the following pairs of variables:
(a) Salary Now and Age.
R=0.145. The R values indicate that the model does explain all the variability of the response data around its mean.
SUMMARY OUTPUT Force Constant to Zero
FALSE
Regression Statistics
Multiple R0.145
R Square0.021Goodness of Fit < 0.80
Adjusted R Square0.019
Standard Error6753.012
Observations474
ANOVA
dfSSMSFPvalue
Regression14610194610110.100.002
Residual4722152445603
Total 473 21985 Confidence Level
0.950.99
Coefficients Standard Error t Stat Pvalue Lower 95% Upper 95% Lower 99% Upper 99%
Intercept 16933.66 1027.505834 16.48035721 0.000 14914.61155 18952.71481 14276.24 19591.09
AGE 83.76 26.34236445 3.179525414 0.002 135.5190317 31.99340273 151.885 15.6273
y = 16933.663 83.756*AGE
RESIDUAL OUTPUTPROBABILITY OUTPUT
(b) Salary at Beginning and Age.
R=0.006. The value of R indicates that the model does not explain all the variability of the response data around the mean.
SUMMARY OUTPUTForce Constant to Zero
FALSE
Regression Statistics
Multiple R0.006
R Square0.000Goodness of Fit < 0.80
Adjusted R Square0.002
Standard Error11.799
Observations474
ANOVA
dfSSMSFPvalue
Regression12.52032.5200.018100.893
Residual47265715.6139.2
Total 473 65718.18 Confidence Level
0.950.99
Coefficients Standard Error t Stat Pvalue Lower 95% Upper 95% Lower 99% Upper 99%
Intercept 37.34235205 1.28132046 29.1436 0.000 34.8245 39.860 34.02 40.65
SALBEG 2.32938E05 0.000173132 0.13454 0.893 0.00036 0.0003 0.00047 0.0004
y = 37.342 0*SALBEG
(c) Salary Now and Salary at Beginning.
R=0.870. The value of R indicates that the model explains all the variability of the response data around its mean. Which means that the salary at the beginning is a good predictor of the salary now?
SUMMARY OUTPUTForce Constant to Zero
FALSE
Regression Statistics
Multiple R0.870
R Square0.757Goodness of Fit < 0.80
Adjusted R Square0.75
Standard Error3364.85
Observations474
ANOVA
dfSSMSFPvalue
Regression116641625972166416259721469.8191410.000
Residual472534409114511322227
Total 473 21985717118 Confidence Level
0.950.99
Coefficients Standard Error t Stat Pvalue Lower 95% Upper 95% Lower 99% Upper 99%
Intercept 1125.406899 365.3930181 3.079990158 0.002 407.4086708 1843.405128 180.3962 2070.418
SALBEG 1.892827702 0.04937182 38.33822037 0.000 1.795811948 1.989843456 1.765138 2.020517
y = 1125.407 +1.893*SALBEG
RESIDUAL OUTPUTPROBABILITY OUTPUT
ObservationsPredictedSALNOWResidualsStandard ResidualsSorted ResidualsPercentileSALNOW
111157.39372377.393720.11228###########0.105493460
210589.545411549.545410.46100###########0.316466400
Produce the results. Write a short note on what you think these graphs and r values tell you about the association between these variables.
From the association of the three variables. Age is not a good predictor of the salary now, and neither does it predicts the salary at the beginning. However, salary at the beginning can significantly predict the salary now (Bajpai, 2011). Which mean that if the entry salary of an individual is higher, then their salary now will be higher and if their beginning salary is low, then their salary now would be significantly low. Additionally, there are many other factors that may also inherently contribute in the value of salary including experience, age, employment category, educational level, and the ethnic background such as either one is welsh or not welsh. It is important to note that conducting a multivariate analysis can help in determining if among the many factors there are some that significantly predictor the independent variable (Balsley, Clover, and Clover, 1988).
2. Use Excel to calculate the regression equation of Salary Now on Salary at Beginning and to draw the line of this equation through the scatter plot.
Write out the equation in full.
For someone who began with a salary of 20,000, what would your regression line predict they would earn now? Would you have much faith in the accuracy of this forecast? Briefly, explain your answer.
y = 1125.407 +1.893*SALBEG
Salary Now=1124.407+ (1.893X salary at the beginning)
Foe someone who started with a salary of only 20,000, their salary now would be:
Y=1125.407+ (20,000X1.893)
Y=1125.407+37860
Y=38985.407
The salary now for the individual would be38985.407. From the equation, it is clear that the beginning salary is a good predictor of the salary now. However, the accuracy of the forecast is not 100% as there are a number of other factors that can influence the salary now apart from beginning salary. For example, professional level of education, task or nature of the job and other economic factors. However, in the absence of these extraneous factors, the beginning salary is a good predictor or salary now (Chesneau, 2007). Beginning salary significantly be predicted the salary now at F (1, 472) = 1469.819141, p<0.000 with an R2 of0.757. Beginning salary also explained a significant proportion of variance in salary now (Clover and Balsley, 1979).
Task 3: Probability
1. Use Excel to construct a pivot table (twoway table) of Sex and Educational Level. Submit this table. Then submit the table as if it was to be included in a page of a report: that is it must be completely selfexplanatory with titles etc.
Row LabelsSum of SEX
030
1128
233
324
41
50
Grand Total216
2. If a member of staff is selected at random, what is the probability that they are:
(a) In the Professional Qualification category.
4%
(b) In the No Formal Qualification category.
11%
(c) Female.
67.90%
Row Labels Sum of the EDLEVEL
067.90%
132.10%
Grand Total100.00%
(d) Male.
32.10%
(e) In the Professional Qualification category given that they are female.
P (professional qualification) and probability (female)
=422/100x6790/100
=Probability that event A and event B occurs, P (AâˆªB): 0.692482
(f) In the Professional Qualification category given that they are male.
P(Aâˆ©B): 0.01344
(g) In the No Formal Qualification given that they are female.
0.07579872
(h) In the No Formal Qualification category given that they are male.
0.035712
3.Is Educational Level independent of Sex in this company? Explain your answer.
Education level is dependence on sex in the company. Most of the professional qualification workers are highly educated and are mostly men compared to the women. This means that most of the people in a position of power were men as compared to women while most women worked menial jobs in the company. This finding indicates the higher level of gender inequality that should be addressed by the top management. There may be possibilities that the management pony employs the most qualified person to the higher position and unfortunately, the women are not highly educated. As usual, correlation does not mean causality. Additionally, there is also significant income disparity as the range between the highest earners, and the lowers earner sis quite large. A range of $50,640 is too huge for a company. The income gap should be reduced by either increasing lowincome earners salaries or reducing the salaries of the highest income earners.
Task 4: Probability Distributions
1. Which variables in the file are continuous?
Only AGE is continued variable
2. A new office, within the company, is to be established with eight occupants. Using the proportion of males to females in the company, and assuming an independent random selection of each office member with constant probabilities, draw up a probability distribution for the number of females in the office.
3. The Personnel Manager of the company has said: "salaries are fairly distributed in this firm over the whole salary range for both men and women, and we guarantee that (eventually) everyone will have a salary above 15,000".
Assume that by 'fairly distributed" the personnel manager means 'normally distributed.' Given this assumption, what would be the probability now of earning at least Â£12170.88 for males and females (separately)? What are the actual numbers (males and females) earning at least 1Â£12170.8? Does a comparison of these calculations support the personnel managerâ€™s statement? Explain your answer.
The actual number of male and females that are likely to earn at least $15,000 is the difference as only 40% males, and 64 females are likely to earn more than $15000. The actual number of males and females that are likely to earn more than Â£12170.88 is 48% and 65% males and females respectively. The comparison of these calculations does not support those personnelâ€™s managerâ€™s statement because the relationship is not linear to be predictable. There are other extraneous factors that can influence the salary now of men and women. In ideal situation, the finding might come true as there are no other factors that might affect relationships and decision at work.
Reflection
The statistics and task report reveals that the company is considerably large with a significant number of employees who are well paid. Despite the development aspects achieved by the company, it has some challenges affecting its system, most of which influence the workers directly, for instance, the gender inequality issue. The use of the various statistical presentation methods makes the report easy to understand. For example, the use of pie chart, bar chart, histogram, and tables provides detailed information on a relatively small space. The histogram reveals that there is an impressive distribution of salaries among the men and women and there is a plan that soon everyone will have a net salary of more than a given figure. The Personnel Manager of the company on this note, reports that the salaries are fairly distributed in this firm over the whole salary range, and therefore will soon guarantee everyone to have a salary above Â£12170.88. It is, however, a challenge for the manager to prove that the implementation would mean a fair distribution of salary among the men and women owing to the fact that the relationship is not linear hence cannot be predicted. The number of males and females to earn above the mentioned figure does not support the argument given by the manager. The report confirms that the manager has failed to recognize some extraneous factors which may considerably influence the distribution and the subsequent increment of salaries for the men and women. There are other extraneous factors that can influence the salary now of men and women. However, it may turn out that the decision of the work may not be liable for the factors that negatively affect them, in which case, the situation may see the finding coming true due to the absence of such factors.
Bibliography
Bajpai, N. (2011). Business research methods. 1st ed. Delhi: Pearson.
Balsley, H., Clover, V. and Clover, V. (1988). Research for business decisions. 1st ed. Columbus, Ohio: Publishing Horizons.
Chesneau, C. (2007). Regression with random design: A minimax study. Statistics & Probability Letters, 77(1), pp.4053.
Clover, V. and Balsley, H. (1979). Business research methods. 1st ed. Columbus, Ohio: Grid Pub.
Decker, S., Kipping, M. and Wadhwani, R. (2015). New business histories! Plurality in business history research methods. Business History, 57(1), pp.3040.
Eriksson, P. and Kovalainen, A. (2008). Qualitative methods in business research. 1st ed. Los Angeles: SAGE.
Eryilmaz, S. (2011). Joint distribution of run statistics in partially exchangeable processes. Statistics & Probability Letters, 81(1), pp.163168.
Feller, W. (1957). An introduction to probability theory and its applications. 1st ed. New York: John Wiley & Sons, Inc.
Feller, W. (1957). An introduction to probability theory and its applications. 1st ed. New York: John Wiley & Sons, Inc.
Fuller, C., Simmering, M., Atinc, G., Atinc, Y. and Babin, B. (2016). Common methods variance detection in business research. Journal of Business Research, 69(8), pp.31923198.
Harrison, R. (2013). Using mixed methods designs in the Journal of Business Research, 1990â€“2010. Journal of Business Research, 66(11), pp.21532162.
Krishnaswamy, O. and Satyaprasad, B. (2010). Business research methods. 1st ed. Mumbai [India]: Himalaya Pub. House.
Marcoulides, G. (1998). Modern methods for business research. 1st ed. Mahwah, N.J.: Lawrence Erlbaum.
Oppewal, H., Louviere, J. and Timmermans, H. (2000). Modifying Conjoint Methods to Model Managers' Reactions to Business Environmental Trends. Journal of Business Research, 50(3), pp.245257.
Saunders, M., Lewis, P. and Thornhill, A. (2007). Research methods for business students. 1st ed. Harlow, England: Financial Times/Prentice Hall.
Zikmund, W. (2003). Business research methods. 1st ed. Mason, OH: Thomson/SouthWestern.
Cite this page
Statistics tasks and report. (2018, Feb 07). Retrieved from https://speedypaper.com/essays/102statisticstasksandreport
Request Removal
If you are the original author of this essay and no longer wish to have it published on the SpeedyPaper website, please click below to request its removal:
Popular categories