Applied Statistical Methods 1000
Assignments

Rules

  • All assignments should be your individual work; otherwise, points will be deducted. [Students who wish to work together on homework must request my permission to do so in advance.]
  • Because answer keys are made available after homework is turned in, late homeworks will not be accepted. In a valid emergency, your recitation instructor may make an exception.
  • Your homework should be neat and well-organized. Show your work and circle your answers. Your recitation instructor is a student like you and will not take time to decipher poor handwriting, put pages in order, or read notes scrawled in margins.
  • Be sure to write or print your name at the top of the first page of your homework. Put your name or initials at the top of each additional sheet of paper or computer output. Staple your pages together.
  • Answer keys are placed on file in the Math-Stat Library (4th floor Thackeray) on Mondays after assignments are handed in. They are on two-hour reserve so that you can take them out to be copied.
  • Computer output must be circled/underlined and explained in order to receive full credit. Hand in printout of session window and/or graphs, not the worksheet of data values.
  • With an Exercise for which you must find an article or internet report, a copy must be handed in with your work.

Homework 0 Due in lecture January 9. Points shown total 2.

Exercise: Hand in an article or report about a statistical study; tell what variable or variables are involved and whether they are quantitative or categorical. If there are two variables, tell which is explanatory and which is response.

Homework 00 Due in lecture January 16. Points shown total 6.

Exercise: Pick a quantitative variable from those in the survey. Use MINITAB to display the variable's values with all three graphs discussed: a dotplot, a histogram, and a stemplot. Report the median for center, range for spread, and describe the shape. Be sure to mention if there are outliers.

Exercise: Consider the values of one quantitative variable in our survey compared for two categorical groups. First, state your expectations about how the quantitative values would compare for the two groups. Then use MINITAB to get side-by-side boxplots and report the Five Number Summary for each. Tell how their centers, spreads, and shapes compare. Use the 1.5*IQR Rule to report the boundaries for low and high outliers in both groups, and tell whether there are any outliers according to the Rule.

Exercise: Find an article or report about an experiment. Tell what the variables of interest are, whether they are quantitative or categorical, and which is explanatory and response. Describe the subjects, treatments, whether or not the study was blind, etc.

Homework 1 Due in lecture January 23. Points shown total 20.5.

[1 pt.] 1.7(a)(b) (page 9)
[1 pt.] 1.10 (page 9)
[1.5 pts.] 1.12 (page 9)
[1.5 pts.] 1.17 (page 10)
[.5 pt.] 1.18(a) (page 10)
[1.5 pts.] 2.5(b)(c)(d) (page 48)
[2 pts.] 2.7 (page 48)
[2.5 pts.] 2.16 (page 49)
[1 pt.] 2.26 (page 51)
[2.5 pts.] 2.41 (page 52)
[.5 pt.] 2.54(c) (page 53)
[1.5 pts.] 2.56(b)(c)(d) (page 53)
[1 pt.] 2.57(b)(c) (page 53)
[.5 pt.] 2.84(b) (page 55)
[2 pts.] Exercise   Find an article or report about an observational study. Tell what the variables of interest are, whether they are quantitative or categorical, which is explanatory and response (if there are two variables). Are there any potential confounding variables that should have been controlled for? Are there any other pitfalls of concern?

Homework 2 Due in lecture January 30. Points shown total 17.5.

[.5 pts.] 3.9 (page 82)
[.5 pts.] 3.10(refer to 3.9(b) (page 82)
[.5 pts.] 3.11(refer to 3.9(b) (page 82)
[.5 pt.] 3.27(b) (page 83)
[.5 pt.] 3.39 (page 84)
[.5 pt.] 3.40 (page 84)
[1 pt.] 3.54 (page 85)
[.5 pt.] 3.59 (page 86)
[1 pt.] 3.62 (page 86)
[1 pt.] 4.4 (page 121)
[1.5 pts.] 4.7 (page 121)
[.5 pt.] 4.10 (page 122)
[.5 pt.] 4.20 (page 123)
[1 pt.] 4.31 (page 124)
[1 pt.] 4.38 (page 124)
[1 pt.] 4.57(a)(d) (page 125)
[1.5 pts.] 4.81 (page 128)
[2 pts.] Exercise   Find an article or internet report about a sample survey. Tell if the variable(s) of interest is quantitative or categorical. Then tell how the individuals were selected and whether or not you believe they adequately represent the population of interest. Discuss whether any of the 5 common problems in the selection process (using the wrong sampling frame, etc.) apply, or if any of the 7 pitfalls in the surveying process (deliberate bias, etc.) apply. Were the questions open or closed?
[2 pts.] Exercise   Pick two quantitative variables from our survey, decide on roles of explanatory and response, and tell what you expect to see in terms of their relationship. Use MINITAB to explore the relationship between them: start by assessing the scatterplot. Be sure to mention direction, form, strength, and outliers. Your summary should tell the value of the correlation r and the equation of the regression line, if the form appeared linear. Summarize your findings in the contextof the specific variables chosen.

Homework 3 Due in lecture February 6. Points shown total 22.

[1 pt.] 5.1(c)(d) (page 161)
[1.5 pts.] 5.3(b)(c)(d) (page 161)
[1 pt.] 5.4 (page 162)
[1 pt.] 5.10 (page 163)
[.5 pt.] 5.23 (page 164)
[.5 pt.] 5.32 (page 165)
[6 pts.] 5.55 (page 167) Use MINITAB; mark and hand in relevant output along with specific answers to textbook questions.
[1 pt.] 6.3(d)(e) (page 193)
[2.5 pts.] 6.7 (page 194)
[.5 pt.] 6.15 (page 194)
[1.5 pts.] 6.25 (page 195)
[1 pt.] 6.50 (page 199)
[2 pts.] Exercise   Read Boys spur marriage and complete a two-way table for this study consistent with all the numbers reported. Assume that the 600 children are equally divided between boys and girls, and assume that half of the fathers of girls ended up marrying the mother. If the proportion marrying the mother is 42% higher in the case of boys, how many would that be? Use a chi-square procedure to tell whether the difference observed is statistically significant.
[2 pts.] Exercise   Pick two categorical variables from our survey, decide which should be explanatory and response, and discuss if and how you expect them to be related. Then analyze the relationship between them: compare conditional percentages in the response category of interest and tell whether the observed difference seems to you to be significant. Then compute a table of counts expected if the variables were not related, and compute the chi-square statistic. Use Table A.5 to tell whether there is a statistically significant relationship.

Extra Credit 1 Due in lecture February 6. Worth 5 pts.

Students in a class were classified according to whether their major was undecided or not, and whether they lived on or off campus. 40 students lived off campus and had a decided major; 10 students lived off campus and had an undecided major. 24 students lived on campus and had a decided major; 26 students lived on campus and had an undecided major.
  1. First analyze the relationship:
    1. Complete a two-way table for the data.
    2. Which group has a higher proportion living on campus---the decided or the undecided majors?
    3. Compute a table of counts expected if there were no relationship between living situation and major decided or not.
    4. Calculate the chi-squared statistic.
    5. Which one of the following is the best way to summarize the situation? (i) There is no statistically significant relationship between living situation and major being decided or not. (ii) Year at Pitt is a confounding variable in the relationship between living situation and major decided or not. (iii) Living on campus prevents students from deciding on a major. (iv) Deciding on a major causes students to move off campus.
  2. Now create two separate two-way tables for "underclassmen" and "upperclassmen", whose counts together total to those in the original table, but neither of which show a significant relationship between living situation and major being decided or not. In other words, create a scenario which demonstrates Simpson's Paradox.

Homework 4 Due in lecture February 13. Points shown total 13.5.

[1.5 pts.] 7.17 (page 241)
[1 pt.] 7.18 (page 241)
[1 pt.] 7.25(c)(d) (page 242)
[2 pts.] 7.34 (page 242)
[1.5 pts.] 7.78 (page 246)
[1 pt.] 7.83 (page 246)
[.5 pt.] 7.85 (page 246)
[.5 pt.] 7.93 (page 247)
[.5 pt.] 7.94 (page 247)
[2 pts.] Exercise   Write up and email me (directly, not as an attachment) a personal coincidence story that happened to you. Were the occurrences really so unlikely?
[2 pts.] Exercise   Use the class survey to report the probability distribution of year for the surveyed undergraduates (years 1, 2, 3, and 4). [You will need to tally the years and adjust the total to exclude "other" students.] Find the mean, variance, and standard deviation. Use mean and standard deviation in a sentence about the distribution of year in order to tell what is typical for surveyed students.

Homework 5 Due in lecture February 20. Points shown total 17.5.

[1 pt.] 8.6 (page 285)
[1 pt.] 8.7(b)(c) (page 285)
[1.5 pts.] 8.18 (page 286)
[.5 pt.] 8.27 (page 287)
[1 pt.] 8.34(a)(b) (page 287)
[1.5 pts.] 8.43(a)(c)(d) (page 288)
[1 pt.] 8.44(f)(g) (page 288)
[1.5 pts.] 8.45 (page 288)
[1.5 pts.] 8.49(b)(c)(d) (page 288)
[1.5 pts.] 8.50 (page 288)
[1 pt.] 8.51(b)(c) (page 288)
[1 pt.] 8.53(c)(d) (page 289)
[1 pt.] 8.54(d)(e) (page 289)
[.5 pt.] 8.56 (page 289)
[1 pt.] 8.60(a)(b) (page 289)
[1 pt.] 8.62 (page 289)

Homework 6 Due in lecture February 27. Points shown total 17.5.

[.5 pts.] 9.6(a) (page 319)
[1 pt.] 9.12(c)(d) (page 319)
[.5 pt.] 9.13 (page 319)
[1 pt.] 9.17(b)(c) (page 320)
[1 pt.] 9.28(b)(d) (page 321)
[.5 pt.] 9.30 (page 321)
[1 pt.] 9.45(c)(d) (page 322)
[1 pt.] 9.47(c)(d) (page 323)
[1.5 pts.] 9.55(b)(c)(d) (page 323)
[.5 pt.] 9.56 (page 323)
[1.5 pts.] 9.69 (page 324)
[1.5 pts.] 9.70 (page 324)
[2 pts.] Exercise   Assume the proportion of females in all intro Stat classes is p=.5. What are the mean and standard deviation of sample proportion, if population proportion were indeed .5? Use our class survey responses to find the sample proportion of females in the survey. Then use a normal approximation to find the probability of a sample proportion as high as the one observed, if the population proportion were truly .5. Characterize the results, based on your probability, in words such as ``not unusual'', ``unlikely'', ``almost impossible'', etc. Finally, tell whether you believe p is .5.
[2 pts.] Exercise   If students each picked a number truly at random from 1 to 20, then their responses would follow a ``uniform distribution'', with each of the numbers appearing with probability 1/20=.05. It can be shown that the mean of all the numbers between 1 and 20 is 10.5, and the standard deviation is 5.77. What are the mean and standard deviation of sample mean selection for a sample of 400? students, if their selections are truly random? Use our class survey responses to find the sample mean ``random'' number selected. Then use a normal approximation to find the probability of a sample mean as high as the one observed, if the population mean were truly 10.5. Characterize the results, based on your probability, in words such as ``not unusual'', ``unlikely'', ``almost impossible'', etc. Finally, tell whether you have statistical evidence of bias in favor of higher numbers.
[2 pts.] Exercise   Find an article or report that includes mention of sample size and summarizes values of a categorical variable with a count, proportion, or percentage. Based on that information, set up a 95% confidence interval for population proportion in the category of interest.

Extra Credit 2 Due in lecture March 15. Worth 5 pts.

Extra Credit Exercises 2 through 10 are based on Fall 2003 Survey data survey9-21-03.txt, which is taken to be our population. To download it into MINITAB, type ctrl A to highlight, ctrl C to copy, start up MINITAB, type ctrl V to paste it. If it asks about delimiters, click OK. Do not hand in all the details of the Session Window; please just include the relevant descriptive summaries, graphs, and your answers to all questions posed. The purpose of this exercise is to explore how sample size affects the distribution of sample proportion.

  1. First verify that the population of categorical values for the variable "live" is very symmetric, with equal proportions of the two possible values "off" and "on":
    • Stat>Tables>Tally
    • Variables Live
    • Display Counts and Percents
  2. Next take repeated small samples (size 10) from the population of values for the variable "live", for which the population proportion p living off campus you have verified to be approximately 50% or .5. [About half of our population of students live off campus, the other half on campus.] Our theory about the behavior of sampling distributions is for an infinite number of repetitions, but for practical purposes you will take 20 random samples altogether.
    • Calc>Random Data>Sample from Columns
    • Sample 10 rows from Column Live
    • Store samples in Livesmallsample1
    • Stat>Tables>Tally
    • Variables Livesmallsample1
    • Display Counts and Percents
    • Create a column called "phatliven=10" and type in the sample proportion living off campus (for example, .6 if the sample proportion is 60%)
    • Calc>Random Data>Sample from Columns
    • Sample 10 rows from Column Live
    • Store samples in Livesmallsample2 [The easiest way to do this is to simply change the "1" in the variable name to a "2".]
    • Stat>Tables>Tally
    • Variables Livesmallsample2 [Again, just change the "1" to a "2".]
    • Display Counts and Percents
    • Type the second sample proportion as the second entry in the column "phatliven=10"
    • Repeat the process above 20 times altogether, finishing with "Livesmallsample20", for which the proportion living off campus will be the 20th entry in "phatliven=10"
  3. Finally, obtain summaries and display:
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Enter the variable phatliven=10
    • Graph>Stem-and-Leaf
    • Enter the variable phatliven=10
    • Summarize the distribution of sample proportion for samples of size 10 by reporting center, spread, and shape.
  4. Now take repeated large samples (size 40) from the population of values for the variable "live" (20 samples altogether):
    • Calc>Random Data>Sample from Columns
    • Sample 40 rows from Column Live
    • Store samples in Livelargesample1
    • Stat>Tables>Tally
    • Variables Livelargesample1
    • Display Counts and Percents
    • Create a column called "phatliven=40" and type in the sample proportion living off campus (for example, .525 if the sample proportion is 52.5%)
    • Calc>Random Data>Sample from Columns
    • Sample 40 rows from Column Live
    • Store samples in Livelargesample2
    • Stat>Tables>Tally
    • Variables Livelargesample2
    • Display Counts and Percents
    • Type the second sample proportion as the second entry in the column "phatliven=40"
    • Repeat the process above 20 times altogether, finishing with "Livelargesample20", for which the proportion living off campus will be the 20th entry in "phatliven=40"
  5. Finally, obtain summaries and display:
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Enter the variable phatliven=40
    • Graph>Stem-and-Leaf
    • Enter the variable phatliven=40
    • Summarize the distribution of sample proportion for samples of size 40 by reporting center, spread, and shape.
  6. Lastly, and most importantly, compare the centers, spreads, and shapes for samples of size 10 vs. 40.
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Enter the variables phatliven=10 and phatliven=40
    • Stat>Basic Statistics>2-sample t
    • Activate "Samples in Different Columns"
    • Enter the variables phatliven=10 and phatliven=40
    • Check Graph>Boxplots of Data
    Write a few sentences to compare the distribution of sample proportion for small vs. large samples, including mention of center, spread, and shape. Are your results consistent with the theory presented in Chapter 9?

Extra Credit 3 Prerequisite: Extra Credit 2. Due in lecture March 15. Worth 5 pts.

The purpose of this exercise is to explore how population shape affects the distribution of sample proportion.

  1. First verify that the population of categorical values for the variable "handed", for which we are interested in the proportion who are ambidextrous, is very skewed, with only about 3% (.03) who are ambidextrous; the remaining 97% favor either the right or the left hand.
    • Stat>Tables>Tally
    • Variables Handed
    • Display Counts and Percents
  2. Next take repeated small samples (size 10) from the population of values for the variable "handed", for which the population proportion p who are ambidextrous you have verified to be approximately 3% or .03. Our theory about the behavior of sampling distributions is for an infinite number of repetitions, but for practical purposes you will take 20 random samples altogether.
    • Calc>Random Data>Sample from Columns
    • Sample 10 rows from Column Handed
    • Store samples in Handedsmallsample1
    • Stat>Tables>Tally
    • Variables Handedsmallsample1
    • Display Counts and Percents
    • Create a column called "phathandedn=10" and type in the sample proportion living off campus (for example, .1 if the sample proportion is 10%, or 0 if the sample only contains right- and left-handed people)
    • Calc>Random Data>Sample from Columns
    • Sample 10 rows from Column Handed
    • Store samples in Handedsmallsample2 [The easiest way to do this is to simply change the "1" in the variable name to a "2".]
    • Stat>Tables>Tally
    • Variables Handedsmallsample2 [Again, just change the "1" to a "2".]
    • Display Counts and Percents
    • Type the second sample proportion as the second entry in the column "phathandedn=10"
    • Repeat the process above 20 times altogether, finishing with "Handedsmallsample20", for which the proportion who are ambidextrous will be the 20th entry in "phathandedn=10"
  3. Finally, obtain summaries and display:
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Enter the variable phathandedn=10
    • Graph>Stem-and-Leaf
    • Enter the variable phathandedn=10
    • Summarize the distribution of sample proportion for samples of size 10 by reporting center, spread, and shape.
  4. Now take repeated large samples (size 40) from the population of values for the variable "handed" (20 samples altogether):
    • Calc>Random Data>Sample from Columns
    • Sample 40 rows from Column Handed
    • Store samples in Handedlargesample1
    • Stat>Tables>Tally
    • Variables Handedlargesample1
    • Display Counts and Percents
    • Create a column called "phathandedn=40" and type in the sample proportion who are ambidextrous (for example, .075 if the sample proportion is 7.5%)
    • Calc>Random Data>Sample from Columns
    • Sample 40 rows from Column Handed
    • Store samples in Handedlargesample2
    • Stat>Tables>Tally
    • Variables Handedlargesample2
    • Display Counts and Percents
    • Type the second sample proportion as the second entry in the column "phathandedn=40"
    • Repeat the process above 20 times altogether, finishing with "Handedlargesample20", for which the proportion who are ambidextrous will be the 20th entry in "phathandedn=40"
  5. Finally, obtain summaries and display:
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Enter the variable phathandedn=40
    • Graph>Stem-and-Leaf
    • Enter the variable phathandedn=40
    • Summarize the distribution of sample proportion for samples of size 40 by reporting center, spread, and shape.
  6. Next compare the centers, spreads, and shapes for samples of size 10 vs. 40.
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Enter the variables phathandedn=10 and phathandedn=40
    • Stat>Basic Statistics>2-sample t
    • Activate "Samples in Different Columns"
    • Enter the variables phathandedn=10 and phathandedn=40
    • Check Graph>Boxplots of Data
    Are your results consistent with the theory presented in Chapter 9?
  7. Lastly, and most importantly, compare the shapes of the distributions of sample proportion for samples of size 10 coming from "Live" vs. from "Handed" and for samples of size 40 coming from "Live" vs. from "Handed". For which population do the distributions of sample proportion for a given sample size tend to be more normal, for the variable "Live" or for the variable "Handed"?

Extra Credit 4 Due in lecture March 15. Worth 5 pts.

Extra Credit Exercises 2 through 10 are based on Fall 2003 Survey data survey9-21-03.txt, which is taken to be our population. To download it into MINITAB, type ctrl A to highlight, ctrl C to copy, start up MINITAB, type ctrl V to paste it. If it asks about delimiters, click OK. Do not hand in all the details of the Session Window; please just include the relevant descriptive summaries, graphs, and your answers to all questions posed. The purpose of this exercise is to explore how sample size affects the distribution of sample mean.

  1. First verify that our population of quantitative values for the variable "math" has mean mu=610.44 and standard deviation sigma=72.14, and that the shape is quite normal:
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Variables Math
    • Graph>Histogram
    • Variables Math
  2. Now take repeated small samples (size 10) from the population of quantitative values for the variable "math". Our theory about the behavior of sampling distributions is for an infinite number of repetitions, but for practical purposes you will take 20 random samples altogether.
    • Calc>Random Data>Sample from Columns
    • Sample 10 rows from Column Math
    • Store samples in Mathsmallsample1
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Variables Mathsmallsample1
    • Create a column called "xbarmathn=10" and type in the sample mean Math SAT score
    • Calc>Random Data>Sample from Columns
    • Sample 10 rows from Column Math
    • Store samples in Mathsmallsample2
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Variables Mathsmallsample2
    • Type the second sample mean as the second entry in the column "xbarmathn=10"
    • Repeat the process above 20 times altogether, finishing with "Mathsmallsample20", for which the sample mean Math SAT score will be the 20th entry in "xbarmathn=10"
  3. Finally, obtain summaries and display:
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Enter the variable xbarmathn=10
    • Graph>Stem-and-Leaf
    • Enter the variable xbarmathn=10
    • Summarize the distribution of sample mean for samples of size 10 by reporting center, spread, and shape.
  4. Now take repeated large samples (size 40) from the population of values for the variable "math" (20 samples altogether):
    • Calc>Random Data>Sample from Columns
    • Sample 40 rows from Column Math
    • Store samples in Mathlargesample1
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Variables Mathlargesample1
    • Create a column called "xbarmathn=40" and type in the sample mean Math SAT score
    • Calc>Random Data>Sample from Columns
    • Sample 40 rows from Column Math
    • Store samples in Mathlargesample2
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Variables Mathlargesample2
    • Type the second sample mean as the second entry in the column "xbarmathn=40"
    • Repeat the process above 20 times altogether, finishing with "Mathlargesample20", for which the sample mean Math SAT score will be the 20th entry in "xbarmathn=40"
  5. Finally, obtain summaries and display:
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Enter the variable xbarmathn=40
    • Graph>Stem-and-Leaf
    • Enter the variable xbarmathn=40
    • Summarize the distribution of sample mean for samples of size 40 by reporting center, spread, and shape.
  6. Lastly, and most importantly, compare the centers, spreads, and shapes for samples of size 10 vs. 40.
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Enter the variables xbarmathn=10 and xbarmathn=40
    • Stat>Basic Statistics>2-sample t
    • Activate "Samples in Different Columns"
    • Enter the variables xbarmathn=10 and xbarmathn=40
    • Check Graph>Boxplots of Data
    Are your results consistent with the theory presented in Chapter 9? Write a paragraph to explain your answer.

Extra Credit 5 Prerequisite: Extra Credit 4. Due in lecture March 15. Worth 5 pts.

The purpose of this exercise is to explore how population shape affects the distribution of sample mean.

  1. First verify that our population of quantitative values for the variable "Earned" has mean mu=3.776 thousand and standard deviation sigma=6.503, and that the shape is quite skewed to the right:
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Variables Earned
    • Graph>Histogram
    • Variables Earned
  2. Now take repeated small samples (size 10) from the population of quantitative values for the variable "earned". Our theory about the behavior of sampling distributions is for an infinite number of repetitions, but for practical purposes you will take 20 random samples altogether.
    • Calc>Random Data>Sample from Columns
    • Sample 10 rows from Column Earned
    • Store samples in Earnedsmallsample1
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Variables Earnedsmallsample1
    • Create a column called "xbarearnedn=10" and type in the sample mean Earned SAT score
    • Calc>Random Data>Sample from Columns
    • Sample 10 rows from Column Earned
    • Store samples in Earnedsmallsample2
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Variables Earnedsmallsample2
    • Type the second sample mean as the second entry in the column "xbarearnedn=10"
    • Repeat the process above 20 times altogether, finishing with "Earnedsmallsample20", for which the sample mean Earned SAT score will be the 20th entry in "xbarearnedn=10"
  3. Finally, obtain summaries and display:
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Enter the variable xbarearnedn=10
    • Graph>Stem-and-Leaf
    • Enter the variable xbarearnedn=10
    • Summarize the distribution of sample mean for samples of size 10 by reporting center, spread, and shape.
  4. Now take repeated large samples (size 40) from the population of values for the variable "earned" (20 samples altogether):
    • Calc>Random Data>Sample from Columns
    • Sample 40 rows from Column Earned
    • Store samples in Earnedlargesample1
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Variables Earnedlargesample1
    • Create a column called "xbarearnedn=40" and type in the sample mean Earned SAT score
    • Calc>Random Data>Sample from Columns
    • Sample 40 rows from Column Earned
    • Store samples in Earnedlargesample2
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Variables Earnedlargesample2
    • Type the second sample mean as the second entry in the column "xbarearnedn=40"
    • Repeat the process above 20 times altogether, finishing with "Earnedlargesample20", for which the sample mean Earned SAT score will be the 20th entry in "xbarearnedn=40"
  5. Finally, obtain summaries and display:
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Enter the variable xbarearnedn=40
    • Graph>Stem-and-Leaf
    • Enter the variable xbarearnedn=40
    • Summarize the distribution of sample mean for samples of size 40 by reporting center, spread, and shape.
  6. Next compare the centers, spreads, and shapes for samples of size 10 vs. 40.
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Enter the variables xbarearnedn=10 and xbarearnedn=40
    • Stat>Basic Statistics>2-sample t
    • Activate "Samples in Different Columns"
    • Enter the variables xbarearnedn=10 and xbarearnedn=40
    • Check Graph>Boxplots of Data
    Are your results consistent with the theory presented in Chapter 9? Write a paragraph to explain your answer.
  7. Lastly, and most importantly, compare shapes of the distributions of sample mean fro samples of size 10 coming from "Math" vs. "Earned" and for samples of size 40 coming from "Math" vs. from "Earned". For which population do the distributions of sample mean for a given sample size tend to be more normal, for the variable "Math" or for the variable "Earned"?

Homework 7 Due in lecture March 5. Points shown total 10.

[1 pt.] 10.1(a)(b) (page 349)
[.5 pts.] 10.11(e) (page 350)
[1.5 pts.] 10.13 (page 350)
[.5 pt.] 10.14 (page 350)
[.5 pt.] 10.19(a) (page 350)
[.5 pt.] 10.24 (page 351)
[1 pt.] 10.27(a)(c) (page 351)
[1 pt.] 10.28 (page 351)
[.5 pt.] 10.31 (page 351)
[.5 pt.] 10.33 (page 351)
[.5 pt.] 10.42 (page 352)
[2 pts.] Exercise   Here is an excerpt from a Pittsburgh Post-Gazette article entitled Criminal pasts cited for many city school bus drivers: "State auditors checking the records of a random sample of 100 city bus drivers have found that more than a quarter of them had criminal histories. The audit also found that 26 of the drivers were never checked for child abuse histories---in Pennsylvania schools, a mandate for all employees and even some volunteers. In all, the auditors discovered 80 convictions for various offenses among the 100 sampled. Thirty-four of those incidents occurred more than ten years ago, including one rape and four drug offenses. In Pennsylvania, it's perfectly legal for school officials to hire a bus driver with certain convictions that are more than five years old---but that doesn't mean they should, state Auditor General Robert P. Casey Jr. said yesterday in releasing the report. ``No one convicted of rape should be driving a school bus full of children,'' said Casey, who also said he was disappointed with the school district's initial response to the audit. ``The General Assembly needs to look at this law,'' he said. A series of problems last year with school bus drivers---including a February accident that was nearly fatal to an 8-year-old Elliott girl---prompted Casey to take a closer look at Pittsburgh's staff of 750 drivers, he said. When his office presented their results to school officials about eight months ago, Casey said, 'they were very reluctant to do anything about it,' and sent him only a brief response outlining what steps were being taken to remedy the problems..."
Note that the article states that about 25% in a sample of Pittsburgh school bus drivers had criminal records. Report a 98% confidence interval for the proportion of all Pittsburgh school bus drivers with criminal records. One of the conditions for our approximation is not quite met; what is it?

Homework 8 Due in lecture March 19. Points shown total 14.5

[1 pt.] 11.1(c)(d) (page 382)
[.5 pt.] 11.6 (page 383)
[1 pt.] 11.10(a)(b) (page 383)
[1.5 pts.] 11.19(a)(b)(c) (page 384)
[3 pts.] 11.27 (page 384)
[.5 pt.] 11.29 (page 385)
[.5 pt.] 11.36 (page 385)
[.5 pt.] 11.45 (page 386)
[2 pts.] 11.55 (page 387)
[2 pts.] Exercise   In a previous Exercise, we explored the sampling distribution of sample proportion of females, when random samples are taken from a population where the proportion of females is .5. We noted the sample proportion of females among surveyed Stats students, and calculated by hand the probability of observing such a high sample proportion, if population proportion were really only .5. We used this probability to decide whether we were willing to believe that population proportion is in fact .5. For this Exercise, address the same question by carrying out a formal hypothesis test using MINITAB. Be sure to specify the appropriate alternative hypothesis. State your conclusions clearly in context.
[2 pts.] Exercise   Refer to the article How not to catch a spy: Use a lie detector, which reports at the bottom of the first column, ``Even if the test were designed to catch eight of every 10 spies, it would produce false results for large numbers of people. For every 10,000 employees screened, Fienberg said, eight real spies would be singled out, but 1,598 innocent people would be singled out with them, with no hint of who's a spy and who isn't.'' Based on this information, set up a two-way table, classifying 10,000 employees as actually being spies or not, and being singled out as a spy by the lie detector or not. Report the probability of a Type I Error and of a Type II Error. If someone is identified by the lie detector as being a spy, what is the probability that he or she is actually a spy?

Extra Credit 6 Due in lecture March 29. Worth 5 pts.

Extra Credit Exercises 2 through 10 are based on Fall 2003 Survey data survey9-21-03.txt, which is taken to be our population. To download it into MINITAB, type ctrl A to highlight, ctrl C to copy, start up MINITAB, type ctrl V to paste it. If it asks about delimiters, click OK. The purpose of this exercise is to understand the long-run behavior of confidence intervals.

  1. First verify that the population proportion p of all Fall 2003 survey respondents living off campus is almost exactly .5:
    • Stat>Tables>Tally
    • Variables Live, Check "Counts and Percents".
  2. Next, take repeated samples (20 altogether) of size 40 from the population of categorical values for the variable "live", obtaining a 90% confidence interval each time for the "unknown" population proportion, based on each sample proportion.
    • Calc>Random Data>Sample from Columns
    • Sample 40 rows from Column Live
    • Store samples in Livesample1
    • Stat>Basic Statistics>1 proportion
    • Samples in columns Livesample1
    • Options>Confidence Level>90 and Check "Use test and interval based on normal distribution"
    • The first confidence interval is shown in the session window; you will need to examine all 20 intervals together once they've been produced.
    • Calc>Random Data>Sample from Columns
    • Sample 40 rows from Column Live
    • Store samples in Livesample2
    • Stat>Basic Statistics>1 proportion
    • Samples in columns Livesample2
    • [No need to specify 90% confidence and normal distribution, since these continue to be enabled by default.]
    • Repeat the process above 20 times altogether, finishing with "Livesample20".
  3. Now examine all 20 confidence intervals. How many of them contain the actual population proportion .5? In the long run, what percentage of the 90% confidence intervals should contain p? [Note: keep all the results in your session window handy if you intend to do Extra Credit 7, which will focus on p-values rather than on confidence intervals.]

Extra Credit 7. Prerequisite Extra Credit 6. Due in lecture March 29. Worth 5 pts.

The purpose of this exercise is to understand the long-run behavior of hypothesis tests.

  1. First recall that the population proportion p of all Fall 2003 survey respondents living off campus is almost exactly .5.
  2. Next examine all 20 p-values obtained in Extra Credit 6. [These were produced for the two-sided test about the null hypothesis that the population proportion p living off campus is .5] How many of them reject the null hypothesis at the 10% level? In the long run, what percentage of the tests should reject against the two-sided alternative at the 10% level, when the null hypothesis is in fact true?

Homework 9 Due in lecture March 26. Points shown total 22.5.

[1 pt.] 12.2(b)(c) (page 425)
[1 pt.] 12.10(a)(b) (page 426)
[1 pt.] 12.13(b)(c) (page 426)
[.5 pts.] 12.26(e) (page 427)
[1.5 pts.] 12.28(a)(b)(c) (page 427)
[1 pt.] 12.32 (page 428)
[.5 pt.] 12.36(c) (page 428)
[3.5 pts.] 12.42 (page 429) Use MINITAB; mark and hand in relevant output along with specific answers to textbook questions.
[.5 pt.] 12.69(c) (page 432)
[2 pts.] Exercise   In a previous Exercise, we explored the sampling distribution of sample mean number selected, when random samples are taken from a population where all numbers between 1 and 20 are equally likely, so population mean is 10.5. We noted the sample mean selection by surveyed Stats students, and calculated by hand the probability of observing such a high sample mean, if population mean were really only 10.5. We used this probability to decide whether we were willing to believe that population mean was in fact 10.5, or if students were rather biased towards higher numbers. For this Exercise, address the same question by using MINITAB to set up a confidence interval for unknown population mean selection, given that population standard deviation is 5.77. Does your interval contain 10.5? What do you conclude?
[2 pts.] Exercise   For this Exercise, address the same question again by using MINITAB to set up a confidence interval for unknown population mean selection, but this time assume population standard deviation is unknown. Does your interval contain 10.5? What do you conclude?
[2 pts.] Exercise   Find paired data in our survey, such as math and verbal SATs, ages of mothers and fathers, heights of females and their mothers, or heights of males and their fathers. Use MINITAB to test Ho: mu(d)=0 against an appropriate Ha. State your conclusion in terms of the variable chosen.
[2 pts.] Exercise   Compare values of a quantitative survey variable for two categorical groups, such as males and females or on and off campus students, by testing Ho: mu1-mu2=0 against an appropriate Ha. State your conclusion in terms of the variable chosen.
[2 pts.] Exercise   Read the article The most important meal, which reports that in a study of American eight-graders in 96 public schools in San Diego, New Orleans, Minneapolis, and Austin, overweight students were more likely to skip breakfast than students who were not overweight. Unstack the data in our class survey according to gender, then for each gender group test the null hypothesis of equal weights for students who did and did not eat breakfast, according to their survey responses. Make sure to formulate the correct alternative hypothesis.
[2 pts.] Exercise   Read Science lifts 'mummy's curse' and use the means for Age at death, exposed vs. unexposed, along with the sample sizes n and standard deviations (in parentheses) to test for a significant difference in age at death between those who were and were not exposed to the ``mummy's curse''. State your conclusions clearly.

Homework 10 Due in lecture April 9. Points shown total 23.

[1 pt.] 13.2(c)(d) (page 482)
[2 pts.] 13.5 (page 482) Instead of n=28 or n=81, use n=65 for (a)(b)(c)(d)
[3.5 pts.] 13.7 (page 482)
[2.5 pts.] 13.11 (page 482)
[1.5 pts.] 13.14 (page 483)
[1 pt.] 13.16(d) and add (e) (page 483) (e) If you were using Table A.2, keeping in mind Ha, the value of t, and the df, what range would you report for the p-value?
[.5 pts.] 13.33(b) (page 485)
[1.5 pts.] 13.70 (page 489) MINITAB is optional
[1.5 pts.] 14.2 (page 518)
[1 pt.] 14.11(b)(c) (page 519)
[1.5 pts.] 14.13(a)(b)(c) (page 519)
[1 pt.] 14.17(c)(d) (page 520)
[.5 pt.] 14.33 (page 522)
[2 pts.] Exercise   Find two quantitative variables from our survey, summarize their relationship as in Chapter 5 (see Exercise end of HW2), and then test Ho: beta1=0. State your conclusions in terms of the variables of interest.
[2 pts.] Exercise   Compare values of a quantitative survey variable for more than two categorical groups by carrying out an ANOVA test in MINITAB. State your conclusions in terms of the particular variables chosen.

Extra Credit 8 Due in lecture April 12. Worth 5 pts.

Extra Credit Exercises 2 through 10 are based on Fall 2003 Survey data survey9-21-03.txt, which is taken to be our population. To download it into MINITAB, type ctrl A to highlight, ctrl C to copy, start up MINITAB, type ctrl V to paste it. If it asks about delimiters, click OK. The purpose of this exercise is to understand the long-run behavior of confidence intervals.

  1. First verify that the population mean mu of all Fall 2003 survey respondents' Math SAT scores is 610.44, and the standard deviation sigma is 72.14.
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Variables Math
  2. Next, take repeated samples (20 altogether) of size 40 from the population of quantitative values for the variable "math", obtaining a 90% confidence interval each time for the "unknown" population mean, based on each sample mean.
    • Calc>Random Data>Sample from Columns
    • Sample 40 rows from Column Math
    • Store samples in Mathsample1
    • Stat>Basic Statistics>1-sample z
    • Variables Mathsample1; Test Mean 610.44 [This will be needed for Extra Credit 9] and enter Sigma as 72.14
    • Options>Confidence Level>90
    • The first confidence interval is shown in the session window; you will need to examine all 20 intervals together once they've been produced.
    • Calc>Random Data>Sample from Columns
    • Sample 40 rows from Column Math
    • Store samples in Mathsample2
    • Stat>Basic Statistics>1-sample z
    • Variables Mathsample2; Continue to Test Mean 610.44, given sigma 72.14 and opt for Confidence Level 90.
    • Repeat the process above 20 times altogether, finishing with "Mathsample20".
  3. Now examine all 20 confidence intervals. How many of them contain the actual population mean 610.44? In the long run, what percentage of the 90% confidence intervals should contain mu? [Note: keep all the results in your session window handy if you intend to do Extra Credit 9, which will focus on p-values rather than on confidence intervals.]

Extra Credit 9. Prerequisite Extra Credit 8. Due in lecture April 12. Worth 5 pts.

The purpose of this exercise is to understand the long-run behavior of hypothesis tests.

  1. First recall that the population mean Math SAT score mu of all Fall 2003 survey respondents was 610.44.
  2. Next examine all 20 p-values obtained in Extra Credit 8. [These were produced for the two-sided z test about the null hypothesis that the population mean mu was 610.44.] How many of them reject the null hypothesis at the 10% level? In the long run, what percentage of the tests should reject against the two-sided alternative at the 10% level, when the null hypothesis is in fact true?

Extra Credit 10. Due in lecture April 12. Worth 5 pts.

The purpose of this exercise is to learn about chi-square goodness of fit tests.

  1. Read Section 15.3 of your textbook, pages 544 to 547.
  2. Find the counts of surveyed fall stats students in years 1, 2, 3, 4 by accessing the survey results: survey9-21-03.txt. Carry out a chi-square goodness of fit test by hand to determine if the population of stats students may be evenly divided among years 1, 2, 3, 4, or if the proportions in the various years differ significantly. ["Other" students should be excluded from your calculations.]

Homework 11 Due in lecture Monday, April 12. Points shown total 19.

[1 pt.] 15.1(c)(d) (page 550)
[1 pt.] 15.2(c)(d) (page 550) Use Table A.5.
[1 pt.] 15.14(a)(b) (page 552)
[2 pts.] 15.41(a)(b)(d) (page 556)
[1 pt.] 16.1(b)(c) (page 584)
[1.5 pts.] 16.3 (page 584)
[1.5 pts.] 16.6(b)(c)(d) (page 585) Your conclusion should state whether or not population means could be equal.
[2 pts.] 16.7 (page 585)
[5 pts.] 16.9(a)(b) and add on (c)(d) (page 585) (c) State Ho and Ha. (d) Use MINITAB to carry out a test, then state your conclusions.
[1 pt.] 16.11 (page 586)
[2 pts.] Exercise   Pick two categorical variables from our survey. Decide which should be explanatory (row variable) and which response. Use MINITAB to compare conditional percentages in each row [explanatory variable must be entered before response] and carry out a chi-square test for a relationship. Use Table A.5 to give a range for the P-value.


[ Home | Calendar | Assignments | Handouts ]