Applied Statistical Methods 1000
Extra Credit Exercises

Rules

  • All extra credit should be your individual work; otherwise, points will be deducted. [Students who wish to work together on these problems must request my permission to do so in advance.]
  • Hand them in to me in lecture, in a separate pile from regular assignments.

Extra Credit 1 Due in lecture February 2. Worth 5 pts.

Students in a class were classified according to whether their major was undecided or not, and whether they lived on or off campus. 40 students lived off campus and had a decided major; 10 students lived off campus and had an undecided major. 24 students lived on campus and had a decided major; 26 students lived on campus and had an undecided major.
  1. First analyze the relationship:
    1. Complete a two-way table for the data.
    2. Which group has a higher proportion living on campus---the decided or the undecided majors?
    3. Compute a table of counts expected if there were no relationship between living situation and major decided or not.
    4. Calculate the chi-squared statistic.
    5. Which one of the following is the best way to summarize the situation? (i) There is no statistically significant relationship between living situation and major being decided or not. (ii) Year at Pitt is a confounding variable in the relationship between living situation and major decided or not. (iii) Living on campus prevents students from deciding on a major. (iv) Deciding on a major causes students to move off campus.
  2. [This is the challenging part!] Now create two separate two-way tables for "underclassmen" and "upperclassmen", whose counts together total to those in the original table, but neither of which shows a significant relationship between living situation and major being decided or not. In other words, create a scenario which demonstrates Simpson's Paradox.

Extra Credit 2 Due in lecture March 12. Worth 5 pts.

Extra Credit Exercises 2 through 10 are based on student survey data survey9-21-03.txt, which is taken to be our population. To download it into MINITAB, type ctrl A to highlight, ctrl C to copy, start up MINITAB, type ctrl V to paste it. If it asks about delimiters, click OK. The purpose of this exercise is to explore how sample size affects the distribution of sample proportion.

  1. First verify that the population of categorical values for the variable "live" has equal proportions of the two possible values "off" and "on":
    • Stat>Tables>Tally
    • Variables Live
    • Display Counts and Percents
  2. Next take repeated small samples (size 10) from the population of values for the variable "live", for which the population proportion p living off campus you have verified to be approximately 50% or .5. [About half of our population of students live off campus, the other half on campus.] Our theory about the behavior of sampling distributions is for an infinite number of repetitions, but for practical purposes you will take 20 random samples altogether.
    • Calc>Random Data>Sample from Columns
    • Sample 10 rows from Column Live
    • Store samples in Livesmallsample1
    • Stat>Tables>Tally
    • Variables Livesmallsample1
    • Display Counts and Percents
    • Create a column called "phatliven=10" and type in the sample proportion living off campus (for example, .6 if the sample proportion is 60%)
    • Calc>Random Data>Sample from Columns
    • Sample 10 rows from Column Live
    • Store samples in Livesmallsample2 [The easiest way to do this is to simply change the "1" in the variable name to a "2".]
    • Stat>Tables>Tally
    • Variables Livesmallsample2 [Again, just change the "1" to a "2".]
    • Display Counts and Percents
    • Type the second sample proportion as the second entry in the column "phatliven=10"
    • Repeat the process above 20 times altogether, finishing with "Livesmallsample20", for which the proportion living off campus will be the 20th entry in "phatliven=10"
  3. Finally, obtain summaries and display:
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Enter the variable phatliven=10
    • Graph>Stem-and-Leaf
    • Enter the variable phatliven=10
    • Summarize the distribution of sample proportion for samples of size 10 by reporting center, spread, and shape.
  4. Now take repeated large samples (size 40) from the population of values for the variable "live" (20 samples altogether):
    • Calc>Random Data>Sample from Columns
    • Sample 40 rows from Column Live
    • Store samples in Livelargesample1
    • Stat>Tables>Tally
    • Variables Livelargesample1
    • Display Counts and Percents
    • Create a column called "phatliven=40" and type in the sample proportion living off campus (for example, .525 if the sample proportion is 52.5%)
    • Calc>Random Data>Sample from Columns
    • Sample 40 rows from Column Live
    • Store samples in Livelargesample2
    • Stat>Tables>Tally
    • Variables Livelargesample2
    • Display Counts and Percents
    • Type the second sample proportion as the second entry in the column "phatliven=40"
    • Repeat the process above 20 times altogether, finishing with "Livelargesample20", for which the proportion living off campus will be the 20th entry in "phatliven=40"
  5. Finally, obtain summaries and display:
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Enter the variable phatliven=40
    • Graph>Stem-and-Leaf
    • Enter the variable phatliven=40
    • Summarize the distribution of sample proportion for samples of size 40 by reporting center, spread, and shape.
  6. Lastly, and most importantly, compare the centers, spreads, and shapes for samples of size 10 vs. 40.
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Enter the variables phatliven=10 and phatliven=40
    • Stat>Basic Statistics>2-sample t
    • Activate "Samples in Different Columns"
    • Enter the variables phatliven=10 and phatliven=40
    • Check Graph>Boxplots of Data
    Write a few sentences to compare the distribution of sample proportion for small vs. large samples, including mention of center, spread, and shape. Are your results consistent with the theory presented in Chapter 9?

Extra Credit 3 Prerequisite: Extra Credit 2. Due in lecture March 12. Worth 5 pts.

The purpose of this exercise is to explore how population shape affects the distribution of sample proportion.

  1. First verify that the population of categorical values for the variable "handed", for which we are interested in the proportion who are ambidextrous, is very skewed: there are only about 3% (.03) who are ambidextrous; the remaining 97% favor either the right or the left hand.
    • Stat>Tables>Tally
    • Variables Handed
    • Display Counts and Percents
  2. Next take repeated small samples (size 10) from the population of values for the variable "handed", for which the population proportion p who are ambidextrous you have verified to be approximately 3% or .03. Our theory about the behavior of sampling distributions is for an infinite number of repetitions, but for practical purposes you will take 20 random samples altogether.
    • Calc>Random Data>Sample from Columns
    • Sample 10 rows from Column Handed
    • Store samples in Handedsmallsample1
    • Stat>Tables>Tally
    • Variables Handedsmallsample1
    • Display Counts and Percents
    • Create a column called "phathandedn=10" and type in the sample proportion ambidextrous (for example, .1 if the sample proportion is 10%, or 0 if the sample only contains right- and left-handed people)
    • Calc>Random Data>Sample from Columns
    • Sample 10 rows from Column Handed
    • Store samples in Handedsmallsample2 [The easiest way to do this is to simply change the "1" in the variable name to a "2".]
    • Stat>Tables>Tally
    • Variables Handedsmallsample2 [Again, just change the "1" to a "2".]
    • Display Counts and Percents
    • Type the second sample proportion as the second entry in the column "phathandedn=10"
    • Repeat the process above 20 times altogether, finishing with "Handedsmallsample20", for which the proportion who are ambidextrous will be the 20th entry in "phathandedn=10"
  3. Finally, obtain summaries and display:
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Enter the variable phathandedn=10
    • Graph>Stem-and-Leaf
    • Enter the variable phathandedn=10
    • Summarize the distribution of sample proportion for samples of size 10 by reporting center, spread, and shape.
  4. Now take repeated large samples (size 40) from the population of values for the variable "handed" (20 samples altogether):
    • Calc>Random Data>Sample from Columns
    • Sample 40 rows from Column Handed
    • Store samples in Handedlargesample1
    • Stat>Tables>Tally
    • Variables Handedlargesample1
    • Display Counts and Percents
    • Create a column called "phathandedn=40" and type in the sample proportion who are ambidextrous (for example, .075 if the sample proportion is 7.5%)
    • Calc>Random Data>Sample from Columns
    • Sample 40 rows from Column Handed
    • Store samples in Handedlargesample2
    • Stat>Tables>Tally
    • Variables Handedlargesample2
    • Display Counts and Percents
    • Type the second sample proportion as the second entry in the column "phathandedn=40"
    • Repeat the process above 20 times altogether, finishing with "Handedlargesample20", for which the proportion who are ambidextrous will be the 20th entry in "phathandedn=40"
  5. Finally, obtain summaries and display:
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Enter the variable phathandedn=40
    • Graph>Stem-and-Leaf
    • Enter the variable phathandedn=40
    • Summarize the distribution of sample proportion for samples of size 40 by reporting center, spread, and shape.
  6. Next compare the centers, spreads, and shapes for samples of size 10 vs. 40.
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Enter the variables phathandedn=10 and phathandedn=40
    • Stat>Basic Statistics>2-sample t
    • Activate "Samples in Different Columns"
    • Enter the variables phathandedn=10 and phathandedn=40
    • Check Graph>Boxplots of Data
    Are your results consistent with the theory presented in Chapter 9?
  7. Lastly, and most importantly, compare the shapes of the distributions of sample proportion for samples of size 10 coming from "Live" vs. from "Handed" and for samples of size 40 coming from "Live" vs. from "Handed". For which population do the distributions of sample proportion for a given sample size tend to be more normal, for the variable "Live" or for the variable "Handed"?

Extra Credit 4 Due in lecture March 12. Worth 5 pts.

Extra Credit Exercises 2 through 10 are based on student survey data survey9-21-03.txt, which is taken to be our population. To download it into MINITAB, type ctrl A to highlight, ctrl C to copy, start up MINITAB, type ctrl V to paste it. If it asks about delimiters, click OK. The purpose of this exercise is to explore how sample size affects the distribution of sample mean.

  1. First verify that our population of quantitative values for the variable "math" has mean mu=610.44 and standard deviation sigma=72.14, and that the shape is quite normal:
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Variables Math
    • Graph>Histogram
    • Variables Math
  2. Now take repeated small samples (size 10) from the population of quantitative values for the variable "math". Our theory about the behavior of sampling distributions is for an infinite number of repetitions, but for practical purposes you will take 20 random samples altogether.
    • Calc>Random Data>Sample from Columns
    • Sample 10 rows from Column Math
    • Store samples in Mathsmallsample1
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Variables Mathsmallsample1
    • Create a column called "xbarmathn=10" and type in the sample mean Math SAT score
    • Calc>Random Data>Sample from Columns
    • Sample 10 rows from Column Math
    • Store samples in Mathsmallsample2
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Variables Mathsmallsample2
    • Type the second sample mean as the second entry in the column "xbarmathn=10"
    • Repeat the process above 20 times altogether, finishing with "Mathsmallsample20", for which the sample mean Math SAT score will be the 20th entry in "xbarmathn=10"
  3. Finally, obtain summaries and display:
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Enter the variable xbarmathn=10
    • Graph>Stem-and-Leaf
    • Enter the variable xbarmathn=10
    • Summarize the distribution of sample mean for samples of size 10 by reporting center, spread, and shape.
  4. Now take repeated large samples (size 40) from the population of values for the variable "math" (20 samples altogether):
    • Calc>Random Data>Sample from Columns
    • Sample 40 rows from Column Math
    • Store samples in Mathlargesample1
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Variables Mathlargesample1
    • Create a column called "xbarmathn=40" and type in the sample mean Math SAT score
    • Calc>Random Data>Sample from Columns
    • Sample 40 rows from Column Math
    • Store samples in Mathlargesample2
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Variables Mathlargesample2
    • Type the second sample mean as the second entry in the column "xbarmathn=40"
    • Repeat the process above 20 times altogether, finishing with "Mathlargesample20", for which the sample mean Math SAT score will be the 20th entry in "xbarmathn=40"
  5. Finally, obtain summaries and display:
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Enter the variable xbarmathn=40
    • Graph>Stem-and-Leaf
    • Enter the variable xbarmathn=40
    • Summarize the distribution of sample mean for samples of size 40 by reporting center, spread, and shape.
  6. Lastly, and most importantly, compare the centers, spreads, and shapes for samples of size 10 vs. 40.
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Enter the variables xbarmathn=10 and xbarmathn=40
    • Stat>Basic Statistics>2-sample t
    • Activate "Samples in Different Columns"
    • Enter the variables xbarmathn=10 and xbarmathn=40
    • Check Graph>Boxplots of Data
    Are your results consistent with the theory presented in Chapter 9? Write a paragraph to explain your answer.

Extra Credit 5 Prerequisite: Extra Credit 4. Due in lecture March 12. Worth 5 pts.

The purpose of this exercise is to explore how population shape affects the distribution of sample mean.

  1. First verify that our population of quantitative values for the variable "Earned" has mean mu=3.776 thousand and standard deviation sigma=6.503, and that the shape is quite skewed to the right:
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Variables Earned
    • Graph>Histogram
    • Variables Earned
  2. Now take repeated small samples (size 10) from the population of quantitative values for the variable "earned". Our theory about the behavior of sampling distributions is for an infinite number of repetitions, but for practical purposes you will take 20 random samples altogether.
    • Calc>Random Data>Sample from Columns
    • Sample 10 rows from Column Earned
    • Store samples in Earnedsmallsample1
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Variables Earnedsmallsample1
    • Create a column called "xbarearnedn=10" and type in the sample mean Earned SAT score
    • Calc>Random Data>Sample from Columns
    • Sample 10 rows from Column Earned
    • Store samples in Earnedsmallsample2
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Variables Earnedsmallsample2
    • Type the second sample mean as the second entry in the column "xbarearnedn=10"
    • Repeat the process above 20 times altogether, finishing with "Earnedsmallsample20", for which the sample mean Earned SAT score will be the 20th entry in "xbarearnedn=10"
  3. Finally, obtain summaries and display:
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Enter the variable xbarearnedn=10
    • Graph>Stem-and-Leaf
    • Enter the variable xbarearnedn=10
    • Summarize the distribution of sample mean for samples of size 10 by reporting center, spread, and shape.
  4. Now take repeated large samples (size 40) from the population of values for the variable "earned" (20 samples altogether):
    • Calc>Random Data>Sample from Columns
    • Sample 40 rows from Column Earned
    • Store samples in Earnedlargesample1
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Variables Earnedlargesample1
    • Create a column called "xbarearnedn=40" and type in the sample mean Earned SAT score
    • Calc>Random Data>Sample from Columns
    • Sample 40 rows from Column Earned
    • Store samples in Earnedlargesample2
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Variables Earnedlargesample2
    • Type the second sample mean as the second entry in the column "xbarearnedn=40"
    • Repeat the process above 20 times altogether, finishing with "Earnedlargesample20", for which the sample mean Earned SAT score will be the 20th entry in "xbarearnedn=40"
  5. Finally, obtain summaries and display:
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Enter the variable xbarearnedn=40
    • Graph>Stem-and-Leaf
    • Enter the variable xbarearnedn=40
    • Summarize the distribution of sample mean for samples of size 40 by reporting center, spread, and shape.
  6. Next compare the centers, spreads, and shapes for samples of size 10 vs. 40.
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Enter the variables xbarearnedn=10 and xbarearnedn=40
    • Stat>Basic Statistics>2-sample t
    • Activate "Samples in Different Columns"
    • Enter the variables xbarearnedn=10 and xbarearnedn=40
    • Check Graph>Boxplots of Data
    Are your results consistent with the theory presented in Chapter 9? Write a paragraph to explain your answer.
  7. Lastly, and most importantly, compare shapes of the distributions of sample mean for samples of size 10 coming from "Math" vs. from "Earned", and for samples of size 40 coming from "Math" vs. from "Earned". For which population do the distributions of sample mean for a given sample size tend to be more normal, for the variable "Math" or for the variable "Earned"?

Extra Credit 6 Due in lecture March 30. Worth 5 pts.

Extra Credit Exercises 2 through 10 are based on student survey data survey9-21-03.txt, which is taken to be our population. To download it into MINITAB, type ctrl A to highlight, ctrl C to copy, start up MINITAB, type ctrl V to paste it. If it asks about delimiters, click OK. The purpose of this exercise is to understand the long-run behavior of confidence intervals.

  1. First verify that the population proportion p of all survey respondents living off campus is almost exactly .5:
    • Stat>Tables>Tally
    • Variables Live, Check "Counts and Percents".
  2. Next, take repeated samples (20 altogether) of size 40 from the population of categorical values for the variable "live", obtaining a 90% confidence interval each time for the "unknown" population proportion, based on each sample proportion.
    • Calc>Random Data>Sample from Columns
    • Sample 40 rows from Column Live
    • Store samples in Livesample1
    • Stat>Basic Statistics>1 proportion
    • Samples in columns Livesample1
    • Options>Confidence Level>90 and Check "Use test and interval based on normal distribution"
    • The first confidence interval is shown in the session window; you will need to examine all 20 intervals together once they've been produced.
    • Calc>Random Data>Sample from Columns
    • Sample 40 rows from Column Live
    • Store samples in Livesample2
    • Stat>Basic Statistics>1 proportion
    • Samples in columns Livesample2
    • [No need to specify 90% confidence and normal distribution, since these continue to be enabled by default.]
    • Repeat the process above 20 times altogether, finishing with "Livesample20".
  3. Now examine all 20 confidence intervals. How many of them contain the actual population proportion .5? In the long run, what percentage of the 90% confidence intervals should contain p? [Note: keep all the results in your session window handy if you intend to do Extra Credit 7, which will focus on p-values rather than on confidence intervals.]

Extra Credit 7. Prerequisite Extra Credit 6. Due in lecture March 30. Worth 5 pts.

The purpose of this exercise is to understand the long-run behavior of hypothesis tests.

  1. First recall that the population proportion p of all survey respondents living off campus is almost exactly .5.
  2. Next examine all 20 p-values obtained in Extra Credit 6. [These were produced for the two-sided test about the null hypothesis that the population proportion p living off campus is .5] How many of them reject the null hypothesis at the 10% level? In the long run, what percentage of the tests should reject against the two-sided alternative at the 10% level, when the null hypothesis is in fact true?

Extra Credit 8 Due in lecture April 13. Worth 5 pts.

Extra Credit Exercises 2 through 10 are based on student survey data survey9-21-03.txt, which is taken to be our population. To download it into MINITAB, type ctrl A to highlight, ctrl C to copy, start up MINITAB, type ctrl V to paste it. If it asks about delimiters, click OK. The purpose of this exercise is to understand the long-run behavior of confidence intervals.

  1. First verify that the population mean mu of all survey respondents' Math SAT scores is 610.44, and the standard deviation sigma is 72.14.
    • Stat>Basic Statistics>Display Descriptive Statistics
    • Variables Math
  2. Next, take repeated samples (20 altogether) of size 40 from the population of quantitative values for the variable "math", obtaining a 90% confidence interval each time for the "unknown" population mean, based on each sample mean.
    • Calc>Random Data>Sample from Columns
    • Sample 40 rows from Column Math
    • Store samples in Mathsample1
    • Stat>Basic Statistics>1-sample z
    • Variables Mathsample1; Test Mean 610.44 [This will be needed for Extra Credit 9] and enter Sigma as 72.14
    • Options>Confidence Level>90
    • The first confidence interval is shown in the session window; you will need to examine all 20 intervals together once they've been produced.
    • Calc>Random Data>Sample from Columns
    • Sample 40 rows from Column Math
    • Store samples in Mathsample2
    • Stat>Basic Statistics>1-sample z
    • Variables Mathsample2; Continue to Test Mean 610.44, given sigma 72.14 and opt for Confidence Level 90.
    • Repeat the process above 20 times altogether, finishing with "Mathsample20".
  3. Now examine all 20 confidence intervals. How many of them contain the actual population mean 610.44? In the long run, what percentage of the 90% confidence intervals should contain mu? [Note: keep all the results in your session window handy if you intend to do Extra Credit 9, which will focus on p-values rather than on confidence intervals.]

Extra Credit 9. Prerequisite Extra Credit 8. Due in lecture April 13. Worth 5 pts.

The purpose of this exercise is to understand the long-run behavior of hypothesis tests.

  1. First recall that the population mean Math SAT score mu of all survey respondents was 610.44.
  2. Next examine all 20 p-values obtained in Extra Credit 8. [These were produced for the two-sided z test about the null hypothesis that the population mean mu was 610.44.] How many of them reject the null hypothesis at the 10% level? In the long run, what percentage of the tests should reject against the two-sided alternative at the 10% level, when the null hypothesis is in fact true?

Extra Credit 10. Due in lecture April 13. Worth 5 pts.

The purpose of this exercise is to learn about chi-square goodness of fit tests.

  1. Read Section 15.3 of your textbook, pages 652 to 656 (2nd ed. 544 to 547).
  2. Find the counts of surveyed fall stats students in years 1, 2, 3, 4 by accessing the survey results: survey9-21-03.txt. Carry out a chi-square goodness of fit test by hand to determine if the population of undergraduate stats students may be evenly divided among years 1, 2, 3, 4, or if the proportions in the various years differ significantly. ["Other" students should be excluded from your calculations.]


[ Home | Calendar | Assignments | Handouts ]