## ISBN-13 9780538733526ISBN-10 0538733527

Probability And Statistics For Engineering And The Sciences Devore 8th Edition by Jay L. Devore – Test Bank

Sample  Questions

Chapter 1 – Overview and Descriptive Statistics

1. Give one possible sample of size 4 from each of the following populations:
a. All daily newspapers published in the United States
b. All companies listed on the New York Stock Exchange
c. All students at your college or university
d. All grade point averages of students at your college or university

2. A Southern State University system consists of 23 campuses. An administrator wishes to make an inference about the average distance between the hometowns of students and their campuses. Describe and discuss several different sampling methods that might be employed. Would this be an enumerative or an analytic study? Explain your reasoning.

3. A Michigan city divides naturally into ten district neighborhoods. How might a real estate appraiser select a sample of single-family homes that could be used as a basis for developing an equation to predict appraised value from characteristics such as age, size, number of bathrooms, and distance to the nearest school, and so on? Is the study enumerative or analytic?

4. An experiment was carried out to study how flow rate through a solenoid valve in an automobile’s pollution-control system depended on three factors: armature lengths, spring load, and bobbin depth. Two different levels (low and high) of each factor were chosen, and a single observation on flow was made for each combination of levels.

a. The resulting data set consisted of how many observations?
b. Is this an enumerative or analytic study? Explain your reasoning.

5. The accompanying data specific gravity values for various wood types used in construction .

.41 .41 .42 .42. .42 .42 .42 .43 .44
.54 .55 .58 .62 .66 .66 .67 .68 .75
.31 .35 .36 .36 .37 .38 .40 .40 .40
.45 .46 .46 .47 .48 .48 .48 .51 .54

Construct a stem-and-leaf display using repeated stems and comment on any interesting features of the display.

6. Temperature transducers of a certain type are shipped in batches of 50. A sample of 60 batches was selected, and the number of transducers in each batch not conforming to design specifications was determined, resulting in the following data:

0 4 2 1 3 1 1 3 4 1 2 3 2 2 8 4 5 1 3 1
2 1 2 4 0 1 3 2 0 5 3 3 1 3 2 4 7 0 2 3
5 0 2 3 2 1 0 6 4 2 1 6 0 3 3 3 6 1 2 3

a. Determine frequencies and relative frequencies for the observed values of x = number of nonconforming transducers in a batch.
b. What proportion of batches in the sample has at most four nonconforming transducers? What proportion has fewer than four? What proportion has at least four nonconforming units?

7. The number of contaminating particles on a silicon wafer prior to a certain rinsing process was determined for each wafer in a sample size 100, resulting in the following frequencies:

Number of particles Frequency Number of particles Frequency
0 1 8 12
1 2 9 4
2 3 10 5
3 12 11 3
4 11 12 1
5 15 13 2
6 18 14 1
7 10

a. What proportion of the sampled wafers had at least two particles? At least six particles?
b. What proportion of the sampled wafers had between four and nine particles, inclusive? Strictly between four and nine particles?

8. The cumulative frequency and cumulative relative frequency for a particular class interval are the sum of frequencies and relative frequencies, respectively, for that interval and all intervals lying below it. Compute the cumulative frequencies and cumulative relative frequencies for the following data:

75 89 80 93 64 67 72 70 66 85
89 81 81 71 74 82 85 63 72 81
81 95 84 81 80 70 69 66 60 83 85 98 84 68 90 82 69 72 87 88

9. Consider the following observations on shear strength of a joint bonded in a particular manner:

30.0 4.4 33.1 66.7 81.5 22.2 40.4 16.4 73.7 36.6 109.9

a. Determine the value of the sample mean.
b. Determine the value of the sample median. Why is it so different from the mean?
c. Calculate a trimmed mean by deleting the smallest and largest observations. What is the corresponding trimming percentage? How does the value of this compare to the mean and median?

10. A sample of 26 offshore oil workers took part in a simulated escape exercise, resulting in the accompanying data on time (sec) to complete the escape:

373 370 364 366 364 325 339 393
356 359 363 375 424 325 394 402
392 369 374 359 356 403 334 397

a. Construct a stem-and-leaf display of the data. How does it suggest that the sample mean and median will compare?
b. Calculate the values of the sample mean and median.
c. By how much could the largest time, currently 424, be increased without affecting the value of the sample median? By how much could this value be decreased without affecting the value of the sample mean?
d. What are the values of and when the observations are re-expressed in minutes?

11. A sample of n = 10 automobiles was selected, and each was subjected to a 5-mph crash test. Denoting a car with no visible damage by S (for success) and a car with such damage by F, results were as follows: S S S F F S S F S S

a. What is the value of the sample proportion of successes x/n?
b. Replace each S with a 1 and each F with a 0. Then calculate for this numerically coded sample. How does compare to x/n?
c. Suppose it is decided to include 15 more cars in the experiment. How many of these would have to be S’s to x/n = .80 for the entire sample of 25 cars?

12. Answer the following two questions:

a. If a constant c is added to each in a sample, yielding = + c, how do the sample mean and median of the ’s relate to the mean and median of the ’s? Verify your conjectures.
b. If each is multiplied by a constant c, yielding = , answer the question of part (a). Again, verify your conjectures.

13. Calculate and interpret the values of the sample mean and sample standard deviation for the following observations on fracture strength.

128 131 142 168 87 93 105 114 96 98

14. The first four deviations from the mean in a sample of n = 5 reaction times were .6, .9, 1.0, and 1.5. What is the fifth deviation from the mean? Give a sample for which these are the five deviations from the mean.

15. A sample of 20 glass bottles of a particular type was selected, and the internal pressure strength of each bottle was determined. Consider the following partial sample information:

Median = 202.2
lower fourth = 196.0
Upper fourth = 216.8

Three smallest observations 125.8 188.1 193.7
Three largest observations 221.3 230.5 250.2

Are there any outliers in the sample? Any extreme outliers?

CHAPTER 2 – Probability

1. Suppose that vehicles taking a particular freeway exit can turn right (R), turn left (L), or go straight (S). Consider observing the direction for each of three successive vehicles.

a. List all outcomes in the event A that all three vehicles go in the same direction.
b. List all outcomes in the event B that all three vehicles take different directions.
c. List all outcomes in the event C that exactly two of the three vehicles turn right.
d. List all outcomes in the even D that exactly tow vehicles go in the same direction.
e. List outcomes in , C D, and C D.

2. Each of a sample of four home mortgages is classified as fixed rate (F) or variable rate (V).

a. What are the 16 outcomes in S?
b. Which outcomes are in the event that exactly two of the selected mortgages are fixed rate?
c. Which outcomes are in the event that all four mortgages are of the same type?
d. Which outcomes are in the event that at most one of the four is a variable-rate mortgage?
e. What is the union of the events in parts (c) and (d), and what is the intersection of these two
events?
f. What are the union and intersection of the two events in parts (b) and (c)?

3. A college library has five copies of a certain text on reserve. Two copies (1 and 2) are first printings, and the other three (3, 4, and 5) are second printings. A student examines these books in random order, stopping only when a second printing has been selected. One possible outcome is 4, and another is 125.

a. List the S.
b. Let A denote the event that exactly one book must be examined. What outcomes are in A?
c. Let B be the event that book 4 is the one selected. What outcomes are in B?
d. Let C be the event that book 2 is not examined. What outcomes are in C?

4. The Department of Statistics at a state university in California has just completed voting by secret ballot for a department head. The ballot box contains four slips with votes for candidate A and three slips with votes for candidate B. Suppose these slips are removed from the box one by one.

a. List all possible outcomes.
b. Suppose a running tally is kept as slips are removed. For what outcomes does A remain ahead of B throughout the tally?

5. Let A denote the event that the next item checked out at a college library is a math book, and let B be the event that the next item checked out is a history book. Suppose that P(A) = .40 and P(B) = .50.

a. Why is it not the case that P(A) + P(B) = 1?
b. Calculate P( )
c. Calculate P(A B).
d. Calculate P( ).
6. A large company offers its employees two different health insurance plans and two different dental insurance plans. Plan 1 of each type is relatively inexpensive, but restricts the choice of providers, whereas plan 2 is more expensive but more flexible. The accompanying table gives the percentages of employees who have chosen the various plans:

Dental Plan
Health Plan 1 2
1 27% 14%
2 24% 35%

Suppose that an employee is randomly selected and both the health plan and dental plan chosen by the selected employee are determined.

a. What are the four simple events?
b. What is the probability that the selected employee has chosen the more restrictive plan of each type?
c. What is the probability that the employee has chosen the more flexible dental plan?

7. An Economic Department at a state university with five faculty members-Anderson, Box, Cox, Carter, and Davis-must select two of its members to serve on a program review committee. Because the work will be time-consuming, no one is anxious to serve, so it is decided that the representative will be selected by putting five slips of paper in a box, mixing them, and selecting two.

a. What is the probability that both Anderson and Box will be selected? (Hint: List the equally likely outcomes.)
b. What is the probability that at least one of the two members whose name begins with C is selected?
c. If the five faculty members have taught for 3, 6, 7, 10, and 14 years, respectively, at the university, what is the probability that the two chosen representatives have at least 15 years’ teaching experience at the university?

8. Student Engineers Council at an Indiana college has one student representative from each of the five engineering majors (civil, electrical, industrial, materials, and mechanical). In how many ways can

a. Both a council president and a vice president be selected?
b. A president, a vice president, and a secretary be selected?
c. Two members be selected for the President’s Council?

9. A real estate agent is showing homes to a prospective buyer. There are ten homes in the desired price range listed in the area. The buyer has time to visit only four of them.

a. In how many ways could the four homes be chosen if the order of visiting is considered?
b. In how many ways could the four homes be chosen if the order is disregarded?
c. If four of the homes are new and six have previously been occupied and if the four homes to visit are randomly chosen, what is the probability that all four are new? (The same answer results regardless of whether order is considered.)

10. An experimenter is studying the effects of temperature, pressure, and type of catalyst on yield from a certain chemical reaction. Three different temperatures, four different pressures, and five different catalysts are under consideration.

a. If any particular experimental run involves the use of a single temperature, pressure, and catalyst, how many experimental runs are possible?
b. How many experimental runs are there that involve use of the lowest temperature and two lowest pressures?

11. A certain sports car comes equipped with either an automatic or a manual transmission, and the car is available in one of four colors. Relevant probabilities for various combinations of transmission type and color are given in the accompanying table.

Color
Transmission Type White Blue Black Red
A .13 .10 .11 .11
M .15 .07 .15 .18

Let A = (automatic transmission), B = {black}, and C = {white}.

a. Calculate P(A), P(B), and P(A B).
b. Calculate both P(A|B) and P(B|A), and explain in context what each of these probabilities represents.

12. Consider the following information: where A = {Visa Card}, B = {MasterCard}, P(A) = .5, P(B) = .4, and P(A B) = .25. Calculate each of the following probabilities.

a. P(B|A)
b. P( |A)
c. P(A|B)
d. P( |B)
e. Given that an individual is selected at random and that he or she has at least one card, what is the probability that he or she has a Visa card?

13. A certain shop repairs both audio and video components. Let A denote the event that the next component brought in for repair is an audio component, and let B be the event that the next component is a compact disc player (so the event B is contained in A). Suppose that P(A) = .625 and P(B) = .05. What is P(B/A)?

14. At a certain gas station, 40% of the customers use regular unleaded gas ( ), 35% use extra unleaded gas ( ), and 25% use premium unleaded gas ( ). Of those customers using regular gas, only 30% fill their tanks (event B). Of those customers using extra gas, 60% fill their tanks, whereas of those using premium, 50% fill their tanks.

a. What is the probability that the next customer will request extra unleaded gas and fill the tank?
b. What is the probability that the next customer fills the tank?
c. If the next customer fills the tank, what is the probability that regular gas is requested? Extra gas? Premium gas?

15. Suppose that the proportions of blood phenotypes in a particular population are as follows:

A B AB O
.42 .10 .04 .44

Assuming that the phenotypes of two randomly selected individuals are independent of one another, what is the probability that both phenotypes are O? What is the probability that the phenotypes of two randomly selected individuals match?

16. Two pumps connected in parallel fail independently of one another on any given day. The probability that only the older pump will fail is .15, and the probability that only the newer pump will fail is .05. What is the probability that the pumping system will fail on any given day (which happens if both pumps fail)?

17. Seventy percent of all vehicles examined at a certain emissions inspection station pass the inspection. Assuming that successive vehicles pass or fail independently of one another, calculate the following probabilities.

a. P(all of the next three vehicles inspected pass)
b. P(at least one of the next three inspected fail)
c. P(exactly one of the next three inspected passes)
d. P(at most one of the next three vehicles inspected passes)
e. Given that at least one of the next three vehicles passes inspection, what is the probability that all three pass?

CHAPTER 3 – Discrete Random Variables and Probability Distributions

1. Three automobiles are selected at random, and each is categorized as having a diesel (S) or nondiesel (F) engine (so outcomes are SSS, SSF, etc.). If X = the number of cars among the three with diesel engines, list each outcome in S and its associated X value.

2. Let X = the number of nonzero digits in a randomly selected zip code. What are the possible values of X? Give three possible outcomes and their associated X values.

3. If the sample space is an infinite set, does this necessarily imply that any random variable X defined from S will have an infinite set of possible values? If yes, say why. If no, give an example.

4. An automobile service facility specializing in engine tune-ups knows that 50% of all tune-ups are done on four-cylinder automobiles, 40% on six-cylinder automobiles, and 10% on eight-cylinder automobiles. Let X = the number of cylinders on the next car to be tuned. What is the pmf of X?

5. The n candidates for a store manager have been ranked 1,2,3,…,n. Let X = the rank of a randomly selected candidate, so that X has pmf

6. A chemical supply company currently has in stock 100lb of a certain chemical, which it sells to customers in 5-lb lots. Let X = the number of lots ordered by a randomly chosen customer, and suppose that X has pmf

x 1 2 3 4
P(x) .2 .3 .3 .2

Compute E(X) and V(X). Then compute the expected number of pounds left after the next customer’s order is shipped, and the variance of the number of pounds left. (Hint: The number of pounds left is a linear function of X.)

7. Twenty-five percent of all telephones of a certain type are submitted for service while under warranty. Of these, 60% can be repaired whereas the other 40% must be replaced with new units. If a company purchases ten of these telephones, what is the probability that exactly two will end up being replaced under warranty?

8. A geologist has collected 10 specimens of basaltic rock and 10 specimens of granite. The geologist instructs a laboratory assistant to randomly select 15 of the specimens for analysis.

a. What is the pmf of the number of granite specimens selected for analysis?
b. What is the probability that all specimens of one of the two types of rock are selected for analysis?
c. What is the probability that the number of granite specimens selected for analysis is within 1 standard deviation of its mean value?

9. A family decides to have children until it has three children of the same gender. Assuming P(B) = P(G) = .5, what is the pmf of X = the number of children in the family?

10. Three brothers and their wives decide to have children until each family has two female children. What is the pmf of X = the total number of male children born to the brothers? What is E(X), and how does it compare to the expected number of male children born to each brother?

11. Suppose the number X of tornadoes observed in Kansas during a 1-year period has a Poisson distribution with

a. Compute
b. Compute
c. Compute

12. Assume that 1 in 200 people carry the defective gene that causes inherited colon cancer. In a sample of 1000 individuals, what is the approximate distribution of the number who carry this gene? Use this distribution to calculate the approximate probability that

a. Between 6 and 9 (inclusive) carry the gene.
b. At least 10 carry the gene.

13. The number of tickets issued by a meter reader for parking-meter violations can be modeled by a Poisson process with a rate parameter of five per hour.

a. What is the probability that exactly three tickets are given out during a particular hour?
b. What is the probability that at least three tickets are given out during a particular hour?
c. How many tickets do you expect to be given during a 45-min period?

14. Automobiles arrive at a vehicle equipment inspection station according to a Poisson process with rate = 10 per hour. Suppose that with probability .5 an arriving vehicle will have no equipment violations.

a. What is the probability that exactly ten arrive during the hour and all ten have no violations?
b. For any fixed what is the probability that y arrive during the hour, of which ten have no violations?
c. What is the probability that ten “no-violation” cars arrive during the next hour? [Hint: Sum the probabilities in part (b) from y = 10 to ]

15. A mail-order computer business has five telephone lines. Let X denote the number of lines in use at a specified time. Suppose the pmf of X is as given in the accompanying table.

x 0 1 2 3 4 5
P(x) .10 .15 .20 .25 .22 .08

Calculate the probability of each of the following events.
a. {at most 3 lines are in use}
b. {fewer than 3 lines are in use}
c. {at least 3 lines are in use}
d. {between 2 and 5 lines, inclusive, are in use}
e. {between 2 and 4 lines, inclusive, are not in use}
f. {at least 4 lines are not in use}

16. Suppose that in one area in California, 40% of all homeowners are insured against earthquake damage. Four homeowners are to be selected at random; let X denote the number among the four who have earthquake insurance.

a. Find the probability distribution of X. [Hint: Let S denote a homeowner who has insurance and F one who does not. Then one possible outcome is SFSS, with probability (.3)(.7)(.3)(.3) and associated X value 3. There are 15 other outcomes.]
b. What is the most likely value for X?
c. What is the probability that at least two of the four selected have earthquake insurance?

17. An insurance company offers its policyholders a number of different payment options. For a randomly selected policyholder, let X = the number of months between successive payments. The cdf of X is as follows:

a. What is the pmf of X?
b. Using just the cdf, compute
c. Using just the pmf, compute P(X>6).

18. The pmf for X = the number of major defects on a randomly selected gas stove of a certain type is

x 0 1 2 3 4
P(x) .10 .15 .45 .25 .05

Compute the following:
a. E(X)
b. V(X) directly from the definition
c The standard deviation of X
d. V(X) using the shortcut formula

19. An appliance dealer sells three different models of upright freezers having 13.5, 15.9, and 19.1 cubic feet of storage space, respectively. Let X = the amount of storage space purchased by the next customer to buy a freezer. Suppose that X has pmf

x 13.5 15.9 19.1
P(x) .2 .4 .4

a. Compute
b. If the price of a freezer having capacity X cubic feet is 25X – 8.5, what is the expected price paid by the next customer to buy a freezer?
c. What is the variance of the price 25X – 8.5 paid by the next customer?
d. Suppose that although the rated capacity of a freezer is X, the actual capacity is What is the expected actual capacity of the freezer purchased by the next customer?

20. Compute the following binomial probabilities directly from the formula for b(x;n,p).

a. b(3; 8, .7)
b. b(5; 8, .7)
c.
d.

21. Use the cumulative binomial probabilities table available in your text to obtain the following probabilities:

a. B(4; 10, .4)
b. b(4; 10, .4)
c. b(6; 10, .6)
d. when Bin(10,.4)
e. when Bin(10,.3)
f. when Bin(10,.7)
g. when Bin(10,.3)

22. Suppose that only 25% of all drivers come to a complete stop at an intersection having flashing red lights in all directions when no other cars are visible. What is the probability that, of 20 randomly chosen drivers coming to an intersection under these conditions,

a. At most 6 will come to a complete stop?
b. Exactly 6 will come to a complete stop?
c. At least 6 will come to a complete stop?
d. How many of the next 20 drivers do you expect to come to a complete stop?

23. Each of 12 refrigerators of a certain type has been returned to a distributor because of the presence of a high-pitched oscillating noise when the refrigerator is running. Suppose that 5 of these 12 have defective compressors and the other 7 have less serious problems. If they are examined in random order, let X = the number among the first 6 examined that have a defective compressor. Compute the following:

a. (X = 1)
b.

24. The number of pumps in use at both a six-pump station and a four-pump station will be determined. Give the possible values for each of the following random variables.

a. T = the total number of pumps in use
b. X = the difference between the numbers in use at stations 1 and 2
c. U = the maximum number of pumps in use at either station
d. Z = the number of stations having exactly two pumps in use

Chapter 4 – Continuous Random Variables and Probability Distributions

1. Let X denote the amount of time for which a book on 2-hour reserve at a college-library is checked out by a randomly selected student and suppose that X

Calculate the following probabilities:

a.
b.

2. A college professor always finishes his lectures within 2 minutes after the bell rings to end the period and the end of the lecture. Let X = the time that elapses between the bell and the end of the lecture and suppose the pdf of X is

a. Find the value of k. [Hint: Total area under the graph of f(x) is 1.]
b. What is the probability that the lecture ends within 1minutes of the bell ringing?
c. What is the probability that the lecture continues beyond the bell for between 60 and 90 seconds?
d. What is the probability that the lecture continues for at least 90 seconds beyond the bell?

3. The time X (minutes) for a lab assistant to prepare the equipment for a certain experiment is believed to have a uniform distribution with A = 20 and B = 30.

a. Write the pdf of X and sketch its graph.
b. What is the probability that preparation time exceeds 27 minutes?
c. Find the preparation mean time, then calculate the probability that preparation is within 2 minutes of the mean time?
d. For any a such that 20 < a < a + 2 < 30, what is the probability that preparation time is between a and a + 2 minutes?

4. “Time headway” in traffic flow is the elapsed time between the time that one car finishes passing a fixed point and the instant that the next car begins to pass that point. Let X = the time headway for two randomly chosen consecutive cars on a freeway during a period of heavy flow. The following pdf of X is

What is the probability that the time headway is

a. At most 6 seconds?
b. At least 6 seconds?
c. At most 5 seconds?
d. Between 5 and 6 seconds?

5. The cdf of checkout duration X for a book on a 2-hour reserve at a college library is given by:

Use this cdf to compute the following:

a.
b.
c.
d. The median checkout duration
e.

6. Let X denote the amount of space occupied by an article placed in a packing container. The pdf of X is

a. Obtain the cdf of X..
b. What is
c. What is ?
d. What is the 75th percentile of the distribution?
e. Compute
f. What is the probability that X is within 1 standard deviation of its mean value?

7. Let X have a uniform distribution on the interval [a, b].

a. Obtain an expression for the (100p) th percentile.
b. Compute E(X), V(X), and .
c. For n a positive integer, compute .

8. Let X be the temperature in at which a certain chemical reaction takes place, and let Y be the temperature in (so Y = 1.8X + 32).

a. If the median of the X distribution is , show that 1.8 + 32 is the median of the Y distribution.
b. How is the 90th percentile of the Y distribution related to the 90th percentile of the X distribution? Verify your conjecture.
c. More generally, if Y = aX + b, how is any particular percentile of the Y distribution related to the corresponding percentile of the X distribution?

9. Let Z be a standard normal random variable and calculate the following probabilities:

a.
b.
c.
d.
e.
f.
g.
h.
i.
j.

10. In each case, determine the value of the constant c that makes the probability statement correct.

a.
b.
c.
d.
e.

11. If X is a normal random variable with mean 85 and standard deviation 10, compute the following probabilities by standardizing.

a.
b.
c.
d.
e.
f.

12. The distribution of resistance for resistors of a certain type is known to be normal, with 10% of all resistors having a resistance exceeding 10.634 ohms, and 5% having a resistance smaller than 9.7565 ohms. What are the mean value and standard deviation of the resistance distribution?

13. Let X have a binomial distribution with parameters n = 25 and p. Calculate each of the following probabilities using the normal approximation (with the continuity correction) for the cases p = .5 and .6, and compare to the exact probabilities calculated from the cumulative binomial probabilities table available in your text.

a.
b.
c.

14. Suppose only 40% of all drivers in Florida regularly wear a seatbelt. A random sample of 500 drivers is selected. What is the probability that

a. Between 170 and 220 (inclusive) of the drivers in the sample regularly wear a seatbelt?
b. Fewer than 175 of those in the sample regularly wear a seatbelt?

15. Let X = the time between two successive arrivals at the drive-up window of a local bank. If X has an exponential distribution with = 1 (which is identical to a standard gamma distribution with =1), compute the following:

a. The expected time between two successive arrivals.
b. The standard deviation of the time between two successive arrivals.
c. .
d. .

16. Let X have a standard gamma distribution with Evaluate the following:

a.
b.
c.
d.
e.
f.

17. Suppose that when a transistor of a certain type is subjected to an accelerated life test, the lifetime X (in weeks) has a gamma distribution with mean of 40 and variance of 320.

a. What is the probability that a transistor will last between 1 and 40 weeks?
b. What is the probability that a transistor will last at most 40 weeks? Is the median of the lifetime distribution less than 40? Why or why not?

18. The lifetime X (in hundreds of hours) of a certain type of vacuum tube has a Weibull distribution with parameters Compute the following:

a. E(X) and V(X)
b.
c.

19. Let X = hourly median power (in decibels) of received radio signals transmitted between two cities. It is believed that the lognormal distribution provides a reasonable probability model for X. If the parameter values are calculate the following:

a. The mean value and standard deviation of received power
b. The probability that received power is between 50 and 250 dB
c. The probability that X is less than its mean value. Why is this probability not .5?

20. Suppose the proportion X of surface area in a randomly selected quadrat that is covered by a certain plant has a standard beta distribution with

a. Compute E(X) and V(X).
b. Compute
c. Compute
d. What is the expected proportion of the sampling region not covered by the plant?

21. Stress is applied to a 20-in. steel bar that is clamped in a fixed position at each end. Let X = distance from the left end at which the bar snaps. Suppose Y/20 has a standard beta distribution with E(Y) = 10 and V(Y) =
a. What are the parameters of the relevant standard beta distribution?
b. Compute
c. Compute the probability that the bar snaps more than 2 in. from where you expect it to.

22. Consider the following ten observations on bearing lifetime (in hours):

152.7 172.0 172.5 173.3 193.0
204.7 216.5 234.9 262.6 422.6

Construct a normal probability plot and comment on the plausibility of the normal distribution as a model for bearing lifetime.

23. Construct a normal probability plot for the following sample of observations on coating thickness for low-viscosity paint. Would you feel comfortable estimating population mean thickness using a method that assumed a normal population distribution?

.83 .98 1.06 1.14 1.20 1.25 1.29 1.40
1.48 1.49 1.51 1.62 1.65 1.71 1.76 1.83

24. The propagation of fatigue cracks in various aircraft parts has been the subject of extensive study in recent years. The accompanying data consists of propagation lives to reach crack size in fastener holes intended for use in military aircraft.

.736 .863 .865 .913 .915 .937 .983 1.007
1.011 1.064 1.109 1.132 1.140 1.153 1.253 1.394

Construct a normal probability plot for the data. Does it appear plausible that propagation life has a normal distribution? Explain.

Chapter 5 – Joint Probability Distributions and Random Samples

1. Each front tire on a particular type of vehicle is supposed to be filled to a pressure of 26 psi. Suppose the actual air pressure in each tire is a random variable—X for the right tire and Y for the left tire, with joint pdf

a. What is the value of K?
b. What is the probability that both tires are underfilled?
c. What is the probability that the difference in air pressure between the two tires is at most 2 psi?
d. Determine the (marginal) distribution of air pressure in the right tire alone.
e. Are X and Y independent random variables?

2. Let X denote the number of brand X VCRs sold during a particular week by a certain store. The pmf of X is

x 0 1 2 3 4
.1 .2 .3 .25 .15

Seventy percent of all customers who purchase brand X VCRs also buy an extended warranty. Let Y denote the number of purchasers during this week who buy an extended warranty.

a. What is P(X = 4, Y = 2)? [Hint: This probability equals P(Y = 2/X = 4) P(X = 4); now think of the four purchases as four trials of a binomial experiment, with success on a trial corresponding to buying an extended warranty.]
b. Calculate P(X =Y).
c. Determine the joint pmf of X and Y and then the marginal pmf of Y.

3. Two components of a minicomputer have the following joint pdf for their useful lifetimes X and Y:

a. What is the probability that the lifetime X of the first component exceeds 3?
b. What are the marginal pdf”s of X and Y? Are the two lifetimes independent? Explain.
c. What is the probability that the lifetime of at least one component exceeds 3?

4. The joint pdf of pressures for right (X) and left (Y) front tires is given by .

a. Determine the conditional pdf of Y given that X = x and the conditional pdf of X given that Y = y if you are given
b. If the pressure in the right tire is found to be 22 psi, what is the probability that the left tire has a pressure of at least 25 psi? Compare this to
c. If the pressure in the right tire is found to be 22 psi, what is the expected pressure in the left tire, and what is the standard deviation of pressure in this tire?

5. An instructor has given a short test consisting of two parts. For a randomly selected student, let X = the number of points earned on the first part and Y = the number of points earned on the second part. Suppose that the joint pmf of X and Y is given in the accompanying table.

p(x,y) 0 5 10 15
0 .02 .06 .02 .10
5 .04 .15 .20 .10
10 .01 .15 .14 .01

a. If the score recorded in the grade book is the total number of points earned on the two parts, what is the expected recorded score E(X + Y)?
b. If the maximum of the two scores is recorded, what is the expected recorded score?

6. Abby and Bianca have agreed to meet for lunch between noon and 1:00 P.M. Denote Abby’s arrival time by X, Bianca’s by Y, and suppose X and Y are independent with pdf’s.

What is the expected amount of time that the one who arrives first must wait for the other person? [Hint: h(X, Y ) = |X – Y|.]

7. Show that if X and Y are independent random variables, then

8. Show that if Under what conditions will

9. A particular brand of dishwasher soap is sold in three sizes: 25oz, 40oz, and 65 oz. Twenty percent of all purchasers select a 25 oz box, fifty percent select a 40 oz box, and the remaining thirty percent choose a 65 oz box. Let denote the package sizes selected by two independently selected purchasers.

a. Determine the sampling distribution of , calculate , and compare to
b. Determine the sampling distribution of the sample variance

10. It is known that 80% of all brand A zip drives work in a satisfactory manner throughout the warranty period (are “success”). Suppose that n = 10 drives are randomly selected. Let X = the number of successes in the sample. The statistic X/n is the sample proportion (fraction) of successes. Obtain the sampling distribution of this statistic. [Hint: One possible value of X/n is .3, corresponding to X = 3. What is the probability of this value (what kind of random variable is X)?]

11. Let X be the number of packages being mailed by a randomly selected customer at a certain shipping facility. Suppose the distribution of X is as follows:

x 1 2 3 4
p(x) .4 .3 .2 .1

a. Consider a random sample of size n = 2 (two customers), and let be the sample mean number of packages shipped. Obtain the probability distribution of .
b. Refer to part (a) and calculate
c. Again consider a random sample of size n = 2, but now focus on the statistic R = the sample range (difference between the largest and smallest values in the sample). Obtain the distribution of R. [Hint: Calculate the value of R for each outcome and use the probabilities from part (a).]
d. If a random sample of size n = 4 is selected, what is ? (Hint: You should not have to list all possible outcomes, only those for which

12. A company maintains three offices in a certain region, each staffed by two employees. Information concerning yearly salaries (1000’s of dollars) is as follows:

Office 1 1 2 2 3 3
Employee 1 2 3 4 5 6
Salary 19.7 23.6 20.2 23.6 15.8 19.7

a. Suppose two of these employees are randomly selected from among the six (without replacement). Determine the sampling distribution of the sample mean salary
b. Suppose one of the three offices is randomly selected. Let denote the salaries of the two employees. Determine the sampling distribution of
c. How does from parts (a) and (b) compare to the population mean salary

13. The breaking strength of a rivet has a mean value of 10,000 psi and a standard deviation of 500 psi.

a. What is the probability that the sample mean breaking strength for a random sample of 40 rivets is between 9950 and 10,250?
b. If the sample size had been 15 rather than 40, could the probability requested in part (a) be calculated from the given information?

14. The lifetime of a certain type of battery is normally distributed with mean value 12 hours and standard deviation 1 hour. There are four batteries in a package. What lifetime value is such that the total lifetime of all batteries in a package exceeds that value for only 5% of all packages?

15. The number of parking tickets issued in Grand Rapids on any given weekday has a Poisson distribution with parameter What is the approximate probability that
a. Between 40 and 70 tickets are given out on a particular day? (Hint: When is large, a Poisson random variable has approximately a normal distribution.)
b. The total number of tickets given out during a 5-day week is between 215 and 265?

16. Let represent the times necessary to perform three successive repair tasks at a certain service facility. Suppose they are independent normal random variables with expected values respectively.

a. If
Calculate What is
b. Using the given in part (a), calculate
c. Using the given in part (a), calculate
d. If calculate

17. Suppose your waiting time for a bus in the morning is uniformly distributed on [0,5], whereas waiting time in the evening is uniformly distributed on [0,10] independent of morning waiting time.

a. If you take the bus each morning and evening for a week, what is your total expected waiting time? [Hint: Define random variables and use a rule of expected value.)
b. What is the variance of your total waiting time?
c. What are the expected value and variance of the difference between morning and evening waiting times on a given day?
d. What are the expected value and variance of the difference between morning waiting time and total evening waiting time for a particular week?

18. Three different roads feed into a particular freeway entrance. Suppose that during a fixed time period, the number of cars coming from each road onto the freeway is a random variable, with expected value and standard deviation as given in the table.

Expected value 750 1000 550
Standard deviation 16 24 18

a. What is the expected total number of cars entering the freeway at this point during the period? (Hint: Let
b. What is the variance of the total number of entering cars? Have you made any assumptions about the relationship between the numbers of cars on the different roads?
c. With denoting the number of cars entering from road I during the period, suppose that
(so that the three streams of traffic are not independent). Compute the expected total number of entering cars and the standard deviation of the total.

19. In an area having sandy soil, 50 small trees of a certain type were planted, and another 50 trees were planted in an area having clay soil. Let X = the number of trees planted in sandy soil that survive 1 year and Y = the number of trees planted in clay soil that survive 1 year. If the probability that a tree planted in sandy soil will survive 1 year is .7 and the probability of 1-year survival in clay soil is .6, compute an approximation to (do not bother with the continuity correction).

Chapter 6 – Point Estimation

COMPLETION

1. The objective of __________ is to select a single number such as , based on sample data, that represents a sensible value (good guess) for the true value of the population parameter, such as .

2. Given four observed values: would result in a point estimate for that is equal to __________.

3. An estimator that has the properties of __________ and __________ will often be regarded as an accurate estimator.

4. A point estimator is said to be an __________ estimator of if for every possible value of .

5. The sample median and any trimmed mean are unbiased estimators of the population mean if the random sample from a population that is __________ and __________.

6. Among all estimators of parameter that are unbiased, choose the one that has minimum variance. The resulting is called the __________ of .

7. The standard error of an estimator is the __________ of .

8. In your text, two important methods were discussed for obtaining point estimates: the method of __________ and the method of __________.

9. Let be a random sample from a probability mass function or probability density function f(x). For k = 1,2,3,……, the kth population moment is denoted by __________, while the kth sample moment is __________.

10. Let be a random sample of size n from an exponential distribution with parameter . The moment estimator of = __________.

11. Let be the maximum likelihood estimates (mle’s) of the parameters . Then the mle of any function h( ) of these parameters is the function of the mle’s. This result is known as the __________ principle.

MULTIPLE CHOICE

1. Which of the following statements are true?
a. A point estimate of a population parameter is a single number that can be regarded as a sensible value of .
b. A point estimate of a population parameter is obtained by selecting a suitable statistic and computing its value from the given sample data. The selected statistic is called the point estimator of .
c. The sample mean is a point estimator of the population mean .
d. The sample variance is a point estimator of the population variance .
e. All of the above statements are true.

2. Which of the following statements are not true?
a. The symbol is customarily used to denote the estimator of parameter and the point estimate resulting from a given sample.
b. The equality is read as “the point estimator of
c. The difference between and the parameter is referred to as error of estimation.
d. None of the above statements is true.

3. Which of the following statements are not always true?
a. A point estimator is said to be an unbiased estimator of parameter if for every possible value of .
b. If the estimator is not unbiased of parameter , the difference is called the bias of .
c. A point estimator is unbiased if its probability sampling distribution is always “centered” at the true value of the parameter , where “centered” here means that the median of the distribution of .
d. All of the above statements are true.

4. Which of the following statements are not always true?
a. It is necessary to know the true value of the parameter to determine whether the estimator is unbiased.
b. When X is a binomial random variable with parameters n and p, the sample proportion is an unbiased estimator of p.
c. When choosing among several different estimators of parameter , select one that is unbiased.
d. All of the above statements are not always true.

5. Which of the following statements are true if is a random sample from a distribution with mean ?
a.
b.
c.
d.
e. All of the above statements are true provided that the sample size n > 30.

6. Which of the following statements are true if is a random sample from a distribution with mean ?
a. The sample mean is always an unbiased estimator of .
b. The sample mean is an unbiased estimator of if the distribution is continuous and symmetric.
c. Any trimmed mean is an unbiased estimator of if the distribution is continuous and symmetric.
d. None of the above statements are true.
e. All of the above statements are true.

7. Which of the following statements are not true?
a. Maximum likelihood estimators are generally preferable to moment estimators because of certain efficiency properties.
b. Maximum likelihood estimators often require significantly more computation than do moment estimators.
c. The definition of unbiasedness in general indicates how unbiased estimators can be derived.
d. None of the above statements are true.
e. All of the above statements are true

8. Which of the following statements are correct?
a. The first population moment is , while the first sample moment is .
b. The moment estimators are obtained by equating the first m sample moments to the corresponding first m population moments, and solving for the unknown parameters .
c. The method of maximum likelihood was first introduced by R.A. Fisher, a geneticist and statistician, in the 1920’s.
d. All of the above statements are true.
e. Only (A) and (B) are true.

9. Which of the following statements are not true?
a. Maximizing the likelihood function gives the parameter values for which the observed sample is most likely to have been generated—that is, the parameter values that “agree most likely” with the observed data.
b. Different principles of estimation may yield different estimators of the unknown parameters.
c. The maximum likelihood estimator of the population standard deviation is the sample standard deviation S.
d. None of the above statements are true.

10. Which of the following statements are true?
a. Maximizing the likelihood estimation is the most widely used estimation technique among statisticians.
b. Under very general conditions on the joint distribution of the sample, when the sample size n is large, the maximum likelihood estimator of any parameter is approximately unbiased; that is, .
c. Under very general conditions on the joint distribution of the sample, when the sample size n is large, the maximum likelihood estimator of any parameter has variance, is nearly as small as small as can be achieved by any estimator.
d. In recent years, statisticians have proposed an estimator, called an M-estimator, which is based on a generalization of maximum likelihood estimation.
e. All of the above are true statements.

11. The accompanying data describe flexural strength (Mpa) for concrete beams of a certain type was introduced in Example 1.2.

9.2 9.7 8.8 10.7 8.4 8.7 10.7
6.9 8.2 8.3 7.3 9.1 7.8 8.0
8.6 7.8 7.5 8.0 7.3 8.9 10.0
8.8 8.7 12.6 12.3 12.8 11.7
a. Calculate a point estimate of the mean value of strength for the conceptual population of all beams manufactured in this fashion, and state which estimator you used. Hint:
b. Calculate a point estimate of the strength value that separates the weakest 50% of all such beams from the strongest 50%, and state which estimator you used.
c. Calculate and interpret a point estimate of the population standard deviation Which estimator did you use? Hint:
d. Calculate a point estimate of the proportion of all such beams whose flexural strength exceeds 11 Mpa. Hint: Think of an observation as a “success” if it exceeds 11.
e. Calculate a point estimate of the population coefficient of variation and state which estimator you used.

ESSAY

a. A random sample of 10 houses in Big Rapids, each of which is heated with natural gas, is selected and the amount of gas (therms) used during the month of January is determined for each house. The resulting observations are 108, 161, 123, 94, 130, 152, 127, 114, 143, 104. Let denote the average gas usage during January by all houses in this area. Compute a point estimate of .
b. Suppose there are 10,000 houses in Big Rapids that use natural gas for heating. Let denote the total amount of gas used by all of these houses during January. Estimate using the data of part (a0. What estimator did you use in computing your estimate?
c. Use the data in part (a) to estimate p, the proportion of all houses that used at least 105 therms.
d. Give a point estimate of the population median usage (the middle value in the population of all houses) based on the sample of part (a). What estimator did you use?

2. Consider a random sample from the pdf

where (this distribution arises in particle physics). Show that is an unbiased estimator of [ Hint: First determine

3. Let represent a random sample from a Rayleigh distribution with pdf

a. It can be shown that Use this fact to construct an unbiased estimator of based on (and use rules of expected value to show that it is unbiased).
b. Estimate from the following observations on vibratory stress of a turbine blade under specified conditions:

17.08 10.43 4.79 6.86 13.88
14.43 20.07 9.60 6.71 11.15

4. A random sample of bike helmets manufactured by a certain company is selected. Let = the number among the that are flawed and let = (flawed). Assume that only is observed, rather than the sequence of

a. Derive the maximum likelihood estimator of . If = 25 and =5, what is the estimate?
b. Is the estimator of part (a) unbiased?
c. If = 25 and =5, what is the mle of the probability that none of the next five helmets examined is flawed?

5. Let denote the proportion of allotted time that a randomly selected student spends working on a certain aptitude test. Suppose the pdf of is

where > -1. A random sample of ten students yields data

a. Use the method of moments to obtain an estimator of and then compute the estimate for this data.
b. Obtain the maximum likelihood estimator of and then compute the estimate for the given data.

6. The shear strength of each of ten test spot welds is determined, yielding the following data (psi):

395 379 404 370 392 365 412 418 361 378

a. Assuming that shear strength is normally distributed, estimate the true average shear strength and standard deviation of shear strength using the method of maximum likelihood.
b. Again assuming a normal distribution, estimate the strength value below which 95% of all welds will have their strengths. (Hint: What is the 95 percentile in terms of ? Now use the invariance principle.)

7. Consider a random sample from the shifted exponential pdf

a. Obtain the maximum likelihood estimators of
b. A random sample of size results in the values 3.12, .65, 2.56, 2.21, 5.45, 3.43, 10.40, 8.94, 17.83, and 1.31, calculate the estimates of

Chapter 7 – Statistical Intervals Based on a Single Sample

COMPLETION

1. The formula used to construct a 95% confidence interval for the mean of a normal population when the value of the standard deviation is known is given by __________.

2. If the random sample is taken from a normal distribution with mean value and standard deviation , then regardless of the sample size n, the sample mean is distributed with expected value __________ and standard deviation __________.

3. The standard normal random variable has a mean value of __________ and standard deviation of __________.

4. If a confidence level of 90% is used to construct a confidence interval for the mean of a normal population when the value of the standard deviation is known, the z critical value is __________.

5. If you want to develop a 99% confidence interval for the mean of a normal population, when the standard deviation is known, the confidence level is __________.

6. If we think of the width of the confidence interval as specifying its precision or accuracy, then the confidence level (or reliability) of the interval is __________ related to its precision.

7. The ability of a confidence interval to contain the value of the population mean is described by the __________.

8. Let be a random sample from a population having a mean and standard deviation . Provided that n is large, the Central Limit Theorem (CLT) implies that is __________ distributed.

9. A random sample of 50 observations produced a mean value of 55 and standard deviation of 6.25. The 95% confidence interval for the population mean is between __________ and __________. (two decimal places)

10. The formula used to construct approximately confidence interval for a population proportion p when the sample size n is large enough is given by __________, where is the sample proportion, and

11. A large-sample lower confidence bound for the population mean __________.

12. When is the mean of a random sample of size n (n is small) from a normal population with mean , the random variable has a probability distribution called t-distribution with n-1 __________.

13. When is the mean of a random sample of size n (n is large) from a normal population with mean , the random variable has approximately a __________ distribution with mean value of __________ and standard deviation of __________.

14. Let denote the density function curve for a t-distribution with degrees of freedom. As __________, the spread of the corresponding curve decreases.

15. The z curve is often called the t curve with degrees of freedom equal to __________.

16. The area under a t-density curve between the critical values is __________.

17. Let be a random sample from a normal distribution with mean and variance . Then the random variable has a __________ probability distribution with __________ degrees of freedom.

18. The chi-squared critical value, , denotes the number on the measurement axis such that __________ of the area under the chi-squared curve with __________ degrees of freedom lies to the __________ of .

19. The 5th percentile of a chi-squared distribution with 10 degrees of freedom is equal to __________.

20. The 90th percentile of a chi-squared distribution with 15 degrees of freedom is equal to __________.

21. The area under a chi-squared curve with 10 degrees of freedom, which is captured between the two critical values is __________.

MULTIPLE CHOICE

1. Which of the following statements are true?
a. A confidence interval is always calculated by first selecting a confidence level, which is a measure of the degree of reliability of the interval.
b. A confidence level of 95% implies that 95% of all samples would give an interval that includes the parameter being estimated, and only 5% of all samples would yield an erroneous interval.
c. Information about the precision of an interval estimate is conveyed by the width of the interval.
d. The higher the confidence level, the more strongly we believe that the value of the parameter being estimated lies within the interval.
e. All of the above statements are true.

2. Which of the following statements are true?
a. The interval is random, while its width is not random.
b. The interval is not random, while its width is random.
c. The interval is random, while its width is not random.
d. The interval is not random, while its width is random.
e. None of the above statements are true.

3. Which of the following statements are not true?
a. A correct interpretation of a confidence interval for the mean relies on the long-run frequency interpretation of probability.
b. It is correct to write a statement such as
c. The probability is .95 that the random interval includes or covers the true value of .
d. The interval is a 90% confidence interval for the mean .
e. None of the above statements are true.

4. If one wants to develop a 90% confidence interval for the mean of a normal population, when the standard deviation is known, the confidence level is
a. .10 c. .90
b. .45 d. 1.645

5. Which of the following statements are true?
a. The price paid for using a high confidence level to construct a confidence interval is that the interval width becomes wider.
b. The only 100% confidence interval for the mean is .
c. If we wish to estimate the mean of a normal population when the value of the standard deviation is known, and be within an amount B with confidence, the formula for determining the necessary sample size n is .
d. All of the above statements are true.
e. None of the above statements are true.

6. A 99% confidence interval for the mean of a normal population when the standard deviation is known is found to be 98.6 to 118.4. If the confidence level is reduced to .95, the confidence interval for
a. becomes wider c. remains unchanged
b. becomes narrower d. None of the above answers are correct.

7. If the width of a confidence interval for is too wide when the population standard deviation is known, which one of the following is the best action to reduce the interval width?
a. Increase the confidence level
b. Reduce the population standard deviation
c. Increase the population mean
d. Increase the sample size n
e. None of the above answers are correct.

ANS: D
Section 7.2

PTS: 1

8. Which of the following statements are not true?
a. Provided that the sample size n is large, the standardized variable is approximately normally distributed, while the variable is not.
b. The formula is a large-sample confidence interval for with confidence level approximately .
c. Generally speaking, n >40 will be sufficient to justify the use of the formula as a large-sample confidence interval for .
d. None of the above statements are true.
e. All of the above statements are true.

9. A random sample of 64 observations produced a mean value of 82 and standard deviation of 5.5. The 90% confidence interval for the population mean is between
a. 81.86 and 82.14
b. 80.65 and 83.35
c. 80.87 and 83.13
d. 81.31 and 82.69
e. None of the above answers are correct.

10. A random sample of 100 observations produced a sample proportion of .25. An approximate 90% confidence interval for the population proportion p is
a. .248 and .252
b. .179 and .321
c. .423 and .567
d. .246 and .254
e. None of the above answers are correct.

11. Which of the following expressions are true about a large-sample upper confidence bound for the population mean ?
a.
b.
c.
d.
e. None of the above statements are true.

12. Suppose that an investigator believes that virtually all values in the population are between 38 and 70. The appropriate sample size for estimating the true population mean within 2 units with 95% confidence level is approximately
a. 61
b. 62
c. 15
d. 16
e. None of the above answers are correct.

13. A 99% confidence interval for the population mean is determined to be (65.32 to 73.54). If the confidence level is reduced to 90%, the 90% confidence interval for
a. becomes wider c. remains unchanged
b. becomes narrower d. None of the above answers are correct.

14. In developing a confidence interval for the population mean , a sample of 50 observations was used, and the confidence interval was 15.24 1.20. Had the sample size been 200 instead of 50, the confidence interval would have been
a. 7.62 1.20
b. 15.24 .30
c. 15.24 .60
d. 3.81 1.20
e. None of the above answers are correct.

15. Which of the following statements are not true in developing a confidence interval for the population mean
a. The width of the confidence interval becomes narrower when the sample mean increases.
b. The width of the confidence interval becomes wider when the sample mean increases.
c. The width of the confidence interval becomes narrower when the sample size n increases.
d. All of the above statements are true.
e. None of the above statements are true.

16. Which of the following statements are true when is the mean of a random sample of size n from a normal distribution with mean ?
a. The random variable has approximately a standard normal distribution for large n.
b. The random variable has a t-distribution with n-1 degrees of freedom for small n.
c. The normal distribution is governed by two parameters, the mean and the standard deviation .
d. A t-distribution is governed by only one parameter, called the number of degrees of freedom.
e. All of the above answers are true.

17. Which of the following statements are not true if denotes the density function curve for a t-distribution with degrees of freedom?
a. The t-distribution is governed by only.
b. Each curve is bell-shaped and centered around 0.
c. Each curve is less spread out than the standard normal z curve.
d. As increases, the spread of the corresponding curve decreases.
e. None of the above answers are true.

18. Which of the following statements are not true?
a. The notation is often used to denote the number on the measurement axis for which the area under the t-curve with degrees of freedom to the left of is , where is called a t critical value.
b. The number of degrees of freedom for a t- variable is the number of freely determined deviations on which the estimated standard deviation in the denominator of is based.
c. A larger value of degrees of freedom implies a t-distribution with smaller spread.
d. All of the above statements are true.
e. None of the above statements are true.

19. A random sample of size 16 is taken from a normal population with mean . If the sample mean is 75 and the sample standard deviation is 5, then a 95% upper confidence bound for is
a. 77.664
b. 77.191
c. 72.809
d. 72.336
e. None of the above answers are correct.

20. A random sample of 10 observations was selected from a normal population distribution. The sample mean and sample standard deviations were 20 and 3.2, respectively. A 95% prediction interval for a single observation selected from the same population is
a. 20 6.152
b. 20 4.244
c. 20 7.962
d. 20 7.592
e. None of the above answers are correct.

21. Which of the following statements are false about the chi-squared distribution with degrees of freedom?
a. It is a discrete probability distribution with a single parameter .
b. It is positively skewed (long upper tail)
c. It becomes more symmetric as increases.
d. All of the above statements are true.
e. All of the above statements are false.

22. Which of the following statements are true about the percentiles of a chi-squared distribution with 20 degrees of freedom?
a. The 5th percentile is 31.410
b. The 95th percentile is 10.851
c. The 10th percentile is 12.443
d. The 90th percentile is 37.566
e. All of the above statements are true.

23. The lower limit of a 95% confidence interval for the variance of a normal population using a sample of size n and variance value is given by:
a.
b.
c.
d.
e. None of the above answers are correct.

24. The upper limit of a 95% confidence interval for the variance of a normal population using a sample of size n and variance value is given by:
a.
b.
c.
d.
e. None of the above answers are correct.

1. Suppose that a random sample of 50 bottles of a particular brand of cough syrup is selected, and the alcohol content of each bottle is determined. Let denote the average alcohol content for the population of all bottles of the brand under study. Suppose that the resulting 95% confidence interval is (8.0, 9.6).

a. Would a 90% confidence interval calculated from this same sample have been narrower or wider than the given interval? Explain your reasoning.
b. Consider the following statement: There is a 95% chance that is between 8 and 9.6. Is this statement correct? Why or why not?
c. Consider the following statement: We can be highly confident that 95% of all bottles of this type of cough syrup have an alcohol content that is between 8.0 and 9.6. Is this statement correct? Why or why not?
d. Consider the following statement: If the process of selecting a sample of size 50 and then computing the corresponding 955 interval ire repeated 100 times, 95 of the resulting intervals will include . Is this statement correct? Why or why not?

2. A CI is desired for the true average stray-load loss (watts) for a certain type of induction motor when the line current is held at 10 amps for a speed of 1500 rpm. Assume that stray-load loss is normally distributed with = 3.0.

a. Compute a 95% CI for when n = 25 and = 60.
b. Compute a 95% CI for when n = 100 and = 60.
c. Compute a 99% CI for when n = 100 and = 60.
d. Compute an 82% CI for when n = 100 and = 60.
e. How large must n be if the width of the 99% interval for is to be 1.0?

3. By how much must the sample size n be increased if the width of the CI is to be halved? If the sample size is increased by a factor of 25, what effect will this have on the width of the interval? Justify your assertions.

4. Consider the 1000 95% confidence intervals (CI) for that a statistical consultant will obtain for various clients. Suppose the data sets on which the intervals are based are selected independently of one another. How many of these 1000 intervals do you expect to capture the corresponding value of ? What is the probability that between 950 and 970 of these intervals contain the corresponding value of ? (Hint: Let Y = the number among the 1000 intervals that contain . What kind of random variable is Y?).

5. A random sample of 100 lightning flashes in a certain region resulted in a sample average radar echo duration of .81 sec and a sample standard deviation of .34 sec. Calculate a 99% (two-sided) confidence interval for the true average echo duration , and interpret the resulting interval.

6. Determine the confidence level for each of the following large-sample one-sided confidence bounds:

a. Upper bound:
b. Lower bound:
c. Upper bound:

7. It was reported that, in a sample of 507 adult Americans, only 142 correctly described the Bill of Rights as the first ten amendments to the U.S. Constitution. Calculate a (two-sided) confidence interval using a 99% confidence level for the proportion of all U. S. adults that could give a correct description of the Bill of Rights.

8. The superintendent of a large school district, having once had a course in probability and statistics, believes that the number of teachers absent on any given day has a Poisson distribution with parameter . Use the accompanying data on absences for 50 days to derive a large-sample CI for . [Hint: The mean and variance of a Poisson variable both equal , so has approximately a standard normal distribution. Now proceed as in the derivation of the interval for p by making a probability statement (with probability 1 – ) and solving the resulting inequalities for .

# Absences 0 1 2 3 4 5 6 7 8 9 10
Frequency 1 6 8 10 8 7 5 3 2 1 1

9. Determine the t critical value for a two-sided confidence interval in each of the following situations.

a. Confidence level = 95%, df = 12
b. Confidence level = 95%, df = 15
c. Confidence level = 99%, df = 20
d. Confidence level = 99%, n = 8
e. Confidence level = 98%, df =25
f. Confidence level = 99%, n = 40

compute a 95% CI for true average stress.

11. A sample of 14 joint specimens of a particular type gave a sample mean proportional limit stress of 8.50 MPa and a sample standard deviation of .80 MPa.

a. Calculate and interpret a 95% lower confidence bound for the true average proportional limit stress of all such joints. What, if any, assumptions did you make about the distribution of proportional limit stress?
b. Calculate and interpret a 95% lower prediction bound for the proportional limit stress of a single joint of this type.

12. A study of the ability of individuals to walk in a straight line reported that accompanying data on cadence (strides per seconds) for a sample of n – 20 randomly selected healthy men:

.95 .81 .93 .95 .93 .86 1.05 .92 .85 .81
.92 .96 .92 1.00 .78 1.06 1.06 .96 .85 .92

A normal probability plot gives substantial support to the assumption that the population distribution of cadence is approximately normal. A descriptive summary of the data from MINITAB follows.

Variable N Mean Median StDev SEMean
Cadence 20 0.9255 0.9300 0.0809 0.0181

a. Calculate and interpret a 95% confidence interval for a population mean cadence.
b. Calculate and interpret a 95% prediction interval for the cadence of a single individual randomly selected from this population.
c. Calculate an interval that includes at least 99% of the cadences in the population distribution using a confidence level of 95%.

13. A more extensive tabulation of t critical values than what appears in your text shows that for the t distribution with 20 df, the areas to the right of the values .687, .860, and 1.064 are .25, .20, and .15, respectively. What is the confidence level for each of the following three confidence intervals for the mean of a normal population distribution? Which of the three intervals would you recommend be used, and why?

a.
b.
c.

14. Determine the values of the following quantities:
a.
b.
c.
d.
e.
f.

15. Determine the following:

a. The 90th percentile of the chi-squared distribution with = 12.
b. The 10th percentile of the chi-squared distribution with = 12.
c. where is a chi-squared rv with = 22.
d. where is a chi-squared rv with = 25

16. The amount of lateral expansion (mils) was determined for a sample of n = 9 pulsed-power gas metal arc welds used in LNG ship containment tanks. The resulting sample standard deviation was s = 2.80 mils. Assuming normality, derive a 95% CI for and for .

17. The results of a Wagner turbidity test performed on 15 samples of standard Ottawa testing sand were (in microamperes)

26.9 25.8 24.4 24.1 26.4 25.9 24.0 21.7
24.9 25.9 27.3 26.7 26.9 24.8 27.3

a. Is it plausible that this sample was selected from a normal population distribution?
b. Calculate an upper confidence bound with confidence level 90% for the population standard deviation of turbidity.

PTS: 1
Chapter 8 Tests of Hypotheses Based on a Single Sample

COMPLETION

1. In many situations, the alternative hypothesis is referred to as the __________ hypothesis, since it is the statement the researcher would really like to validate.

2. The __________ hypothesis should be identified with the hypothesis of no change, no difference, no improvement, and son on.

3. An engineer has suggested a change in the production process in the belief that it will result in a reduced defective rate. Let p denote the true proportion of defective items resulting from the changed process, and that 5% of items produced by a manufacturer during a certain period were defective. Then the research hypothesis is the assertion that __________.

4. In our treatment of hypothesis testing, the __________ hypothesis will always be stated as an equality claim.

5. The null hypothesis will be rejected if and only if the observed or computed __________ value falls in the __________.

6. A __________ error involves not rejecting the null hypothesis is false.

7. A __________ error consists of rejecting the null hypothesis is true.

8. The probabilities of type I and type II errors are traditionally denoted by the Greek letters __________ and __________, respectively.

9. The rejection region is called __________ if it consists only of large values of the test statistic.

10. The rejection region is called __________ if it consists only of small values of the test statistic.

11. A __________ error is usually more serious than a __________ error.

12. The value that represents the probability of type I error is often referred to as the __________ of the test.

13. If the null hypothesis is then the test statistic value is z = __________.

14. For any significance level the two-tailed rejection region has type I error probability equal to __________.

15. Suppose a test procedure about the population mean is performed, when the population is normal with known standard deviation then if the alternative hypothesis is the rejection region for a level test is __________.

16. Suppose a test procedure about the population mean is performed, when the population is normal with known standard deviation then if the alternative hypothesis is the rejection region for a level test is __________.

17. Suppose a test procedure about the population mean is performed, when the population is normal with known standard deviation then if the alternative hypothesis is the rejection region for a level test is either __________ or __________.

18. If is a random sample from a normal distribution, and the sample size n is small, then the standardized variable has a __________ distribution with __________ degrees of freedom.

1

19. Suppose a test procedure about the population mean is performed, when the population is normal and the sample size n is small, then if the alternative hypothesis is the rejection region for a level test is __________.

20. Suppose a test procedure about the population mean is performed, when the population is normal and the sample size n is small, then if the alternative hypothesis is the rejection region for a level test is __________.

21. Suppose a test procedure about the population mean is performed, when the population is normal and the sample size n is small, then if the alternative hypothesis is the rejection region for a level test is either __________ or __________.

22. Let p denote the proportion of individuals in a population who possess a specified property, and X denote the number of individuals in the sample who possess the same property. Provided that the sample size n is small relative to the population size, then X has approximately a __________ distribution.

23. Let p denote the proportion of individuals in a population who possess a specified property, and X denote the number of individuals in the sample who possess the same property. Provided that the sample size n is large, then both X and the estimator are approximately __________ distributed.

24. Let p denote the proportion of individuals in a population who possess a specified property, and X denote the number of individuals in the sample who possess the same property. The estimator is __________ if and its standard deviation = __________.

25. Suppose that a test procedure about the population proportion p is performed, and that the sample proportion is approximately normally distributed. If the alternative hypothesis is , then the rejection region for a level test is __________.

26. Suppose that a test procedure about the population proportion p is performed, and that the sample proportion is approximately normally distributed. If the alternative hypothesis is then the rejection region for a level test is __________.

27. Suppose that a test procedure about the population proportion p is performed, and that the sample proportion is approximately normally distributed. If the alternative hypothesis is , then the rejection region for a level test is either __________ or __________.

28. As the level of significance decreases, the critical value __________.

29. For an upper-tailed z test, the level of significance is just the area under the z curve to the __________ of the critical value __________.

30. The smallest level of significance for which the null hypothesis would be rejected is the tail area captured by the computed value of the test statistic. This smallest is referred to as the __________.

31. If the P-value is smaller than or equal to the level of significance then the researcher should __________ at level

32. It is customary to call the data significant when the null hypothesis is __________, and not significant otherwise.

33. If the P-value is larger than the level of significance then the researcher should __________ at level

34. Suppose that when data from an experiment was analyzed, the P-value for testing was calculated as .047. At the .01 level, would __________.

35. If the calculated test statistic for an upper-tailed z test is 2.15, then the P-value is __________.

36. If the calculated test statistic for two-tailed z test is -1.84, then the P-value is __________.

37. Suppose that a t test of is based on 10 degrees of freedom. If the calculated value of the test statistic is t = 2.4, then the P-value for this test is __________.

38. Suppose that a t test of is based on 18 degrees of freedom. If the calculated value of the test statistic is t = – 1.80, then the P-value for this test is __________.

39. Suppose that a z test of is conducted. Intuition then suggests rejecting when the value of test statistic z is __________.

40. Suppose that a z test of is conducted. The critical value is determined by specifying __________ and using the fact that has a __________ distribution when is true.

41. One must be especially careful in interpreting evidence when the sample size is __________, since any small departure from will almost surely be detected by a test, yet such a departure may have little practical significance.

42. The likelihood ratio test procedure consists of rejecting the null hypothesis when the likelihood ratio statistic value is __________.

MULTIPLE CHOICE

1. Which of the following statements are not true?
a. A statistical hypothesis is a claim or assertion either about the value of a single parameter, about the values of several parameters, or about the form of an entire probability distribution.
b. In any hypothesis-testing problem, there are two contradictory hypotheses under consideration.
c. A test of hypothesis is a method for using sample data to decide whether the null hypothesis should be rejected.
d. A type I error consists of not rejecting the null hypothesis is false.
e. None of the above statements are true.

2. Which of the following statements are true?
a. The null hypothesis, denoted by , is the claim that is initially assumed to be true (the “prior belief” claim).
b. The alternative hypothesis, denoted by , is the assertion that is contradictory to the null hypothesis .
c. The null hypothesis will be rejected in favor of the alternative hypothesis only if sample evidence suggests that is false.
d. If sample evidence does not strongly contradict the null hypothesis , we will continue to believe in the truth of .
e. All of the above statements are true.

3. Which of the following statements are not correctly stated?
a. The two possible conclusions from a hypothesis-testing analysis are rejecting the null hypothesis or accepting .
b. In many situations, the alternative hypothesis is referred to as the “research hypothesis” since it is the claim that the researcher would really like to validate.
c. In our treatment of hypothesis testing, the null hypothesis will always be stated as an equality claim.
d. A test statistic is a rule, based on sample data, for deciding whether to reject the null hypothesis.
e. All of the above statements are correctly stated.

4. Which of the following statements are not correct?
a. It is possible that the null hypothesis may be rejected when it is true.
b. It is impossible that the null hypothesis may be rejected when it is true.
c. It is possible that the null hypothesis may not be rejected when it is false.
d. All of the above statements are correct.
e. None of the above statements are correct.

5. If denotes the parameter of interest, and the simplified null hypothesis has the form is a specified number called the “null value” of the parameter, then the alternative hypothesis will be
a. (so the implicit null hypothesis is
b. (so the implicit null hypothesis is
c.
d. The alternative hypothesis will look like any of the above three assertions.
e. The alternative hypothesis must be the assertion specified in (C).

6. Which of the following statements are not true?
a. A test statistic is a function of the sample data on which the decision to reject or not reject the null hypothesis is to be based.
b. A rejection region consists of the set of all test statistic values for which the null hypothesis will be rejected.
c. A rejection region consists of the set of all test statistic values for which the alternative hypothesis will be rejected.
d. A good hypothesis-testing procedure is one for which the probability of making either type I or type II error is small.
e. None of the above statements are true.

7. In hypothesis-testing analysis, a type I error occurs only if
a. the null hypothesis is rejected when it is true
b. the null hypothesis is rejected when it is false
c. the null hypothesis is not rejected when it is false
d. the null hypothesis is not rejected when it is true

8. In hypothesis-testing analysis, a type II error occurs only if
a. the null hypothesis is rejected when it is true.
b. the null hypothesis is rejected when it is false.
c. the null hypothesis is not rejected when it is false.
d. the null hypothesis is not rejected when it is true.

9. Which of the following statements are not true?
a. A rejection region is called upper-tailed if it consists only of large values of the test statistic.
b. A rejection region is called upper-tailed if it consists only of small values of the test statistic.
c. A rejection region is called lower-tailed if it consists only of small values of the test statistic.
d. All of the above statements are true.
e. None of the above statements are true.

10. Which of the following statements are true?
a. The probability of type I error, is computed using the probability distribution of the test statistic when the null hypothesis is true.
b. The probability of type II error, requires knowing the distribution of the test statistic when the null hypothesis is false.
c. The probability of type I error, is computed by summing over probabilities of test statistic values in the rejection region.
d. The probability of type II error, is computed by summing over probabilities of test statistic values in the complement of the rejection region.
e. All of the above statements are true.

11. Which of the following statements are not generally true?
a. A type I error is usually more serious than a type II error.
b. A type II error is usually more serious than a type I error.
c. A test with significance level is one for which the type I error probability is controlled at the specified level.
d. When an experiment and a sample size are fixed, then decreasing the size of the rejection region to obtain a smaller value of (probability of type I error) results in a larger value of (probability of type II error) for any particular parameter value consistent with the alternative hypothesis .
e. None of the above statements are true.

12. Which of the following statements are not true?
a. If the null hypothesis is , then the test statistic value is t = -2.5.
b. If the alternative hypothesis has the form , then an value less than certainly does not provide support for .
c. If the alternative hypothesis has the form , then an value that exceeds by only a small amount (corresponding to z which is positive but small) does not suggest that should be rejected in favor of .
d. All of the above statements are true.
e. None of the above statements are true.

13. Which of the following statements are true?
a. When the alternative hypothesis is , the null hypothesis should be rejected if is too far to the left of .
b. When the alternative hypothesis is , the null hypothesis should be rejected if is too far to the right of .
c. When the alternative hypothesis is , the null hypothesis should be rejected if is too far to either side of .
d. All of the above statements are true.
e. None of the above statements are true.

14. Which of the following statements are not true for any significance level?
a. The one-tailed rejection region has type I error probability
b. The one-tailed rejection region has type I error probability
c. The two-tailed rejection region has type I error probability , since area is captured under each of the tow tails of the z curve.
d. All of the above statements are true.
e. None of the above statements are true.

15. Which of the following statements are not true if a test procedure about the population mean is performed when the population is normal with known standard deviation ?
a. The rejection region for level test is if the test is an upper-tailed test.
b. The rejection region for level test is if the test is an upper-tailed test.
c. The rejection region for level test is if the test is a lower-tailed test.
d. The rejection region for level test is either or if the test is a two-tailed test.
e. None of the above statements are true.

16. Which of the following statements are true in testing based on a sample of size 15 from a normal population with unknown standard deviation ?
a. The test procedure requires the use of standard normal distribution.
b. The test procedure requires the use of binomial distribution.
c. The test procedure requires the use of exponential distribution.
d. The test procedure requires the use of t-distribution with 15 degrees of freedom.
e. None of the above statements are true.

17. Which of the following statements are true?
a. Knowledge of the test statistic’s distribution when the null hypothesis is true allows us to construct a rejection region for which the type I error probability is controlled at the desired level.
b. The rejection region for the t test differs from that for the z test only in that a t critical value replaces the z critical value.
c. If (n is small) is a random sample from a normal distribution, the standardized variable has a t distribution with n-1 degrees of freedom.
d. All of the above statements are true.
e. None of the above statements are true.

18. Suppose that a two-tailed test procedure about the population mean is performed when the population is normal, but the sample size n is small. The null hypothesis will be rejected at significance level if the value of the standardized test statistic t is such that
a.
b.
c. either
d.
e. None of the above inequalities are correct.

19. Let p denote the proportion of individuals in a population who possess a specified property, and let X denote the number of individuals in the sample who possess the same property. Provided that the sample size n is small relative to the population size, X has approximately
a. a normal distribution c. an exponential distribution
b. a binomial distribution d. a Poisson distribution

20. Let p denote the proportion of individuals in a population who possess a specified property, and let X denote the number of individuals in the sample who possess the same property. Provided that the sample size n is large, the estimator has approximately
a. a normal distribution c. an exponential distribution
b. a binomial distribution d. a Poisson distribution

21. Which of the following statements are not true?
a. If the alternative hypothesis has the form , then a certainly does not provide support for .
b. If the alternative hypothesis has the form , then a by only a small amount (corresponding to z value which is positive but small) does not suggest that should be rejected in favor of .
c. If the null hypothesis is and n = 100, then the test statistic value is z = 4
d. All of the above statements are true.
e. None of the above statements are true.

22. Which of the following statements are true?
a. When the alternative hypothesis is , the null hypothesis should be rejected if is too far to the left of .
b. When the alternative hypothesis is , the null hypothesis should be rejected if is too far to the right of .
c. When the alternative hypothesis is , the null hypothesis should be rejected if is too far to either side of .
d. All of the above statements are true
e. None of the above statements are true.

23. Which of the following statements are not true if a test procedure about the population proportion p is performed provided that
a. The rejection region for level test is if the test is an upper-tailed test.
b. The rejection region for level test is if the test is a lower-tailed test.
c. The rejection region for level test is either if the test is a two-tailed test.
d. The rejection region for level test is if the test statistic is an upper-tailed test.
e. None of the above statements are true.

24. Which of the following statements are not true?
a. A P-value conveys much information about the strength of evidence against the null hypothesis and allows an individual decision maker to draw a conclusion at any specified significance level
b. The P-value (or observed significance level) is the largest level of significance at which the null hypothesis would be rejected when a specified test procedure is used on a given data set.
c. If P-value
d. If P-value
e. All of the above statements are true.

25. Which of the following statements are true?
a. It is customary to call the data significant when is rejected.
b. It is customary to call the data not significant when is not rejected.
c. The calculation of the P-value depends on whether the test is upper-tailed, lower-tailed, or two-tailed.
d. The P-value for a z test (one based on a test statistic whose distribution when is true is at least approximately standard normal) is easily determined from the information in the standard normal probability table.
e. All of the above are true statements.

26. Suppose that when data from an experiment was analyzed, the P-value for testing was calculated as .0244. Which of the following statements are true?
a. is rejected at .10 level.
b. is not rejected at .05 level.
c. is not rejected at .025 level.
d. is rejected at any level
e. All of the above statements are true.

27. Which of the following statements are true if the value of the test statistic for a two-tailed z test is z = -1.56?
a. P-value = .4406
b. P-value = .0594
c. P-value = .1188
d. .0594 < P-value < .1188
e. P-value = .0406

28. Which of the following statements are true about the P-value, where z is the calculated value of the test statistic and is the corresponding cumulative area under the standard normal curve?
a. P-value = for an upper-tailed test.
b. P-value = for a lower-tailed test.
c. P-value = for a lower-tailed test
d. All of the above statements are true.
e. None of the above statements are true.

29. Suppose that a t test of is based on 12 degrees of freedom. If the calculated value of the test statistic is 2.8, then the P-value is
a. .008
b. .992
c. .016
d. .492
e. .496

30. Which of the following P-values will lead us to reject the null hypothesis at the .05 level?
a. .10
b. .025
c. .075
d. .15
e. Any P-value greater than .05

31. Which of the following statements are necessary to construct an appropriate test procedure?
a. Specify a test statistic.
b. Decide on the general form of the rejection region.
c. Select the specified numerical critical value or values that will separate the rejection region from the acceptance region.
d. All of the above statements are necessary.
e. Only A and B are necessary statements.

32. Which of the following statements are not true?
a. The test statistic has a standard normal distribution when
is true.
b. The reliability of hypothesis testing procedure in reaching a correct decision can be assessed by studying type I error probability.
c. The process of reaching a decision by using the methodology of classical hypothesis testing involves selecting a level of significance and then rejecting or not rejecting the null hypothesis at that level
d. All of the above statements are true.
e. None of the above statements are true.

33. Which of the following statements are true?
a. When the results of an experiment are to be communicated to a large audience, rejection of at level will be much more convincing if the observed value of the test statistic greatly exceeds the critical value than if it barely exceeds that value.
b. A large P-value would indicate statistical significance in that it would strongly suggest rejection of
c. In many experimental situations, only departures from of small magnitude would be worthy of detection, whereas a large departure from would have little practical significance.
d. All of the above statements are true.
e. None of the above statements are true.

34. Which of the following statements are needed in constructing the likelihood ratio test?
a. Finding the largest value of the likelihood for any (by finding the maximum
likelihood estimate within and substituting back into the likelihood function).
b. Find the largest value of the likelihood for any
c. Forming the ratio = maximum likelihood for maximum likelihood for
d. All of the above statements are needed.
e. Only statements A and B are needed.

ESSAY

1. For the following pairs assertions, indicate which do not comply with our rules for setting up hypotheses and why (the subscripts 1 and 2 differentiate between quantities for two different populations or samples).

a.
b.
c.
d.
e.
f.
g.
h.

2. Let denote the true average radioactivity level (picocuries per liter). The value 5 pCi/L is considered the dividing line between safe and unsafe water. Would you recommend testing versus Explain your reasoning. (Hint: Think about the consequences of a type I and type II error for each possibility.)

3. Before agreeing to purchase a large order of polyethylene sheaths for a particular type of high pressure oil-filled submarine power cable, a company wants to see conclusive evidence that the true standard deviation of sheath thickness is less than .05mm. What hypotheses should be tested, and why? In this context, what are the type I and type II errors?

4. Two different companies have applied to provide cable television service in a certain region. Let p denote the proportion of all potential subscribers who favor the first company over the second. Consider testing based on a random sample of 25 individuals. Let X denote the number in the sample who favor the first company and x represent the observed value of X.

a. Which of the following rejection regions is most appropriate and why?
or
b. In the context of this problem situation, describe what type I and type II errors are.
c. What is the probability distribution of the test statistic X when is true? Use it to compute
the probability of a type I error..
d. Compute the probability of a type II error for the selected region when p = .6 and p = .7.
e. Using the selected region, what would you conclude if 6 of the 25 queried favored company 1?

5. Let the test statistic Z have a standard normal distribution when is true. Give the significance level for each of the following situations.

a.
b.
c.

6. Let the test statistic T have a t distribution when is true. Give the significance level for each of the following situations:

a.
b.
c.

7. Light bulbs of a certain type are advertised as having an average lifetime of 800 hours. The price of these bulbs is very favorable, so a potential customer has decided to go ahead with a purchase arrangement unless it can be conclusively demonstrated that the true average lifetime is smaller than what is advertised. A random sample of 50 bulbs was selected, the lifetime of each bulb determined, and the appropriate hypotheses were tested using MINITAB, resulting in the accompanying output.

Variable n Mean St. Dev SE of Mean Z P-Value
Lifetime 50 738.44 38.20 5.40 -2.14 0.016

What conclusion would be appropriate for a significance level of .05? A significance level of .01?

8. The desired percentage Si in a certain type of aluminous cement is 5.5. To test whether the true average percentage is 5.5 for a particular production facility using a significance level of .01, 16 independently obtained samples are analyzed. Suppose that the percentage of Si in a sample is normally distributed with and that

a. Does this indicate conclusively that the true average percentage differs from 5.5?
b. If the true average percentage is and a level based on n = 16 is used, what is the probability of detecting this departure from
c. What value of n is required to satisfy and

9. A sample of 12 radon detectors of a certain type was selected, and each was exposed to 100 pCi/L of radon. The resulting readings were as follows:

104.3 89.6 89.9 95.6 95.2 90.0

98.8 103.7 98.3 106.4 102.0 91.1

a. Does this data suggest that the population mean reading under these conditions differs from 100? State and test the appropriate hypotheses using =.05
b. Suppose that prior to the experiment, a value of =7.5 had been assumed. How many determinations would then have been appropriate to obtain for the alternative

10. State DMV records indicate that of all vehicles undergoing emissions testing during the previous year, 70% passed on the first try. A random sample of 200 cars tested in a particular county during the current year yields 160 that passed on the initial test. Does this suggest that the true proportion for this county during the current year differs from the previous statewide proportion? Test the relevant hypotheses using

11. A university library ordinarily has a complete shelf inventory done once every year. Because of new shelving rules instituted the previous year, the head librarian believes it may be possible to save money by postponing the inventory. The librarian decides to select at random 1000 books from the library’s collection and have them searched in a preliminary manner. If evidence indicates strongly that the true proportion of misshelved or unlocatable books is less than .02, then the inventory will be postponed.

a. Among the 1000 books searched, 15 were misshelved or unlocatable. Test the relevant hypotheses and advise the librarian what to do (use ).
b. If the true proportion of misshelved and lost books is actually .01, what is the probability that the inventory will be (unnecessarily) taken?
c. If the true proportion is .05, what is the probability that the inventory will be postponed?

12. A plan for an executive traveler’s club has been developed by an airline on the premise that 5% of its current customers would qualify for membership. A random sample of 500 customers yielded 40 who would qualify.

a. Using this data, test at level .01 the null hypothesis that the company’s premise is correct against the alternative that it is not correct.
b. What is the probability that when the test of part (a) is used, the company’s premise will be judged correct when in fact 10% of all current customers qualify?

13. Each of a group of 20 intermediate tennis players is given two tennis rackets, one with the two rackets, each player will be asked to state a preference for one of the two types of strings. Let p denote the proportion of all such players who would prefer gut to nylon, and let X be the number of players in the sample who prefer gut. Because gut strings are more expensive, consider the null hypothesis that at most 50% of all such players prefer gut. We simplify this to planning to reject only if sample evidence strongly favors gut strings.

a. Which of the rejecting regions { 15, 16, 17, 18, 19, 20}, {0, 1, 2, 3, 4, 5}, or { 0, 1, 2, 3, 17, 18, 19, 20} is most appropriate, and why are the other two not appropriate?
b. What is the probability of a type I error for the chosen region of part (a)? Does the region specify a level .05 test? Is it the best level .05 test?
c. If 60% of all enthusiasts prefer gut, calculate the probability of a type II error using the appropriate region from part (a). Repeat if 80% of all enthusiasts prefer gut.
d. If 13 out of the 20 players prefer gut, should be rejected using a significance level of .10?

14. Pairs of P-values and significance levels, , are given. For each pair, state whether the observed P-value would lead to rejection of at the given significance level.

a.
b.
c.
d.
e.
f.

15. Let denote the mean reaction time to a certain stimulus. For a large-sample z test of versus find the P-value associated with each of the given values of the z test statistics.
a. 1.52
b. .95
c. 1.96
d. 2.33
e. .79

16. Give as much information as you can about the P-value of a t test in each of the following situations:

a. Upper-tailed test, df = 8, t = 2.15
b. Lower-tailed test, df – 11, t = -2.52
c. Two-tailed test, df = 15, t = -1.69
d. Upper-tailed test, df = 19, t = 2.539
e. Upper-tailed test, df = 5, t = 5.25
f. Two-tailed test, df = 40, t = -4.5

17. The times of first sprinkler activation for a series of tests with fire prevention sprinkler systems using an aqueous film-forming were (in sec).

28 42 23 28 24 36 31 34 25 28 29 23 25

The system has been designed so that true average activation time is at most 25 sec under such conditions. Does the data strongly contradict the validity of this design specification? Test the relevant hypotheses at significance level .05 using the P-value approach.

18. A certain pen has been designed so that true average writing lifetime under controlled conditions (involving the use of a writing machine) is at least 12 hours. A random sample of 18 pens is selected, the writing lifetime of each is determined, and a normal probability plot of the resulting data supports the use of a one-sample t test.

a. What hypotheses should be tested if the investigator believe a priori that the design specification has been satisfied?
b. What conclusion is appropriate if the hypotheses of part (a) are tested, t = -2.5, and ?
c. What conclusion is appropriate if the hypotheses of part (a) are tested, t = -2, and ?
d. What should be concluded if the hypotheses of part (a) are tested and t = -3.25?

19. Consider the large-sample level .01 test for testing versus

a. For the alternative value p = .21, compute sample sizes n = 100, 2500, 10,000, 40,000, and 90,000.
b. For , compute the p-value when n = 100, 2500, 10,000, and 40,000.

Chapter 9

COMPLETION

1. Let be a random sample from a population with mean be a random sample with mean and that the X and Y samples are independent of one another. The expected value of is __________ and the standard deviation of = __________.

2. Let be a random sample from a normal population with mean be a random sample from a normal population with mean be a random sample from a normal population with mean =16, and that X and Y samples are independent of one another. If the sample mean values are then the value of the test statistic to test is z = __________ and that will be rejected at .01 significance level if

3. In testing the computed value of the test statistic is z = 2.25. The P-value for this two-tailed test is then __________.

4. Investigators are often interested in comparing the effects of two different treatments on a response. If the individuals or subjects to be used in the comparison are not assigned by the investigators to the two treatments, the study is said to be __________. If the investigators assign individuals or subjects to the two treatments in a random fashion, this is referred to as __________.

5. Provided that the sample sizes m and n of two independent samples X and Y are both large (i.e., m> 40 and n> 40), then a confidence interval for the difference between the two population means, with a confidence level of approximately is __________, where the values of the population variances are unknown.

1

6. Provided that at least one of the sample sizes m and n of two independent samples X and Y is small, and that the corresponding populations are both normally distributed with unknown values of the population variances, then a confidence interval for the difference between the two population means, with a confidence level of is __________.

7. The pooled t procedures are alternatives to the two-sample t procedures for situations in which not only the two population distributions are assumed to be __________ but also they have equal __________.

8. The degrees of freedom associated with the pooled t test, based on sample sizes m and n, is given by __________.

9. The pooled t confidence interval for estimating with confidence level using two independent samples X and Y with sizes m and n is given by __________.

10. The weighted average of the variances of two independent samples is referred to as the __________ of (the common variance of the two population variances), and is denoted by __________.

ANS: pooled estimator,

PTS: 1
11. The number of degrees of freedom for a paired t test, where the data consists of n independently pairs is __________.

12. The rejection region for level .025 paired t test in testing is __________, where the data consists of 12 independent pairs.

13. A 90% confidence interval for the true mean difference in paired data consisting of n independent pairs, is determined by the formula __________.

14. In testing where is the true mean difference in paired data consisting of 16 independent pairs, the value of the test statistic is found to be 2.8. Then the P-value is approximately __________.

15. Let with X and Y independent variables, and let is an __________ estimator of

16. In testing denote the two population properties, the P-value is found to be .0715. Then at .05 level, should __________.

17. In testing denote the two population proportions, the following summary statistics are given: m = 400, x = 140, n = 500 and y = 160. Then the value of the test statistic is z = __________.

18. In testing where denote the two population proportions, the standardized variable is an estimate of the common value of and m and n are the two sample sizes, has approximately a standard normal distribution when __________.

19. If are independent __________ random variables with degrees of freedom respectively, then the random variable has an F distribution.

20. Analogous to the notation for the point on the axis that captures __________ of the area under the F density curve with degrees of freedom in the __________ tail.

21.

22. Let be a random sample from a normal distribution with variance be another random sample (independent of the from a normal distribution with variance denote the two sample variances. Then the random variable has an F distribution with

23. Two independent samples of sizes m and n and variances are selected at random from two normal distributions with variances In testing where the test statistic value is the rejection region for a level .05 test is either

24. In testing degrees of freedom, if the test statistic value f = 4.53, then P-value = __________.

MULTIPLE CHOICE

1. Let be a random sample from a population with mean be a random sample from a population with mean and that the X and Y samples are independent of one another. Which of the following statements are not true?
a. The natural estimator of
b. The expected value of
c. The expected value of
d. is an unbiased estimator of
e. All of the above statements are true.

2. Let be a random sample from a normal population with mean and let be a random sample from a normal population with mean and that the X and Y samples are independent of one another. Which of the following statements are true?
a. is normally distributed with expected value
b. is normally distributed with expected value
c. is normally distributed with expected value
d. is an unbiased estimator of .
e. All of the above statements are true.

3. Which of the following statements are true?
a. When the alternative hypothesis is the null hypothesis is considerably smaller than the null value .
b. When the alternative hypothesis is the null hypothesis is considerably larger than the null value .
c. When the alternative hypothesis is the null hypothesis is too far to either side of the null value .
d. All of the above statements are true.
e. None of the above statements are true.

4. Which of the following statements are not true if a test procedure about the difference between two population means is performed when both population distributions are normal and that the values of both population variances are known?
a. The rejection region for level if the test is an upper-tailed test.
b. The rejection region for level if the test is a lower-tailed test.
c. The rejection region for level if the test is a two-tailed test.
d. All of the above statements are true.
e. None of the above statements are true.

5. Let be a random sample from a normal population with mean and variance be a random sample from a normal population with mean and that X and Y samples are independent of one another. Assume the sample mean values are and we want to test Which of the following statements are correct?
a. The value of the test statistic is z = 2.83
b. The value of the test statistic is z = 1.88
c. is rejected at the .05 level if
d. is rejected at the .05 level if
e. None of the above statements are correct.

6. In testing the computed value of the test statistic is z = 1.98. The P-value for this two-tailed test is then
a. .4761
b. .0478
c. .0239
d. .2381
e. .2619

7. In calculating 95% confidence interval for the difference between the means of two normally distributed populations, summary statistics from two independent samples are: Then, the lower limit of the confidence interval is:
a. 29.994
b. 11.587
c. 10.006
d. 10.797
e. 28.413

8. In calculating 95% confidence interval for the difference between the means of two normally distributed populations, summary statistics from two independent samples are: Then, the upper limit of the confidence interval is
a. 10.953
b. 9.047
c. 9.216
d. 10.784
e. 10.0

9. Which of the following statements are true?
a. In real problems, it is virtually always the case that the values of the population variances are unknown.
b. The two-sample t test is applicable in situations in which population distributions are both normal when population variances have unknown values, and at least one of the two sample sizes are small.
c. The pooled t test procedure is applicable if the two population distribution curves are assumed normal with equal spreads.
d. All of the above statements are true.
e. None of the above statements are true.

10. Which of the following statements are not true?
a. Many statisticians recommend pooled t procedures over the two-sample t procedures.
b. The pooled t test is not a likelihood ratio test, whereas the two-sample t test can be derived from the likelihood ratio principle.
c. The significance level for the pooled t test is exact.
d. The significance level for the two-sample t test is only approximate.
e. All of the above statements are true

11. The degrees of freedom associated with the pooled t test, based on sample sizes 10 and 12 are
a. 22
b. 21
c. 20
d. 19
e. 18

12. Which of the following statements are not correct assumptions for developing pooled confidence intervals and for testing hypotheses about the difference between two population means
a. Both populations are normally distributed
b. The samples selected from the two populations are independent random samples.
c. At least one of the two sample sizes is small.
d. The two population variances are equal
e. The two population variances are not equal

13. When variances of two independent samples are combined and is computed, the is referred to as
a. the pooled estimator of
b. the combined estimator of
c. the pooled estimator of the common variance of the two populations
e. None of the above answers are correct.

14. Two independent samples of sizes 15 and 17 are randomly selected from two normal populations with equal variances. Which of the following distributions should be used for developing confidence intervals and for testing hypotheses about the difference between the two population means
a. The standard normal distribution
b. The t distribution with 32 degrees of freedom
c. The t distribution with 31 degrees of freedom
d. The t distribution with 30 degrees of freedom
e. Any continuous distribution since the sum of the two sample sizes exceeds 30

15. Which of the following statements are not necessarily true about the paired t test?
a. The data consists of n independently selected pairs
b. The differences are assumed to be normally distributed.
c. The X and Y observations within each pair are independent.
d. The are not independent of one another.
e. All of the above statements are true.

16. The number of degrees of freedom for a paired t test, where the data consists of 10 independent pairs, is equal to
a. 20
b. 18
c. 10
d. 9
e. 8

17. At the .05 significance level, the null hypothesis is rejected in a paired t test, where the data consists of 15 independent pairs, if
a.
b.
c. either
d.
e.

18. A 95% confidence interval for the true mean difference in paired data, where is determined by
a. 20 2.048 (0.80)
b. 20 2.145 (3.098)
c. 20 2.131 (0.894)
d. 20 1.761 (1.118)
e. 20 1.753(1.291)

19. Which of the following statements are true?
a. Whenever there is positive dependence within pairs, the denominator for the paired t statistic should be smaller than for t of the independent-samples test.
b. When data is paired, the paired t confidence interval will usually be narrower than the (incorrect) two-sample t confidence interval.
c. If there is great heterogeneity between experimental units and a large correlation within experimental units, a paired experiment is preferable to an independent-samples experiment.
d. If the experimental units are relatively homogeneous and the correlation within pairs is not large, an independent-samples experiment should be used.
e. All of the above statements are true.

20. In testing is the true mean difference in paired data consisting of 12 independent pairs, the sample mean and sample standard deviation are, respectively, 7.25 and 8.25. Which of the following statements are true?
a. The value of the test statistic is z = 3.04.
b. The P-value is .0013.
c. The P-value is .0026.
d. The null hypothesis is rejected at the .01 level.
e. The null hypothesis is rejected at the .005 level.

21. Let with X and Y independent variables, and let Which of the following statements are not correct?
a. is an unbiased estimator of
b. When both m and n are large, the estimator individually has approximately normal distributions.
c. When both m and n are large, the estimator has approximately a normal distribution.
d.
e. All of the above statements are correct.

22. In testing denote the two population proportions, the value of the test statistic is found to be z = -1.82. Then, the P-value is
a. .9312
b. .4656
c. .0688
d. .0344
e. .9656

23. In testing denote the two population proportions, and both sample sizes are assumed to be large, the rejection region for approximate level .025 test is
a.
b.
c. either
d.
e.

24. Let denote two population proportions, and let be the sample proportions of samples of sizes 150 and 200, respectively. Then a large sample confidence interval for with a confidence level of approximately 99% is determined by
a.
b.
c.
d.
e.

25. When the necessary conditions are met in testing the two sample proportions are is true. Then, the value of the test statistic is
a. 10.0
b. 2.5
c. 7.5
d. 62.5
e. 0.70

26. In testing the difference between two population proportions, a weighted average of the sample proportions should be used in computing the value of the test statistic when
a. the two populations are normally distributed
b. the two sample sizes are small
c. the two samples are independent of each other
d. the null hypothesis states that the two population proportions are equal
e. the null hypothesis states that the two sample proportions are equal

27. Which of the following statements are not true about the F distribution with parameters
a. The parameter is called the number of numerator degrees of freedom.
b. The parameter is called the number of denominator degrees of freedom.
c. A random variable that has an F distribution can assume a negative value; depends on the values of
d. All of the above statements are true.
e. None of the above statements are true.

28. Which of the following statements are true?
a. Methods for comparing two population variances (or standard deviations) are occasionally needed, though such problems arise much less frequently than those involving means or proportions.
b. If are independent chi-squared random variables with degrees of freedom, respectively, divided by their respective degrees of freedom can be shown to have an F distribution.
c. The density curve of an F distribution is positively skewed (skewed to the right).
d. All of the above statements are true.
e. None of the above statements are true.

29. Let be a random sample from a normal distribution with variance be another random sample (independent of the from a normal distribution with variance denote the two sample variances. Which of the following statements are not true in testing where the test statistic value is and the test is performed at .10 level?
a. The rejection region is
b. The rejection region is
c. The rejection region is either
d. All of the above statements are true.
e. None of the above statements are true.

30. Let be a random sample from a normal distribution with variance be another random sample (independent of the from a normal distribution with variance denote the two sample variances. Which of the following statements are not true?
a. The random variable has an F distribution with parameters
b. The random variables each have a t distribution with m-1 and n-1 degrees of freedom, respectively.
c. The hypothesis is rejected if the ratio of the sample variances differs by too much from 1.
d. In testing the rejection region for a level
e. All of the above statements are true.

31. Which of the following statements are not necessarily true?
a. The density curve of an F distribution is not symmetric, so it would be necessary that both upper-and lower-tail critical values must be tabulated.
b.
c. There is an important connection between an F distribution and independent chi-squared random variables.
d. A random variable that has an F distribution cannot assume a negative value.
e. All of the above statements are true.

32. For an F distribution with parameters is the number of numerator degrees of freedom, and is the number of denominator degrees of freedom, which of the following statements are true?
a.
b.
c.
d. can be larger than, smaller than, or equal to
e. None of the above answers are true.

1. A study comparing different types of batteries showed that the average lifetimes of Duracell Alkaline AA batteries and Eveready Energizer Alkaline AA batteries were given as 4.5 hours and 4.2 hours, respectively. Suppose these are the population average lifetimes.

a. Let be the sample average lifetime of 150 Duracell batteries and be the sample average lifetime of 150 Eveready batteries. What is the mean value of (i.e., where is the distribution of centered)? How does your answer depend on the specified sample sizes?
b. Suppose the population standard deviations of lifetime are 1.8 hours for Duracell batteries and 2.0 hours for Eveready batteries. With the sample sizes given in part (a), what is the variance of the statistic , and what is its standard deviation?
c. For the sample sizes given in part (a), what is the approximate distribution curve of (include a measurement scale on the horizontal axis)? Would the shape of the curve necessarily be the same for sample sizes of 10 batteries of each type? Explain.

2. Let denote true average tread life for a premium brand of radial tire and let denote the true average tread life for an economy brand of the same size. Test versus at level .01 using the following statistics:

3. Tensile strength tests were carried out on two different grades of wire rod resulting in the accompanying data:

Grade Sample Size Sample Mean Sample St. Dev.
1064 130 108 1.3
1078 130 124 2.0

a. Does the data provide compelling evidence for concluding that true average strength for the 1078 grade exceeds that for the 1064 grade by more than 10 ? Test the appropriate hypotheses using the -value approach.
b. Estimate the difference between true average strengths for the two grades in a way that provides information about precision and reliability.

4. To decide whether two different types of steel have the same true average fracture toughness values, specimens of each type are tested, yielding the following results:

Type Sample Average Sample St. Dev.
1 60.2 1.0
2 60.0 1.0

Calculate the -value for the appropriate two-sample test, assuming that the data was based on = 100. Then repeat the calculation for = 400. Is the small p-value for = 400 indicative of a difference that has practical significance? Would you have been satisfied with just a report of the p-value? Comment briefly.

5. Suppose are true mean stopping distances at 50 mph for cars of a certain type equipped with two different types of braking systems. Use the two-sample test at significance level .01 to test for the following statistics:

6. Suppose are true mean stopping distances at 50 mph for cars of a certain type equipped with two different types of braking systems. The following statistics are given: m = 6, Calculate a 95% CI for the difference between true average stopping distance for cars equipped with system 1 and cars equipped with system 2. Does the interval suggest that precise information about the value of this difference is available?

7. A study includes the accompanying data on compression strength (lb) for a sample of 12-oz aluminum cans filled with strawberry drink and another sample filled with cola. Does the data suggest that the extra carbonation of cola results in a higher average compression strength? Base your answer on a -value. What assumptions are necessary for your analysis?

Beverage Sample Size Sample Mean Sample St. Dev.
Strawberry drink 15 546 21
Cola 15 560 15

8. A summary data on proportional stress limits for specimens constructed using two different types of wood are shown below:

Type of wood Sample size Sample mean Sample St. Dev.
Red oak 14 8.50 .80
Douglas fir 10 6.65 1.28

Assuming that both samples were selected from normal distributions, carry out a test of hypotheses to decide whether the true average proportional stress limit for red oak joints exceeds that for Douglas fir joints by more than one Mpa?

9. Consider the accompanying data on breaking load (kg/25 mm width) for various fabrics in both an unabraded condition and an abraded condition. Use the paired t test at significance level .01 to test .

Fabric

1 2 3 4 5 6 7 8
U 25.6 48.8 49.8 43.2 38.7 55.0 36.4 51.5
A 26.5 52.5 46.5 36.5 34.5 20.0 28.5 46.0

10. Two types of fish attractors, one made from vitrified clay pipes and the other from cement blocks and brush, were used during 16 different time periods spanning 4 years at Lake Tohopekaliga, Florida The following observations are of fish caught per fishing day.

Period

1 2 3 4 5 6 7 8
Pipe .00 1.80 4.86 .58 .37 .32 .11 .23
Brush .48 2.33 5.38 .79 .32 .76 .52 .91

Period

9 10 11 12 13 14 15 16
Pipe .29 .85 6.64 .57 1.83 7.89 .63 .42
Brush .75 1.61 9.73 .83 2.17 8.21 .56 .75

Does one attractor appear to be more effective on average than the other?
a. Use the paired t test with
b. What happens if the two-sample t test is used

11. In an experiment designed to study the effects of illumination level on task performance, subjects were required to insert a fine-tipped probe into the eyeholes of ten needles in rapid succession both for a low light level with black background and a higher level with a white background. Each data value is the time (sec) required to complete the task.

Subject

1 2 3 4 5 6 7 8 9
Black 25.01 41.05 27.47 25.74 24.96 28.84 25.85 20.89 32.05
White 16.61 24.98 24.59 19.68 16.07 20.84 18.23 19.50 22.96

Does the data indicate that the higher level of illumination yields a decrease of more than 5 sec in
true average task completion time? Test the appropriate hypotheses using the P-value approach.

12. In an experiment designed to study the effects of illumination level on task performance, subjects were required to insert a fine-tipped probe into the eyeholes of ten needles in rapid succession both for a low light level with black background and a higher level with a white background. Each data value is the time (sec) required to complete the task.

Subject

1 2 3 4 5 6 7 8 9
Black 25.01 41.05 27.47 25.74 24.96 28.84 25.85 20.89 32.05
White 16.61 24.98 24.59 19.68 16.07 20.84 18.23 19.50 22.96

Compute in interval estimate for the difference between true average task time under the high illumination level and true average time under the low level.

13. A sample of 300 urban adult residents of in Michigan revealed 63 who favored increasing the highway speed limit from 55 to 70mph, whereas a sample of 180 rural residents yielded 72 who favored the increase. Does this data indicate that the sentiment for increasing the speed limit is different for the two groups of residents? Test using , where refers to the urban population.

PTS: 1
14. A random sample of 5726 telephone numbers from a certain region taken in March 2002 yielded 1105 that were unlisted, and 1 year later a sample of 5384 yielded 980 unlisted numbers.

a. Test at level .10 to see whether there is a difference in true proportions of unlisted numbers between the two years.
b. If what sample sizes (m = n) would be necessary to detect such a difference with probability .90?

PTS: 1
15. Ionizing radiation is being given increasing attention as a method for preserving horticultural products. A study reports that 153 of 180 irradiated garlic bulbs were marketable (no external sprouting, rotting, or softening) 240 days after treatment, whereas only 117 of 180 untreated bulbs were marketable after this length of time. Does this data suggest that ionizing radiation is beneficial as far as marketability is concerned?

16. Two different types of alloy, A and B, have been used to manufacture experimental specimens of a small tension link to be used in a certain engineering application. The ultimate strength (ksi) of each specimen was determined, and the results are summarized in the accompanying frequency distribution.

A B
26 – < 30 6 4
30 – < 34 12 9
34 – < 38 15 19
38 – < 42 7 10
m = 40 n = 42

Compute a 95% CI for the difference between the true proportions of all specimens of alloys A and B that have an ultimate strength of at least 34 ksi.

17. Obtain or compute the following quantities using the table of “critical values for F distribution” available in your text

a.
b.
c.
d.
e. The 95th percentile of the F distribution with
f. The 5th percentile of the F distribution with
g.
h.

18. Give as much information as you can about the P-value of the F test in each of the following situations:

a.
b.
c.
d.
e.

19. The sample standard deviation of sodium concentration in whole blood (mEq/L) for m = 20 marine eels was found to be whereas the sample standard deviation of concentration for n = 20 freshwater eels was . Assuming normality of the two concentration distributions, test at level .10 to see whether the data suggests any difference between concentration variances for the two types of eels.

20. In a study of copper deficiency in cattle, the copper values (ug Cu/100mL blood) were determined both for cattle grazing in an area known to have well-defined molybdenum anomalies (metal values in excess of the normal range of regional variation) and for cattle grazing in a nonanomalous area, resulting in (m = 48) for the anomalous condition and (n = 45) for the nonanomalous condition. Test for the equality versus inequality of population variances at significance level .10 by using the P-value approach.

Chapter 10

1. The simplest ANOVA problem is referred to variously as a single-factor, single-classification, or __________ ANOVA.

2. In a single-factor ANOVA, the characteristic that differentiates the treatments or populations from one another is called the __________ under study, and the different treatments or populations are referred to as the __________ of the factor.

3. An experiment is conducted to study the effectiveness of three teaching methods on student performance. In this experiment, the factor of interest is __________, and there are __________ different levels of the factor.

4. An experiment is conducted to study the effects of the presence of four different sugar solutions (glucose, sucrose, fructose, and a mixture of the three) on bacterial growth. In this experiment, the factor of interest is __________, and there are __________ different levels of the factor.

5. Single-factor ANOVA focuses on a comparison of more than two population or treatment __________.

6. In a one-way ANOVA problem involving four populations or treatments, the null hypothesis of interest is

7. In single-factor ANOVA, the __________ is a measure of between samples variation, and is denoted by __________.

8. In single-factor ANOVA, the __________ is a measure of within-samples variation, and is denoted by __________.

9. In one-factor ANOVA, both mean square for treatments (MSTr) and mean square for error (MSE) are unbiased estimators for estimating the common population variance when __________, but MSTr tends to overestimate when __________.

10. In single-factor ANOVA, SST – SSTr = __________.

11. In one-factor ANOVA, __________ denoted by __________ is the part of total variation that is unexplained by the truth or falsity of .

12. In one-factor ANOVA, __________ denoted by __________ is the part of total variation that can be explained by possible differences in the population means.

PTS: 1

13. Let F =MSTr/MSE be the test statistic in a single-factor ANOVA problem involving four populations or treatments with a random sample of six observations from each one. When is true and the four population or treatment distributions are all normal with the same variance then F has an F distribution with degrees of freedom With f denoting the computed value of F, the rejection region for level .05 test is __________.

14. When is rejected in single-factor ANOVA, the investigator will usually want to know how much of the population means are different from one another. A method for carrying out this further analysis is called a __________.

15. __________ multiple comparisons procedure involves the use of probability distributed called the Studentized range distribution.

16. If three 95% confidence intervals for a population mean are calculated based on three independent samples selected randomly from the population, then the simultaneous confidence level will be about __________%.

17. Sometimes an experiment is carried out to compare each of several new treatments to a control treatment. In such situations, a multiple comparisons technique called __________’s method is appropriate.

18. In the “model equation” are assumed to be independent and normally distributed random variables with mean of 0 and standard deviation of for every i and j. Then the mean and variance of are __________ and __________, respectively.

19. Define a parameter by and the parameters by . Then, __________.

20. In a single-factor ANOVA problem involving 5 populations, assume that the sample sizes from each population are not equal, and that the total number of observations is 21. If SSTr = 8 and SST = 12, then MSTr = __________, MSE = __________, and the test statistic value f = __________.

21. If , a known function of , then a transformation that “stabilizes the variance” so that is approximately the same for each I is given by h(x)oc __________.

MULTIPLE CHOICE

1. Which of the following statements are not true?
a. The analysis of variance, or more briefly ANOVA, refers broadly to a collection of experimental situations and statistical procedures for the analysis of quantitative responses from experimental units.
b. The simplest ANOVA problem is referred to as two-way ANOVA.
c. Single-factor ANOVA focuses on a comparison of more than two population or treatment means.
d. All of the above statements are true.
e. None of the above statements are true.

2. In a single-factor ANOVA problem involving five populations or treatments, which of the following statements are true about the alternative hypothesis?
a. All five population means are equal.
b. All five population means are different.
c. At least two of the population mean are different.
d. At least three of the population mean are different.
e. At most, two of the population means are equal.

3. Which of the following statements are true?
a. In some experiments, different samples contain different numbers of observations. However, the concepts and methods of single-factor ANOVA are most easily developed for the case of equal sample sizes.
b. The population or treatment distributions in single-factor ANOVA are all assumed to be normally distributed with the same variance
c. In one-way ANOVA, if either the normality assumption or the assumption of equal variances is judged implausible, a method of analysis other than the usual F test must be employed.
d. The test statistic for single-factor ANOVA is F = MSTr/MSE, where MSTr is the mean square for treatments, and MSE is the mean square for error.
e. All of the above statements are true.

4. In single-factor ANOVA, MSTr is the mean square for treatments, and MSE is the mean square for error. Which of the following statements are not true?
a. MSE is a measure of between-samples variation.
b. MSE is a measure of within-samples variation.
c. MSTr is a measure of between-samples variation.
d. The value of MSTr is affected by the status of (true or false).
e. All of the above statements are true

5. In single-factor ANOVA, MSE is the mean square for error, and MSTr is the mean square for treatments. Which of the following statements are not true?
a. The value of MSTr is affected by the status of (true or false)
b. When is true, E(MSTr) = E(MSE) = is the common population variance.
c. When is false, E(MSTr) > E(MSE) = is the common population variance.
d. The value of MSE is affected by the status of (true or false).
e. All of the above statements are true.

6. In a single-factor ANOVA problem involving four populations or treatments, the four sample standard deviations are 25.6, 30.4, 28.7, and 32.50. Then, the mean square for error is
a. 29.3
b. 117.2
c. 864.865
d. 29.409
e. None of the above answers are correct.

7. In a single-factor ANOVA problem involving five populations or treatments with a random sample of four observations form each one, it is found that SSTr = 16.1408 and SSE = 37.3801. Then the value of the test statistic is
a. 1.619
b. 2.316
c. 0.432
d. 1.522
e. 4.248

8. In one-way ANOVA, which of the following statements are true?
a. SST is a measure of the total variation in the data.
b. SSE measures variation that would be present within treatments even if were true, and is thus the part of total variation that is unexplained by the truth or falsity of .
c. SSTr is the amount of variation between treatments that can be explained by possible differences in the population or treatments’ means.
d. If explained variation is large relative to unexplained variation, then is rejected in favor of .
e. All of the above statements are true.

9. Which of the following statements are not true?
a. An F distribution arises in connection with a ratio in which there is one number of degrees of freedom associated with the numerator and another number of degrees of freedom associated with the denominator .
b. For single-factor ANOVA, a value of the test statistic F = MSTr/MSE that is greatly smaller than 1, casts considerable doubt on .
c. The unbiasedness of MSE in single-factor ANOVA is a consequence of (the common population variance) whether is true or false.
d. In a single-factor ANOVA problem, the F critical value that captures upper-tail area .05 under the F curve with
e. All of the above statements are true.

10. The distribution of the test statistic in single-factor ANOVA is the
a. binomial distribution
b. normal distribution
c. t distribution
d. F distribution
e. None of the above answers are correct.

11. In a single-factor ANOVA problem involving 3 treatments, the sample means were 5,6, and 9. If each observation in the third sample was increased by 20, the test statistic value f would
a. increase
b. decrease
c. remain the same
d. increase by 20
e. decrease by 10

12. Which of the following statements are not true?
a. When the computed value of the F statistic in single-factor ANOVA is significant, the analysis is terminated because no differences among the population means have been identified.
b. When is rejected in single-factor ANOVA, further analysis is carried out by applying the multiple comparisons procedure.
c. Tukey’s multiple comparisons procedure involves the use of probability distribution called the Studentized range distribution.
d. All of the above statements are true.
e. None of the above statements are true.

13. In a single-factor ANOVA problem involving five populations or treatments with a random sample of nine observations from each one, suppose that is rejected at .05 level. Which of the following values are correct for the appropriate critical value needed to perform Tukey’s procedure?
a. 4.76
b. 3.79
c. 4.04
d. 3.85
e. 4.80

14. Which of the following statements are not correct?
a. The simultaneous confidence level is controlled by Tukey’s method.
b. The Tukey intervals are based on independent samples.
c. To obtain a 95% simultaneous confidence level using the Tukey method, the individual level for each interval must be considerably larger than 95%.
d. All of the above statements are true.
e. None of the above statements are true.

15. Consider calculating a 95% confidence interval for a population mean based on a sample from a population, and then a 95% confidence interval for a population proportion p based on another sample selected independently from the same population. Which of the following statements are true?
a. Prior to obtaining data, the probability that the first interval will include is .95.
b. Prior to obtaining data, the probability that the second interval will include p is .95.
c. The probability that both intervals will include the values of the respective parameters is about .90.
d. All of the above statements are correct.

16. If three 90% confidence intervals for a population proportion p are calculated based on three independent samples selected randomly from the population, then the simultaneous confidence level will be
b. exactly 90%
c. exactly 81%
d. exactly 270%
e. None of the above answers are correct.

17. The assumptions of single-factor ANOVA can be described succinctly by means of the “model equation” represents a random deviation from the population or true treatment mean . Which of the following statements are true?
a. The are assumed to be independent.
b. The are normally distributed random variables.
c. for every i and j.
d. for every i and j.
e. All of the above statements are true.

18. Which of the following statements are not true?
a. ANOVA can be used to test .
b. ANOVA cannot be used to test .
c. ANOVA can be used to test at least two of the are different.
d. The two-sample t test can be used to test .
e. All of the above statements are true.

19. Which of the following statements are not true?
a. The two-sample t test is more flexible than the F test when the number of treatments or populations is 2.
b. The two-sample t test is valid without the assumption that the two population variances are equal.
c. The two-sample t test can be used to test as well as .
d. The F test can be used to test as well as .
e. When the number of treatments or populations is at least 3, there is no general test procedure known to have good properties without assuming equal populations variances.

20. In a single-factor ANOVA problem involving 4 populations, the sample sizes are 7,5,6, and 6. If SST = 65.27 and SSTr = 23.49, then the test statistic value f is
a. 3.75
b. 2.09
c. 7.83
d. 0.56
e. 6.67

ANS: A

PTS: 1

21. QUESTION BLANK
a. CHOICE BLANK c. CHOICE BLANK
b. CHOICE BLANK d. CHOICE BLANK

22. QUESTION BLANK
a. CHOICE BLANK c. CHOICE BLANK
b. CHOICE BLANK d. CHOICE BLANK

23. QUESTION BLANK
a. CHOICE BLANK c. CHOICE BLANK
b. CHOICE BLANK d. CHOICE BLANK

24. QUESTION BLANK
a. CHOICE BLANK c. CHOICE BLANK
b. CHOICE BLANK d. CHOICE BLANK

25. QUESTION BLANK
a. CHOICE BLANK c. CHOICE BLANK
b. CHOICE BLANK d. CHOICE BLANK

26. Consider the accompanying data on plant growth after the application of different types of growth hormone.

1 15 19 9 16
2 23 15 22 19
Hormone 3 20 17 22 19
4 9 13 20 12
5 8 13 17 10

a. Perform an F test at level
b. What happens when Tukey’s procedure is applied?

27. QUESTION BLANK
a. CHOICE BLANK c. CHOICE BLANK
b. CHOICE BLANK d. CHOICE BLANK

28. QUESTION BLANK
a. CHOICE BLANK c. CHOICE BLANK
b. CHOICE BLANK d. CHOICE BLANK

29. QUESTION BLANK
a. CHOICE BLANK c. CHOICE BLANK
b. CHOICE BLANK d. CHOICE BLANK

30. QUESTION BLANK
a. CHOICE BLANK c. CHOICE BLANK
b. CHOICE BLANK d. CHOICE BLANK

31. Folacin is the only B vitamin present in any significant amount in tea, and recent advances in assay methods have made accurate determination of folacin content feasible. Consider the accompanying data on folacin content for randomly selected specimens of the four leading brands of green tea.

Brand Observations

1 8.0 6.3 6.7 8.7 9.0 10.2 9.7
2 5.8 7.6 9.9 6.2 8.5
3 6.9 7.6 5.1 7.5 5.4 6.2
4 6.5 7.2 8.0 4.6 5.1 4.1

Does this data suggest that true average folacin content is the same for all brands?
a. Carry out a test using via the P-value method.
b. Perform a multiple comparisons analysis to identify significant differences among brands.

32. QUESTION BLANK
a. CHOICE BLANK c. CHOICE BLANK
b. CHOICE BLANK d. CHOICE BLANK

33. QUESTION BLANK
a. CHOICE BLANK c. CHOICE BLANK
b. CHOICE BLANK d. CHOICE BLANK

Chapter 11

COMPLETION

1. In two-factor ANOVA, when factor A consists of I levels and factor B consists of J levels, there are __________ different combinations (pairs) of levels of the two factors, each called a __________.

2. Assume the existence of I parameters and J parameters such that The model specified by the above equations is called an __________ model because each mean response is the __________ of an effect due to factor A at level and an effect due to factor B at level .

3. In a two-factor experiment where factor A consists of I levels, factor B consists of J levels, and there is only one observation on each of the IJ treatments, SSE has __________ degrees of freedom.

4. In a two-factor experiment where factor A consists of I levels, factor B consists of J levels, and there is only one observation on each of the IJ treatments, SST has __________ degrees of freedom.

5. In a two-factor experiment where factor A consists of 5 levels, factor B consists of 4 levels, and there is only one observation on each of the 20 treatments, the critical value for testing the null hypothesis that the different levels of factor B have no effect on true average response at significance level .05 is denoted by __________, and is equal to __________.

6. In two-factor ANOVA, additivity means that the difference in true average responses for any two levels of one of the factors is the same for each level of the other factor. When additivity does not hold, we say that there is __________ between the different levels of the factors.

7. The parameters for the fixed effects model with interaction are and Thus the model is The are called the __________ for factor A, whereas the are the __________ for factor B. The are referred to as the __________ parameters.

8. In the fixed effects model with interaction, assume that there are I levels of factor A, J levels of factor B, and K observations (replications) for each of the IJ combinations of levels of the two factors. Then SST (the total sum of squares) has df = __________.

9. In the fixed effects model with interaction, assume that there are I levels of factor A, J levels of Factor J, and K observations (replications) for each of the IJ combinations of levels of the two factors. Then SSE (the error sum of squares) has df = __________.

10. In the fixed effects model with interaction, assume that there are I levels of factor A, J levels of factor B, and K observations (replications) for each of the IJ combinations of levels of the two factors. Then SSAB (the interaction sum of squares) has df =__________.

11. The three-factor fixed effects model, with the same number of observations for each combination of levels, I, J, and K of the three factors A, B, and C, respectively, is represented by

The parameters are called __________, and is called a __________, whereas are the __________ parameters.

12. In the three-factor fixed effects model, assume that there are 5 levels of factor A, 4 levels of factor B, 3 levels of factor C, and 2 observations for each combination of levels of the three factors. Then the error sum of squares (SSE) has df = __________.

13. In the three-factor fixed effects model, assume that there are 4 levels for each of the three factors A, B, and C, and 3 observations for each combination of levels of the three factors. Then, the two-factor interaction sum of squares for factors B and C (SSBC) has df = __________.

14. In the three-factor effects model, assume that there are 4 levels for factor A, 2 levels for factor B, 3 levels for factor C, and 4 observations for each combination of levels of the three factors. Then, the three-factor interaction sum of squares (SSABC) has df = __________.

15. When several factors are to be studied simultaneously, an experiment in which there is at least one observation for every possible combination of levels is referred to as __________.

16. A three-factor experiment, with I levels of factor A, J levels of factor B, and C levels of factor C, in which fewer than IJK observations are made is called an __________.

17. In a three-factor experiment, if the levels of factor A are identified with the rows of a two-way table and the levels of B with the columns of the table, then the defining characteristic of a Latin square design is that every level of factor C appears exactly __________ in each row and exactly __________ in each column.

18. An experiment in which there are p factors, each at two levels, is referred to as a

19. A experiment has __________ factors, and each factor has __________ levels.

20. Consider a experiment with 2 blocks. The price paid for this blocking is that __________ of the factor effects cannot be estimated.

21. Consider a experiment with four blocks. In this case, __________ factor effects are confounded with the blocks.

22. To select a quarter-replicate of a factorial experiment possible treatment conditions), the number of defining effects that must be selected is __________.

23. To select a half-replicate of a factorial experiment possible treatment conditions), the number of defining effects that must be selected is __________.

MULTIPLE CHOICE

1. Which of the following statements are not true?
a. The model specified by is called an additive model.
b. The model are assumed independent, normally distributed with mean 0 and common variance is an additive model in which the parameters are uniquely determined.
c. In two-way ANOVA, when the model is additive, additivity means that the difference in mean responses for two levels of one of the factors is the same for all levels of the other factor.
d. All of the above statements are true.
e. None of the above statements are true.

2. Which of the following statements are not true regarding the model where
a. is the true grand mean; that is, the mean response averaged over all levels of both factors A and B.
b. is the effect of Factor A at level i.
c. is the effect of Factor B at level j.
d. are assumed independent and normally distributed with mean 0 and variance 1.
e. All of the above statements are true.

3. In a two-factor experiment where factor A consists of 4 levels, factor B consists of 3 levels, and there is only one observation on each of the 12 treatments, which of the following statements are not true?
a. SST has 12 degrees of freedom
b. SSA has 3 degrees of freedom
c. SSB has 2 degrees of freedom
d. SSE has 6 degrees of freedom
e. None of the above statements are correct.

4. A two-factor experiment where factor A consists of I levels, factor B consists of J levels, and there is only one observation on each of the IJ treatments, can be represented by the model . Which of the following is the correct form in testing the null hypothesis that the different levels of factor A have no effect on true average response?
a.
b.
c.
d.
e. None of the above answers are correct.

5. The primary interest of designing a randomized block experiment is:
a. to reduce the variation among blocks
b. to increase the between-treatments variation to more easily detect differences among the treatment means
c. to reduce the within-treatments variation to more easily detect differences among the treatment means
d. to increase the total sum of squares
e. All of the above statements are true.

6. In the randomized block design for ANOVA where the single factor of primary interest has I levels, and b blocks are created to control for extraneous variability in experimental units or subjects, the number of degrees of freedom for SSE (error sum of squares) is given by
a. Ib-1
b. (I-1) (b-1)
c. I-1
d. b-1
e. I+b-1

7. In the fixed effects model with interaction, assume that there are 5 levels of factor A, 4 levels of factor B, and 3 observations (replications) for each of the 20 combinations of levels of the two factors. Then the number of degrees of freedom of the interaction sum of squares (SSAB) is
a. 60
b. 20
c. 15
d. 12
e. 59

8. Which of the following statements are not true?
a. In a two-factor experiment there are I levels of one factor and J levels of the other factor. When there is more than one observation for at least one (i,j) pair of the IJ combinations of levels of the two factors, a valid estimator of the random error variance cannot be obtained without assuming additivity.
b. In a two-factor experiment, additivity means that the difference in true average responses for any two levels of one of the factors is the same for each level of the other factor.
c. The parameters for the fixed effects model with interaction are the effect of factor A at level the effect of factor B at level j, and the interaction of factor A at level i and factor B at level j. Thus, the model is This model is additive if and only if
d. All of the above statements are true.
e. None of the above statements are true.

9. In the fixed effects model with interaction, assume that there are 4 levels of factor A, 3 levels of factor B, and 3 observations for each of the 12 combinations of levels of the two factors. Then, the number of degrees of freedom for the error sum of squares (SSE) is
a. 36
b. 35
c. 24
d. 10
e. 9

10. In the fixed effects model with interaction, assume that there are 3 levels of factor A, 2 levels of factor B, and 3 observations for each of the six combinations of levels of the two factors. Then the critical value for testing the null hypothesis of no interaction between the levels of the two factors at the .05 significance level is
a. 3.49
b. 3.89
c. 3.00
d. 3.55
e. 3.11

11. Which of the following statements are true regarding a two-factor experiment?
a. In some experiments, the levels of either factor may have been chosen from a large population of possible levels, so that the effects contributed by the factor are random rather than fixed.
b. If both factors contribute random effects, the model is referred to as a random effects model.
c. If one factor is fixed, and the other contributes random effects, a mixed effects model results.
d. If both factors are fixed, the model is referred to as a fixed effects model.
e. All of the above statements are true.

12. The following equation SST = SSA + SSB +SSAB +SSE applies to which ANOVA model?
a. One-factor ANOVA
b. Two-factor ANOVA with interaction
c. Three-factor ANOVA
d. Randomized block design
e. All of the above

13. In a two-factor ANOVA problem, there are 4 levels of factor A, 5 levels of factor B, and 2 observations (replications) for each combination of levels of the two factors. Then, the number of treatments in this experiment is
a. 40
b. 11
c. 10
d. 20
e. 8

14. The three-factor fixed effects model, with the same number of observations for each combination of levels I, J, and K of the three factors A, B, and C, respectively, is represented by

.

Which of the following statements are true?
a. The restrictions necessary to obtain uniquely defined parameters are that the sum over any subscript of any parameter on the right-hand side of the above equation equals 0.
b. The parameters are called the main effects of the factors A, B, and C, respectively.
c. The parameters are called two-factor interactions.
d. The parameter is called a three-factor interaction.
e. All of the above statements are true.

15. In the three-factor fixed effects model, assume that there are 3 levels for each of the three factors A, B, and C, and 2 observations for each combination of levels of the three factors. Then the number of degrees of freedom for the error sum of squares (SSE) is
a. 54
b. 27
c. 11
d. 18
e. 16

16. In the three-factor fixed effects model, assume that there are 4 levels of factor A, 2 levels of factor B, 4 levels of factor C, and 3 observations for each combination of levels of the three factors. Then, the number of degrees of freedom for the three-factor interaction sum of squares (SSABC) is
a. 32
b. 10
c. 9
d. 12
e. 13

17. The following equation SST = SSA + SSB +SSC + SSAB + SSAC + SSBC + SSABC + SSE applies to which ANOVA model?
a. One-factor ANOVA
b. Two-factor ANOVA with interaction
c. Three-factor ANOVA with interactions
d. Randomized block design
e. Latin square design

18. Which of the following statements are not true?
a. Tukey’s multiple comparison procedure can be used in two-factor ANOVA but not in three-factor (or more) ANOVA.
b. When several factors are to be studied simultaneously, an experiment in which there is at least one observation for every possible combination of levels is referred to as complete layout.
c. A three-factor experiment, with I levels of factor A, J levels of factor B, and K levels of factor C, in which fewer than IJK observations are made is called an incomplete layout.
d. There are some incomplete layouts in which the pattern of combinations of factors is such that the analysis is straightforward. One such three-factor design is called a Latin square.
e. All of the above statements are true.

19. The following equation SST = SSA + SSB + SSC + SSE applies to which ANOVA model?
a. One-factor ANOVA
b. Two-factor ANOVA with interaction
c. Three-factor ANOVA with interactions
d. Latin square design
e. Randomized block design

20. Which of the following statements are not true?
a. An experiment in which there are p factors, each at two levels, is referred to as a factorial experiment.
b. A factorial experiment provides a simple setting for introducing the important concepts of confounding and fractional replications.
c. A experiment, with four factors A, B, C, and D, has 16 different experimental conditions.
d. All of the above statements are true.
e. None of the above statements are true.

21. Which of the following statements are true?
a. Blocking is always effective in reducing variation associated with extraneous sources.
b. It is often not possible to carry out all experimental conditions of a factorial experiment in a homogeneous experimental environment.
c. When the experimental conditions are placed in homogeneous blocks (r<p), the price paid for this blocking is that of the factor effects cannot be estimated.
d. All of the above statements are true.
e. None of the above statements are true.

22. Which of the following statements are not true?
a. If the two three-factor interactions BCD and CDE are chosen for confounding, then their generalized interaction is BE.
b. If the two three-factor interactions ABC and CDE are chosen for confounding, then their generalized interaction is ABCDE.
c. When the number p of factors is large, a single replicate of a experiment can be expensive and time consuming.
d. All of the above statements are true.
e. None of the above statements are true.

23. Which of the following statements are true?
a. For experimental situations with more than three factors, there are often no replications, so sums of squares associated with nonconfounded higher-order interactions are usually pooled to obtain an error sum of squares that can be used in the denominators of the various F statistics.
b. One replicate of a factorial experiment involves an observation for each of the 64 different experimental conditions.
c. If an experimenter decides to include only of the possible conditions in the experiment; this is usually called a half-replicate.
d. The first step in selecting half-replicate is to select a defining effect as the nonestimable effect.
e. All of the above statements are true.

1. The number of miles useful tread wear (in 1000’s) was determined for tires of five different makes of subcompact car (factor A, with I = 5) in combination with each of four different brands of radial tires (factor B, with J = 4), resulting in IJ = 20 observations. The values SSA = 30, SSB = 45, and SSE = 60 were then computed. Assume that an additive model is appropriate.

a. Test (no differences in true average tire lifetime due to makes of cars) versus using a level .05 test.
b. (no differences in true average tire lifetime due to brands of tires) versus using a level .05 test.

2. In an experiment to see whether the amount of coverage of light-blue interior paint depends either on the brand of paint or on the brand of roller used, 1 gallon of each of four brands of paint was applied using each of three brands of roller, resulting in the following data (number of square feet covered).

Roller Brand
1 2 3
1 404 396 401
Paint Brand 2 396 394 397
3 389 392 394
4 394 387 393

a. Construct the ANOVA table.
b. State and test hypotheses appropriate for deciding whether paint has any effect on coverage. Use
c. Repeat part (b) for brand of roller.
d. Use Tukey’s method to identify significant differences among brands. Is there one brand that seems clearly preferable to the others?

3. A particular county in Indiana employs three assessors who are responsible for determining the value of residential property in the county. To see whether these assessors differ systemically in their assessments, five houses are selected, and each assessor is asked to determine the market value of each house. With factor A denoting assessors (I = 3) and factor B denoting houses (J=5), suppose SSA = 12, SSB = 110, and SSE = 26.

a. Test states that there are no systemic differences among assessors).
b. Explain why a randomized block experiment with only 5 houses was used rather than a one-way ANOVA experiment involving a total of 15 different houses with each assessor asked to assess 5 different houses (a different group of 5 for each assessor).

4. The strength of concrete used in commercial construction tends to vary from one batch to another. Consequently, small test cylinders of concrete sampled from a batch are “cured” for periods up to about 28 days in temperature- and moisture-controlled environments before strength measurements are made. Concrete is then “bought and sold on the basis of strength test cylinders”. The accompanying data resulted from an experiment carried out to compare three different curing methods with respect to compressive strength (MPa). Analyze this data.

Batch Method A Method B Method C
1 30.2 33.2 30.0
2 28.6 30.1 32.1
3 29.5 31.7 30.0
4 31.4 34.1 33.0
5 30.0 32.5 31.9
6 26.4 28.8 27.3
7 27.7 27.9 30.2
8 31.9 31.9 33.1
9 26.1 29.0 28.7
10 28.1 28.9 32.7

5. In an experiment to assess the effects of curing time (factor A) and type of mix (factor B) on the compressive strength of hardened cement cubes, three different curing times were used in combination with four different mixes, with three observations obtained for each of the 12 curing time-mix combinations. The resulting sums of squares were computed to be SSA = 30,763.0, SSB = 34,185.6, SSE = 97,436.8, and SST = 205,966.6.

a. Construct an ANOVA table.
b. Test at level .05 the null hypothesis (no interaction of factors) against
c. Test at level .05 the null hypothesis (factor A main effects are absent) against
d. Test at least one using a level .05 test.
e. The values of the Use Tukey’s procedure to investigate significant differences among the three curing times.

6. The accompanying data table gives observations on total acidity of coal samples of three different types, with determinations made using three different concentrations of ethanolic NaOH.

Type of Coal
.404N 8.27, 8.17 8.66, 8.61 8.14, 7.96
NaOH Conc. .626N
8.03, 8.21 8.42, 8.58 8.02, 7.89
.786N
8.60, 8.20 8.61, 8.76 8.13, 8.07

a. Assuming both effects to be fixed, construct an ANOVA table, test for the presence of
ANOVA table, test for the presence of interaction, and then test for the presence of main effects for each factor (all using level .01).
b. Use Tukey’s procedure to identify significant differences among the types of coal.

7. The current (in ) necessary to produce a certain level of brightness of a television tube was measured for two different types of glass and three different types of phosphor, resulting in the accompanying data:

Phosphor Type
1 2 3
Glass 1 280, 290, 285 300, 310, 295 270, 285, 290
Type 2 230, 235, 240 260, 240, 235 220, 225, 230

Assuming that both factors are fixed, test at level .01. Then if cannot be rejected, test the two sets of main effect hypotheses.

8. The accompanying data was obtained in an experiment to investigate whether compressive strength of concrete cylinders depends of the type of capping material used or variability in different batches. Each number is a cell total
based on K = 3 observations.

Batch
1 2 3 4 5
Capping Material 1 1847 1942 1935 1891 1795
2 1779 1850 1795 1785 1626
3 1806 1892 1889 1891 1756

In addition, Obtain the ANOVA table and then test at level .01 the hypotheses assuming that capping is a fixed effect and batches is a random effect.

9. The output of a continuous extruding machine that coats steel pipe with plastic was studied as a function of the thermostat temperature profile (A, at three levels), type of plastic (B, at three levels), and the speed of the rotating screw that forces the plastic through a tube-forming die (C, at three levels). There were two replications (L = 2) at each combination of levels of the factors, yielding a total of 54 observations on output. The sums of squares were SSA = 14,144.44, SSB = 5511.27, SSC = 244,696.39, SSAB = 1069.62, SSAC = 62.67, SSBC = 331.67, SSE = 3127.50, and SST = 270,024.33.

a. Construct the ANOVA table.
b. Use appropriate F tests to show that none of the F ratios for two- or three-factor interactions is
at level .05.
c. Which main effects appear significant?
d. With use Tukey’s procedure to identify significant
differences among the levels of factor C.

10. The following summary quantities were computed from an experiment involving four levels of nitrogen (A), two times of planting (B), and two levels of potassium (C). Only one observation (N content, in percentage, of corn grain) was made for each of the 16 combinations of levels.

SSA = .22625 SSB = .000025 SSC = .0036 SSAB = .004325
SSAC = .00065 SSBC = .000625 SST = .2384.

a. Construct the ANOVA table.
b. Assume that there are no three-way interaction effects, so that MSABC is a valid estimate of
and test at level .05 for interaction and main effects.
c. The nitrogen averages are Use Tukey’s method to examine differences in percentage N among the nitrogen levels

11. Because of potential variability in aging due to different castings and segments on the castings, a Latin square design with N = 7 was used to investigate the effect of heat treatment on aging. With A = castings, B = segments, C = heat treatments, summary statistics include and Obtain the ANOVA table and test at level .05 the hypothesis that heat treatment has no effect on aging.

12. A four-factor ANOVA experiment was carried out to investigate the effects of fabric (A), type of exposure (B), level of exposure (C), and fabric direction (D) on extent of color change in exposed fabric as measured by a spectrocolorimeter. Two observations were made for each of the three fabrics, two types, three levels, and two directions, resulting in MSA = 2207.329, MSB = 47.255, MSC = 491.783, MSD = .044, MSAB = 15.303, MSAC = 275.446, MSAD = .470, MSBC = 2.141, MSBD = .280, MSE = .977, and MST = 93.621 (“Accelerated Weathering of Marine Fabrics,” J. Testing and Eval.,. 1992: 139-143). Assuming fixed effects for all factors, carry out an analysis of variance using for all tests and summarize your conclusions.

13. The accompanying data resulted from a experiment with three replications per combination of treatments designed to study the effects of concentration of detergent (A), concentration of sodium carbonate (B), and concentration of sodium carboxymethyl cellulose (C0 on cleaning ability of a solution in washing tests (a larger number indicates better cleaning ability than a smaller number).

Factor Levels
A B C Condition Observations
1 1 1 (1) 106, 93, 116
2 1 1 a 198, 200, 214
1 2 1 b 197, 202, 185
2 2 1 ab 329, 331, 307
1 1 2 c 149, 169, 135
2 1 2 ac 243, 247, 220
1 2 2 bc 255, 230, 252
2 2 2 abc 383, 360, 364

a. After obtaining cell totals compute estimates of
b. Use the cell totals along with Yate’s method to compute the effect contrasts and sums of squares. Then construct an ANOVA table and test all appropriate hypotheses using

14. A data from an experiment to assess the effects of vibration (A), temperature cycling (B), altitude cycling (C), and temperature for altitude cycling and firing (D) on thrust duration are shown below. Use the Yates method to obtain sums of squares and the ANOVA table. Then assume that three- and four-factor interactions are absent, pool the corresponding sums of squares to obtain an estimate of and test all appropriated hypotheses at level .05.

21.60 21.60 11.54 11.50
21.09 22.17 11.14 11.32

21.60 21.86 11.75 9.82
19.57 21.85 11.69 11.18

15. In an experiment involving four factors (A,B,C, and D) and four blocks show that at least one main effect or two-factor interaction effect must be confounded with the block effect.

a. In a experiment, suppose two blocks are to be used, and it is decided to confound the ABCD interaction with the block effect. Which treatments should be carried out in the first block [containing the treatment (1)], and which treatments are allocated to the second block?
b. In an experiment to investigate niacin retention in vegetables as a function of cooking temperature (A), sieve size (B), type of processing (C), and cooking time (D), each factor was held at two levels. Two blocks were used, with the allocation of blocks as given in part (a) to confound only the ABCD interaction with blocks. Use Yate’s procedure to obtain the ANOVA table for the accompanying data.

Treatment Treatment
(1) 91 d 72
b 92 bd 68
ab 94 abd 79
c 86 cd 69
ac 83 acd 75
bc 85 bcd 72
abc 90 abcd 71

c. Assume that all three-way interaction effects are absent, so that the associated sums of squares
Can be combined to yield an estimate of and carry out all appropriate test at level .05.

PTS: 1
Chapter 12

COMPLETION

1. If y = 2x + 5, then y__________ by __________when x increases by 1.

2. If y = -2x – 8, then the y-intercept is __________.

3. In general, the variable whose value is fixed by the experimenter will be denoted by x and will be called the independent, predictor, or __________ variable. For fixed x, the second variable will be random; we denote this random variable and its observed value by Y and y, respectively, and refer to it as the dependent or __________ variable.

4. A first step in a regression analysis involving two variables is to construct a __________. In such a plot, each (x,y) is represented as a point plotted on a two-dimensional coordinate system.

5. The simple linear regression model is is a random variable assumed to be __________ distributed, with

6. The estimated regression line or least squares line for the simple linear regression model is the line whose equation is given by __________.

7. If then the least squares estimate of the slope coefficient of the true regression line = __________.

8. If then the least squares estimate of the slope coefficient of the true regression line = __________.

9. If then the least squares estimate of the intercept of the true regression line = __________.

10. The vertical deviations from the estimated regression line are referred to as the __________.

11. When the estimated regression line is obtained via the principle of least squares, the sum of the residuals (i = 1, 3, …….., n) should in theory be __________.

12. In a simple linear regression problem, the following statistics are given:

Then, the error sum of squares is __________.

13. In simple linear regression analysis, the __________, denoted by __________, can be interpreted as a measure of how much variability in y left unexplained by the model – that is, how much cannot be attributed to a linear relationship.

14. In simple linear regression analysis, a quantitative measure of the total amount of variation in observed y values is given by the __________, denoted by __________.

15. If SSE = 36 and SST = 500, then the proportion of total variation that can be explained by the simple linear regression model is_ _________.

16. In simple linear regression analysis, SST is the total sum of squares, SSE is the error sum of squares, and SSR is the regression sum of squares. The coefficient of determination is given by

17. Since the mean of is an __________ estimator of .

18. In the simple linear regression model Y = the quantity E is a random variable, assumed to be normally distributed with E( ) = 0, and V( ) = . The estimated standard error of (the least squares estimated of ), denoted by , is __________ divided by __________, where .

19. In the simple linear regression model the quantity E is a random variable, assumed to be normally distributed with E( ) = 0 and V( ) = . The estimator has a __________ distribution, because it is a linear function of independent __________ random variables.

20. The assumptions of the simple of the simple linear regression model imply that the standardized variable has a t distribution with __________ degrees of freedom.

21. A 100(1 – ) % confidence interval for the slope of the true regression line is __________ .

22. Given that , and n = 15, the 95% confidence interval for the slope of the true regression line (__________,__________).

23. The t critical value for a confidence level of 90% for the slope of the regression line, based on a sample of size 20, is t = __________.

24. In a simple linear regression, the most commonly encountered pair of hypotheses about is A test of these two hypotheses is often referred to as the __________.

25. In testing the test statistic value is the t – ratio t = __________ divided by __________.

26. In testing using a sample of 15 observations, the rejection region for .05 level test is either __________ or __________.

27. In testing using a sample of 18 observations, the rejection region for .025 level test is __________.

28. The null hypothesis can be tested against by constructing an ANOVA table, and rejecting level of significance if the test statistic value f __________, where n is the sample size.

29. In testing the t test statistic value is found to be t = 2.15. Should the null hypothesis be tested by constructing an ANOVA table, the F test would result in a test statistic value f = __________.

30. Both the confidence interval for , the expected value of Y when and prediction interval for a future Y observation to be made when are __________ for an near than for an far from .

31. Let where is some fixed value of x. Then, the mean value of is __________.

32. If the confidence interval . For the expected value of Y when is computed both for x = a and for x = b to obtain joint confidence intervals for then the joint confidence coefficient on the resulting pair of intervals is at least __________ %.

33. The validity of joint or simultaneous confidence intervals for the expected value of Y when rests on a probability result called the __________ inequality, so the joint confidence intervals are referred to as __________ intervals.

34. A confidence interval refers to a parameter, or population characteristic, whose value is fixed but unknown to us. In contrast, a future value of Y is not a parameter but instead a random variable; for this reason we refer to an interval of plausible values for a future Y as a __________ rather than a confidence interval.

35. The __________ is a measure of how strongly related two variables x and y are in a sample.

36. Given n pairs of observations if large x’s are paired with large y’s and small x’s are paired with small y’s, then a __________ relationship between the variables is implied. Similarly, it is natural to speak of x and y having a __________ relationship if large x’s are paired with small y’s and small x’s are paired with large y’s.

37. The value of the sample correlation coefficient r is always between __________ and __________.

38. The sample correlation coefficient r equals 1 if and only if all pairs lie on a straight line with __________ slope.

39. The sample correlation coefficient r equals -1 if and only if all pairs lie on a straight line with __________ slope.

40. If the sample correlation coefficient r equals -.80, then the value of the coefficient of determinations is __________.

41. If then the sample correlation coefficient r equals __________.

42. A reasonable rule of thumb is to say that the correlation is weak if __________ __________, strong if __________ __________, and moderate otherwise.

43. When is true, the test statistic has a t distribution with __________ degrees of Freedom, where n is the sample size.

44. In testing using a sample of size 25, the test statistic value is found to be t = 2.50. The corresponding P-value for the test is __________, and we __________ when

MULTIPLE CHOICE

1. Which of the following statements are not true?
a. The objective of regression analysis is the exploit the relationship between two (or more) variables so that we can gain information about one of them through knowing values of the other(s).
b. Saying that variables x and y are deterministically related means that once we are told the value of x, the value of y is completely specified.
c. Regression analysis is the part of statistics that deals with investigation of the relationship between two or more variables related in a deterministic fashion.
d. All of the above statements are true.
e. None of the above statements are true.

2. Which of the following statements are true?
a. The simplest deterministic mathematical relationship between two variables x and y is a linear relationship
b. The set of pairs (x, y) for which determines a straight line with slope and y-intercept .
c. The slope of a line is the change in y per a 1-unit increase in x.
d. The y-intercept of a line is the height at which the line crosses the vertical axis and is obtained by setting x = 0 in the equation.
e. All of the above statements are true.

3. Which of the following statements are not true if ?
a. The y-intercept is 7
b. y decreases by 3 when x increases by 4
c. y decreases by 3 when x increases by 1
d. The slope of the line is -3
e. All of the above statements are not true.

4. Which of the following statements are not true?
a. In regression analysis, the independent variable is also referred to as the predictor or explanatory variable.
b. In regression analysis, the dependent variable is also referred to as the response variable.
c. A first step in a regression analysis involving two variables is to construct a scatter plot.
d. The simple linear regression model is where the quantity is a random variable, assumed to be normally distributed with
e. All of the above statements are true.

5. The simple linear regression model is where is a random variable assumed to be normally distributed with Let denote a particular value of the independent variable x. Which of the following identities are true regarding the expected or mean value of Y when ?
a.
b.
c.
d.
e.

6. The simple linear regression model is is a random variable assumed to be normally distributed with denote a particular value of the independent variable x. Which of the following identities are true regarding the variance of Y when ?
a.
b.
c.
d.
e.

7. Which of the following statements are true?
a. The true regression line is the line of mean values.
b. The height of the true regression line above any particular x value is the expected value of Y for that value of x.
c. The slope of the true regression line is interpreted as the expected change in Y associated with a 1-unit increase in the value of x.
d. The equation states that the amount of variability in the distribution of Y values is the same at each different value of x (homogeneity of variance).
e. All of the above statements are true.

8. In simple linear regression model which of the following statements are not required assumptions about the random error term ?
a. The expected value of is zero.
b. The variance of is the same for all values of the independent variable x.
c. The error term is normally distributed.
d. The values of the error term are independent of one another.
e. All of the above are required assumptions about .

9. A procedure used to estimate the regression parameters and to find the least squares line which provides the best approximation for the relationship between the explanatory variable x and the response variable Y is known as the
a. least squares method
b. best squares method
c. regression analysis method
d. coefficient of determination method
e. prediction analysis method

10. The principle of least squares results in values of that minimizes the sum of squared deviations between
a. the observed values of the explanatory variable x and the estimated values
b. the observed values of the response variable y and the estimated values
c. the observed values of the explanatory variable x and the response variable y
d. the observed values of the explanatory variable x and the response values
e. the estimated values of the explanatory variable x and the observed values of the response variable y

11. If then the least squares estimate of the slope coefficient of the true regression line is
a. 11.314
b. 8.944
c. 1.600
d. 0.625
e. cannot be determined from the given information

12. If then the least squares estimate of the slope coefficient of the true regression line is
a. 18.75
b. 28.42
c. 9.15
d. 9.76
e. 10.50

13. If then the least squares estimate of the slope coefficient of the true regression line is
a. 3.60
b. 0.75
c. 1.33
d. 4.80
e. 1.68

14. Which of the following statements are not true regarding the normal equations

?
a. The normal equations are linear in the unknowns .
b. The least squares estimates are always the unique solution to the system of normal equations.
c. Provided that at least two of the values are different, the least squares estimates are the unique solution to the system of normal equations.
d. The quantity is not needed to solve the system of normal equations.
e. All of the above statements are true.

15. Which of the following statements are true?
a. Before the least squares estimates are computed, a scatter plot should be examined to see whether a linear probabilistic model is plausible.
b. For a fixed (the height of the estimated regression line above ) gives either a point estimate of the expected value of Y when or a point prediction of the Y value that will result from a single new observation made at .
c. The least squares regression line should not be used to make a prediction for an x value much beyond the range of the data x values.
d. The residuals are the vertical deviations from the estimated regression line.
e. All of the above statements are true.

16. Which of the following statements are not true?
a. The predicted value is the value of y that we would predict or expect when using the estimated regression line with
b. The predicted value is the height of the estimated regression line above the value for which the observation was made.
c. The residual is the difference between the observed and the predicted
d. If the residuals are all large in magnitude, then much of the variability in observed y values appears to be due to the linear relationship between x and y, whereas many small residuals suggest quite a bit of inherent variability in y relative to the amount due to the linear relation.
e. All of the above statements are true.

17. The quantity in the simple linear regression model is a random variable, assumed to be normally distributed with Based on 20 observations, if the residual sum of squares is 8, then the estimated standard deviation is
a. 2.500
b. 0.400
c. 0.667
d. 0.444
e. None of the above answers are correct.

18. Which of the following statements are not true?
a. The total sum of squares is the sum of squared deviations about the sample mean of the observed y values.
b. The error sum of squares is the sum of squared deviations about the least squares line
c. The ratio of the error sum of squares to the total sum of squares is the proportion of total variation that cannot be explained by the simple linear regression model.
d. The sum of squared deviations about the least squares regression line is always smaller than the sum of squared deviations about any other line.
e. All of the above statements are true.

19. If the error sum of squares is 12 and the total sum of squares is 400, then the proportion of observed y variation explained by the simple linear regression model is
a. 0.030
b. 0.173
c. 0.970
d. 0.985
e. None of the above answers are correct.

20. Which of the following statements are not correct?
a. The coefficient of determination, denoted by is interpreted as the proportion of observed y variation that cannot be explained by the simple linear regression model.
b. The higher the value of the coefficient of determination, the more successful is the simple linear regression model in explaining y variation.
c. If the coefficient of determination is small, an analyst will usually want to search for an alternative model (either a nonlinear model or a multiple regression model that involves more than a single independent variable).
d. The coefficient of determination can be calculated as the ratio of the regression sum of squares (SSR) to the total sum of squares.
e. All of the above statements are correct.

21. The quantity in the simple linear regression model is a random variable, assumed to be normally distributed with The estimated standard deviation is given by
a. SSE / (n – 2)
b.
c.
d.
e.

22. In simple linear regression analysis, if the residual sum of squares is zero, then the coefficient of determination must be
a. -1
b. 0
c. between -1 and zero
d. 1
e. between -1 and 1

23. In testing versus using a sample of 20 observations, the rejection region for .01 level of significance test is
a. t -2.878
b. t 2.878
c. -2.878 t 2.878
d. either t 2.878 or t -2.878
e. t = 0

24. Which of the following statements are not true?
a. The slope of the population regression line is the true average change in the independent variable x associated with a 1 – unit increase in the dependent variable y.
b. The slope of the least squares line, of the population regression line.
c. Inferences about the slope of the population regression line are based on thinking of the slope of the least squares line as a statistic and investigating its sampling distribution.
d. All of the above statements are true
e. Non of the above statements are true.

25. Which of the following statements are true?
a. The denominator of the slope of the least squares line is , which is a constant since it depends only on the and not on the
b. The slope of the least squares line is a linear function of the “independent” random variables each of which is normally distributed.
c. The distribution of the slope of the least squares line is always centered at the value of the slope of the population regression line.
d. All of the above statements are true.
e. None of the above statements are true.

26. Which of the following statements are not true?
a. The slope of the least squares line is an unbiased estimator of the slope coefficient of the true regression line.
b. The variance of the least squares line equals the variance of the random error divided by , where
c. Values of all close to one another imply a highly variable estimator of the slope of the true regression line.
d. Values of that are quite spread out results in a more precise estimator of the slope of the true regression line
e. All of the above statements are true

27. In testing using a sample of 22 observations, the test statistic value
is found to be t = -2.528. the approximated P-value of the test is
a. .01
b. .02
c. .025
d. .05
e. .99

28. Which of the following statements are true?
a. The assumptions of the simple linear regression model imply that the standardized variable has a t distribution with n – 2 degrees of freedom.
b. The estimated standard error of ; namely , will tend to be small when there is little variability in the distribution of and large otherwise.
c. There is an estimated standard error for the statistic from which a confidence interval for the intercept of the population regression line can be calculated.
d. The most commonly encountered pair of hypotheses about the slope of the population regression line is
e. All of the above statements are true.

29. Which of the following statements are not true?
a. The model utility test is the test of in which case the test statistic value is the t ratio t = .
b. The null hypothesis can be tested against the alternative hypothesis by constructing an ANOVA table and rejecting if the test statistic value , when n is the sample size.
c. The simple linear regression model should not be used for further inferences (estimates of mean value or predictions of future values) unless the model utility test results in acceptance of for a suitably small significance level .
d. All of the above statements are true.
e. None of the above statements are true.

30. Which of the following statements are not true?
a. , where is a specified value of the independent variable x, can be regarded either as a point estimate of (the expected or true average value of Y when ) or as a prediction of the Y value that will result from a single observation made when .
b. Before we obtain sample data, both and are subject to sampling variability – that is, they are both statistics whose values will vary from sample to sample.
c. A confidence interval for a mean y value in regression is based on properties of the sampling distribution of the statistic .
d. All of the above statements are true.
e. None of the above statements are true.

31. Which of the following statements are not true?
a. Let where is some fixed value of x, then the mean value of is E( ) = .
b. is an unbiased estimator for (i.e., for )
c. The estimation for is more precise when is near the center of the ’s then when it is far from the x values at which observations have been made.
d. All of the above statements are true.
e. None of the above statements are true.

32. Which of the above statements are not true?
a. A t variable obtained by standardizing leads to a confidence interval and test procedure concerning (the expected value of Y when ).
b. The variable T = has a t distribution with n – 1 degrees of Freedom, where n is the sample size and is a specified value of the independent variable x.
c. A 100 confidence interval for ; the expected value of Y when , is given by where n is the sample size.
d. All of the above statements are true.
e. None of the above statements are true.

33. Which of the following statements are true?
a. The confidence interval for ; the expected value of Y when is centered at the point estimate for and extends out to each side by an amount that depends on the confidence level and on the extent of variability in the estimator on which the point estimated is based.
b. In some situations, a confidence interval is desired not just for a single x value but for two or more x values.
c. The joint or simultaneous confidence level for a set of K Bonferroni intervals is guaranteed to be at least 100(1 – K )%.
d. We refer to an interval of plausible values for a future Y as a prediction interval rather than a confidence interval, since a future value of Y is a random variable.
e. All of the above statements are true.

34. A 95% confidence interval for the expected value of Y is constructed first for x = 2, then for x = 3, then for x = 4, and finally for x = 5. This yields a set of four confidence intervals for which the joint or simultaneous confidence level is guaranteed to be at least
a. 95%
b. 90%
c. 85%
d. 80%
e. 75%

35. The test statistic value for testing is found to be z = 1.52. The corresponding P-value for the test is
a. .9357
b. .0643
c. .1286
d. .4357
e. .3714

36. In testing the rejection region for .05 level of significance test is
a. z 1.645
b. z -1.645
c. -1.645 z 1.645
d. either z 1.645 or z -1.645
e. z = 1.96

37. Which of the following statements are true?
a. When the relationship between x and y is positive, will be positive, a negative relationship implies that will be negative.
b. By changing the units of measurement of either x or y, , can be made either arbitrarily large in magnitude or arbitrarily close to zero.
c. A reasonable condition to impose on any measure of how strongly x and y are related is that the calculated measure should not depend on the particular units used to measure them. This condition is achieved by using the sample correlation coefficient r.
d. All of the above statements are true.
e. None of the above statements are true.

38. Which of the following statements are not true about the sample correlation coefficient r?
a. The value of r depends on which of the two variables under study is labeled x and which is labeled y.
b. The value of r is independent of the units in which x and y are measure.
c. The value of r is always between -1 and +1, inclusive.
d. The value of r = 1 if all pairs lie on a straight line with positive slope.
e. The value of r = -1 if all pairs lie on a straight line with negative slope.

39. Which of the following statements are not true?
a. The proportion of variation in the dependent variable explained by fitting the simple linear regression model does not depend on which variable is treated as the dependent variable.
b. A value of the sample correlation coefficient r near 0 is not evidence of the lack of a strong relationship, but only the absence of a linear relation, so that such a value of r must be interpreted with caution.
c. It may surprise you that a value of the sample correlation coefficient r = .5 is considered weak, but implies that in a regression of y or x, only 25% of observed y variation would be explained by the model.
d. The sample correlation coefficient r can be used to make various inferences about the population correlation coefficient .
e. The square root of the sample correlation coefficient gives the value of the coefficient of determination that would result from fitting the simple linear regression model.

40. A data set consists of 15 pairs of observations If each is replaced by and if each is replaced by then the sample correlation coefficient r
a. increases by 3/15
b. increases by 4/15
c. remains unchanged
d. decreases by 3/15
e. decreases by 4/15

41. A data set consists of 20 pairs of observations If each is replaced by and if each is replaced by then the sample correlation coefficient r
a. decreases by .05
b. decreases by .10
c. increases by.05
d. increases by .10
e. remains unchanged

42. Which of the following statements are not true?
a. The correlation coefficient r is a measure of how strongly related x and y are in the observed sample, while the correlation coefficient is a measure of how strongly related x and y are in the population
b. When is a sample from a bivariate normal distribution, where are the mean and standard deviation of X, and are the mean and standard deviation of Y , then the random variable where has approximately a t distribution with 2n degrees of freedom.
c. When is true, the test statistic where has a t distribution with n -2 degrees of freedom.
d. All of the above statements are true
e. None of the above statements are true.

1. The accompanying observations on x = hydrogen concentration (ppm) using a gas chromatography method and y = concentration using a new sensor method were obtained in a recent study

x 47 62 65 70 70 78 95 100 114 118
y 38 62 53 67 84 79 93 106 117 116

x 124 127 140 140 140 150 152 164 198 221
y 127 114 134 139 142 156 149 154 200 215

Construct a scatter plot. Does there appear to be a very strong relationship between the two types of concentration measurements? Do the two methods appear to be measuring roughly the same quantity? Explain your reasoning.

2. An experiment is conducted to investigate how the behavior of mozzarella cheese varied with temperature. Consider the accompanying data on x = temperature and y = elongation (%) at failure of the cheese.

x 59 63 67 72 74 78 83
y 118 182 247 208 197 160 132

a. Construct a scatter plot in which the axes intersect at (0,0). Mark 0, 20, 40, 60, 80, and 100 on the horizontal axis and 0, 50, 100, 150, 200, and 250 on the vertical axis.
b. Construct a scatter plot in which the axes intersect at (55,100). Does this plot seem preferable to the one in part (a)? Explain your reasoning.
c. What do the plots of parts (a) and (b) suggest about the nature of the relationship between the two variables?

3. Suppose the expected cost of a production run is related to the size of the run by the equation y = 4000 + 10x. Let Y denote an observation on the cost of a run. If the variable size and cost are related according to the simple linear regression model, could it be the case that Explain.

4. Suppose that in a certain chemical process the reaction time y (hour) is related to the temperature in the chamber in which the reaction takes place according to the simple linear regression model with equation y = 5.00 – .01x and = .075.

a. What is the expected change in reaction time for a F increase in temperature? For a 10 F increase in temperature?
b. What is the expected reaction time when temperature is 200 F? When temperature is 250 F?
c. Suppose five observations are made independently on reaction time, each one for a temperature of 250 F. What is the probability that all five times are between 2.4 and 2.6 hours?
d. What is the probability that two independently observed reaction times for temperatures apart are such that the time at the higher temperature exceeds the time at the lower temperature?

5. The accompanying data on x = current density (mA/cm ) and y = rate of deposition appeared in a recent study. Do you agree with the claim by the article’s author that “a linear relationship was obtained from the tin-lead rate of deposition as a function of current density”? Explain your reasoning.

x 20 40 60 80
y .24 1.20 1.71 2.22

6. A scatter plot, along with the least squares line, of x = rainfall volume and y = runoff volume for a particular location were given. The accompanying values were read from the plot.

x 5 12 14 17 23 30 40 47
y 4 10 13 15 15 25 27 46

x 55 67 72 81 96 112 127
y 38 46 53 70 82 99 100

a. Does a scatter plot of the data support the use of the simple linear regression model?
b. Calculate point estimates of the slope and intercept of the population regression line.
c. Calculate a point estimate of the true average runoff volume when rainfall volume is 50.
d. Calculate a point estimate of the standard deviation
e. What proportion of the observed variation in runoff volume can be attributed to the simple linear regression relationship between runoff and rainfall?

7. The accompanying data was read from a graph that appeared in a recent study. The independent variable is and the dependent variable is steel weight loss (g/m ).

x 14 18 40 43 45 112
y 280 350 470 500 560 1200

a. Construct a scatter plot. Does the simple linear regression model appear to be reasonable in this situation?
b. Calculate the equation of the estimated regression line.
c. What percentage of observed variation in steel weight loss can be attributed to the model relationship in combination with variation in deposition rate?
d. Because the largest x value in the sample greatly exceeds the others, this observation may have been very influential in determining the equation of the estimated line. Delete this observation and recalculate the equation. Does the new equation appear to differ substantially from the original one (you might consider predicted values)?

8. The following summary statistics were obtained from a study that used regression analysis to investigate the relationship between pavement deflection and surface temperature of the pavement at various locations on a state highway. Here x = temperature ( ) and y = deflection adjustment factor

, ,

a. Compute and the equation of the estimated regression line. Graph the estimated line.
b. What is the estimate of expected change in the deflection adjustment factor when temperature is increased by 1 F?
c. Suppose temperature were measured in C rather than in F. What would be the estimated regression line? Answer part (b) for an increase of 1 C. (Hint: F = (9/5) C + 32; now substitute for the “old x” in terms of the new x.”)
d. If a 200 F surface temperature were within the realm of possibility, would you use the estimated line of part (a) to predict deflection factor for this temperature? Why or why not?

9. A study reports on an investigation of methods for age determination based on tooth characteristics. With x = percentage of root with transparent dentine and y = age (years), consider the following representative data for anterior teeth:

x 15 19 31 39 41 44 47 48 55 64
y 23 52 65 55 32 60 78 59 61 60

a. Calculate a 95% CI for the expected change in age associated with a 1% increase in transparent dentine content. What does the interval suggest about usefulness of the model?
b. Carry out a test of model utility based on the P-

10. A study reports the results of a regression analysis based on n = 15 observations in which x = filter application temperature ( C) and y = % efficiency of BOD removal. Calculated quantities include

a. Test at level .01 which states that the expected increase in % BOD removal is 1 when filter application temperature increases by 1 C, against the alternative
b. Compute a 99% CI for the expected increase in % BOD removal for a 1 C increase in filter application temperature.

11. A study contains a plot of the following data pairs, where x = pressure of extracted gas (microns) and y = extraction time (min):

x 40 130 155 160 260 275 325 370 420 480
y 2.5 3.0 3.1 3.3 3.7 4.1 4.3 4.8 5.0 5.4

a. Estimate and the standard deviation of
b. Suppose the investigators had believed prior to the experiment that on average there would be an increase of .006 min. in extraction time associated with an increase of 1 micron in pressure. Use the P-value approach with a significance level of .10 to decide whether the data contradicts this prior belief.

12. An investigation of the relationship between traffic flow x (1000’s of cars per 24 hours) and lead content y bark on trees near the highway ( dry wt) yielded the data in the accompanying table.

x 8.3 8.3 12.1 12.1 17.0 17.0 17.0 24.3 24.3 24.3 33.6
y 227 312 362 521 640 539 728 945 738 759 1263

The summary statistics are:

,
In addition, the least squares estimates are given by:
Carry out the model utility test using the ANOVA approach for the traffic flow/lead-content data of Example 12.6. Verify that it gives a result equivalent to that of the t test.

13. The simple linear regression model provides a very good fit to a data set on rainfall and runoff volume. The equation of the least squares line is

a. Use the fact that when rainfall volume is 40 m to predict runoff in a way that conveys information about reliability and precision. Does the resulting interval suggest that precise information about the value of runoff for this future observation is available? Explain your reasoning.
b. Calculate a PI for runoff when rainfall is 50 using the same prediction level as in part (a). What can be said about the simultaneous prediction level for the two intervals you have calculated?

14. You are told that a 95% CI for expected lead content when traffic flow is 15, based on a sample of n = 10 observations, is (462.1, 597.7). Calculate a CI with confidence level 99% for expected lead content when traffic flow is 15.
15. An experiment to measure the macroscopic magnetic relaxation time in crystals as a function of the strength of the external biasing magnetic field (KG) yielded the following data

x 11.0 12.5 15.2 17.2 19.0 20.8
y 187 225 305 318 367 365

x 22.0 24.2 25.3 27.0 29.0
y 400 435 450 506 558

The summary statistics are and
Compute the following:

a. A 95% CI for expected relaxation time when field strength equals 18.
b. A 95% PI for future relaxation time when field strength equals 18.
c. Simultaneous confidence intervals for expected relaxation time when field strength equals 15, 18, and 20; your joint confidence coefficient should be at least 97%.

16. Infestation of crops by insects has long been of great concern to farmers and agricultural scientists. A study reports data on x = age of a cotton plant (days) and y = % damaged squares. Consider the accompanying n = 12 observations:

x 9 12 12 15 18 18
y 11 12 23 30 29 52

x 21 21 27 30 30 33
y 41 65 60 72 84 93

a. Why is the relationship between x and y not deterministic?
b. Does a scatter plot suggest that the simple linear regression model will describe the relationship between the two variables?
c. The summary statistics are
Determine the
equation of the least squares line.
d. Predict the percentage of damaged squares when the age is 20 days by giving an interval of
plausible values.

17. The Turbine Oil Oxidation Test (TOST) and the Rotating Bomb Oxidation Test (RBOT) are two different procedures for evaluating the oxidation stability of steam turbine oils. The accompanying observations on x = TOST time (hr) and y = RBOT time (min) for 12 oil specimens have been reported:

TOST 4200 3600 3750 3675 4050 2770
RBOT 370 340 375 310 350 200

TOST 4870 4500 3450 2700 3750 3300
RBOT 400 375 285 225 345 285

a. Calculate and interpret the value of the sample correlation coefficient .
b. How would the value of r be affected if we had let x = RBOT time and y = TOST time?
c. How would the value of r be affected if RBOT time were expressed in hours?
d. Normal probability plots indicate that Both TOST and ROBT time appear to have come from normally distributed populations. Carry out a test of hypotheses to decide whether RBOT time and TOST time are linearly related.

18. Toughness and fibrousness of asparagus are major determinants of quality. This was the focus of a study reported in “Post-Harvest Glyphosphate Application Reduces Toughening, Fiber Content, and Lignification of Stored Asparagus Spears” (J. of the Amer. Soc. Of Horticultural Science, 1988: 569-572). The article reported the accompanying data (read from a graph) on x = shear force (kg) and y = percent fiber dry weight.

x 46 48 55 57 60 72 81 85 94
y 2.18 2.10 2.13 2.28 2.34 2.53 2.28 2.62 2.63

x 109 121 132 137 148 149 184 185 187
y 2.50 2.66 2.79 2.80 3.01 2.98 3.34 3.49 3.26

a. Calculate the value of the sample correlation coefficient. Based on this value, how would you describe the nature of the relationship between the two variables?
b. If a first specimen has a larger value of shear force than does a second specimen, what tends to be true of percent dry fiber weight for the two specimens.
c. If shear force is expressed in pounds, what happens to the value of r? Why?
d. If the simple linear regression model were fit to this data, what proportion of observed variation in percent fiber dry weight could be explained by the model relationship?
e. Carry out a test at significance level .01 to decide whether there is a positive linear association between the two variables.

19. Hydrogen content is conjectured to be an important factor in porosity of aluminum alloy castings. The accompanying data on x = content and y = gas porosity for one particular measurement technique have been reported:

x .18 .20 .21 .21 .21 .22 .23
y .46 .70 .41 .45 .55 .44 .24

x .23 .24 .24 .25 .28 .30 .37
y .47 .22 .80 .88 .70 .72 .75

MINITAB gives the following output in response to a CORRELATION command:

Correlation of Hydrogen and Porosity = 0.449

a. Test at level .05 to see whether the population correlation coefficient differs from 0.
b. If a simple linear regression analysis had been carried out, what percentage of observed variation in porosity could be attributed to the model relationship?

20. A sample of n = 500 (x, y) pairs was collected and a test of was carried out. The resulting P-value was computed to be .00032.

a. What conclusion would be appropriate at level of significance .001?
b. Does this small P-value indicate that there is a very strong linear relationship between x and y (a value of that differs considerably from 0)?
Explain.

21. A sample of n = 10,000 (x, y) pairs resulted in r = .022. Test at level
.05. Is the result statistically significant? Comment on the practical significance of your analysis.

Chapter 13

COMPLETION

1. Multiple regression analysis involves building models for relating dependent variable y to __________or more independent variables.

2. Many statisticians recommend __________ for an assessment of model validity and usefulness. These include plotting the residuals or standardized residuals on the vertical axis versus the independent variable or fitted values on the horizontal axis.

3. If the regression parameters and are estimated by minimizing the expression , where the ’s are weights that decrease with increasing , this yields____________estimates.

4. The principle__________selects and to minimize .

5. The transformation __________ is used to linearize the function

6. A function relating y to x is ___________ if by means of a transformation on x and / or y, the function can be expressed as , where is the transformed independent variable and is the transformed dependent variable.
7. For the exponential function , only the __________ variable is transformed via the transformation __________ to achieve linearity.

8. The transformation __________ of the dependent variable y and the transformation __________ of the independent variable x are used to linearize the power function

9. The transformation __________ is used to linearize the reciprocal function

10. The additive exponential and power models, and are ___________ linear.

11. The function has been found quite useful in many applications. This function is well known as the ___________function.

12. In logistic regression it can be shown that . The expression on the left-hand side of this equality is well known as the ___________.

13. The kth -degree polynomial regression model equation is , where is a normally distributed random variable with = ___________ and = ___________

14. With , the sum of squared residuals (error sum of squares) is . Hence the mean square error is MSE =__________/___________.

15. If we let , and , then SSE/SST is the proportion of the total variation in the observed ’s that is ___________by the polynomial model.

16. If we let , and , then 1-SSE/SST is the proportion of the total variation in the observed ’s that is __________ by the polynomial model. It is called the ____________ ,and is denoted by R .

17. In general, with is the error sum of squares from a kth degree polynomial, ____________ , and ____________ whenever > k.

18. If = .75 is the value of the coefficient of multiple determination from a cubic regression model and that n =15, then the adjusted value is _____________.

19. The regression coefficient in the multiple regression model is interpreted as the expected change in ___________ associated with a 1-unit increase in ___________,while___________ are held fixed.

20. A dichotomous variable, one with just two possible categories, can be incorporated into a regression model via a ___________ or __________ variable x whose possible values 0 and 1 indicate which category is relevant for any particular observations.

21. Incorporating a categorical variable with 5 possible categories into a multiple regression model requires the use of __________ dummy variables.

22. Inferences concerning a single parameter in a multiple regression model with 5 predictors and 25 observations are based on a standardized variable T which has a t distribution with ___________ degrees of freedom.

23. If a data set on at least five predictors is available, regressions involving all possible subsets of the predictors involve at least __________different models

24. A multiple regression model with k predictors will include __________ regression parameters, because will always be included.

25. If is the error sum of squares computed from a model with k predictors and n observations, then the mean squared error for the model is = __________/__________.

26. When the numbers of predictors is too large to allow for an explicit or implicit examination of all possible subsets, several alternative selection procedures generally will identify good models. The simplest such procedure is the __________, known as BE method.

27. In many multiple regression data sets, the predictors are highly interdependent. When the sample values can be predicted very well from the other predictor values, for at least one predictor, the data is said to exhibit __________.

MULTIPLE CHOICE

1. Which of the following statements are true?
a. One way to study the fit of a model is to superimpose a graph of the best-fit function on the scatter plot of the data.
b. An effective approach to assessment of model adequacy is to compute the fitted or predicted values and the residuals , then plot various functions of these computed quantities, and examine the plots either to confirm our choice of model or for indications that the model is not appropriate.
c. Multiple regression analysis involves building models for relating the dependent variable y to two or more independent variables.
d. All of the above statements are true.
e. None of the above statements are true.

2. Which of the following statements are not true?
a. If a particular standardized residual is 1.5, then the residual itself is 3 estimated standard deviations larger than what would be expected from fitting the correct model.
b. Plotting the fitted or predicted values on the vertical axis versus the actual values on the horizontal axis is a diagnostic plot that can be used for assessing model validity and usefulness.
c. A normal probability plot of the standardized residuals is a basic plot that man statisticians recommend for an assessment of model validity and usefulness.
d. All of the above statements are true.
e. None of the above statements are true.

3. Which of the following statements are not true?
a. Provided that the model is correct, no residual plot should exhibit distinct patterns.
b. Provided that the model is correct, the residuals should be randomly distributed about 0 according to a normal distribution, so all but a very few standardized residuals should lie between -2 and +2 ( i.e., all but a few residuals are within 2 standard deviations of their expected value 0 ).
c. If we plot the fitted or predicted values on the vertical axis versus the actual values on the horizontal axis, and the plot yields points close to the line, then the estimated regression function gives accurate predictions of the values actually observed.
d. All of the above statements are true.
e. None of the above statements are true.

4. Quite frequently, residual plots as well as other plots of the data will suggest some difficulties or abnormality in the data. Which of the following statements are not considered difficulties?
a. A nonlinear probabilistic relationship between x and y is appropriate.
b. The variance of the error term (and of Y ) is a constant .
c. The error term does not have a normal distribution.
d. The selected model fits the data well except for very few discrepant or outlying data values, which may have greatly influenced the choice of the best-fit function.
e. One or more relevant independent variables have been omitted from the model.

5. A multiple regression model has
a. One independent variable.
b. Two dependent variables
c. Two or more dependent variables.
d. Two or more independent variables.
e. One independent variable and one independent variable.

6. In multiple regression models, the error term is assumed to have:
a. a mean of 1.
b. a standard deviation of 1.
c. a variance of 0.
d. negative values.
e. normal distribution.

7. Which of the following statements are not true?
a. The exponential function is intrinsically linear.
b. The power function can be linearized by the transformations and .
c. The function is intrinsically linear.
d. All of the above statements are true.
e. None of the above statements are true.

8. Which of the following statements are true?
a. The function is intrinsically linear.
b. The reciprocal function can be linearized by the transformation .
c. For an exponential function relationship , only y is transformed to achieve linearity.
d. All of the above statements are true.
e. None of the above statements are true.

9. Which of the following statements are not true?
a. The function is intrinsically linear.
b. Intrinsically linear functions lead directly to probabilistic models which, though not linear in x as a function, have parameters whose values are easily estimated using ordinary least squares.
c. The multiplicative exponential model is intrinsically linear probabilistic model.
d. All of the above statements are true.
e. None of the above statements are true.

10. Which of the following statements are not true?
a. In analyzing transformed data, one should keep in mind that if a transformation on y has been made and one wishes to use the standardized formulas to test hypothesis or construct confidence intervals, the transformed error term should be at least approximately normally distributed.
b. When y is transformed, the coefficient of determination value from the resulting regression refers to variation in the ’s explained by the original (non-transformed) regression model.
c. The additive exponential and power models, and , respectively, are not intrinsically linear.
d. When the transformed model satisfies all required assumptions, the method of least squares yields best estimates of the transformed parameters. However, estimates of the original parameters may not be best in any sense, though they will be reasonable.
e. All of the above statements are true.

11. The coefficient of multiple determination R is
a. SSE/SST
b. SST/SSE
c. 1-SSE/SST
d. 1-SST/SSE
e. ( SSE + SST ) / 2

12. Which of the following statements are true?
a. The kth-degree polynomial model with K large is quite unrealistic in virtually all applications, and in most applications k =2 (quadratic) or k =3 (cubic) is appropriate.
b. The objective of regression analysis is to find a model that is both simple (relatively few parameters) and provides a good fit to the data.
c. A higher-degree polynomial may not specify a better model than a lower-degree model despite its higher coefficient of multiple determination value.
d. All of the above statements are true.
e. None of the above statements are true.

13. Which of the following statements are not true?
a. To balance the cost of using more parameters against the gain in the coefficient of multiple determination , many statisticians use the adjusted .
b. It is always true whenever for any kth-degree polynomial regression model.
c. It is always true > whenever for any kth -degree polynomial regression model.
d. All of the above statements are true.
e. None of the above statements are true

14. If the value of the coefficient of multiple determination is .80 for a quadratic regression model, and that n = 11, then the adjusted value is
a. .75
b. .80
c. .85
d. .90
e. .95

15. For the quadratic model with regression function , the parameters characterize the behavior of the function near
a. x = 2.0
b. x = 1.5
c. x = 1.0
d. x = .05
e. x = 0.0

16. For a multiple regression model, , and , then the proportion of the total variation in the observed ’s that is not explained by the model is
a. .76
b. .24
c. 310
d. 190
e. .52

a. The value of the error term
b. The number of dependent variables in the model
c. The number of parameters in the model
d. The number of outliers
e. The level of significance

18. Which of the following statements are not true?
a. In multiple regression, the objective is to build a probabilistic model that relates a dependent variable y to more than one independent or predictor variable.
b. , where E ( ) = 0 and V( ) = is the equation of the general additive multiple regression model.
c. The coefficient in the multiple regression model is
interpreted as the expected change in Y when is held constant (fixed).
d. All of the above statements are true.
e. None of the above statements are true.

19. Which of the following statements are true?
a. In general, it is not only permissible for some independent or predictor variables to the mathematical functions of others, but also of often highly desirable in the sense that the resulting model may be much more successful in explaining variation in y than any model without such predictors.
b. Polynomial regression is indeed a specific case of multiple regression.
c. The coefficient in the multiple regression model is interpreted as the expected change in Y with a 1-unit increase in , when are held fixed.
d. All of the above statements are true.
e. None of the above statements are true.

20. For the case of two independent variables and , which of the following statements are not true?
a. is the first-order no-interaction model
b. is the second-order no interaction model
c. is the model with first-order predictors and interaction
d. is the complete second-order or full quadratic model is
e. All of the above statements are true.

21. Which of the following statements are not true?
a. The way to incorporate a qualitative (categorical) variable with three possible categories into a regression model is to define a single-numerical variable with coded values such as 0, 1, and 2 corresponding to the three categories.
b. Incorporating a categorical variable with c possible categories into a multiple regression model requires the use of c-1 indicator variables.
c. The positive square root of the coefficient of multiple determination is called the multiple correlation coefficient R.
d. All of the above statements are true.
e. None of the above statements are true.

22. Which of the following statements are true?
a. The proportion of total variation explained by the multiple regression model is ; the coefficient of multiple determination.
b. The coefficient of multiple determination is often adjusted for the number of parameters (k+1) in the model by the formula
c. With multivariate data, there is no preliminary picture analogous to a scatter plot to indicate whether a particular multiple regression model will be judged useful.
d. The model utility test in multiple regression involves testing versus (i = 1, 2, ……, k)
e. All of the above statements are true.

23. In multiple regression analysis with n observations and k predictors (or equivalently k+1 parameters), inferences concerning a single parameter are based on the standardized variable , which has a t-distribution with degrees of freedom equal to
a. n-k+1
b. n-k
c. n-k-1
d. n+k-1
e. n+k+1

24. Which of the following statements are not true?
a. The model utility F test is appropriate for testing whether there is useful information about the dependent variable in any of the k predictors (i.e., whether ).
b. If we let be the sum of squared residuals for the full multiple regression model with k predictors and be the corresponding sum for the reduced model with l predictors (l < k), then .
c. The standardized residuals in multiple regression result from dividing each residual by its estimated standard deviation; the formula for these standard deviations is substantially more complicated than in the case of simple linear regression.
d. All of the above statements are true.
e. None of the above statements are true.

25. A first-order no-interaction model has the form . As increases by 1-unit, while holding fixed, then y will be expected to
a. increase by 10
b. increase by 5
c. increase by 3
d. decrease by 3
e. decrease by 6

26. Incorporating a categorical variable with 4 possible categories into a multiple regression model
requires the use of
a. 4 indicator variables
b. 3 indicator variables
c. 2 indicator variables
d. 1 indicator variable
e. no indicator variables at all

27. A multiple regression model has the form , where the dependent variable Y represents (in \$1,000), represents unit price (in dollars), and represents advertisement (in dollars). As increases by \$1, while holding fixed, then sales are expected to
a. increase by \$7
b. increase by \$13
c. decrease by \$4
d. decrease by \$4,000
e. remain the same

28. Which of the following statements are not true?
a. Often theoretical considerations suggest a nonlinear relation between a dependent variable and two or more independent variables, whereas on other occasions, diagnostic plots indicate that some type of nonlinear function should be used.
b. The logistic regression model is used to relate a dichotomous variable y to a single prediction. Unfortunately, this model cannot be extended to incorporate more than one predictor.
c. A multiple regression model with k predictors includes k+1 regression parameters ’s, because will always be included.
d. All of the above statements are true.
e. None of the above statements are true.

29. Which of the following statements are not true?
a. , the coefficient of multiple determination for a k-predictor model, will virtually always increase as k does, and can never decrease.
b. We are not interested in the number of predictors k that maximizes , the coefficient of multiple determination for a k-predictor model. Instead, we wish to identify a small k for which is nearly as large as for all predictors in the model.
c. is the mean squared error for a k-predictor model.
d. All of the above statements are true.
e. None of the above statements are true.

30. Which of the following statements are not true?
a. Generally speaking, when a subset of k predictors (k < m) is used to fit a model, the
estimators will be unbiased for , and will also be
unbiased estimator for the true E(Y).
b. When the number of predictors is too large to allow for explicit or implicit examination
of all possible subsets, several alternative selection procedures generally will identify good models.
c. The backward elimination method starts with the model in which all predictors under
considerations are used.
d. All of the above statements are true.
e. None of the above statements are true.

31. Which of the following statements are true?
a. The forward selection method, an alternative to the backward elimination method, starts with no predictors in the model and consider fitting in turn the model with only , only ,….., and finally only .
b. The stepwise procedure most widely used is a combination of forward selection (FS) method and backward elimination (BE) method.
c. The stepwise procedure starts by adding variables to the model, but after each addition it examines those variables previously entered to see whether any is a candidate for elimination.
d. All of the above statements are true.
e. None of the above statements are true.

32. Which of the following statements are true?
a. The idea behind the stepwise procedure is that with forward selection, a single variable may be more strongly related to y than either of two or more other variables individually, but the combination of those variables may make the single variable subsequently redundant.
b. When the predictors are highly interdependent, the data is said to exhibit multicollinearity.
c. There is unfortunately no consensus among statisticians as to what remedies are appropriate when sever multicollinearity is present. One possibility involves continuing to use a model that includes all the predictors but estimating parameters by using something other than least squares.
d. All of the above statements are true.
e. None of the above statements are true.

1. Suppose the variables x=commuting distance and y=commuting time are related according to the simple linear regression model with

a. If n=5 observations are made at the x values calculate the standard deviations of the five corresponding residuals.
b. Repeat part (a) for
c. What do the results of parts (a) and (b) imply about the deviation of the estimated line from the observation made at the largest sampled x value?

2. Wear resistance of certain nuclear reactor components made of Zircaloy-2 is partly determined by properties of the oxide layer. The following data appears in a study that proposed a new nondestructive testing method to monitor thickness of the layer. The variables are x =oxide-layer thickness ( and y =eddy-current respond (arbitrary units).

x 0 7 17 114 133 142 190 218 237 285
x 20.3 19.8 19.5 15.9 15.1 14.7 11.9 11.5 8.3 6.6

The equation of the least squares line is =20.6 – .047x. Calculate and plot the residuals against x and then comment on the appropriateness of the simple linear regression model.

a. Show that when the are the residuals from a simple linear regression.
b. Are the residuals from a simple linear regression independent of one another, positively correlated, or negatively correlated? Explain.
c. Show that for the residuals from a simple linear regression. [This result along with part (a) shows that there are two linear restrictions on the , resulting in a loss of 2 df when the squared residuals are used to estimate ]

a. Could a linear regression result in residuals 25, -25, 7, 19, -6, 11, and 17? Why or why not?
b. Could a linear regression result in residuals 25, -25, 7, 19, -6, -10, and 4 corresponding to x values 4, -3, 9, 13, -13, -19, and 26? Why or why not?

5. It is important to find characteristics of the production process that produce tortilla chips with an appealing texture. The following data on x = frying time (sec) and y = moisture content (%) are obtained:

x 5 10 15 20 25 30 45 60
x 16.3 11.4 8.1 4.5 3.4 2.9 1.9 1.3

a. Construct a scatter plot of y versus x and comment.
b. Construct a scatter plot of the (In(x), In(y)) pairs and comment.
c. What probabilistic relationship between x and y is suggested by the linear pattern in the plot of part (b)?
d. Predict the value of moisture content when frying time is 20 in a way that conveys information about reliability and precision.

6. Consider the following data on mass rate of burning x and flame length y:

x 1.7 2.2 2.3 2.6 2.7 3.0 3.2
x 1.3 1.8 1.6 2.0 2.1 2.2 3.0

x 3.3 4.1 4.3 4.6 5.7 6.1
x 2.6 4.1 3.7 5.0 5.8 5.3

a. Estimate the parameters of a power function model.
b. Assume that the power function is an appropriate model, test using a level .05 test.
c. Test the null hypothesis that states that the median flame length when burning rate is 5.0 is twice the median flame length when burning rate is 2.5 against the alternative that this is not the case.

7. An investigation of the influence of sodium benzoate concentration on the critical minimum pH necessary for the inhibition of Fe yielded the accompanying data, which suggests that expected critical minimum pH is linearly related to the natural logarithm of concentrate:

Concentration .01 .025 .1 .95
pH 5.1 5.5 6.1 7.3

a. What is the implied probabilistic model, and what are the estimates of the model parameters?
b. What critical minimum pH would you predict for a concentration of 1.0? Obtain a 95% PI for critical minimum pH when concentration is 1.0.

8. Suppose that the expected value of thermal conductivity y is a linear function of where x is lamellar thickness.

x 240 410 460 490 520 590 745 8300
x 12.0 14.7 14.7 15.2 15.2 15.6 16.0 18.1

a. Estimate the parameters of the regression function and the regression function itself.
b. Predict the value of thermal conductivity when lamellar thickness is 500 angstroms.

9. In each of the following cases, decide whether the given function is intrinsically linear. If so, identify and then explain how a random error term can be introduced to yield an intrinsically linear probabilistic model.

a.
b.
c. (a Gompertz curve)
d.

10. The following data on y=glucose concentration (g/L) and x=fermentation time (days) for a particular blend of malt liquor were obtained:

x 1 2 3 4 5 6 7 8
x 74 54 52 51 52 53 58 71

a. Verify that a scatter plot of the data is consistent with the choice of a quadratic regression model.
b. The estimated quadratic regression equation is Predict the value of glucose concentration for a fermentation time of 6 days and compute the corresponding residual.
c. Using SSE=61.77, what proportion of observed variation can be attributed to the quadratic regression relationship?
d. The n=8 standardized residuals based on the quadratic model are 1.91, -1.95, -.25, .58, .90, .04, -.66, and .20. Construct a plot of the standardized residuals versus x and a normal probability plot. Do the plots exhibit any troublesome features?
e. The estimated standard deviation of ; that is, Compute a 95% CI for .
f. Compute a 95% PI for a glucose concentration observation made after 6 days of fermentation time.

11. The viscosity (y) of an oil was measured by a cone and plate viscometer at six different cone speeds (x). It was assumed that a quadratic regression model was appropriate, and the estimated regression function resulting from the n=6 observations was

a. Estimate , the expected viscosity when speed is 75 rpm.
b. What viscosity would you predict for a cone speed of 60 rpm.
c. If and compute SSE
d. From part ( c ), Using SSE computed in part ( c ), what is the computed value of
e. If the estimated standard deviation of at level .01.

12. The accompanying data on exposure time to radiation x (in kr/16 hr) and dry weight of roots y (in mg x10-1) are given:

x 0 2 4 6 8
x 110 123 119 86 62

The estimated quadratic regression function seems to fit the data well.

a. Compute the predicted values residuals. Then compute SSE and .
b. Compute the coefficient of multiple determination
c. The estimated standard deviation of the estimator of the quadratic coefficient is Does the quadratic term belong in the model? State and test the appropriate hypotheses at level .05.
d. The estimated standard deviation of Use this information in part (c) to obtain joint CI’s for and with joint confidence level (at least) 95%.
e. The estimated standard deviation of is 5.01. Compute a 90% CI for .
f. Estimate the exposure time that maximizes expected dry weight of roots.

13. The following data resulted from an experiment to assess the potential of unburnt colliery spoil as a medium for plant growth. The variables are x=acid extractable cations and y=exchangeable acidity/total cation exchange capacity.

x -23 -5 16 26 30 38 52
x 1.50 1.46 1.32 1.17 .96 .78 .77

x 58 67 81 96 100 113
x .91 .78 .69 .52 .48 .55

Standardizing the independent variable x to obtain and fitting the regression function yielded the accompanying computer output.

Parameter Estimate Estimated St. Dev.
.8733 .0421
-.3255 .0316
.0448 .0319

a. Estimate .
b. Compute the value of the coefficient of multiple determination.
c. What is the estimated regression function using the unstandardized variable x?
d. What is the estimated standard deviation of computed in part ( c )?
e. Carry out a test using the standardized estimates to decide whether the quadratic term should be retained in the model. Repeat using the unstandardized estimates. Do your conclusions differ?
14. Cardiorespiratory fitness is widely recognized as a major component of overall physical well-being. Direct measurement of maximal oxygen uptake is the single best measure of such fitness, but direct measurement is time-consuming and expensive. It is therefore desirable to have a prediction equation for in terms of easily obtained quantities. Consider the variables

Here is one possible model, for male students: , and
a. Interpret .
b. What is the expected value of when weight 75 kg. age is 20 yr, walk time is 15 minutes, and heart rate is 140 b/m?
c. What is the probability that will be between 1.00 and 2.60 for a single observation made when the values of the predictors are as stated in part (b)?

15. A trucking company considered a multiple regression model for relating the dependent variable y=total daily travel time for one of its drivers (hours) to the predictors =distance traveled (miles) and the number of deliveries made. Suppose that the model equation is

a. What is the mean value of travel time when distance traveled is 50 miles and three deliveries are made?
b. How would interpret the coefficient of the predictor ? What is the interpretation of
c. If hour, what is the probability that travel time will be at most 6 hours when three deliveries are made and the distance traveled is 50 miles?

16. Let y = sales at a fast food outlet (1000’s of \$), number of competing outlets within a 1-mile radius, the population within a 1-mile radius (1000’s of people), and be an indicator variable that equals 1 if the outlet has a drive-up window and 0 otherwise. Suppose that the true regression model is

a. What is the mean value of sales when the number of competing outlets is 2, there are 8000 people within a 1-mile radius, and outlet has a drive-up window?
b. What is the mean value of sales for an outlet without a drive-up window that has three competing outlets and 5000 people within a 1-mile radius?
c. Interpret

17. A multiple regression model with four independent variables to study accuracy in reading liquid crystal displays was used. The variables were

y = error percentage for subjects reading a four-digit liquid crystal display
= level of backlight (ranging from 0 to 122 )
= character subtense (ranging from )
= viewing angle (ranging from )
=level of ambient light (ranging from 20 to 1500 lux)

The model fit to data was The resulting estimated coefficient were

a. Calculate an estimate of expected error percentage when
b. Estimate the mean error percentage associated with a backlight level of 20, character subtense of .5, viewing angle of 10, and ambient light level of 30.
c. What is the estimated expected change in error percentage when the level of ambient light is increased by 1 unit while all other variables are fixed at the values given in part (a)? Answer for a 100-unit increase in ambient light level.
d. Explain why the answers in part ( c ) do not depend on the fixed values of Under what conditions would there be such a dependence?
e. The estimated model was based on n=30 observations, with SST=39.2 and SSE=20.0. Calculate and interpret the coefficient of multiple determination, and then carry out the model utility test using

18. A study reports the accompanying data on discharge amount ( ), flow area ( ), and slope of the water surface (b, in m/m) obtained at a number of floodplain stations. The study proposed a multiplicative power model .

q 17.6 23.8 5.7 3.0 7.5
a 8.4 31.6 5.7 1.0 3.3
b .0048 .0073 .0037 .0412 .0413

q 89.2 60.9 27.5 13.2 12.2
a 41.1 26.2 16.4 6.7 9.7
b .0063 .0061 .0036 .0039 .0025

a. Use an appropriate transformation to make the model linear and then estimate the regression parameters for the transformed model. Finally, estimate (the parameters of the original model). What would be your prediction of discharge amount when flow area is 10 and slope is .01?
b. Without actually doing any analysis, how would you fit a multiplicative exponential model ?
c. After the transformation to linearity in part (a), a 95% CI for the value of the transformed regression function when a = 3.3 and b = .0046 was obtained from computer output as (.217, 1.755). Obtain a 95% CI for when a = 3.3 and b = .0046.

19. In an experiment to study factors influencing wood specific gravity, a sample of 20 mature wood samples was obtained, and measurements were taken on number of fibers/ in springwood ( ), number of fibers/ in summerwood ( ), % springwood ( ), light absorption in springwood ( ), and light absorption in summerwood ( ).

a. Fitting the regression function resulted in Does the data indicate that there is a linear relationship between specific gravity and at least one of the predictors? Test using
b. When is dropped from the model, the value of remains at .769. Compute adjusted for both the full model and the model with deleted.
c. When The total sum of squares is SST = .0196610. Does the data suggest that all of have zero coefficients in the true regression model? Test the relevant hypotheses at level .05.
d. The mean and standard deviation of were 52.540 and 5.4447, respectively, whereas those of were 89.195 and 3.6660, respectively. When the model involving these two standardized variables was fit, the estimated regression equation was What value of specific gravity would you predict for a wood sample with % springwood = 50 and % light absorption in summerwood = 90?
e. The estimated standard deviation of the estimated coefficient ( i.e, for of the standardized model) was .0046. Obtain a 95% CI for
f. Using the information in parts (d) and (e), what is the estimated coefficient of in the unstandardized model (using only predictors and ), and what is the estimated standard deviation of the coefficient estimator (i.e., in the unstandardized model)?
g. The estimate of for the two-predictor model is s = .02001, whereas the estimated standard deviation of (i.e., when = 50.5 and = 88.9) is .00482. Compute a 95% PI for specific gravity when % springwood = 50.5 and % light absorption in summerwood = 88.9.

20. In the accompanying table, we give the smallest SSE for each number of predictors k (k = 1,2,3,4) for a regression problem in which y=cumulative heat of hardening in cement, =% tricalcium aluminate, = % tricalcium silicate, = % aluminum ferrate, and = % dicalcium silicate.

Number of Predictors k Predictor (s) SSE
1 880.85
2 58.01
3 49.20
4 47.86
a. Use the criteria discussed in the text to recommend the use of a particular regression model.
b. Would forward selection result in the best two-predictor model? Explain.

21. A study reported data on y-tensile strength (MPa), = slab thickness (cm), = load (kg), = age at loading (days), and = time under test (days) resulting from stress tests of n=9 reinforced concrete slabs. The results of applying the BE elimination method of variable selection are summarized in the accompanying tabular format. Explain what occurred at each step of the procedure.

Step
Constant 1
8.496 2
12.670 3
12.989
-.29 -.42 -.49
T-RATIO -1.33 -2.89 -3.14
.0104 .0110 .0116
T-RATIO 6.30 7.40 7.33
.0059
T-RATIO .83
-.023 -.023
T-RATIO -1.48 -1.53
S .533 .516 .570
R-SQ 95.81 95.10 92.82

Chapter 14

COMPLETION

1. A __________ generalizes a binomial experiment by allowing each trial to result in one of k possible outcomes (categories), where k > 2.

2. The chi-squared distribution has a single parameter , called the number __________ of the distribution.

3. The critical value for the chi-squared distribution is the value such that __________ of the area under the curve with degrees of freedom lies to the right of

4. Provided that for every i (i =1, 2, 3, 4, 5), the goodness-of-fit test statistic when category probabilities are completely specified has approximately a chi-squared distribution with __________ degrees of freedom.

5. If Z is a standard normal random variable; that is , then has a __________ distribution with degrees of freedom = __________.

6. The area to the right of 4.93 under the 2 degrees of freedom chi-squared curve is __________.

7. One may wish to test is not true. The null hypothesis is __________ hypothesis in the sense that each is a specified number, so that the expected cell counts when is true are uniquely determined numbers.

8. One may wish to test is not true. The null hypothesis is__________ hypothesis because knowing that is true does not uniquely determine the cell probabilities and expected cell counts but only their general form.

9. The goodness-of-fit test statistics, when there are 6 categories and 2 parameters to be estimated, has approximately a chi-squared distribution with __________ degrees of freedom.

10. If the computed value of the chi-squared test statistic is =2.83, and the test has 2 degrees of freedom, then the null hypothesis is __________ at .05 level of significance.

11. It is true that the more the sample correlation coefficient r deviates from __________, the less the normal probability plot resembles a straight line.

12. In a two-way contingency table, if the second row total is 125, the third column total is 60, and the total number of observations is 375, then the estimated expected count in cell (2, 3) is __________.

13. A two-way contingency table has 3 rows and 5 columns. Then, the number of degrees of freedom associated with the chi-squared test for homogeneity is __________.

14. The chi-squared test for homogeneity can safely be applied as long as the estimated expected count is at least __________ for all cells.

15. A two-way contingency table has r rows and c columns. Then, the number of degrees of freedom associated with the chi-squared test for independence is __________.

MULTIPLE CHOICE

1. Which of the following statements are not true?
a. The chi-squared distribution is used to obtain a confidence interval for the variance of a normal population.
b. Provided that for every i (i =1, 2,……, k), the goodness-of-fit test statistic when all k category probabilities are completely specified has approximately a t distribution with k-1 degrees of freedom.
c. A multinomial experiment generalizes a binomial experiment by allowing each trial to result in one of k possible outcomes, where k>2. In general, we refer to these outcomes as categories.
d. All of the above statements are correct.
e. None of the above statements are correct.

2. Which of the following statements are true regarding the critical value for the chi-squared distribution when
a. The area to the right of 9.488 is .05.
b. The area to the left of 9.488 is .95.
c. The total area under the chi-squared curve is 9.488.
d. All of the above statements are true.
e. None of the above statements are true.

3. In testing versus the alternative that states that at least one does not equal rejection of is appropriate at .10 significance level when the test statistic value is
a. greater than or equal to 9.236.
b. smaller than or equal to 11.070
c. between 9.236 and 11.070
d. smaller than or equal to 7.779
e. greater than or equal to 7.779

4. If , then has a
a. standard normal distribution.
b. binomial distribution.
c. multinomial distribution.
d. chi-squared distribution with one degree of freedom.
e. t distribution with two degrees of freedom.

5. Which of the following statements are true?
a. The goodness-of-fit test can be used when the number of categories k is two or more.
b. If , then has a t distribution with one degree of freedom.
c. The chi-squared tests in this chapter are not all upper-tailed.
d. The P-value for an upper-tailed chi-squared test is the area under the chi-squared curve with degrees of freedom to the left of the calculated test statistic value.
e. All of the above statements are true.

6. Which of the following statements are not true?
a. In testing is not true, the null hypothesis is simple hypothesis in the sense that each is a specified number.
b. In testing versus is not true, the null hypothesis is composite in the sense that knowing is true does not uniquely determine the cell probabilities and expected cell counts but only their general form.
c. A general rule of thumb for degrees of freedom in a chi-squared test is that d.f. is the difference between the number of freely determined cell counts and the number of independent parameters estimated.
d. All of the above statements are true
e. None of the above statements are true

7. Let be the maximum likelihood estimators of the unknown parameters , and let denote the test statistic value based on these estimators. If the data are classified into k categories, then the critical value that specifies a level upper-tailed test satisfies
a.
b.
c.
d.
e.

8. The goodness-of-fit test statistic, when there are k categories and m parameters to be estimated from the sample data, has approximately a chi-squared distribution with degrees of freedom, where equals
a. m-k-1
b. k-m
c. k-m-1
d. m+k-1
e. k-m+1

9. Which of the following statements are not true?
a. The chi-squared goodness-of-fit test can be used to test whether the sample comes from a specified family of continuous distributions, such as the normal family, but it cannot be used to test whether the sample comes from a specified discrete distribution, such as Poisson.
b. A normal probability plot is used for checking whether any member of the normal distribution family is plausible.
c. The sample correlation coefficient r is a quantitative measure of the extent to which points cluster about a straight line.
d. The null hypothesis of population normality is rejected if the sample correlation coefficient r is less than or equal to where is a critical value chosen to yield the desired significance level .
e. All of the above statements are true.

10. The number of degrees of freedom for a two-way contingency table with I rows and J columns is
a.
b.
c.
d.
e.

11. In a two-way contingency table with 3 rows and 5 columns, assume that the second row total is 120 and the fourth column total is 50, and the total number of observations is 600. Then, the estimated expected count in cell (2, 4) is
a. 50
b. 40
c. 30
d. 20
e. 10

12. The chi-squared test for homogeneity can safely be applied as long as each estimated expected county for all cells in the contingency table must be
a. at least 5
b. at most 10
c. at least 10
d. at most 15
e. any number between 10 and 15

13. Which of the following statements are not true?
a. The chi-squared test statistic used in testing for independence is identical to that used in testing for homogeneity.
b. In general, the number of degrees of freedom when testing for independence is larger than those used in testing for homogeneity.
c. The chi-squared test for independence can safely be applied as long as the estimated expected count for all cells in the contingency table is larger than or equal to5.
d. The rejection region in testing for homogeneity at significance level is that the test statistic value where I and J are the number of rows and columns, respectively, in the two-way contingency table.
e. All of the above statements are true.

14. The number of degrees of freedom in testing for independence when using a contingency table with 6 rows and 4 columns is:
a. 24
b. 10
c. 15
d. 20
e. 12

1. What conclusion would be appropriate for an upper-tailed chi-squared test in each of the following situations?

a.
b.
c.
d.

2. Say as much as you can about the P-value for an upper-tailed chi-squared test in each of the following situations:

a.
b.
c.
d.
e.

3. A statistics department at a state university maintains a tutoring service for students in its introductory service courses. The service has been staffed with the expectation that 40% of its students would be from the business statistics course, 30% from engineering statistics, 20% from the statistics course for social science students, and the other 10% from the course for agriculture students. A random sample of n=120 students revealed 50, 40, 18, and 12 from the four courses. Does this data suggest that the percentages on which staffing was based are not correct? State and test the relevant hypotheses using

4. Criminologists have long debated whether there is a relationship between weather conditions and the incidence of violent crime. A study classified 1400 homicides according to season, resulting in the accompanying data. Test the null hypothesis of equal proportions using by using the chi-squared table to say as much as possible about the P-value.

Winter Spring Summer Fall
338 345 382 335

5. A study focuses on the existence of any relationship between date of patient admission for treatment of alcoholism and patient’s birthday. Assuming a 365-day year (i.e., excluding leap year), in the absence of any relation, a patient’s admission date is equally likely to be any one of the 365 possible days. The investigators established four different admission categories: (1) within 7 days of birthday, (2) between 8 and 30 days, inclusive, from the birthday, (3) between 31 and 90 days, inclusive, from the birthday, (4) more than 90 days from the birthday. A sample of 200 patients gave observed frequencies of 11, 24, 69, and 96 for categories 1, 2, 3, and 4, respectively. State and test the relevant hypotheses using a significance level of .01.

6. Consider a large population of families in which each family has exactly three children. If the genders of the three children in any family are independent of one another, the number of male children in a randomly selected family will have a binomial distribution based on three trials.

a. Suppose a random sample of 200 families yields the following results. Test the relevant hypotheses.

Number of male children 0 1 2 3
Frequency 18 82 80 20

b. Suppose a random sample of 50 families in a non-human population resulted in observed frequencies of 15, 20, 12, and 3, respectively. Would the chi-squared test be based on the same number of degrees of freedom as the test in part (a)? Explain.

7. A certain type of flashlight is sold with the four batteries included. A random sample of 150 flashlights is obtained, and the number of defective batteries in each is determined, resulting in the following data?

Number of defective 0 1 2 3 4
Frequency 26 51 47 16 10

Let X be the number of defective batteries in a randomly selected flashlight. Test the null hypothesis that the distribution of X is Bin That is, with test
i=0,1,2,3,4
[Hint: To obtain the MLE of write the likelihood (the function to be maximized) as where the exponents are linear functions of the cell counts. Then take the natural log, differentiate with respect to equate the result to 0, and solve for ]

8. A study reports the following data on the number of borers in each of 120 groups of borers. Does the Poisson pmf provide a plausible model for the distribution of the number of borers in a group? (Hint: Add the frequencies for 7, 8, …, 12 to establish a single category “ 7.”)

Number of borers 0 1 2 3 4 5 6 7 8 9 10 11 12
Frequency 24 16 16 18 15 9 6 5 3 4 3 0 1

9. A study reports data on the rate of oxygenation in streams at C in certain region. The sample mean and standard deviation were computed as respectively. Based on the accompanying frequency distribution, can it be concluded that oxygenation rate is a normally distributed variable? Use the chi-squared test with

Rate (per day) Frequency
Below .100 14
.100-below .150 24
.150- below .200 28
.200 –below .250 18
.250 or more 16

10. A study reports the following data on 7-day flexural strength of nonbloated burned clay aggregate concrete samples (psi):

456 476 480 490 497 526 546 700 386
393 407 407 434 427 440 407 450 440
327 317 300 340 340 343 374 377 460

Test at level .10 to decide whether flexural strength is a normally distributed variable.

11. The accompanying data refers to leaf marks found on white clover samples selected from both long-grass areas and short-grass areas. Use a test at .01 level of significance to decide whether the true proportions of different marks are identical for the two types of regions.

Type of mark
L LL Y+YL O Others Row Total
Long-grass areas 410 12 23 8 277 730
Short-grass areas 515 6 16 13 220 770
Column Total 925 18 39 21 497

12. A study reports on research into the effect of different injection treatments on the frequencies of audiogenic seizures.

Treatment No response Wild running Clonic seizure Tonic seizure
Thienylalanine 22 8 25 45
Solvent 14 14 20 52
Sham 22 10 23 45
Unhandled 47 13 28 32

Does the data suggest that the true percentages in the different response categories depend on the nature of the injection treatment? State and test the appropriate hypotheses using

PTS: 1
13. Each individual in a random sample of high school and college students was cross-classified with respect to both political views and marijuana usage, resulting in the data displayed in the accompanying two-way table. Does the data support the hypothesis that political views and marijuana usage level are independent within the population? Test the appropriate hypotheses using level of significance .01.

Usage Level
Political Views Never Rarely Frequently
Liberal 480 180 120
Conservative 215 50 15
Other 170 45 85

14. Consider the accompanying 2 3 table displaying the sample proportions that fell in the various combinations of categories (e.g., 13% of those in the sample were in the first category of both factors).

1 2 3
1 .13 .19 .28
2 .07 .11 .22

a. Suppose the sample consisted of n = 100 people. Use the chi-squared test for independence with significance level .10.
b. Repeat part (a) assuming that the sample size was n = 1000.
c. What is the smallest sample size n for which these observed proportions would result in rejection of the independence hypothesis?

Chapter 15

COMPLETION

1. Because the t and F procedures require the distributional assumption of normality, they are not __________ procedures.

2. Because the t and F procedures are based on a particular parametric family of distributions (normal), they are not __________ procedures.

3. The observed value of the Wilcoxon Signed-Rank Test statistic is the sum of the ranks associated with the __________ observations.

4. Let be a random sample from a continuous and symmetric probability distribution with mean (and median) In testing using the Wilcoxon signed-rank test, the rejection region for level .01 test is

5. Let be a random sample from a continuous and symmetric probability distribution with mean (and median) In testing using the Wilcoxon signed-rank test, the rejection region for level .01 test is

6. For n = 8 observations, there are __________ possible signed-rank sequences, and to list these sequences would be very tedious.

7. The table of critical values for the Wilcoxon signed-rank test, as shown in your text, provides critical values for level tests only when n is less than or equal to __________.

8. When the underlying distribution being sampled is normal, the t test or the Wilcoxon signed-rank test can be used to test a hypothesis about the population mean However, the __________ is the best test in such a situation because among all level tests it is the one having minimum (i.e., minimum probability of Type II error)

9. The asymptotic relative efficiency (ARE) of one test with respect to another is essentially the limiting ratio of the __________ necessary to obtain identical error probabilities for the two tests.

10. When the underlying distribution is normal, the asymptotic relative efficiency (ARE) of the Wilcoxon signed-rank test with respect to the t test is approximately __________.

11. For any distribution, the asymptotic relative efficiency (ARE) will be at least __________, and for many distributions will be much greater than 1.

12. An alternative name for the Wilcoxon rank-sum test is the __________ test.

13. The Wilcoxon rank-sum test is applied to three values of x and four values of y. Then, the smallest possible value of the test statistic W is w = __________ and the largest possible value is w = __________.

14. The Wilcoxon rank-sum test statistic W is the sum of the ranks in the combined X and Y sample observations associated with __________ observations.

15. Suppose Then, the computed value of the Wilcoxon rank-sum test statistic W is w = __________.

16. For values of m (number of observed x values) and n (number of observed y values) that exceed __________, a normal approximation for the distribution of the Wilcoxon rank-sum statistic W can be used.

17. Suppose that a random sample of size 30 from a normal population is used to test The t test at level .10 specifies that should be rejected if the test statistic value t is either

18. A 95% distribution-free confidence interval for a parameter can be obtained from a level __________ test for

19. For large samples when the underlying population is normal, the Wilcoxon signed-rank interval will tend to be slightly __________ than the t interval.

20. For large samples when the underlying population is quite nonnormal (symmetric but with heavy tails), the Wilcoxon signed-rank interval will tend to be much __________ than the t interval.

21. The Wilcoxon signed-rank interval uses pairwise averages from a single sample, whereas the Wilcoxon rank-sum interval uses pairwise differences from __________ samples.

22. Let N be the total number of observations in a data set, and suppose we rank all N observations from 1 (the smallest to N (the largest . When is true, and denotes the rank of among the N observations, then

23. When is true, and either the number of population or treatment means I = 3 and the sample size (i = 1,2,3), or I > 3 and (i = 1, ,I), then the Kruskal-Wallis test statistic K has approximately a __________ distribution with __________ degrees of freedom.

24. When is tested using the Kruskal-Wallis test statistic K with approximate significance level , then is rejected if K is greater than or equal to __________.

25. Friedman’s test for a randomized block experiment rejects (where is the i th treatment effect) when the computed value of the test statistic is too __________.

26. When is tested using the Friedman’s test statistic with significance level .025 (where is the i th treatment effect), then is rejected if is greater than or equal to __________.

27. For moderate values of the number of blocks J, the Friedman’s test statistic has approximately a __________ distribution with __________ degrees of freedom, where I is the number of treatments.

MULTIPLE CHOICE

1. Which of the following tests would be an example of a nonparametric procedure?
a. Wilcoxon signed-rank test
b. The t test for population mean
c. The F test for population means
d. All of the above tests are correct.
e. Only B and C are correct tests.

2. When ranking data in a Wilcoxon signed-rank test, the data value that receives a rank of 1 is the
a. largest value regardless of its size
b. smallest value regardless of its size
c. middle value regardless of its size
d. 25th percentile value
e. 75th percentile value

3. Which of the following statements are not true?
a. The t and F procedures are not “distribution-free” procedures because they require the distributed assumption of normality.
b. The t and F procedures are not “nonparametric” procedures because they are based on the normal parametric family of distribution.
c. Distribution-free and nonparametric procedures are valid for very few different types of underlying distributions.
d. Generally speaking, the distribution-free procedures perform almost as well as their t and F counterparts on the “home ground” of the normal distribution, and will often yield a considerable improvement under nonnormal conditions.
e. All of the above statements are true.

4. A random sample of size 15 is drawn from a continuous and symmetric probability distribution with mean In testing using the Wilcoxon signed-rank test with approximate level of significance of .05, the rejection region for the test is
a.
b.
c. either or
d.
e.

5. Which of the following statements are not true?
a. Any normal distribution is symmetric, so symmetry is actually a weaker assumption than normality.
b. Any symmetric distribution is normal, so normality is actually a weaker assumption than symmetry.
c. When testing versus ( is the median) using the Wilcoxon signed-rank test, is rejected when the test statistic value is too large because a large value of indicates that most of the observations with large absolute magnitude are positive, which in turn indicates a median greater than 0.
d. When the data consists of pairs and the differences (i =1, are normally distributed, a paired t test is used to test hypotheses about the expected difference
e. All of the above statements are true.

6. Which of the following statements are true?
a. When the data consists of pairs and the differences (i = 1, are not assumed to be normally distributed, hypotheses tests about the expected differences can be tested by using the Wilcoxon signed-rank test on the provided that the distribution of the differences is continuous and symmetric.
b. When the sample size n is larger than 20, it can be shown that the Wilcoxon signed-rank test statistic has approximately a normal distribution when the null hypothesis is true.
c. When the underlying distribution being sampled is normal, either the t test or the Wilcoxon signed-rank test can be used to test a hypothesis about the population mean
d. A number of different efficiency measures have been proposed by statisticians; one that many statisticians regard as credible is called asymptotic relative efficiency (ARE).
e. All of the above statements are true.

7. Which of the following statements are not true?
a. When the underlying distribution being sampled has “heavy tails”; that is, when observed values lying far from population mean are relatively more likely than they are when the distribution is normal, the t test can perform poorly.
b. If the asymptotic relative efficiency (ARE) of one test with respect to a second equals .50, then when sample sizes are large, twice as large a sample size will be required of the first test to perform as well as the second test.
c. When the underlying distribution is normal, the asymptotic relative efficiency of the Wilcoxon signed-rank test with respect to the t test is approximately .95.
d. For any distribution, the asymptotic relative efficiency will be at least .86, and for many distributions it will be much greater than 2.
e. All of the above statements are true.

8. Which of the following statements are true?
a. In large-sample problems, the Wilcoxon signed-rank test is never very much less efficient than the t test and may be much more efficient if the underlying distribution is far from normal.
b. The Wilcoxon signed-rank test statistic for large-sample is where n is the sample size and is the sum of the ranks associated with the positive observations.
c. When the sample size n > 20, the Wilcoxon signed-rank test statistic has approximately a normal distribution with mean and variance given by , respectively
d. All of the above statements are true.
e. None of the above statements are true.

9. A random sample of size 24 is drawn from a continuous and symmetric probability distribution with mean In testing it can be shown that the Wilcoxon signed-rank test statistic has approximately a normal distribution with mean and standard deviation given, respectively, by
a. 150 and 35
b. 25 and 300
c. 150 and 300
d. 35 and 25
e. 25 and 35

10. Which of the following statements are not true?
a. When m and n (number of observed x values and y values, respectively, in the combined sample) exceed 8, the Wilcoxon rank-sum test statistic W has approximately a t distribution with m + n – 1 degrees of freedom
b. The table of critical values for the Wilcoxon rank-sum test, which is available in your text, gives information only for where m and n are the number of observed x and y values, respectively, in the combined sample.
c. If m and n are the number of observed x and y values, respectively, in the combined sample, then to use the table of critical values for the Wilcoxon rank-sum test, which is available in your text, the X and Y samples should be labeled so that
d. As with the Wilcoxon signed-rank test, the common practice in dealing with ties when using the Mann-Whitney test is to assign each of the tied observations in a particular set of ties the average of the ranks they would require if they differed very slightly from one another.
e. All of the above statements are true.

11. Which of the following tests would be an example of a distribution-free procedure?
a. The t test for population mean
b. The paired t test for the expected difference
c. The F test for two or more population means
d. The Wilcoxon rank-sum test
e. Only A and B are correct tests

12. Which of the following statements are not true?
a. When at least one of the sample sizes in a two-sample problem is small, the t test requires the assumption of normality (at least approximately).
b. The Wilcoxon rank-sum test statistic W is the sum of the ranks in the combined (X, Y) sample associated with X observations.
c. Because the Wilcoxon rank-sum test statistic W has a continuous probability distribution, there will always be a critical value corresponding exactly to one of the usual levels of significance.
d. All of the above statements are true.
e. None of the above statements are true.

13. Which of the following statements are true?
a. The Wilcoxon rank-sum test procedure is not distribution-free because it will not have the desired level of significance for a very large class of underlying distributions.
b. If there are three observed values of x and five observed values of y, then the smallest possible value of the Wilcoxon rank-sum test statistic W is w = 6 and the largest possible value is w = 21.
c. When the distributions being sampled are both normal with and therefore have the same shapes and spreads, only the pooled t test can be used in testing whereas the Wilcoxon rank-sum test should not be used because it is distribution-free.
d. When normality and equal variances both hold, the Wilcoxon rank-sum test is approximately 75% as efficient as the pooled t test in large samples.
e. All of the above statements are true.

14. Two independent random samples of sizes 5 and 7 are selected from two continuous distributions with means and that the two distributions have the same shape and spread. In testing using the Wilcoxon rank-sum test with approximate significance level of .05, the rejection region for the test is
a. either
b. either
c. either
d. either
e. either

15. Which of the following statements are not true?
a. A general method for obtaining confidence intervals takes advantage of a relationship between test procedures and confidence intervals; a % confidence interval for a parameter can be obtained from a level test for
b. To test using the Wilcoxon signed-rank test, where is the mean of a continuous symmetric distribution, the absolute values are ordered from largest to smallest, with the largest receiving rank 1 and the smallest receiving rank n. Each rank is then given the sign of its associated and the test statistic is the sum of the positively signed ranks.
c. For fixed Wilcoxon signed-rank interval will consist of all for which is not rejected at level where is the mean of a continuous symmetric distribution.
d. All of the above statements are true.
e. None of the above statements are true.

16. A sample of size 8 is selected at random from a continuous symmetric distribution. A 95% Wilcoxon signed-rank interval (actually 94.5%) has the form
a.
b.
c.
d.
e.

17. Which of the following statements are not true?
a. The efficiency of the Wilcoxon signed-rank interval relative to the t interval is roughly the same as that for the Wilcoxon test relative to the t test.
b. For large samples when the underlying population is normal, the Wilcoxon signed-rank interval will tend to be slightly longer than the t interval.
c. For large samples when the underlying population is quite nonnormal (symmetric but with heavy tails), then the Wilcoxon signed-rank interval will tend to be much shorter than the t interval.
d. All of the above statements are true.
e. None of the above statements are true.

18. Which of the following statements are true?
a. The Wilcoxon rank-sum test for testing is carried out by first combining the into one sample of size m + n and ranking them from smallest (rank 1) to largest (rank m + n). The test statistic W is then the sum of the ranks of the
b. The Wilcoxon rank-sum interval is very similar to the Wilcoxon signed=rank interval; the later uses pairwise averages from a single sample, whereas the former uses pairwise differences from two samples
c. The Wilcoxon rank-sum interval is quite efficient with respect to the t interval.
d. For large samples, the Wilcoxon rank-sum interval will tend to be only a bit longer than the t interval when the underlying populations are normal, and may be considerably shorter than the t interval if the underlying populations have heavier tails than do normal populations.
e. All of the above statements are true.

19. The nonparametric counterpart of the parametric single-factor ANOVA F-test is the
a. Wilcoxon signed-rank test.
b. Wilcoxon rank-sum test.
c. Kruskal-Wallis test.
d. Friedman’s test.
e. None of the above tests are correct.

20. Which of the following statements are not true?
a. The single-factor ANOVA model for comparing I population or treatment means assumed that for i=1,2,…..,I, a random sample of size is drawn from any population with mean and variance
b. Let be the total number of observations in a data set, and suppose we rank all N observations from 1 to N when is false, then some samples will consist mostly of observations having small ranks in the combined sample, whereas others will consist mostly of observations having large ranks.
c. Let be the total number of observations in a data set, and suppose we rank all N observations from 1 to N When is true, the N observations all come from the same distribution, in which case all possible assignments of the ranks 1,2,…, N to the J samples are equally likely and we expect ranks to be intermingles in these samples.
d. All of the above statements are true.
e. None of the above statements are true.

21. The Kruskal-Wallis test is always
a. two-tailed test.
b. one-tailed test.
c. used with one sample.
d. Used when the populations are normally distributed.
e. Used with match-pairs samples

22. The Friedman’s test is always
a. two-tailed test
b. one-tailed test
c. used with one sample
d. used when the populations are normally distributed.
e. Used with matched-pairs samples

23. Which of the following distributions approximate the Kruskal-Wallis test statistic K when is true, and the number of populations or treatments I=3 and the sample sizes
a. Standard normal distribution
b. T distribution with I-1 degrees of freedom
c. F distribution with I-1 and degrees of freedom.
d. Chi-squared distribution with I-1 degrees of freedom.
e. Either B or C.

1. A sample of 12 radon detectors of a certain type was selected, and each was exposed to 100 pCi/L of radon. The resulting readings were as follows:

109.6 94.9 95.2 100.9 100.5 95.3
105.1 110.0 103.6 111.7 107.3 96.4

Does this data suggest that the population mean reading under these conditions differs from 100? Use the Wilcox test with to test the relevant hypotheses.

2. A random sample of 15 automobile mechanics certified to work on a certain type of car was selected, and the time (in minutes) necessary for each one to diagnose a particular problem was determined, resulting in the following data:

32.6 32.1 17.6 28.7 29.1 27.4 37.0 32.8
33.9 55.2 14.5 25.2 10.8 26.9 32.2

Use the Wilcoxon test at significance level .10 to decide whether the data suggests that true average diagnostic time is less than 32 minutes.

3. In an experiment designed to study the effects of illumination level on task performance, subjects were required to insert a fine-tipped probe into the eyeholes of ten needles in rapid succession both for a low light level with black background and a higher level with a white background. Each data value is the time (sec) required to complete the task.

Subject

1 2 3 4 5 6 7 8 9
Black 28.85 31.84 35.05 28.74 23.89 44.05 28.01 27.96 30.47
White 21.23 23.84 25.96 22.68 22.50 27.98 19.61 19.07 27.59

Does the data indicate that the higher level of illumination yields a decrease of more than 5 sec in
true average task completion time? Test the appropriate hypotheses using the Wilcoxon test.

4. The accompanying 25 observations on fracture toughness of base plate of 18% nickel maraging steel were obtained. Suppose a company will agree to purchase this steel for a particular application only if it can be strongly demonstrated from experimental evidence that true average toughness exceeds 80. Assuming that the fracture toughness distribution is symmetric, state and test the appropriate hypotheses level .05 and compute a P-value.

74.5 76.9 77.6 78.1 78.3 78.5 79.1 79.2 80.3
80.5 80.7 80.8 81.1 81.2 81.2 81.9 82.0 82.9
83.1 84.6 84.7 85.1 87.2 88.7 98.7

5. In an experiment to compare the bond strength of two different adhesives, each adhesive was used in five bondings of two surfaces, and the force necessary to separate the surfaces was determined for each bonding. For adhesive 1, the resulting values were 240, 297, 256, 310, and 261, whereas the adhesive 2 observations were 224, 190, 174, 258, and 236. Let denote the true average bond strength of adhesive type i. Use the Wilcoxon rank-sum test at level .05 to test

6. A Study of Wood reports the following data on burn time (hours) for samples of oak and pine. Test at level .05 to see whether there is any difference in true average burn time for the two types of wood.

Oak 1.80 .75 1.63 1.64 4.50 1.31 1.85 .56
Pine 1.06 1.48 1.41 1.60 .81 1.28

7. The accompanying data resulted from an experiment to compare the effects of vitamin C in orange juice and in synthetic ascorbic acid on the length of odontoblasts in guinea pigs over a 6-week period. Use the Wilcoxon rank-sum test at level .01 to decide whether true length differs for the two types of vitamin C intake. Compute also an approximate P-value.

Orange juice 8.5 9.7 9.9 10.0 10.3 14.8 15.5 16.4 17.9 21.8
Ascorbic acid 4.5 5.5 6.1 6.7 7.3 7.6 10.4 11.5 11.6 11.8

8. Reports are available on a study in which various measurements were taken both from a random sample of infants who had been exposed to household smoke and from a sample of unexposed infants. The accompanying data consists of observations on urinary concentration of cotanine, a major metabolite of nicotine. Does the data suggest that true average cotanine level is higher in exposed infants than in unexposed infants by more than 25? Carry out a test at significance level l.05.

Unexposed 8 11 12 14 20 43 111
Exposed 35 56 83 92 128 150 176 208

9. A study reports the accompanying data on lead concentration in samples gathered during eight different summer rainfalls: 19.0, 23.4, 32.6, 7.0, 14.2, 13.8, 19.3, and 20.8. Assuming that the lead-content distribution is symmetric, use the Wilcoxon signed-rank interval to obtain a 95% CI for

10. The following observations are amounts of hydrocarbon emissions resulting from road wear of bias-belted tires under a 522-kg load inflated at 228 kPa and driven at 64 km/hr for 6 hours: 048, .120, .065, and .075. What confidence levels are achievable for this sample size using the signed-rank interval? Select an appropriate confidence level and compute the interval.

11. Compute the 90% rank-sum CI for using the following data:

Sample A: 229 286 245 299 250
Sample B: 213 179 163 247 225

12. Compute a 99% CI for using the following data:

Sample A: 1.72 0.67 1.55 1.56 1.42 1.23 1.77 0.48
Sample B: 0.98 1.40 1.33 1.52 0.73 1.20

13. The accompanying data refers to concentration of the radioactive isotope strontium-90 in milk samples obtained from five randomly selected dairies in each of four different regions.

1 7.0 6.4 7.1 8.3 6.7
Region 2 7.7 10.5 11.8 11.1 9.4
3 6.3 6.5 8.8 7.2 5.7
4 10.1 12.7 10.9 13.0 12.3

Test at level .10 to see whether true average strontium-90 concentration differs for at least two of the regions.

14. The accompanying data on cortisol level was reported in a research paper. Experimental subjects were pregnant women whose babies were delivered between 38 and 42 weeks gestation. Group 1 individuals elected to deliver by Caesarean section before labor onset, group 2 delivered by emergency Caesarean during induced labor, and group 3 individuals experienced spontaneous labor. Use the Kruskal-Wallis test at level .05 to test for equality of the three population means.

Group 1 267 312 216 328 459 344
309 159 292 361
Group 2 470 506 460 360 473 367
Group 3 348 777 212 1053 843 692

15. In a test to determine whether soil pretreated with small amounts of Basic-H makes the soil more permeable to water, soil samples were divided into blocks, and each block received each of the four treatment under study. The treatments were (A) water with .001% Basic-H flooded on control soil, (B) water without Basic-H on control soil, (C) water with Basic-H flooded on soil pretreated with Basic-H, and (D) water without Basic-H on soil pretreated with Basic-H. Test at level .01 to see whether there are any effects due to the different treatments.

Blocks
1 2 3 4 5 6 7 8 9 10
A 40.1 34.8 31.0 28.9 28.5 28.3 26.7 27.4 24.7 29.2
B 36.2 28.3 23.2 23.3 21.3 22.3 20.3 20.0 19.7 21.3
C 61.9 57.2 52.2 50.9 41.2 51.8 50.8 43.2 47.0 49.4
D 59.7 52.6 49.4 43.9 42.4 40.1 40.5 42.6 38.1 39.5

16. In an experiment to study the way in which different anesthetics affect plasma epinephrine concentration, ten dogs were selected and concentration was measured while they were under the influence of the anesthetics isoflurane, halothane, and cyclopropane. Test at level .05 to see whether there is an anesthetic effect on concentration.

Dog
1 2 3 4 5 6 7 8 9 10
Isoflurane .30 .53 1.02 .41 .31 .38 .34 .71 .19 .35
Halothane .32 .41 .65 .40 .23 .90 .41 .53 .34 .44
Cyclopropane 1.09 1.37 .71 .30 1.26 1.55 .51 .58 1.04 .32

Chapter 16

COMPLETION

1. __________ are now used extensively in industry as diagnostic techniques for monitoring production processes to identify instability and unusual circumstances.

2. Sources of variation that may have a pernicious impact on the quality of items produced by some process, such as contaminated material, are referred to as __________ in the quality control literature.

3. In addition to the plotted points themselves (e.g., sample means or sample proportions), a control chart has a __________ and two__________.

4. When an out-of-control process produces a point inside the control limits, a type __________ error has occurred.

5. When an in-control process yields a point outside the control limits (an out-of-control signal), a type __________ error has occurred.

6. If the points on a control chart all lie between the two central limits, the process is deemed to __________.

7. Any point outside the lower control limit (LCL) and/or the upper control limit (UCL) of a 3-sigma chart suggests that the process may have been __________ at that time, so a search for __________ should be initiated.

8. An control chart is constructed using 25 samples of size 3 each, when the process is in-control and the random variable of interest is normally distributed with a mean of 20 and standard deviation of 1.732. Then, the 3 standard deviation control limits are LCL = __________ and UCL =__________.

9. The two control limit for 3 sigma chart have been calculated. If the variable of interest is normally distributed, and 3 is replaced by 3.09 in the control limits formulas, then probability of Type I error __________, but for any fixed n and , probability of Type II error will __________.

ANS: decreases, increase

PTS: 1

10. The two control limits for 3=sigma chart have been calculated. If the variable of interest is normally distributed, and 3 is replaced by 2.5 in the control limits formulas, then probability of Type I error __________, but for fixed n and , probability of Type II error will __________.

11. An control chart is based on control limits When the process is in control, then the probability that a point falls outside the limits is __________.

12. An control chart is based on control limits When the process is in control, then the average run length (ARL) is __________.

13. The sample variance is an unbiased estimator of the population variance that is, __________ = __________.

14. For a sample of size 5, the tabulated value of If the population standard deviation then the standard deviation of the sample standard deviation S is __________, rounded to 3 decimal places.

15. The 3-sigma lower control limit (LCL) for an S chart is given by LCL = where the values of for n=3, ….., 8 are tabulated in your text. This expression for LCL will be negative if __________, in which case it is customary to use LCL = __________.

16. Suppose there are 25 samples obtained at equally spaced time points, and n=5 observations in each sample. If the sum of the 25 sample standard deviations is 50, then the center line of the S chart will be at height equals to __________.

17. Suppose there are 24 samples obtained at equally spaced time points, and n=4 observations in each sample. If the sum of the 24 samples ranges is 108, then the center line of the R chart will be at height equals to __________.

18. The 3-sigma lower control limit (LCL) for an R chart is given by LCL = where the values of for n=3,…..,8 are tabulated in your text. This expression for LCL will be negative if __________, in which case it is customary to use LCL = __________.

19. The term __________ data is used in quality control to describe situations such as each item produced conforms to specifications or does not, or a single item (e.g., one automobile) may have one or more defects, and the number of defects is determined.

20. The c control chart for the number of defectives in a single item (e.g., one automobile) or a group of items (e.g., blemishes on a set of four tires) is based on the __________ probability distribution.

21. Suppose that 25 samples, each of size 100, were selected from what is believed to be an in-control process, and that where is the fraction of defective items in sample i. The p chart for the fraction of defective items has its center line at height equals to __________.

22. If Y is a Poisson random variable with parameter then E(Y) = __________, V(Y) = __________, and also has approximately a __________ distribution when is large

23. The p control chart for the fraction of defective items produced is based on the __________ probability distribution.

24. If are independent Poisson variables with common parameter then has also a Poisson distribution with parameter = __________, where denotes the expected number of defects per unit.

25. The use of 3-sigma control limits is presumed to result in P(statistic <LCL __________, when the process is in control.

26. The 3-sigma control limits of the p chart for the fraction of defective items in a process believed to be in control are LCL = .0125 and UCL = .1425. If the smallest and largest values of the fraction of defective items, are .0062 and .1685 respectively, then the process is __________.

27. There are two equivalent versions of a cumulative sum procedure for a process mean, one __________ and the __________.

28. At time r, a process is judged out of control if any of the cumulative sum plotted values lies outside the __________; either above the upper arm or below the lower arm.

29. If the size of a shift in a process mean is .40, then the customarily value of the slope of the lower arm of the V-mask is __________.

30. The __________ is a chart developed by Kenneth Kemp that can be used to determine values of h and n that achieves a specified ARL (average run length).

31. The behavior of a sampling plan can be nicely summarized by graphing P(A); the probability that the lot is accepted, as a function of p; the proportion of defective items in the lot. Such a graph is called the __________ for the plan.

32. Let p denote the proportion of defective items in the lot, and P(A) denote the probability that the lot is accepted. Let us designate two different values of p, one for which P(A) is a specified value close to 1 and the other for which P(A) is a specified value near 0. These tow values of are often called the __________, denoted AQL, and the __________, denoted by LTPD.

33. In a double-sampling plan, it is customary to terminate inspection of the second sample if the number of defectives is sufficient to justify rejection before all items have been examined. This is referred to as __________ in the second sample.

34. One important characteristic of a sampling plan with rectifying inspection is the __________, denoted by AOQ.

35. Under curtailment in a double-sampling plan, it can be shown that the expected number of items inspected in a __________ is smaller than the number of items examined in a __________ when the operating characteristic (OC) curves of the two plans are close to be identical.

36. The average outgoing quality equals 0 when the proportion of defective items in the lot, p, is either __________ or __________.

MULTIPLE CHOICE

1. Which of the following statements are not true?
a. Raising quality levels can lead to decreased costs, a greater degree of consumer satisfaction, and thus increased profitability.
b. Control Charting is now used extensively in industry as a diagnostic technique for monitoring production processes to identify instability and unusual circumstances.
c. The basis for most of control charts lies in our work concerning probability distributions of various statistics such as the sample mean and sample proportion
d. All of the above statements are true.
e. None of the above statements are true.

2. Which of the following statements are not true?
a. Control charts and acceptance sampling plans were first developed in the 1960’s and 1970’s.
b. Statisticians and engineers have recently introduced many statistical methods for identifying types and levels of production inputs that will ensure high-quality output.
c. There is a large body of material known as “Taguchi methods”, named after Japanese engineer/statistician G. Taguchi.
d. All of the above statements are true.
e. None of the above statements are true.

3. Which of the following are not examples of assignable causes of variation?
a. Contaminated material
b. Incorrect machine settings
c. Environmental factors
d. Unusual machine tools wear
e. All of the above

4. Which of the following statements are true?
a. We might think of “natural random variation” as uncontrollable background noise.
b. Control charts provide a mechanism for recognizing situations where assignable causes may be adversely affecting product quality.
c. Once a control chart indicates an out-of-control situation, an investigation can be launched to identify causes and take corrective action.
d. A basic element of control charting is that samples have been selected from the process of interest at a sequence of time points.
e. All of the above statements are true.

5. Which of the following statements are not true?
a. The basis for the choice of a center line for a control chart is sometimes a target value or design specification, for example a desired value of the bearing diameter.
b. An “in-control” process is a process that “meets design specifications or tolerance”.
c. An in-control process is simply one whose behavior with respect to variation is stable over time, showing indications of unusual extraneous causes.
d. If the points on a control chart all lie between the two control limits, the process is deemed to be in control.
e. All of the above statements are true.

6. Which of the following statements are not true?
a. An in-control process is simply one that is operating in a stable fashion, reflecting only natural random variation.
b. An out-of-control “signal” occurs whenever a plotted point falls outside the two control limits.
c. There is a strong analogy between the logic of control charting and hypothesis testing. The null hypothesis here is that the process is out-of-control.
d. The two control limits are designed so that an in-control process generates very few false alarms, whereas a process not in control quickly gives rise to a point outside the limits.
e. All of the above statements are true.

7. Which of the following statements are not true?
a. To construct an chart, there are two different commonly used methods for estimating the unknown process standard deviation ; one based on the k sample standard deviations and the other on the k sample ranges.
b. In the case of a normal population distribution, the estimator of the unknown process standard deviation based on sample standard deviations S is more efficient than that based on the sample range.
c. The sample standard deviation S is an unbiased estimator of the population standard deviation ; that is, E(S)= .
d. All of the above statements are true.
e. None of the above statements are true.

8. Which of the following statements are not true?
a. One important use of control charts is to see whether some measure of location of the variable’s distribution remains stable over time.
b. It is highly unlikely that for an in-control process, the sample mean will fall within 3 standard deviations of the process mean
c. The use of control charts based on 3 standard deviation limits is traditional, but tradition is certainly not inviolable.
d. All of the above statements are true.
e. None of the above statements are true.

9. Assume that for an in-control process, the random variable of interest X has a normal distribution with mean value and standard deviation . If denotes the sample mean for a random sample of size n selected at a particular time, then
a.
b.
c. has a normal distribution.
d. All of the above are true.
e. None of the above are true.

10. Suppose that at each of the time points 1,2,3,….., a sample of size n is selected at random from a normal distribution with known mean and standard deviation In order to construct a 3-sigma chart, we need to
a. determine calculated values of the corresponding sample means.
b. plot over time; that is, plot the points and so on.
c. Draw horizontal lines across the plot as .
d. All of the above are needed.
e. Only A and B are needed.

11. Which of the following statements are true?
a. Generally speaking, a control chart will be effective if it gives very few out-of-control signals when the process is I control, but shows a point outside the control limits almost as soon as the process goes out of control.
b. One assessment of a control chart’s effectiveness is based on the notion of “error probabilities”
c. The use of a 3-sigma limits for an chart makes it highly unlikely that an out-of-control signal will result from an in-control process.
d. One assessment of a control chart’s effectiveness involves the average run length (ARL) needed to observe an out-of-control signal.
e. All of the above statements are true.

12. The inability of charts with 3-sigma limits to quickly detect small shifts in the process mean has prompted investigators to develop procedures that provide improved behavior in this respect. Which of the following conditions need to be satisfied for an appropriated intervention to take corrective action?
a. Two out of three successive points fall outside 2-sigma limits on the same side of the center line.
b. Four out of five successive points fall outside 1-sigma limits on the same side of the center line.
c. Eight successive points fall on the same side of the center line.
d. All of the above.
e. None of the above.

13. Which of the following statements are true?
a. It is important to ensure that a process is under control with respect to location (equivalently, central tendency) as well as variation.
b. Most practitioners recommend that control of a process be established on variation prior to constructing an chart or any other chart for controlling location.
c. Charts for variation are based on the sample standard deviation S and also based on the sample range R.
d. Charts for variation that are based on the sample standard deviation S are generally preferred over charts that are based on the sample range R because the standard deviation gives a more efficient assessment of variation than does the range.
e. All of the above statements are true.

14. Suppose that k independently selected samples are available, each one consisting of n observations on a normally distributed variable. Denote the sample standard deviations by The values are plotted in sequence on an S chart. The center line of the chart will be at height equals to the
a. average of the values .
b. range of the values .
c. standard deviation of the values .
d. first quartile of the values .
e. third quartile of the values .

15. Suppose there are 10 sample obtained at equally spaced time points, and n=4 observations in each sample. If the sum of the 20 sample standard deviations is 42, and that the tabulated value of is .921, then the 3-sigma control limits LCL and UCL for an S control chart are respectively
a. 1.062 and 3.138
b. 3.138 and 4.765
c. 0.0 and 4.765
d. -.354 and 4.554
e. -.565 and 4.765

16. Suppose there are 28 samples obtained at equally spaced time points, and n=7 observations in each sample. If the sum of the 28 sample ranges is 98, and that the tabulated values of and are 2.706 and .833, respectively, then the 3-sigma control limits LCL and UCL for an R control chart are respectively
a. 0.0 and 6.73
b. .27 and 6.73
c. -30.61 and 37.61
d. 0.0 and 37.61
e. None of the above answers are correct.

17. Which of the following statements are not correct?
a. For a 3-sigma chart, where the process mean and standard deviation are known, the probability that a point on the chart falls above the UCL is .013, as is the probability that the point falls below the LCL.
b. For a 3-sigma S chart,
c. For a 3-sigma R chart,
d. Only B and C are not correct.
e. Only A and C are not correct.

18. Consider a sample of n items obtained at a particular time, and let X be the number of defectives and Which of the following statements are not true?
a. E(X) = np
b. V(X) = p(1-p)
c.
d.
e. If has approximately a normal distribution.

19. Suppose that 25 samples, each of size 200, we selected from what is believed to be an in-control process, and that is the fraction of defective items in sample i. The 3-sigma control limits, LCL and UCL, of the p chart for the fraction of defective items are respectively
a. .0236 and .244
b. -.0462 and .0942
c. 0 and .0564
d. 0 and .0942
e. -.0084 and .0564

20. Assume that the total number of defects in 50 samples are 450. Which of the following statements are not true regarding the c chart for the number of defectives?
a. The center line of the chart is at height 9
b. The lower control limit (LCL) is 0
c. The upper control limit (UCL) is 18
d. All of the above statements are true
e. None of the above statements are true

21. Which of the following statements are true?
a. The p control chart for the fraction of items not conforming to specifications is based on the binomial probability distribution.
b. The c control chart for the number of defectives is based on the Poisson probability distribution.
c. If the random variable X has a binomial distribution with parameters n and p, then when p is small the transformation is recommended and that the points should be used to construct the p chart.
d. If the random variable X has a Poisson distribution with parameter and is small, then the transformation is recommended and that the points should be used to construct the c chart.
e. All of the above statements are correct

22. Which of the following statements are true?
a. The CUSUM procedures discussed in your text are used for controlling process location.
b. There are CUSUM procedures for controlling process variation.
c. There are CUSUM procedures for attribute data.
d. None of the above statements are true.
e. All of the above statements are true.

23. Which of the following statements are true?
a. A defect on the traditional chart is its inability to detect a relatively small change in a process mean.
b. Whether a process is judged out of control at a particular time depends only on the sample at that time, and not on the past history of the process.
c. The computational version of a cumulative sum (CUSUM) procedure is used almost exclusively in practice, but the logic behind the procedure is most easily grasped by first considering the graphical form.
d. All of the above statements are true.
e. None of the above statements are true.

24. Which of the following statements are not true?
a. A particular V-mask is determined by specifying the “lead distance” d and “half angle” , or equivalently, by specifying d and the length h of the vertical line segment from 0 to the lower (or to the upper) arm of the mask.
b. One method for deciding which V-mask to use involves specifying the size of a shift in the process mean that is of particular concern to an investigator, then the parameters of the mask are chosen to give desired values of the false-alarm probability and the probability of not detecting the specified shift, respectively.
c. One method for deciding which V-mask to use involves selecting the mask that yields specified values of the ARL (average run length) both for an in-control process and for a process in which the mean has shifted by a designated amount.
d. All of the above statements are true.
e. None of the above statements are true.

25. Which of the following statements are not true?
a. If we let denote the size of a shift in a process mean that is to be quickly detected using a CUSUM procedure, then it is common practice to let where k denotes the slope of the lower arm of the V-mask.
b. A quality control practitioner may specify a desired value of an ARL (average run length) when the process is in control
c. A quality control practitioner may specify a desire value of an ARL (average run length) when the process is out of control because the mean has shifted by
d. All of the above statements are true.
e. None of the above statements are true.

26. Which of the following statements are true?
a. Until quite recently, control chart procedures and acceptance sampling techniques were regarded by practitioners as equally important parts of quality control methodology, but this is no longer the case.
b. Acceptance sampling deals with what has already been produced and thus does not provide for any direct control over process quality.
c. The most straightforward type of acceptance sampling plan involves selecting a single random sample of size n and then rejecting the lot if the number of defectives in the sample exceeds a specified critical value c.
d. We want an operating characteristic (OC) curve that is higher for very small p(proportion of defective items) and lower for larger p. This can be achieved by increasing the sample size n and the specified critical value c.
e. All of the above statements are true.

27. In acceptance sampling, the risk of accepting a poor quality lot is considered a
a. Type I error.
b. consumer’s risk.
c. producer’s risk.
e. None of the above.

28. Let the random variable X denote the number of defective items in the lot, A denote the event that the lot is accepted, and p denote the proportion of defective items in the lot. Which of the following statements is not true?
a. If the sample size n is large relative to the lot size N, then the probability of accepting the lot, P(A), is calculated using the hypergeometric distribution.
b. When the sample size n is small relative to the lot size N (the rule of thumb suggested in your text was ), then the probability of accepting the , P(A), is calculated using the binomial distribution.
c. If the probability of accepting the lot, P(A), is large only when p is small (this, of course, depends on the specified critical value c), then the Poisson approximation to the binomial distribution is justified.
d. The larger value of p, the larger the probability P(A) of accepting the lot.
e. All of the above statements are true.

29. In acceptance sampling, the risk of rejecting a good quality lot is considered a
a. Type II error
b. consumer’s risk
c. producer’s risk
e. None of the above

30. Consider a double-sampling plan with Which of the following equalities are not correct if the lot will be accepted?
a.
b.
c.
d.
e. None of the above are correct

31. Which of the following statements are not true?
a. One standard method for designing a double-sampling plan involves specifying values along with corresponding acceptance probabilities then find a plan that satisfies these conditions.
b. The average outgoing quality (AOQ) is the short-run proportion of defective items among those sent on before the sampling plan is employed.
c. Because the average outgoing quality AOQ = 0 when either p = 0 or p = 1, it follows that there is a value of p between 0 and 1 for which AOQ is a maximum.
d. The maximum value of the average outgoing quality (AOQ) is called the average outgoing quality limit (AOQL).
e. It is common practice to select a sampling plan that has a specified average outgoing quality limit (AOQL) and, in addition, minimum average total number inspected (ATI) at a particular quality level p.

1. A control chart for thickness of rolled-steel sheets is based on an upper control limit of .0525 inch and a lower limit of .0475 inch. The first ten values of the quality statistic (in this case the sample mean thickness of n =5 sample sheets) are .0508, .0495, .0504, .0503, .0514, .0500, .0487, .0502, .0507, and .0485. Comment on the behavior of this control chart.

2. A control chart for thickness of rolled-steel sheets is based on an upper control limit of .0525 inch and a lower limit of .0475inch. Suppose the ten most recent values of the quality statistic are .0493, .0485, .0490, .0503, .0492, .0486, .0495, .0494, .0493, and .0488. Comment on the behavior of this control chart.

3. Suppose a control chart is constructed so that the probability of a point falling outside the control limits when the process is actually in control is .005.

a. What is the probability that ten successive points (based on independently selected samples) will be within the control limits?
b. What is the probability that 25 successive points will all lie within the control limits?
c. What is the smallest number of successive points plotted for which the probability of observing at least one outside the control limits exceeds .10?

4. In the case of known what control limits are necessary for the probability of a single point being outside the limits for an in-control process to be .003?

5. Consider a 3-sigma control chart with center line at and based on n = 5. Assuming normality, calculate the probability that a single point will fall outside the control limits when the actual process mean is

a.
b.
c.

6. The accompanying table gives sample means and standard deviations, each based on n = 6 observations of the refractive index of fiber-optic cable. Calculate the control limits for an chart based on the sample standard deviations given, and comment on the behavior of the chart. [Hint: ]

Day s Day s
1 96.43 0.75 13 96.63 1.48
2 97.06 1.34 14 96.50 080
3 98.34 1.60 15 97.22 1.42
4 96.42 1.22 16 96.55 1.65
5 95.99 1.18 17 96.01 1.58
6 96.52 1.27 18 95.39 0.98
7 96.08 1.16 19 96.58 1.21
8 96.48 0.79 20 95.47 1.30
9 97.02 1.28 21 97.38 0.88
10 95.55 1.14 22 96.85 1.43
11 96.29 1.37 23 96.64 1.59
12 96.80 1.40 24 96.87 1.52

7. Consider the control chart based on control limits

a. What is the ARL when the process is in control?
b. What is the ARL when n = 4 and the process mean has shifted to ?
c. How do the values of parts (a) and (b) compare to the corresponding values for a 3-sigma chart?

8. A manufacturer of dustless chalk instituted a quality control program to monitor chalk density. The sample standard deviations of densities of n = 8 chalk specimens, were as follow:

.165 .231 .073 .165 .292 .371 .179 .234
.408 .207 .170 .249 .296 .138 .111 .076
.224 .335 .116 .204 .250 .232 .342 .307

Calculate limits for an S chart, and check for out-of-control points. If there is an out-of-control point, delete it and repeat the process.
9. Subgroups of power supply units are selected once each hour from an assembly line, and the high-voltage output of each unit is determined.

a. Suppose the sum of the resulting sample ranges for 30 subgroups, each consisting of four units, is 84. Calculate control limits for an R chart.
b. Repeat part (a) if each subgroup consists of eight units and the sum is 105.

10. On each of the previous 25 days, 100 electronic devices of a certain type were randomly selected and subjected to a severe heat stress test. The total number of items that failed to pass the test was 570.

a. Determine control limits for a 3-sigma p chart.
b. The highest number of failed items on a given day was 40, and the lowest number was 12. Does either of these correspond to an out-of-control point? Explain.

11. A sample of ROM computer chips was selected on each of 30 consecutive days, and the number of nonconforming chips on each day was as follows:

30 16 17 22 19 18 16 23 13 21
18 19 13 25 31 17 12 21 15 29
11 19 25 18 38 20 8 26 12 25

Determine control limits for a 3-sigma p chart and specify any out-of-control points.

12. When n = 180, what is the smallest value of for which the LCL in a 3-sigma p chart is positive?

13. The accompanying observations are numbers of defects in 25 1-square-yard specimens of woven fabric of a certain type: 3, 7, 5, 3, 4, 2, 8, 4, 3, 3, 6, 6, 2, 3, 2, 4, 7, 3, 2, 3, 4, 1, 5, 4, 6. Determine control limits for a 3-sigma c chart for the number of defects, and discuss the behavior of the chart.

14. For what values will the LCL in a 3-sigma c chart be negative?

15. Containers of a certain treatment for septic tanks are supposed to contain 16 oz of liquid. A sample of five containers is selected from the production line once each hour and the sample average content is determined. Consider the following results:

15.993 16.052 16.067 15.913 16.031 16.061 15.983 15.900
16.039 16.075 16.040 15.936 16.033 15.961 16.056

Using = .10 and h = .20, employ the computational form of the CUSUM procedure to investigate the behavior of this process.

16. The standard deviation of a certain dimension on an aircraft part is .005 cm. What CUSUM procedure will give an in-control ARL of 600 and an out-of-control ARL of 4 when the mean value of the dimension shifts by .004 cm?

17. When the out-of-control ARL corresponds to a shift of 1 standard deviation I the process mean, what are the characteristics of the CUSUM procedure that has ARL of 250 and 4.8 for the in-control and out-of-control conditions, respectively?

18. A sample of 50 items is to be selected from a batch consisting of 5000 items. The batch will be accepted if the sample contains at most one defective item. Calculate the probability of lot acceptance, P(A), for p = .01, .02,…, .10, and sketch the OC curve.

19. A sample of 100 items is to be selected from a batch consisting of 5000 items. The batch will be accepted if the sample contains at most two defective items.

a. Calculate the probability of lot acceptance, P(A), for p = .10, .02, …, .05, and sketch the OC curve.
b. Sketch the OC curve in problem 85 above (with n = 50, and c = 1) and the OC curve in part (a) of problem 86 (with n = 100, and c = 2) on the same set of axes. Which of the two plans is preferable (leaving aside the cost of sampling) and why?

20. Consider the single-sample plan that utilizes n = 50 and c = 1 when N = 2000. Determine the values of AOQ and ATI for selected values of p and graph each of these against p. Also determine the value of AOQL.