Q: But what about omitted variable bias? We can see from regression 2 that competitors’ advertising spending seems related to our copy and ad spending. Doesn't this mean that the estimated coefficients on copy and ad spending in regression 3 are biased?
A: Yes, we would expect the coefficient estimates to be biased. However, recall the nature of the bias: the coefficient estimates reflect not only the effect of the variable included in the regression (holding other included variables fixed) but also include the effect of any variables left out of the regression, to the extent that they are related to the included variable. Specifically, our coefficient estimate of -3.00 on Dum Copy in Regression 3 includes the effect of the expected change in competitors’ advertising spending when Nopane's copy changes. Since we are assuming that competitors will behave in the real campaign as they did in the experiment, this "combined effect" is exactly what we want to estimate, so the "bias" works in our favor.
Sidelight: How can we relate this combined effect to the other regressions? In Regression 1, the estimated coefficient on Dum Copy is 2.134. This says that if we use the "emotional" copy rather than the "rational" copy, and all the other variables (including competitors' behavior) are held fixed, we would expect 2.134 more sales per 100 prospects. But what does Regression 2 say is the effect on competitors' behavior of using "emotional" copy rather than "rational" copy? It says that on average we would expect them to spend 9.083 dollars more per 100 prospects when we use the "emotional" copy. What effect does this have on sales? From Regression 1, we expect this to change sales by -(0.5652)*9.083 = -5.134 units. So, the expected net effect of using "emotional" rather than "rational" copy and having our competitors react is 2.134 - 5.134 = -3.00 !
(Hard bonus sidelight to think about: What is wrong with using Regression 2 to predict competitors' advertising spending in response to our policy and then plug that value into Regression 1 to get a prediction for sales? Does this give the same predicted value? The same prediction interval?)
(Answer: Yes, same predicted value. Smaller prediction interval. Prediction interval is not the correct one since the only reason it is smaller is that Regression 1 incorporates the movement of competitors' ad spending which is unrelated to the other variables (we can see there is some, since the R-squared in Regression 2 is only .571). But this is precisely the component of our knowledge of competitors' behavior which is lacking when we use Regression 2 to predict it. Thus we need to take into account the extra variability that comes from knowing the competitors' behavior imperfectly. Regression 3 does this.)
3. Answer all the questions in 2 assuming that Stanley Skamarycz's hypothesis is correct.
Here Regression 1 is the most appropriate choice, since Mr. Skamarycz believes (see p. 3 of the case) that he can accurately specify competitors' behavior ($19 per 100 prospects always) and this behavior differs substantially from that in the experiment. This is where the argument above about omitted variable bias does have bite. Once we know competitors' ad spending, we want to estimate the effects of our strategy holding the competitors’ ad spending fixed at the known level. The proper way to do this is by including competitors’ ad spending in the multiple regression. Regression 1 does this.
4. Given the data from the case (in nopane.xls), what national advertising strategy (i.e., copy and one of the three levels of ad spending) would you advocate? Assume an additional unit sold per 100 prospects over a six-month period yields a profit (net of production and delivery costs, but not net of advertising costs) of 10 dollars per 100 prospects over a six-month period. Provide support for your position.
As noted above, Regression 1 and Regression 3 have opposite (or indeterminate) implications for which ad copy to use. In both of these regressions, the coefficient on Nopane ad dollars is significantly greater than 0.10 (the break-even level of 1 extra unit expected for each 10 dollars in additional advertising) at a 5% level, thus both suggest setting ad spending as high as possible -- $8.00 per 100 prospects. It seems that the regressions don't give an unambiguous recommendation for copy, though they do for spending.
However, in thinking further about these regression specifications, it seems we may have left out some obvious effects. In particular, these regressions assume that the effect of advertising expenditure is the same no matter what ad copy we run. While this may be true, there is certainly no reason to think it must be. They also assume that advertising expenditure is equally effective across regions and that copy is equally effective across regions as well. We can modify our analysis to check for these effects by adding slope dummy variables to the regressions: specifically we add (Nopane ad dollars)*(Dum Copy), (Nopane ad dollars)*(Dum Region), and (Dum Copy)*(Dum Region). The modified Regression 1 is shown below:
Note that the coefficient on Nopane Ad*dum copy is highly significant (p-value = 0.0235) despite the fact that multicollinearity has greatly reduced the precision of the estimate (VIF = 20.24). It seems that increased ad spending is really only helpful when running the "rational" copy and doesn't do much (or even hurts somewhat) when running the "emotional" copy. Below are the predictions for different policies (in Segment A) that are calculated using the above regression.
Policy: $8.00 ad spending, "rational" copy
Policy: $4.75 ad spending, "rational" copy
Policy: $2.50 ad spending, "rational" copy
Policy: $8.00 ad spending, "emotional" copy
Policy: $4.75 ad spending, "emotional' copy
Policy: $2.50 ad spending, "emotional" copy
These results suggest that the "rational" copy and the highest level of ad spending would lead to the highest sales. Doing the profit calculations with these sales numbers shows that this policy is also best in terms of profit: it is predicted to yield a profit of 39.55 * ($10) - $8.00 = $385.50 per 100 prospects in Region A.
Note that we could have also added slope dummies involving the competitors’ ad spending. These do not come out significant and the regression with them leads to similar conclusions to the above.
We can also add slope dummy variables to the specification in Regression 3 (appropriate if competitors' behavior is as in the experiment). The regression results are below:
Observe that the coefficient estimates are roughly similar to those in the regression we used for prediction above. In fact, if we calculate predictions for the various policies we again find that the policy which gives the highest predicted profit in Segment A is to use the "rational" copy and spend $8.00 per 100 prospects on advertising.
Policy: $8.00 ad spending, "rational" copy
Profit = 40.153*($10) - $8.00 = $401.53 - $8.00 = $393.53 per 100 prospects per six-months in Segment A.
A similar analysis for Segment B territories also suggests that “rational” copy and $8.00 ad spending is best. Observe that with the more complete specification, we have also rendered moot the question of whether Ms. Silk's or Mr. Skamarycz's hypothesis about competitors' behavior is correct. Under either assumption, the data suggest that using the "rational" copy and a high level of ad spending is best.
5. Instead of a single national campaign, Ms. Silk knows that it would be possible (though more costly) to have one campaign for the East and West Coast states and another for the middle of the country. Comment on the desirability of splitting up the campaign.
It only makes sense to split up the campaign if the best strategy for the coasts is different from the best strategy for the middle of the country. Does this seem to be the case here? The analysis and calculations done in question 4 above suggest that it doesn’t make any difference: in both regions you would want to run the “rational” copy and spend $8.00 per 100 prospects.
Note: If we consider only the regressions given in the case write-up, regressions 1 and 3, since the segment dummy variable does not interact with any of the policy variables, the segment information cannot change the ordering of predicted performance of the different strategies. Why? Because the segment dummy is an intercept dummy variable, and its only effect is to move the regression up or down by a constant. If $8.00 spending and "rational" copy performs 0.35 units better on the coasts than in the middle, then the regression would also predict that $2.50 spending and "emotional" copy would perform 0.35 units better on the coasts than in the middle. Therefore, the regressions in the case write-up do not allow us to answer this question.
The specifications used in question 3 that include variables that interact the region with the strategy: Nopane Ad spending * Dum Segment and Dum Copy * Dum Segment did let us address this issue. If the policy that is best is going to change from region to region the change must be captured by these slope dummy/interactive terms.
Additional class question: Do the estimates of the coefficients on the interactive segment terms given above indicate a significant regional difference in relative strategy effectiveness?
Looking at the two regressions, the preliminary answer seems to be no. The t-statistics and p-values for these two variables indicate that neither can be shown to differ from zero at a reasonable level of significance. Since we are considering the joint significance of two variables, we should calculate an F-test as well. Kstat does this for us and gives the following output for the regression model with competitors' ad spending included:
Similarly, for the model without competitors' ad spending we get the following output:
The p-value of the F-test in the two models are 0.3142 and 0.3086 respectively. So the F-tests are not significant at a reasonable level.
Thus, we don't see any strong evidence that the region of the country makes a statistically significant difference in the relative merits of different advertising strategies (copy and/or ad spending)
(Additional comment: In what sense did the competitors’ interfere with the experiment?
According to Regression 1, the estimated coefficient on Dum Copy is 2.134 with a standard error of 2.027 (p-value about 0.306). Thus the regression provides only weak evidence in favor of the "emotional" copy. Why such an insignificant coefficient estimate? It could be that the true coefficient is close to zero. However part of the explanation may well lie in the "interference" run by the competitors. How did they interfere with the experiment? By varying their advertising as a function of the Nopane ad copy and spending levels, they created correlation among the X variables in the regression. This can be seen in Regression 2, which shows that about 57% of the variation in competitors' ad spending can be explained by the variables manipulated in Ms. Silk's experiment. It can also be seen in the correlation matrix,
or by examining the variance inflation factors (VIFs) calculated by Kstat for the regression. Recall from the course packet that a VIF for a regression coefficient tells you how many times smaller the estimated variance of the coefficient would have been if there were no correlation in the X variables.
We see that Dum Copy has a VIF of 2.1, meaning that the variance of the coefficient estimate is more than twice as large (standard error is aboutor 1.4 times as large) as it would have been with uncorrelated X's. So the standard error of Dum Copy has been inflated from 1.43 to 2.027 by multicollinearity. This is the sense in which the competitors' behavior "interfered" with the experiment: the correlation it created makes the coefficient estimates less accurate (i.e., have larger standard deviations) than they otherwise would have been.)