An alternative way to specify conditional contrasts or comparisons is through the use of the argument to or , which amounts to specifying which factors are used as variables. For example, consider:
Then is the same as .
One may specify a list for , in which case separate runs are made with each element of the list. Thus, returns two sets of contrasts: comparisons of for each combination of the other two factors; and comparisons of combinations for each .
A shortcut that generates all simple main-effect comparisons is to use . In this example, the result is the same as obtained using .
Ordinarily, when is a list (or equal to ), a list of contrast sets is returned. However, if the additional argument is set to , they are all combined into one family:
The dots () in this result correspond to which simple effect is being displayed. If we re-run this same call with or omitted, these twenty comparisons would be displayed in three broad sets of contrasts, each broken down further by combinations of variables, each separately multiplicity-adjusted (a total of 16 different tables).
An interaction contrast is a contrast of contrasts. For instance, in the auto-noise example, we may want to obtain the linear and quadratic contrasts of separately for each , and compare them. Here are estimates of those contrasts:
The comparison of these contrasts may be done using the argument in as follows:
The practical meaning of this is that there isn’t a statistical difference in the linear trends, but the quadratic trend for Octel is greater than for standard filter types. (Both quadratic trends are negative, so in fact it is the standard filters that have more pronounced curvature, as is seen in the plot.) In case you need to understand more clearly what contrasts are being estimated, the method helps:
Note that the 4th through 6th contrast coefficients are the negatives of the 1st through 3rd – thus a comparison of two contrasts.
By the way, “type III” tests of interaction effects can be obtained via interaction contrasts:
This result is exactly the same as the test of in the output.
The three-way interaction may be explored via interaction contrasts too:
One interpretation of this is that the comparison by of the linear contrasts for is different on the left side than on the right side; but the comparison of that comparison of the quadratic contrasts, not so much. Refer again to the plot, and this can be discerned as a comparison of the interaction in the left panel versus the interaction in the right panel.
Finally, emmeans provides a function that obtains and tests the interaction contrasts for all effects in the model and compiles them in one Type-III-ANOVA-like table:
You may even add variable(s) to obtain separate ANOVA tables for the remaining factors:
Consider the dataset included with the package. These data concern the sales of two varieties of oranges. The prices ( and ) were experimentally varied in different stores and different days, and the responses and were observed. Let’s consider three multivariate models for these data, with additive effects for days and stores, and different levels of fitting on the prices:
Being a multivariate model, emmeans methods will distinguish the responses as if they were levels of a factor, which we will name “variety”. An interesting way to view these models is to look at how they predict sales of each variety at each observed values of the prices:
The trends portrayed here are quite sensible: In the left panel, as we increase the price of variety 1, sales of that variety will tend to decrease – and the decrease will be faster when the other variety of oranges is low-priced. In the right panel, as price of variety 1 increases, sales of variety 2 will increase when it is low-priced, but could decrease also at high prices because oranges in general are just too expensive. A plot like this for will be similar but all the curves will be straight lines; and the one for will have all lines parallel. In all models, though, there are implied and interactions, because we have different regression coefficients for the two responses.
Which model should we use? They are nested models, so they can be compared by :
It seems like the full-quadratic model has little advantage over the interaction model. Strict .05-significance people would, I suppose, settle on the additive model. I like , but what follows could be done with any of them.
To summarize and test the results compactly, it makes sense to obtain estimates of a representative trend in each of the left and right panels, and perhaps to compare them. In turn, that can be done by obtaining the slope of the curve (or line) at the average value of . The function is designed for exactly this kind of purpose. It uses a difference quotient to estimate the slope of a line fitted to a given variable. It works just like except for requiring the variable to use in the difference quotient. Using the model:
From this, we can say that, starting with and both at their average values, we expect to decrease by about .75 per unit increase in (statistically significant since the confidence interval excludes zero); meanwhile, a slight increase (but not significant) of may occur. Marginally, the first variety has a .89 disadvantage relative to sales of the second variety.
Other analyses (not shown) with set at a higher value will reduce these effects, while setting lower will exaggerate all these effects. If the same analysis is done with the quadratic model, the the trends are curved, and so the results will depend somewhat on the setting for . The graph above gives an indication of the nature of those changes.
Similar results hold when we analyze the trends for :
At the averages, increasing the price of variety 2 has the effect of decreasing sales of variety 2 while slightly increasing sales of variety 1 – a marginal difference of about .92.
Interaction, by nature, make things more complicated. One must resist pressures and inclinations to try to produce simple bottom-line conclusions. Interactions require more work and more patience; they require presenting more cases – more than are presented in the examples in this vignette – in order to provide a complete picture.
I sure wish I could ask some questions about how how these data were collected; for example, are these independent experimental runs, or are some cars measured more than once? The model is based on the independence assumption, but I have my doubts.
This (a) doesn’t change the sample size ratio very much at all and (b) would considered a very strong effect in most fields.
Something crazy is going on: if I change labels for Male Female (resulting in a different 0/1 dummy coding) my *power* drops from 70% to 35%:
The basic function is this:
sim1 = function(n=40, mu=0.6343) { a = rnorm(n, 0.0, 1) b1 = rnorm(n/2, mu – mu/4, 1) b2 = rnorm(n/2, mu + mu/4, 1) # interaction is mu/2 dat = data.frame(Group = c(rep(“C”, n), rep(“T”, n)), Sex = rep(c(rep(“M”, n/2), rep(“F”, n/2)), 2), Val = c(a, b1, b2)) fit = lm(Val ~ Group*Sex, dat) return(summary(fit)$coefficients) }
Where is my error? It can’t be that an arbitrary label (and resulting positive or negative coefficient) halves my power from 70% to 35%!
Thanks for taking a look at this. I’m not sure whats going on, it seems to have something to do with how lm is treating the factors. You can change only the ordering of the factor levels and see that the estimate of the effect (and associated test stats) for the other factor level changes:
set.seed(1234) n = 4 mu = 1 a = rnorm(n, 0.0, 1) b1 = rnorm(n/2, mu – mu/4, 1) b2 = rnorm(n/2, mu + mu/4, 1) # interaction is mu/2
dat1 = dat2 = data.frame(Group = c(rep(“C”, n), rep(“T”, n)), Sex = rep(c(rep(“F”, n/2), rep(“M”, n/2)), 2), Val = c(a, b1, b2))
# Now male is the reference dat2$Sex = factor(dat2$Sex, levels = c(“M”, “F”))
fit1 = lm(Val ~ Group*Sex, dat1) fit2 = lm(Val ~ Group*Sex, dat2)
# Results dat1 dat2
summary(fit1) summary(fit2)
> dat1 Group Sex Val 1 C F -1.2070657 2 C F 0.2774292 3 C M 1.0844412 4 C M -2.3456977 5 T F 1.1791247 6 T F 1.2560559 7 T M 0.6752600 8 T M 0.7033681 > dat2 Group Sex Val 1 C F -1.2070657 2 C F 0.2774292 3 C M 1.0844412 4 C M -2.3456977 5 T F 1.1791247 6 T F 1.2560559 7 T M 0.6752600 8 T M 0.7033681 > > summary(fit1)
Call: lm(formula = Val ~ Group * Sex, data = dat1)
Residuals: 1 2 3 4 5 6 7 8 -0.74225 0.74225 1.71507 -1.71507 -0.03847 0.03847 -0.01405 0.01405
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.4648 0.9346 -0.497 0.645 GroupT 1.6824 1.3218 1.273 0.272 SexM -0.1658 1.3218 -0.125 0.906 GroupT:SexM -0.3625 1.8692 -0.194 0.856
Residual standard error: 1.322 on 4 degrees of freedom Multiple R-squared: 0.4079, Adjusted R-squared: -0.03622 F-statistic: 0.9184 on 3 and 4 DF, p-value: 0.5081
> summary(fit2)
Call: lm(formula = Val ~ Group * Sex, data = dat2)
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.6306 0.9346 -0.675 0.537 GroupT 1.3199 1.3218 0.999 0.374 SexF 0.1658 1.3218 0.125 0.906 GroupT:SexF 0.3625 1.8692 0.194 0.856
I absolutely agree as well. He should also take responsibility for all of the reckless posts that include cat pictures. These could be easily misleading to cat lovers who don’t carefully read the blog posts.
I absolutely agree. As if this whole hysterical obsession with cats hasn’t caused enough damage already!
That’s right ! Now every time I want to publish a picture of my cat the editor is refusing to do so because he wants it from at least 100 angles. No wait it’s actually a bad example that has nothing to do with the subject. I guess that means that your right again of course. Yes I need 10000 people in my sample for an interaction.
Think:
As I wrote above, I understand that my statistical advice can be upsetting because I’m a bearer of bad tidings. All I can tell you is that these issues have confused a lot of people for a long time, which is one reason why the replication crisis is a crisis and is one reason why people keep being surprised that “p less than 0.05” results aren’t getting replicated. Getting angry is easy but it won’t help you understand the world better in a replicable way.
Think again: what does “effect size of statistical significance” mean? Are you saying that too much emphasis on power is resulting in students confusing effect size with significance?
How do you think power should be taught, relative to how it is currently taught?
“Rather, my objection (1) was that the title is misleading because it’s easily misinterpreted by readers who don’t carefully read the blog post, which is almost certainly a large majority”
Seems a bit harsh, and a bit hard to believe. Who goes to a technical statistical methods blog and skins
…skims the titles?
(Accidentally hit submit there)
Another nitpick: the order of the arguments in the pnorm functions is wrong, isn’t it? It should be q, mean, sd, not mean, q, sd.
“He’s an angry person”; “I’m a very anxious person.” We’ve all made statements like these. They point towards the belief that emotions are hardwired in our brains or automatically triggered by events. But after decades of research at Northeastern University, neuroscientist Lisa Feldman Barrett has come to a different conclusion : “Your brain’s most important job is not thinking or feeling or even seeing, but keeping your body alive and well so that you survive and thrive … How is your brain able to do this? Like a sophisticated fortune-teller, your brain constantly predicts. Its predictions ultimately become the emotions you experience and the expressions you perceive in other people.” ( For an overview of her theory, watch her TED Talk . ) And that’s good news: Since our brain essentially constructs our emotions, we can teach it to label them more precisely and then use this detailed information to help us take the most appropriate actions — or none at all. Here, she explains how to do this.
One of the best things you can do for your emotional health is to beef up your concepts of emotions. Suppose you knew only two emotion concepts: “Feeling Awesome” and “Feeling Crappy.” Whenever you experienced an emotion or perceived someone else as emotional, you’d categorize only with this broad brush, which isn’t very emotionally intelligent. But if you could distinguish finer meanings within “Awesome” (happy, content, thrilled, relaxed, joyful, hopeful, inspired, prideful, adoring, grateful, blissful . . .), and fifty shades of “Crappy” (angry, aggravated, alarmed, spiteful, grumpy, remorseful, gloomy, mortified, uneasy, dread-ridden, resentful, afraid, envious, woeful, melancholy . . .), your brain would have many more options for predicting, categorizing and perceiving emotions, providing you with the tools for more flexible and useful responses. You could predict and categorize your sensations more efficiently and better suit your actions to your environment.
People who can construct finely-grained emotional experiences go to the doctor less frequently, use medication less frequently, and spend fewer days hospitalized for illness.
What I’m describing is emotional granularity, the phenomenon that some people construct finer-grained emotional experiences than others do. People who make highly granular experiences are emotion experts: they issue predictions and construct instances of emotion that are finely tailored to fit each specific situation. At the other end of the spectrum are young children who haven’t yet developed adult-like emotion concepts and who use “sad” and “mad” interchangeably. My lab has shown that adults run the whole range from low to high emotional granularity. So, a key to real emotional intelligence is to gain new emotion concepts and hone your existing ones.