Wednesday, May 13, 2009

Nonresponse


People who respond vs. those who don’t to surveys are sometimes very different, including sometimes on key types of information that researchers are studying. Let’s say you’re studying people’s opinions about gun control. People for whom this issue is very important – let’s say, National Rifle Association members (let’s say they’re very against gun control) and relatives of gun violence victims (let’s say they’re very for gun violence) - may be much more likely to respond to questions about gun control. Because of this, surveys ostensibly of samples of the whole U.S. population may artificially suggest the public is polarized on this issue. There may, in fact, be lots of people who don’t have strong feelings for or against gun control. It may be hard to notice this if the survey research response rates among NRA members and gun violence victim relatives are much larger than response rates among other people.

But how to figure this out and what to do about it?The first paid workshop I went to at the American Association for Public Opinion Research conference on Hollywood (FL) beach this week dealt with this.

One of the key would-be presenters skipped out because the Obama Administration has just appointed him to direct the U.S. Census Bureau. A University of Nebraska professor substituted for him, joining a researcher for a private research firm.

A key point Michael Brick and Kristen Olson made was that many survey researchers focus on response rates (the percentage of people you asked to respond to a survey who actually did so) as a proxy for error due to nonresponse, to nonresponse bias. But they cited a 2008 study that showed that many studies with all different kinds of response rates can suffer from large nonresponse bias. Even with surveys with response rates of 60, 70, 80, or 90 percent, the people who respond and the people who don’t respond may be very different, even on the key variables the researchers are interested in. Conversely, surveys with response rates of 10, 20, or 30 percent may involve respondents and nonrespondents who are not very different from each other on relevant key variables. In general, this study showed that response rates (oir - inversely - nonresponse rates) are a very imperfect indicator of nonresponse bias.


A strategy for assessing nonresponse bias initially that one of my colleagues (Jack) often suggests is comparing results for a key variable on one survey with results from another another, maybe even a better survey. The workshop leaders suggested comparing results on – for example, age – for a general survey of the U.S. population that one might be working on with – let’s say, the age distribution – that comes out of the Census Bureau’s American Community Survey. In my office we can compare results for surveys of Presbyterian congregations (as answered by their leaders) with Office of the General Assembly data (that actually comes from another survey of congregations, the Session Annual Statistical Report).

I was recently looking at responses to yet another survey of Presbyterian congregations (this one from last year) and comparing responses on some similar questions on another survey, this one from 2000. I was interested in change, but I was a little suspicious of some of the changes that showed up (partly because the implications seemed so bleak: more financial problems now, fewer volunteers, fewer staff, weaker vision for the future, etc.). I wondered if too many of the responses to the current survey came from small, struggling Presbyterian congregations. I went to OGA/SASR data and found that the median average worship attendance for the congregations surveyed was pretty similar to that for all PC(USA) congregations (around 70 worshipers on Sunday). That surprised me. What may have exaggerated 2000-2008 changes, however, was that the median worship attendance for congregations whose leaders responded to the 2000 survey was significantly larger than median attendance for Presbyterian congregations in general at that time. Median worship attendance has decreased a little, but not as much as you’d think from just looking at these two surveys.

There are at least two possibilities (and it could be both): The possibility that the initial sample for the 2000 survey (around 700 congregations) was less representative (attendance size wise) than the smaller sample for the 2008 survey (200 congregations) seems intuitively implausible, because of the size difference (though it could be a factor). More plausible is differential response by congregational size leading to response bias. With the 2000 survey in particular, leaders of a smaller percentage of small congregations responded to the survey. If smaller congregations are struggling more, this gave an exaggerated picture of how financially secure and loaded with paid staff and volunteer Presbyterian congregations were in 2000. In turn it exaggerated a change in financial security and people resources among Presbyterian congregations between 2000 and 2008. For whatever reason (with the smaller 2008 sample), this response bias may have not occurred, yielding a more accurate picture now (which still isn’t that pretty).

If further analysis confirmed that response bias at work, we might adjust the 2000 survey results by weighting – counting disproportionately the responses of leaders of smaller congregations that DID respond and therefore reducing the – counting less – the responses of leaders of larger congregations. We could also recalculate the response rate to reflect our estimate of in what proportion leaders of congregations of all sizes responded – rather than a general response rate, unweighted – in other words, to weight the response rate by size to account for how much response rates varied among congregations of different sizes.

The workshop leaders also suggested finding out more about the sampled cases to try to learn more about response bias. One strategy is to try to match survey respondents with other information we have about them. We actually already did this with the 2000 and 2008 survey data. Instead of using the Sunday worship attendance figures the respondents reported in the survey, my colleague Ida and I matched the survey data to the OGA/SASR data and used the average Sunday worship attendance for those sampled congregations from the SASR survey. Using these data, we could then compare survey response rates among sampled congregations of different sizes – to assess the theory I laid out above that – among sampled 2000 congregations – fewer small congregations had leaders respond to the survey, compared with large congregations.

But there are also other sources of data about congregations. We could link to census data to see – for example - if leaders of congregations from different regions of the country responded at different rates. If we were e-mail inviting people in a group to participate in a Web-based survey, we could assess whether people for whom we apparently had personal e-mail accounts (by – let’s say – virtue of their e-mail addresses ending with endings that suggested subscription with popular Internet Service providers like Google mail, America On-Line, Earthlink, Insight Communications, and so on ) responded at different rates from people who appeared to have organizational e-mail addresses. If we were surveying people on the Presbyterian Panel or a part of our hymnal study, we’re likely to have information from previous surveys (including the Panel background survey) or short screening surveys (the Hymnal study). In that case we could use information from these earlier surveys to assess – for example – whether women and men responded at different rates or whether leaders of congregations that use vs. those that don’t use the existing “Presbyterian Hymnal” respond at different rates.

Again, there’s always the chance to re-weight the results and the response rates to try to counteract apparent biases in the results due to differential nonresponse to surveys.

The workshop leaders also talked about extraordinary efforts to try to persuade nonrespondents to participate in a survey. My colleague Jack helped get us a grant several years ago to try this with the Panel – by calling Presbyterians who weren’t participating in the Panel (A cheaper way to do this is to call a sample of nonrespondents – but then weighting is even more complex.) Using incentives – like money, sometimes sent with a survey (and offered to would-be respondents as a sign of trust and an indicator of a survey’s importance) – is another strategy. (I once got a $5 bill in an envelope with a blank survey from our magazine, “Entertainment Weekly.”) Another similar strategy we’ve talked about using is sending reply envelopes with real stamps instead of business reply envelopes with which the U.S. Postal Service only charges us if the people send the survey back in the envelope. As with the cash incentives, placing a real stamp on the return envelope is a sign of trust that the would-be respondent will indeed reply. It also makes the process look even more official and professional. And our would-be respondents might feel guilty if they’ve cost the church another stamp and haven’t completed and returned the survey.

But in general even the reminder e-mail messages, post cards, and letters with duplicate surveys we routinely send out are a form of “extraordinary effort.” So we might compare the responses on key questions of respondents who replied before reminders were sent out with those who responded only after receiving reminders – or responses of those who replied after final reminders versus responses by those who replied earlier. We might then extrapolate that responses by pure nonrespondents would in fact be similar to those of late respondents and adjust our reported results accordingly. In practice, this would amount to weighting more heavily the responses by late respondents (maybe a lot if the overall response rate was low).

In general, the workshop leaders said that response bias occurs with statistics, not with surveys. Although they did not talk much about item nonresponse (people who respond to surveys but skip some questions (as many people do)), they said that – even if people in different groups in your survey – let’s say women and men – respond at very different rates, this alone does not produce response bias if people in these two groups don’t disagree or have different experiences on the key topics of the survey. Let’s say 60 percent of women elders – but only 30 percent of men elders – of congregations in a presbytery responded to a survey. But let’s say that – in the actual congregations – women and men didn’t disagree at all as to whether the presbytery should make a particular personnel or policy change. If the centerpiece of the survey is to assess attitudes about a proposed change in the presbytery, and people of different genders from among elders in congregations of that presbytery agree on this change, the differential response rates by gender will not produce response bias that is substantively significant.

Although workshop leaders urged survey researchers to keep in mind studying and trying to counteract response bias when we are designing surveys, they spent only a little time at the end talking about research studies that include experimental designs to assess response bias. Let’s say you propose a study that employs different survey “modes” – phone surveys, regular mailed printed surveys, Web surveys with e-mail invitations, door-to-door canvassing for in-person surveys.. If it’s already apparent that door-to-door surveys usually work best, for example, most clients aren’t going to say – let’s do this four different ways, even though I know that we’d get the best results if we did them all by in-person methods. I’m willing to have you consign three-quarters of our would-be respondents to survey methods that will persuade fewer of them to participate in the survey, for the sake of generating more research about nonresponse and improving future surveys. Most clients won’t go for that. (Keep in mind that there is some evidence that people are less truthful for example about sensitive topics – urinary incontinence was a topic of one study the workshop leaders covered – in-person than they are in more anonymous surveys (like printed or Web).) Nevertheless, a few researchers have been able to do these kinds of experimental studies.



One key formula from today: base weights should be the inverse of probabilities of selection. So for example if you stratify a sample and for whatever reason you sample 20 percent of men and 50 percent of women, the base weight for responses by men should be 5 (1 divided by 1/5) and for women should be 2 (1 divided by ½.).

Another formula: an estimate of nonrespponse error is equal to the response rate multiplied by the difference between the mean response by the respondents and the mean response by the nonrespondents. The trick is to gauge that latter mean. How much nonresponse error is too much? Too much, the workshop leaders said, occurs when the nonrepsonse error is 10 percent of the sampling error (sampling error – another source of error – depends on the sample size, the distribution of responses to a key question, and the confidence with which you want to say responses by a sample represent responses by )all in a population.)

On to the topic of questionnaire construction Thursday morning.

-- Perry

No comments: