Spot Duplicate Submissions of Individuals and Decide What to Keep

Two responses from the same person happen when a survey is kept open for multiple months or when the sponsor sets in motion multiple ways of letting people know about the survey. Inevitably, people forget that they have completed the survey (ouch, not a memorable survey!). It is important to cull duplicate submissions because they will distort findings, especially if you subset your data into relatively small groups.

You can cut down on the number of duplicate responses by barring more than one response from the same ISP address. The hosting software may be able to assure only one submission per person (but a person who takes the survey from home and also from the office circumvents this guard). A thank you e-mail will also help respondents recall later that they already took the survey.

The first sign of a doubleton is usually the last name of the respondent. Last names, however, are not unique, although first and last name combinations usually do the trick. Once I joined first and last names so that my software could spot submissions by the same person. In the old-fashioned way you can sort a name column and scan down it for repeated entries, which might be duplicates.

But other clues can surface a suspect pair. Knowing the organization of the respondent, for example, helps detect duplicate responses. Likewise, an email address affords a way to determine if someone has entered more than one response to a survey. Email addresses have the advantage of being unique. Software can identify duplicates when the user tells which variable to inspect.

You can deal with duplicates by keeping the latest submission, keeping the one that has the fullest set of answers, keeping the one that answered the most crucial questions, or writing to the person to ask them. My default is to take the most recent response because that may reflect the most up-to-date data and opinions. Time has passed and facts have changed.

In a couple of instances I have looked at the responses and chosen the one that seemed the most comprehensive, in the sense of answering the most questions. It is possible that the person turned in two surveys because they knew they had not finished the first. A variation on this solution is to combine the data of the two responses to create one response with the most answers. You don’t have to pick one or the other.

if you write to somebody about their duplicate responses, their reply should guide you, but they may ask you to send them their data back, which can be a pain. It is also possible to encourage them to take the survey a third time, and then you delete the first two responses. I find that possibility slim.

As a side note, I have been surprised at how much the two responses sometimes differ. For a question as seemingly fixed as “How many people report to you?” you find the duplicate responses give different numbers! Sure, the set of direct reports might have changed, but has corporate revenue in the latest fiscal year? Also, if you ask for people to estimate percentages of time, from one answer to the next you see fluctuations. Shifts in estimates are to be expected; shifts in fixed numbers trouble me.