State Respondents per Question with N =

You can’t make every question a required question, or respondents will likely feel bullied, vexed, and tempted to drop out. For many reasons, people leave answers blank (they don’t want to give that piece of data, can’t track it down quickly, only want to spend a few minutes on the survey, don’t have an opinion, don’t know the answer, never finished the questionnaire, or are not allowed to share the information). At other times what they enter isn’t usable as it’s not credible or legible. The result is that each substantive question will almost certainly have a fluctuating number of analyzable responses. If you persuaded 100 people to venture into your survey, any given question in it that doesn’t ask for demographic information may produce only 70-90 usable answers.

It is not good enough to state how many respondents your survey collected. Readers can’t judge from that overall figure how many respondents gave a usable figure to each individual question.

A precept of reproducible research, survey results reported so that readers can fully understand the methodology and judge the credibility of the findings, is to make generous use of “**N = **” for the number of data points reported on each individual question. That conventional shorthand for “how many solid answers we are we talking about” appears in or near every commendable graphic or table. Whether in the sub-title of a plot, the text that relates to it, on the table itself or in a footnote, a reader should be able to learn quickly how many respondents answered each question, how many documents were reviewed, how many law departments had a given benchmark, or whatever pertains to the foundational answers underlying the questions discussed in the report.

You conventionally see “N = “ in the caption below a graphic, such as in the lower right. But it could be stated in the text or in the Appendix, especially if the Appendix reproduces the questions. No rules govern where to place this important information.

The larger the N, the more reliable the statistics such as averages or medians that are calculated from the data. For example, if the claim is that “average base compensation of general counsel” rose 2% from one year to the next, it makes a huge difference whether that change applies to N = 38 [general counsel] or N = 188. Changes in small numbers of observations have much less credibility than changes in large numbers.

Stated in statistical terms, the margin of error for any set of numbers depends largely on how many numbers there are. Especially, you should disclose that number if you carry out imputation on blank answers, which has limits and introduces artificiality into your data.

The ubiquitous “N = “ also signals that the sponsor cares about precision and disclosure. No one is trying to exaggerate the basis for their findings. N is a vital piece of transparency and honesty.

This subject also touches on the question of year-over-year comparisons. Not only do scrupulous readers deserve to know how many answered each year’s question and whether the sponsor reworked the question or its instructions, they deserve to know the overlap of answerers. If 60 answered last year (N = 60) and 65 answered this year, ideally the report would disclose how many of the 65 took part in the previous survey. Unless the comparison includes a relatively high proportion of the same respondents, any claim about a trend is irresponsible.