Supplement Your Data with External Data
The data analyzed in a survey project consists of three kinds: attitudes and answers that only the respondents know; extensions of that data, such as index variables, categorized continuous variables, coded comments, or calculated variables (such as fully loaded internal costs per hour); and thirdly, supplemental information that the sponsor stirs into the mix. Thus, human data, software data (so to speak), and external data.
Like Christmas trees padded with attached branches, supplemental data fills out the analyses. The new variables permit finer tabulations, graphics with more distinctions of shape or color, or regression models with higher explanatory values. More data brings with it more powerful and nuanced interpretations.
What kinds of supplemental, external data might one see when law firms, law departments of corporations, or legal vendors want to bolster the data their survey has collected? Almost always they enrich their survey demographics with supplemental information from elsewhere.
• If you know the person who has submitted the survey responses, which means that the response cannot be anonymous, Human Resources might allow you to include age or gender, or the date of employment of the employee. For a law firm timekeeper, you could include billable hours during the previous fiscal year, the number of matters worked on, partners worked with, and much more.
It is a step to far to say that you should not ask for information from the respondent that you as the survey sponsor can provide yourself. Even in a confidential survey of employees, if you know their name, you can add their title, office, compensation, and other information, but it is much easier (and likely more accurate) to ask them to fill in the appropriate demographic data. [Merging data sets sounds simple, but software can bare its fangs!]
• If you know a company name or an office location, to give two examples, the survey analysts might add the stock market capitalization of the company or the cost of living of a city, respectively.
• If a corporate legal team sends a survey to its major law firms, it might supplement what it learns from the firms with the amount of fees paid them over the past two years, the number of matters worked on, a count of offices of the firm, profits per partner, total revenue, and other information about the firm.
Here we have another trade-off, because you don’t want to waste respondents valuable time or bog them down with supplying simple fact, yet you ask them what you already can know. You want from them thoughts you cannot generate yourself. The ideal supplemental data consists of contextual information that the respondent does not know off the top of his head, such as the rating of a lawyer in a law department in terms of succession planning or the number of years a law department has retained a law firm.
As with so many aspects of surveys, the methodology section needs to explain the sources of any supplemental data that you have added beyond what the participants provided. For example, if you ask for the headquarters country of a company, and you add a ranking for that country on a corruption index, that addition and source need to be fully described. The description will likely need to address missing data, which often happens because your external source doesn’t cover every demographic item you want to append to, and how your analysis coped with those holes (e.g., the form of imputation, if any).