Extract Ideas from Text Responses -- Manual Coding

A respectable survey conducted by a law firm, law department, or legal vendor gives respondents ample opportunity not only to pick items from selections or prioritize items but also to expand on their ideas. They add their own thoughts in text boxes, which often accompany multiple-choice questions that have an “Other” choice. Or free text might emanate from open-ended questions like “How do you feel about the training you have been offered for confrontation management?” When survey responses include free text, it is incumbent on sponsors to figure out what has been written and conclude something from it.

The old-fashioned way consultants and survey sponsors developed a sense of what written in comments was for someone to read them all and then list the “key ideas” they noticed. The subjective, bespoke, and biased nature of such loose interpretation is evident.

A more thoughtful, but still traditional, method to extract ideas from text responses is to code them by hand. All the text responses to a given question are copied into a spreadsheet that has one row per response. Whoever is tasked with parsing those comments (we will dub that person the “coder”) reads them and enters the key idea or term, briefly, in a column to the right of the comment. If the comment expresses a second idea, that goes in the third column to the right, and so on. The key terms try to capture the essence of the comment’s message. Once coding has finished, the report can state summarized findings along the lines of “More than half the respondents expressed concerns with how well law department lawyers understand dealing with conflict.”

The choice of what word or term concisely characterizes a comment will challenge anyone. As I do manual coding, I keep revising what I think is the most representative phrase or word. I keep sorting the worksheet by the columns and making them consistent by dragging up or down my favored term (Beware! Every time you sort in Excel you must include all the columns.). At times you consolidate two similar terms, break a term up to reflect nuances of meaning, or create a new term category.

Manual coding takes time, attention, and thoughtfulness. Time because you must read the comment, and they can vary in length from a single word to 3000+ characters, and then tease out its central point and whether the comment addresses more than one point. Concentrated attention because people express themselves with different degrees of clarity. Thoughtfulness because whoever is coding needs to have the intelligence and desire to formulate expressive summaries and note unusual ideas. Ideally, a subject matter expert would do the coding or review the assignment of codes as the process trundles along. In the most scrupulous analyses, more than one coder follows the same procedure, and the two coders periodically reconcile the phrases and terms they use (“Let’s use ‘conflict’ rather than ‘strong disagreement’.”). Confirming the coding with double coders takes time and costs money.

Having coded the text question answers, you are better positioned to quote telling messages verbatim in the report because you can easily gather related comments together and pick the best one. You never want to disclose that a specific individual wrote a specific comment, but unidentifiable quotations give readers of the report the voice of the people and are compelling.

Let’s consider four steps that might improve the manual coding sketched above.

• Coding can create columns that identify a comment’s primary message, secondary message, tertiary, and so on. But ranking message ideas by importance or strength of expression complicates the process even more. It’s an enticing goal, but in my experience a chimera.

• Categorizing explanations in an “Other” text box into the most similar of the given selections represents a crude form of converting free-form comments into standardized terms or phrases, but it can only succeed with a few comments.

• Asking coders to record their impression of characteristics of the comment, such as creativity (unusualness, out-of-the-box possibilities), sentiment (positivity, negativity, neutrality), or intensity (number of words, vocabulary, or tone). For each of these characteristics, a scale of 1 to 5 suffices. If you add this step, even though it is highly impressionistic, you can summarize text

• Powerful complements and extensions to manual coding flow from the tools of natural language processing. For example, word clouds, associations, or topic modeling, which extracts latent topics from survey comments. A limitation on algorithmic coding is that short comments make it harder for the software to figure out a topic. Plus, there may be too few comments out of a small survey for the NLP tools of topic models to get purchase. One source explains that “Topic modeling can work on relatively small texts … You do need a large number of documents. At least 100, ideally over 1000.”