I am currently in the process of putting together a climate survey to assess departmental climate. As a result, I’ve been looking at some existing surveys that other departments have done and, no offense, but most of them are terrible. It’s occurred to me that, of course, questionnaire construction is extremely difficult in all sorts of non-obvious ways and most people are not ever trained in it. This post shares some thoughts on constructing climate surveys that might be helpful to people who want to construct their own surveys. I will share my questionnaire at the end of this post, although, for reasons explained below, I strongly recommend that you customize it if only just a little bit.
Length
One major reason I decided to write one from scratch was that the one previously used, as well as many I found online, were too long. This was mostly because they were trying to test for too many different things – and sometimes that’s unavoidable. However, lengthy questionnaires decrease both response rate and response validity (because people get increasingly impatient and unwilling to think their answers through). Since testing for only some questions is better than getting invalid responses for all questions, I consider controlling for survey length as my top priority. As we will see, this means that I will have to make sacrifices pretty much in every aspect.
Question wording
It’s obvious that one should avoid ambiguous wording, but it’s hard to pinpoint exactly what that means. First of all, wording is important not only because we don’t want respondents to misunderstand the question, but also because we want to alleviate as much cognitive load as possible, so it is easier for respondents to answer accurately. One way to do that is to use wording that is appropriate for the respondents. This is why you should always customize your questionnaire. For example, usually the general advice is that one should avoid complex sentence structure, but I opted to use complex sentences in order to express precise meanings, because I believe they won’t pose a problem for my fellow philosophy graduate students.
Another key aspect is level of generality of the questions. Some theorists believe that questionnaires should always start with short, general questions explicitly on the topic at hand. For example, “how do you think about the overall climate of the department?” I disagree with this advice for two reasons. 1) I don’t think anything useful is measured by this question, and so it lengthens the questionnaire unnecessarily. 2) Having an overarching question at the start might invoke an ordering effect: people may arbitrarily answer one way or another, but then commit to this answer for the rest of the questionnaire.
Finally, one should carefully balance perception questions and factual questions, and make the distinction clear (especially for more sophisticated respondents). The reason is this. On the one hand, asking for specific incidents (e.g., how many times have you heard of an inappropriate remark made by a faculty member?) can help gather information that is more objective. On the other hand, overall impression in the absence of concrete incidents is also important for climate. Ideally, we would like to cover both aspects, but that increases questionnaire length. I think it really should depend on the context: if the problem is more serious and outside intervention is on the table, objective measures are more appropriate.
What to measure
Since I prefer specific questions to general ones, it’s hard to narrow down the aspect I want to measure. It’s tempting to just think up good questions and ask them, but that destroys construct validity. If we were to construct “real” questionnaires (for widespread use), we would think up hundreds of questions to give to as many people as possible, and then perform a factor analysis to see if the questions relate to each other in ways we expect. The assumption is that, if two questions with similar surface content do not correlate highly, then at least one of them is probably not interpreted by respondents as intended. Since I can’t really do that for my climate survey, I need to be very careful what I try to measure with each question. I recommend deciding on a few themes you want to measure, and then separate questions out accordingly. Each question should only measure one theme. Each theme should have at least a handful of questions measuring it.
Validity measures
It’s generally a good idea to include some validity measures in your questionnaire, even though they all lengthen the questionnaire. There are a few ways of doing it. One is to measure the exact same thing from two angles. For example, have a question on “I feel comfortable talking to faculty about climate issues” and another on “I do not feel comfortable talking to faculty about climate issues”. If a person’s answers to these questions are not perfectly consistent, then the person might not be answering truthfully. It’s usually recommended that the wording of these questions be as similar as possible, so that we can be sure they measure the same thing and that the answers are supposed to be consistent. However, I’ve found that a lot of times trying to keep the wording consistent 1) makes the sentence sound very unnatural, increasing cognitive load; 2) makes it too obvious to respondents that the surveyor does not trust them, harming rapport. Also because of point 2, I recommend not asking the reversal questions back-to-back.
A second way to check validity is to include a question that has an “obviously right” answer that most definitely isn’t true. A typical example is “I have never told a lie”. That’s not true of anyone, but it sounds good. So the idea is that people answer “yes” to that question are trying to make themselves appear better than they actually are or that they lack insight, and so the validity of their other answers are questionable as well. The problem with this strategy is that it’s really hard to do right, and more often than not just confuses people.
Scales, scales
I decided to do a 5-point Likert scale: 1 (strongly disagree) to 3 (neutral) to 5 (strongly agree). I decided to not include a “don’t know/ not applicable” option, because I think all my questions apply to all my intended respondents. I opted to do a 5-point scale rather than a 4-point force-choice (i.e. where you have to slightly agree or disagree with each question, because there’s no “neutral” option). I’m generally a fan of force-choice questions, but, from first-hand experience and introspection thereof, I do think force-choices harm validity and frustrate respondents.
Likert-type scales are usually interpreted as interval scales for the purpose of correlation analysis. That is, the “distance” between any two numbers is considered as the same. While there exist criticisms on this point, I haven’t encountered anything suggesting that this is actively harmful. That said, I do think 7-points are too much. With 7 or more points, you’re running the risk of some people unwilling to use the extreme ends of the scale for cultural reasons. It also increases cognitive load.
The survey
…is here. I might update it in the future. It’s free to use, but, as always, it’s nice to receive credit.
- It might happen after all - May 14, 2023
- Another job market data point - December 17, 2022
- Our place in the fediverse - November 30, 2022
Reading the survey really makes me realize that climate in my department sucks..