Introduction to Validity

Validity is the best available approximation to the truth of a given proposition, inference, or conclusion.

The first thing we have to ask is: “validity of what?” When we think about validity in research, most of us think about research components. We might say that a measure is a valid one, or that a valid sample was drawn, or that the design had strong validity. But all of those statements are technically incorrect. Measures, samples and designs don’t ‘have’ validity – only propositions can be said to be valid. Technically, we should say that a measure leads to valid conclusions or that a sample enables valid inferences, and so on. It is a proposition, inference or conclusion that can ‘have’ validity.

We make lots of different inferences or conclusions while conducting research. Many of these are related to the process of doing research and are not the major hypotheses of the study. Nevertheless, like the bricks that go into building a wall, these intermediate process and methodological propositions provide the foundation for the substantive conclusions that we wish to address. For instance, virtually all social research involves measurement or observation. And, whenever we measure or observe we are concerned with whether we are measuring what we intend to measure or with how our observations are influenced by the circumstances in which they are made. We reach conclusions about the quality of our measures – conclusions that will play an important role in addressing the broader substantive issues of our study. When we talk about the validity of research, we are often referring to these to the many conclusions we reach about the quality of different parts of our research methodology.

We subdivide validity into four types. Each type addresses a specific methodological question. In order to understand the types of validity, you have to know something about how we investigate a research question. Because all four validity types are really only operative when studying causal questions, we will use a causal study to set the context.

The figure shows that there are really two realms that are involved in research. The first, on the top, is the land of theory. It is what goes on inside our heads as researchers. It is where we keep our theories about how the world operates. The second, on the bottom, is the land of observations. It is the real world into which we translate our ideas – our programs, treatments, measures and observations. When we conduct research, we are continually flitting back and forth between these two realms, between what we think about the world and what is going on in it. When we are investigating a cause-effect relationship, we have a theory (implicit or otherwise) of what the cause is (the cause construct). For instance, if we are testing a new educational program, we have an idea of what it would look like ideally. Similarly, on the effect side, we have an idea of what we are ideally trying to affect and measure (the effect construct). But each of these, the cause and the effect, has to be translated into real things, into a program or treatment and a measure or observational method. We use the term operationalization to describe the act of translating a construct into its manifestation. In effect, we take our idea and describe it as a series of operations or procedures. Now, instead of it only being an idea in our minds, it becomes a public entity that anyone can look at and examine for themselves. It is one thing, for instance, for you to say that you would like to measure self-esteem (a construct). But when you show a ten-item paper-and-pencil self-esteem measure that you developed for that purpose, others can look at it and understand more clearly what you intend by the term self-esteem.

Now, back to explaining the four validity types. They build on one another, with two of them (conclusion and internal) referring to the land of observation on the bottom of the figure, one of them (construct) emphasizing the linkages between the bottom and the top, and the last (external) being primarily concerned about the range of our theory on the top. Imagine that we wish to examine whether use of a World Wide Web (WWW) Virtual Classroom improves student understanding of course material. Assume that we took these two constructs, the cause construct (the WWW site) and the effect (understanding), and operationalized them – turned them into realities by constructing the WWW site and a measure of knowledge of the course material. Here are the four validity types and the question each addresses:

Conclusion Validity: In this study, is there a relationship between the two variables?

In the context of the example we’re considering, the question might be worded: in this study, is there a relationship between the WWW site and knowledge of course material? There are several conclusions or inferences we might draw to answer such a question. We could, for example, conclude that there is a relationship. We might conclude that there is a positive relationship. We might infer that there is no relationship. We can assess the conclusion validity of each of these conclusions or inferences.

Internal Validity: Assuming that there is a relationship in this study, is the relationship a causal one?

Just because we find that use of the WWW site and knowledge are correlated, we can’t necessarily assume that WWW site use causes the knowledge. Both could, for example, be caused by the same factor. For instance, it may be that wealthier students who have greater resources would be more likely to use have access to a WWW site and would excel on objective tests. When we want to make a claim that our program or treatment caused the outcomes in our study, we can consider the internal validity of our causal claim.

Construct Validity: Assuming that there is a causal relationship in this study, can we claim that the program reflected well our construct of the program and that our measure reflected well our idea of the construct of the measure?

In simpler terms, did we implement the program we intended to implement and did we measure the outcome we wanted to measure? In yet other terms, did we operationalize well the ideas of the cause and the effect? When our research is over, we would like to be able to conclude that we did a credible job of operationalizing our constructs – we can assess the construct validity of this conclusion.

External Validity: Assuming that there is a causal relationship in this study between the constructs of the cause and the effect, can we generalize this effect to other persons, places or times?

We are likely to make some claims that our research findings have implications for other groups and individuals in other settings and at other times. When we do, we can examine the external validity of these claims.

Notice how the question that each validity type addresses presupposes an affirmative answer to the previous one. This is what we mean when we say that the validity types build on one another. The figure shows the idea of cumulativeness as a staircase, along with the key question for each validity type.

For any inference or conclusion, there are always possible threats to validity – reasons the conclusion or inference might be wrong. Ideally, one tries to reduce the plausibility of the most likely threats to validity, thereby leaving as most plausible the conclusion reached in the study. For instance, imagine a study examining whether there is a relationship between the amount of training in a specific technology and subsequent rates of use of that technology. Because the interest is in a relationship, it is considered an issue of conclusion validity. Assume that the study is completed and no significant correlation between amount of training and adoption rates is found. On this basis it is concluded that there is no relationship between the two. How could this conclusion be wrong – that is, what are the “threats to validity”? For one, it’s possible that there isn’t sufficient statistical power to detect a relationship even if it exists. Perhaps the sample size is too small or the measure of amount of training is unreliable. Or maybe assumptions of the correlational test are violated given the variables used. Perhaps there were random irrelevancies in the study setting or random heterogeneity in the respondents that increased the variability in the data and made it harder to see the relationship of interest. The inference that there is no relationship will be stronger – have greater conclusion validity – if one can show that these alternative explanations are not credible. The distributions might be examined to see if they conform with assumptions of the statistical test, or analyses conducted to determine whether there is sufficient statistical power.

The theory of validity, and the many lists of specific threats, provide a useful scheme for assessing the quality of research conclusions. The theory is general in scope and applicability, well-articulated in its philosophical suppositions, and virtually impossible to explain adequately in a few minutes. As a framework for judging the quality of evaluations it is indispensable and well worth understanding.