# Threats to Conclusion Validity

A threat to conclusion validity is a factor that can lead you to reach an incorrect conclusion about a relationship in your observations. You can essentially make two kinds of errors about relationships:

- Conclude that there is no relationship when in fact there is (you missed the relationship or didn’t see it)
- Conclude that there is a relationship when in fact there is not (you’re seeing things that aren’t there!)

Most threats to conclusion validity have to do with the first problem. Why? Maybe it’s because it’s so hard in most research to find relationships in our data at all that it’s not as big or frequent a problem — we tend to have more problems finding the needle in the haystack than seeing things that aren’t there! So, I’ll divide the threats by the type of error they are associated with.

## Finding no relationship when there is one (or, “missing the needle in the haystack”)

When you’re looking for the needle in the haystack you essentially have two basic problems: the tiny needle and too much hay. You can view this as a signal-to-noise ratio problem.The “signal” is the needle — the relationship you are trying to see. The “noise” consists of all of the factors that make it hard to see the relationship. There are several important sources of noise, each of which is a threat to conclusion validity. One important threat is **low reliability of measures** (see reliability). This can be due to many factors including poor question wording, bad instrument design or layout, illegibility of field notes, and so on. In studies where you are evaluating a program you can introduce noise through **poor reliability of treatment implementation**. If the program doesn’t follow the prescribed procedures or is inconsistently carried out, it will be harder to see relationships between the program and other factors like the outcomes. Noise that is caused by **random irrelevancies in the setting** can also obscure your ability to see a relationship. In a classroom context, the traffic outside the room, disturbances in the hallway, and countless other irrelevant events can distract the researcher or the participants. The types of people you have in your study can also make it harder to see relationships. The threat here is due to **random heterogeneity of respondents**. If you have a very diverse group of respondents, they are likely to vary more widely on your measures or observations. Some of their variety may be related to the phenomenon you are looking at, but at least part of it is likely to just constitute individual differences that are irrelevant to the relationship being observed.

All of these threats add variability into the research context and contribute to the “noise” relative to the signal of the relationship you are looking for. But noise is only one part of the problem. We also have to consider the issue of the signal — the true strength of the relationship. There is one broad threat to conclusion validity that tends to subsume or encompass all of the noise-producing factors above and also takes into account the strength of the signal, the amount of information you collect, and the amount of risk you’re willing to take in making a decision about a whether a relationship exists. This threat is called **low statistical power**. Because this idea is so important in understanding how we make decisions about relationships, we have a separate discussion of statistical power.

## Finding a relationship when there is not one (or “seeing things that aren’t there”)

In anything but the most trivial research study, the researcher will spend a considerable amount of time analyzing the data for relationships. Of course, it’s important to conduct a thorough analysis, but most people are well aware of the fact that if you play with the data long enough, you can often “turn up” results that support or corroborate your hypotheses. In more everyday terms, you are “fishing” for a specific result by analyzing the data repeatedly under slightly differing conditions or assumptions.

In statistical analysis, we attempt to determine the probability that the finding we get is a “real” one or could have been a “chance” finding. In fact, we often use this probability to decide whether to accept the statistical result as evidence that there is a relationship. In the social sciences, researchers often use the rather arbitrary value known as the `0.05`

level of significance to decide whether their result is credible or could be considered a “fluke.” Essentially, the value `0.05`

means that the result you got could be expected to occur by chance at least 5 times out of every 100 times you run the statistical analysis. The probability assumption that underlies most statistical analyses assumes that each analysis is “independent” of the other. But that may not be true when you conduct multiple analyses of the same data. For instance, let’s say you conduct 20 statistical tests and for each one you use the `0.05`

level criterion for deciding whether you are observing a relationship. For each test, the odds are 5 out of 100 that you will see a relationship even if there is not one there (that’s what it means to say that the result could be “due to chance”). Odds of 5 out of 100 are equal to the fraction 5/100 which is also equal to 1 out of 20. Now, in this example, you conduct 20 separate analyses. Let’s say that you find that of the twenty results, only one is statistically significant at the `0.05`

level. Does that mean you have found a statistically significant relationship? If you had only done the one analysis, you might conclude that you’ve found a relationship in that result. But if you did 20 analyses, you would expect to find one of them significant by chance alone, even if there is no real relationship in the data. We call this threat to conclusion validity **fishing and the error rate problem**. The basic problem is that you were “fishing” by conducting multiple analyses and treating each one as though it was independent. Instead, when you conduct multiple analyses, you should adjust the error rate (i.e., significance level) to reflect the number of analyses you are doing. The bottom line here is that you are more likely to see a relationship when there isn’t one when you keep reanalyzing your data and don’t take that fishing into account when drawing your conclusions.

## Problems that can lead to either conclusion error

Every analysis is based on a variety of assumptions about the nature of the data, the procedures you use to conduct the analysis, and the match between these two. If you are not sensitive to the assumptions behind your analysis you are likely to draw erroneous conclusions about relationships. In quantitative research we refer to this threat as the **violated assumptions of statistical tests**. For instance, many statistical analyses assume that the data are distributed normally — that the population from which they are drawn would be distributed according to a “normal” or “bell-shaped” curve. If that assumption is not true for your data and you use that statistical test, you are likely to get an incorrect estimate of the true relationship. And, it’s not always possible to predict what type of error you might make — seeing a relationship that isn’t there or missing one that is.

I believe that the same problem can occur in qualitative research as well. There are assumptions, some of which we may not even realize, behind our qualitative methods. For instance, in interview situations we may assume that the respondent is free to say anything s/he wishes. If that is not true — if the respondent is under covert pressure from supervisors to respond in a certain way — you may erroneously see relationships in the responses that aren’t real and/or miss ones that are.

The threats listed above illustrate some of the major difficulties and traps that are involved in one of the most basic of research tasks — deciding whether there is a relationship in your data or observations. So, how do we attempt to deal with these threats? The researcher has a number of strategies for improving conclusion validity through minimizing or eliminating the threats described above.