Other Quasi-Experimental Designs

There are many different types of quasi-experimental designs that have a variety of applications in specific contexts. Here, I’ll briefly present a number of the more interesting or important quasi-experimental designs. By studying the features of these designs, you can come to a deeper understanding of how to tailor design components to address threats to internal validity in your own research contexts.

The Proxy Pretest Design

The proxy pretest design looks like a standard pre-post design. But there’s an important difference. The pretest in this design is collected after the program is given! But how can you call it a pretest if it’s collected after the program? Because you use a “proxy” variable to estimate where the groups would have been on the pretest. There are essentially two variations of this design. In the first, you ask the participants to estimate where their pretest level would have been. This can be called the “Recollection” Proxy Pretest Design. For instance, you might ask participants to complete your measures “estimating how you would have answered the questions six months ago.” This type of proxy pretest is not very good for estimating actual pre-post changes because people may forget where they were at some prior time or they may distort the pretest estimates to make themselves look better. However, there may be times when you are interested not so much in where they were on the pretest but rather in where they think they were. The recollection proxy pretest would be a sensible way to assess participants’ perceived gain or change.

The other proxy pretest design uses archived records to stand in for the pretest. We might call this the “Archived” Proxy Pretest design. For instance, imagine that you are studying the effects of an educational program on the math performance of eighth graders. Unfortunately, you were brought in to do the study after the program had already been started (a too-frequent case, I’m afraid). You are able to construct a posttest that shows math ability after training, but you have no pretest. Under these circumstances, your best bet might be to find a proxy variable that would estimate pretest performance. For instance, you might use the student’s grade point average in math from the seventh grade as the proxy pretest.

The proxy pretest design is not one you should ever select by choice. But, if you find yourself in a situation where you have to evaluate a program that has already begun, it may be the best you can do and would almost certainly be better than relying only on a posttest-only design.

The Separate Pre-Post Samples Design

The basic idea in this design (and its variations) is that the people you use for the pretest are not the same as the people you use for the posttest. Take a close look at the design notation for the first variation of this design. There are four groups (indicated by the four lines) but two of these groups come from a single nonequivalent group and the other two also come from a single nonequivalent group (indicated by the subscripts next to N). Imagine that you have two agencies or organizations that you think are similar. You want to implement your study in one agency and use the other as a control. The program you are looking at is an agency-wide one and you expect that the outcomes will be most noticeable at the agency level. For instance, let’s say the program is designed to improve customer satisfaction. Because customers routinely cycle through your agency, you can’t measure the same customers pre-post. Instead, you measure customer satisfaction in each agency at one point in time, implement your program, and then measure customer satisfaction in the agency at another point in time after the program. Notice that the customers will be different within each agency for the pre and posttest. This design is not a particularly strong one. Because you cannot match individual participant responses from pre to post, you can only look at the change in average customer satisfaction. Here, you always run the risk that you have nonequivalence not only between the agencies but that within agency the pre and post groups are nonequivalent. For instance, if you have different types of clients at different times of the year, this could bias the results. You could also look at this as having a proxy pretest on a different group of people.

The second example of the separate pre-post sample design is shown in design notation at the right. Again, there are four groups in the study. This time, however, you are taking random samples from your agency or organization at each point in time. This is essentially the same design as above except for the random sampling. Probably the most sensible use of this design would be in situations where you routinely do sample surveys in an organization or community. For instance, let’s assume that every year two similar communities do a community-wide survey of residents to ask about satisfaction with city services. Because of costs, you randomly sample each community each year. In one of the communities you decide to institute a program of community policing and you want to see whether residents feel safer and have changed in their attitudes towards police. You would use the results of last year’s survey as the pretest in both communities, and this year’s results as the posttest. Again, this is not a particularly strong design. Even though you are taking random samples from each community each year, it may still be the case that the community changes fundamentally from one year to the next and that the random samples within a community cannot be considered “equivalent.”

The Double Pretest Design

The Double Pretest is a very strong quasi-experimental design with respect to internal validity. Why? Recall that the Pre-Post Nonequivalent Groups Design (NEGD) is especially susceptible to selection threats to internal validity. In other words, the nonequivalent groups may be different in some way before the program is given and you may incorrectly attribute posttest differences to the program. Although the pretest helps to assess the degree of pre-program similarity, it does not tell us if the groups are changing at similar rates prior to the program. Thus, the NEGD is especially susceptible to selection-maturation threats.

The double pretest design includes two measures prior to the program. Consequently, if the program and comparison group are maturing at different rates you should detect this as a change from pretest 1 to pretest 2. Therefore, this design explicitly controls for selection-maturation threats. The design is also sometimes referred to as a “dry run” quasi-experimental design because the double pretests simulate what would happen in the null case.

The Switching Replications Design

The Switching Replications quasi-experimental design is also very strong with respect to internal validity. And, because it allows for two independent implementations of the program, it may enhance external validity or generalizability. The design has two groups and three waves of measurement. In the first phase of the design, both groups are pretests, one is given the program and both are posttested. In the second phase of the design, the original comparison group is given the program while the original program group serves as the “control”. This design is identical in structure to its randomized experimental version, but lacks the random assignment to group. It is certainly superior to the simple pre-post nonequivalent groups design. In addition, because it assures that all participants eventually get the program, it is probably one of the most ethically feasible quasi-experiments.

The Nonequivalent Dependent Variables (NEDV) Design

The Nonequivalent Dependent Variables (NEDV) Design is a deceptive one. In its simple form, it is an extremely weak design with respect to internal validity. But in its pattern matching variations, it opens the door to an entirely different approach to causal assessment that is extremely powerful. The design notation shown here is for the simple two-variable case. Notice that this design has only a single group of participants! The two lines in the notation indicate separate variables, not separate groups.

The idea in this design is that you have a program designed to change a specific outcome. For instance, let’s assume you are doing training in algebra for first-year high-school students. Your training program is designed to affect algebra scores. But it is not designed to affect geometry scores. And, pre-post geometry performance might be reasonably expected to be affected by other internally validity factors like history or maturation. In this case, the pre-post geometry performance acts like a control group – it models what would likely have happened to the algebra pre-post scores if the program hadn’t been given. The key is that the “control” variable has to be similar enough to the target variable to be affected in the same way by history, maturation, and the other single group internal validity threats, but not so similar that it is affected by the program. The figure shows the results we might get for our two-variable algebra-geometry example. Note that this design only works if the geometry variable is a reasonable proxy for what would have happened on the algebra scores in the absence of the program. The real allure of this design is the possibility that we don’t need a control group – we can give the program to all of our sample! The problem is that in its two-variable simple version, the assumption of the control variable is a difficult one to meet. (Note that a double-pretest version of this design would be considerably stronger).

The Pattern Matching NEDV Design. Although the two-variable NEDV design is quite weak, we can make it considerably stronger by adding multiple outcome variables. In this variation, we need many outcome variables and a theory that tells how affected (from most to least) each variable will be by the program. Let’s reconsider the example of our algebra program above. Now, instead of having only an algebra and geometry score, we have ten measures that we collect pre and post. We expect that the algebra measure would be most affected by the program (because that’s what the program was most designed to affect). But here, we recognize that geometry might also be affected because training in algebra might be relevant, at least tangentially, to geometry skills. On the other hand, we might theorize that creativity would be much less affected, even indirectly, by training in algebra and so our creativity measure is predicted to be least affected of the ten measures.

Now, let’s line up our theoretical expectations against our pre-post gains for each variable. The graph we’ll use is called a “ladder graph” because if there is a correspondence between expectations and observed results we’ll get horizontal lines and a figure that looks a bit like a ladder. You can see in the figure that the expected order of outcomes (on the left) are mirrored well in the actual outcomes (on the right).

Depending on the circumstances, the Pattern Matching NEDV design can be quite strong with respect to internal validity. In general, the design is stronger if you have a larger set of variables and you find that your expectation pattern matches well with the observed results. What are the threats to internal validity in this design? Only a factor (e.g., an historical event or maturational pattern) that would yield the same outcome pattern can act as an alternative explanation. And, the more complex the predicted pattern, the less likely it is that some other factor would yield it. The problem is, the more complex the predicted pattern, the less likely it is that you will find it matches to your observed data as well.

The Pattern Matching NEDV design is especially attractive for several reasons. It requires that the researcher specify expectations prior to institution of the program. Doing so can be a sobering experience. Often we make naive assumptions about how our programs or interventions will work. When we’re forced to look at them in detail, we begin to see that our assumptions may be unrealistic. The design also requires a detailed measurement net – a large set of outcome variables and a detailed sense of how they are related to each other. Developing this level of detail about your measurement constructs is liable to improve the construct validity of your study. Increasingly, we have methodologies that can help researchers empirically develop construct networks that describe the expected interrelationships among outcome variables (see Concept Mapping for more information about how to do this). Finally, the Pattern Matching NEDV is especially intriguing because it suggests that it is possible to assess the effects of programs even if you only have a treated group. Assuming the other conditions for the design are met, control groups are not necessarily needed for causal assessment. Of course, you can also couple the Pattern Matching NEDV design with standard experimental or quasi-experimental control group designs for even more enhanced validity. And, if your experimental or quasi-experimental design already has many outcome measures as part of the measurement protocol, the design might be considerably enriched by generating variable-level expectations about program outcomes and testing the match statistically.

One of my favorite questions to my statistician friends goes to the heart of the potential of the Pattern Matching NEDV design. “Suppose,” I ask them, “that you have ten outcome variables in a study and that you find that all ten show no statistically significant treatment effects when tested individually (or even when tested as a multivariate set). And suppose, like the desperate graduate student who finds in their initial analysis that nothing is significant that you decide to look at the direction of the effects across the ten variables. You line up the variables in terms of which should be most to least affected by your program. And, miracle of miracles, you find that there is a strong and statistically significant correlation between the expected and observed order of effects even though no individual effect was statistically significant. Is this finding interpretable as a treatment effect?” My answer is “yes.” I think the graduate student’s desperation-driven intuition to look at order of effects is a sensible one. I would conclude that the reason you did not find statistical effects on the individual variables is that you didn’t have sufficient statistical power. Of course, the results will only be interpretable as a treatment effect if you can rule out any other plausible factor that could have caused the ordering of outcomes. But the more detailed the predicted pattern and the stronger the correlation to observed results, the more likely the treatment effect becomes the most plausible explanation. In such cases, the expected pattern of results is like a unique fingerprint – and the observed pattern that matches it can only be due to that unique source pattern.

I believe that the pattern matching notion implicit in the NEDV design opens the way to an entirely different approach to causal assessment, one that is closely linked to detailed prior explication of the program and to detailed mapping of constructs. It suggests a much richer model for causal assessment than one that relies only on a simplistic dichotomous treatment-control model. In fact, I’m so convinced of the importance of this idea that I’ve staked a major part of my career on developing pattern matching models for conducting research!

The Regression Point Displacement (RPD) Design

The Regression Point Displacement (RPD) design is a simple quasi-experimental strategy that has important implications, especially for community-based research. The problem with community-level interventions is that it is difficult to do causal assessment, to determine if your program made a difference as opposed to other potential factors. Typically, in community-level interventions, program costs preclude our implementing the program in more than one community. We look at pre-post indicators for the program community and see whether there is a change. If we’re relatively enlightened, we seek out another similar community and use it as a comparison. But, because the intervention is at the community level, we only have a single “unit” of measurement for our program and comparison groups.

The RPD design attempts to enhance the single program unit situation by comparing the performance on that single unit with the performance of a large set of comparison units. In community research, we would compare the pre-post results for the intervention community with a large set of other communities. The advantage of doing this is that we don’t rely on a single nonequivalent community, we attempt to use results from a heterogeneous set of nonequivalent communities to model the comparison condition, and then compare our single site to this model. For typical community-based research, such an approach may greatly enhance our ability to make causal inferences.

I’ll illustrate the RPD design with an example of a community-based AIDS education program. We decide to pilot our new AIDS education program in one particular community in a state, perhaps a county. The state routinely publishes annual HIV positive rates by county for the entire state. So, we use the remaining counties in the state as control counties. But instead of averaging all of the control counties to obtain a single control score, we use them as separate units in the analysis. The first figure shows the bivariate pre-post distribution of HIV positive rates per 1000 people for all the counties in the state. The program county – the one that gets the AIDS education program – is shown as an X and the remaining control counties are shown as Os. We compute a regression line for the control cases (shown in blue on the figure). The regression line models our predicted outcome for a count with any specific pretest rate. To estimate the effect of the program we test whether the displacement of the program county from the control county regression line is statistically significant.

The second figure shows why the RPD design was given its name. In this design, we know we have a treatment effect when there is a significant displacement of the program point from the control group regression line.

The RPD design is especially applicable in situations where a treatment or program is applied in a single geographical unit (e.g., a state, county, city, hospital, hospital unit) instead of an individual, where there are lots of other units available as control cases, and where there is routine measurement (e.g., monthly, annually) of relevant outcome variables.

The analysis of the RPD design turns out to be a variation of the Analysis of Covariance model (see the Statistical Analysis of the Regression Point Displacement Design). I had the opportunity to be the co-developer with Donald T. Campbell of the RPD design. You can view the entire original paper entitled “The Regression Point Displacement Design for Evaluating Community-Based Pilot Programs and Demonstration Projects”.