Evaluating a program’s “research base” Part 1: importance of a comparison group

October 26th, 2018 | Leigh Mingle

Every program, curriculum, and even lots of books out there say they are research-based, but what does that really mean?  Why do “research-based” programs still fail kids if they are proven effective? The truth of the matter is that the quality of program evaluation research can vary widely and even if high quality evaluation work is done, the findings from that study may not extend to your students, classroom, or school.  Having some basic knowledge about evaluation methodology can help you determine the programs that will work for your kids and avoid wasting money on those that may not. In this three part series, I’ll discuss what factors you’ll want to look for when making a decision.

This week, We’ll talk about the importance of the study design.  Namely was the study, correlational, quasi-experimental, or experimental?  Lots of programs will tout a correlational study as proof that it works, but that is NOT what that study tells us. A correlational study looks at the relationship between two factors, it does not control for outside forces that may be acting on the outcome (although there are some statistical ways to rule out factors, they get a little complicated).  The hallmark of a study using this design is language that says “after a year of use, student scores went up by X points”. This is a red flag that there was not a comparison group and that they weren’t able to control for possible other factors that may influence the result. There are a lot of places that you can read about silly correlations (Check out this Buzzfeed list!), but the take home message is that the old adage “correlation doesn’t imply causation” is true.  You cannot tell if a program works with a correlational study. Always demand a comparison group.

Source: https://xkcd.com/552/

Comparison groups are important because they help us “control” for any factors that may be influencing the relationship.  For instance, you may give students a program and a year later, they’re all doing better, but without a comparison group you won’t know if they would have grown that much with any program or if it was the program you used? Or maybe they all got a lot out of the start of the year seminar and decided to study more? Or maybe they realized college was becoming more expensive and all decided they needed to work harder to get a scholarship?  My point is, there are infinite possible explanations for growth with a correlational study. When you add a comparison group, you get to rule a lot of those explanations out.

There are two major study designs that employ a comparison group: experimental and quasi-experimental.  Experimental studies are the gold standard. In an experimental study, participants are randomly assigned to either treatment or control, ensuring that confounding factors are also randomly assigned.  This is ideal. However, a lot of times there is a practical reason why random assignment won’t work. For instance, if students’ schedules are fixed, you may not be able to randomly assign them to a classroom, but perhaps you can “match” students on as many demographic factors as possible and create two mostly equivalent groups.  In doing so, you are taking away a lot of the common confounding factors that may influence the results. This isn’t the ideal situation, but it is a decent alternative when random assignment just isn’t possible. The key to a quasi-experimental design is that both groups (treatment and control) are pretty equivalent demographically AND they have similar starting points on the outcome measure. So if you are using the state standardized test as your outcome measure, you want the two groups to have an average test score that is pretty similar for the previous year.

This was a lot of information, but the takeaway is that when you are evaluating a program’s research base, you want to ask for data showing both the treatment and comparison groups.  You’ll want to ask what the comparison group was doing instead (nothing? A different program? etc.) and if the groups were initially equivalent. If data meeting these requirements can be provided, the program passes my “comparison group” test.  

In the next three weeks, I’ll present three more “tests” for evaluation research to meet before you should consider buying a program or curriculum declaring themselves “research-based”