• Communicate often and better: How to make education research more meaningful

    by Jay Lynch, PhD and Nathan Martin, Pearson

    blog image alt text

    Question: What do we learn from a study that shows a technique or technology likely has affected an educational outcome?

    Answer: Not nearly enough.

    Despite widespread criticism, the field of education research continues to emphasize statistical significance—rejecting the conclusion that chance is a plausible explanation for an observed effect—while largely neglecting questions of precision and practical importance. Sure, a study may show that an intervention likely has an effect on learning, but so what? Even researchers’ recent efforts to estimate the size of an effect don’t answer key questions. What is the real-world impact on learners? How precisely is the effect estimated? Is the effect credible and reliable?

    Yet it’s the practical significance of research findings that educators, administrators, parents and students really care about when it comes to evaluating educational interventions. This has led to what Russ Whitehurst has called a “mismatch between what education decision makers want from the education research and what the education research community is providing.”

    Unfortunately, education researchers are not expected to interpret the practical significance of their findings or acknowledge the often embarrassingly large degree of uncertainty associated with their observations. So, education research literature is filled with results that are almost always statistically significant but rarely informative.

    Early evidence suggests that many edtech companies are following the same path. But we believe that they have the opportunity to change course and adopt more meaningful ways of interpreting and communicating research that will provide education decision makers with the information they need to help learners succeed.

    Admitting What You Don’t Know

    For educational research to be more meaningful, researchers will have to acknowledge its limits. Although published research often projects a sense of objectivity and certainty about study findings, accepting subjectivity and uncertainty is a critical element of the scientific process.

    On the positive side, some researchers have begun to report what is known as standardized effect sizes, a calculation that helps compare outcomes in different groups on a common scale. But researchers rarely interpret the meaning of these figures. And the figures can be confusing. A ‘large’ effect actually may be quite small when compared to available alternatives or when factoring in the length of treatment, and a ‘small’ effect may be highly impactful because it is simple to implement or cumulative in nature.

    Confused? Imagine the plight of a teacher trying to decide what products to use, based on evidence—an issue of increased importance since the Every Student Succeeds Act (ESSA) promotes the use of federal funds for certain programs, based upon evidence of effectiveness. The newly-launched Evidence for ESSA admirably tries to help support that process, complementing the What Works Clearinghouse and pointing to programs that have been deemed “effective.” But when that teacher starts comparing products, say Math in Focus (effect size: +0.18) and Pirate Math (effect size: +0.37), the best choice isn’t readily apparent.

    It’s also important to note that every intervention’s observed “effect” is associated with a quantifiable degree of uncertainty. By glossing over this fact, researchers risk promoting a false sense of precision and making it harder to craft useful data-driven solutions. While acknowledging uncertainty is likely to temper excitement about many research findings, in the end it will support more honest evaluations of an intervention’s likely effectiveness.

    Communicate Better, Not Just More

    In addition to faithfully describing the practical significance and uncertainty around a finding, there also is a need to clearly communicate information regarding research quality, in ways that are accessible to non-specialists. There has been a notable unwillingness in the broader educational research community to tackle the challenge of discriminating between high quality research and quackery for educators and other non-specialists. As such, there is a long overdue need for educational researchers to be forthcoming about the quality and reliability of interventions in ways that educational practitioners can understand and trust.

    Trust is the key. Whatever issues might surround the reporting of research results, educators are suspicious of people who have never been in the classroom. If a result or debunked academic fad (e.g. learning styles) doesn’t match their experience, they will be tempted to dismiss it. As education research becomes more rigorous, relevant, and understandable, we hope that trust will grow. Even simply categorizing research as either “replicated” or “unchallenged” would be a powerful initial filtering technique given the paucity of replication research in education. The alternative is to leave educators and policy-makers intellectually adrift, susceptible to whatever educational fad is popular at the moment.

    At the same time, we have to improve our understanding of how consumers of education research understand research claims. For instance, surveys reveal that even academic researchers commonly misinterpret the meaning of common concepts like statistical significance and confidence intervals. As a result, there is a pressing need to understand how those involved in education interpret (rightly or wrongly) common statistical ideas and decipher research claims.

    A Blueprint For Change

    So, how can the education technology community help address these issues?

    Despite the money and time spent conducting efficacy studies on their products, surveys reveal that research often plays a minor role in edtech consumer purchasing decisions. The opaqueness and perceived irrelevance of edtech research studies, which mirror the reporting conventions typically found in academia, no doubt contribute to this unfortunate fact. Educators and administrators rarely possess the research and statistical literacy to interpret the meaning and implications of research focused on claims of statistical significance and measuring indirect proxies for learning. This might help explain why even well-meaning educators fall victim to “learning myths.”

    And when nearly every edtech company is amassing troves of research studies, all ostensibly supporting the efficacy of their products (with the quality and reliability of this research varying widely), it is understandable that edtech consumers treat them all with equal incredulity.

    So, if the current edtech emphasis on efficacy is going to amount to more than a passing fad and avoid devolving into a costly marketing scheme, edtech companies might start by taking the following actions:

    • Edtech researchers should interpret the practical significance and uncertainty associated with their study findings. The researchers conducting an experiment are best qualified to answer interpretive questions around the real-world value of study findings and we should expect that they make an effort to do so.
    • As an industry, edtech needs to work toward adopting standardized ways to communicate the quality and strength of evidence as it relates to efficacy research. The What Works Clearinghouse has made important steps, but it is critical that relevant information is brought to the point of decision for educators. This work could resemble something like food labels for edtech products.
    • Researchers should increasingly use data visualizations to make complex findings more intuitive while making additional efforts to understand how non-specialists interpret and understand frequently reported statistical ideas.
    • Finally, researchers should employ direct measures of learning whenever possible rather than relying on misleading proxies (e.g., grades or student perceptions of learning) to ensure that the findings reflect what educators really care about. This also includes using validated assessments and focusing on long-term learning gains rather than short-term performance improvement.

    This series is produced in partnership with Pearson. EdSurge originally published this article on April 1, 2017, and it was re-posted here with permission.

     

    read more
  • Can Edtech support - and even save - educational research?

    by Jay Lynch, PhD and Nathan Martin, Pearson

    blog image alt text

    There is a crisis engulfing the social sciences. What was thought to be known about psychology—based on published results and research—is being called into question by new findings and the efforts of individual groups like the Reproducibility Project. What we know is under question and so is how we come to know. Long institutionalized practices of scientific inquiry in the social sciences are being actively questioned, proposals put forth for needed reforms.

    While the fields of academia burn with this discussion, education results have remained largely untouched. But education is not immune to problems endemic in fields like psychology and medicine. In fact, there’s a strong case that the problems emerging in other fields are even worse in educational research. External or internal critical scrutiny has been lacking. A recent review of the top 100 education journals found that only 0.13% of published articles were replication studies. Education waits for its own crusading Brian Nosek to disrupt the canon of findings. Winter is coming.

    This should not be breaking news. Education research has long been criticized for its inability to generate a reliable and impactful evidence base. It has been derided for problematic statistical and methodological practices that hinder knowledge accumulation and encourage the adoption of unproven interventions. For its failure to communicate the uncertainty and relevance associated with research findings, like Value-Added Measures for teachers, in ways that practitioners can understand. And for struggling to impact educational habits (at least in the US) and how we develop, buy, and learn from (see Mike Petrilli’s summation) the best practices and tools.

    Unfortunately, decades of withering criticism have done little to change the methods and incentives of educational research in ways necessary to improve the reliability and usefulness of findings. The research community appears to be in no rush to alter its well-trodden path—even if the path is one of continued irrelevance. Something must change if educational research is to meaningfully impact teaching and learning. Yet history suggests the impetus for this change is unlikely to originate from within academia.

    Can edtech improve the quality and usefulness of educational research? We may be biased (as colleagues at a large and scrutinized edtech company), but we aren’t naïve. We know it might sound farcical to suggest technology companies may play a critical role in improving the quality of education research, given almost weekly revelations about corporations engaging in concerted efforts to distort and shape research results to fit their interests. It’s shocking to read efforts to warp public perception on the effects of sugar on heart disease or the effectiveness of antidepressants. It would be foolish not to view research conducted or paid for by corporations with a healthy degree of skepticism.

    Yet we believe there are signs of promise. The last few years has seen a movement of companies seeking to research and report on the efficacy of educational products. The movement benefited from the leadership of the Office of Education Technology, the Gates FoundationLearning AssemblyDigital Promise and countless others. Our own company has been on this road since 2013. (It’s not been easy!)

    These efforts represent opportunities to foment long-needed improvements in the practice of education research. A chance to redress education research’s most glaring weakness: its historical inability to appreciably impact the everyday activities of learning and teaching.

    Incentives for edtech companies to adopt better research practices already exist and there is early evidence of openness to change. Edtech companies possess a number of crucial advantages when it comes to conducting the types of research education desperately needs, including:

    • access to growing troves of digital learning data;
    • close partnerships with institutions, faculty, and students;
    • the resources necessary to conduct large and representative intervention studies;
    • in-house expertise in the diverse specialties (e.g., computer scientists, statisticians, research methodologists, educational psychologists, UX researchers, instructional designers, ed policy experts, etc.) that must increasingly collaborate to carry out more informative research;
    • a research audience consisting primarily of educators, students, and other non-specialists

    The real worry with edtech companies’ nascent efforts to conduct efficacy research is not that they will fail to conduct research with the same quality and objectivity typical of most educational research, but that they will fall into the same traps that currently plague such efforts. Rather than looking for what would be best for teachers and learners, entrepreneurs may focus on the wrong measures (p-values, for instance) that obfuscate people rather than enlighten them.

    If this growing edtech movement repeats the follies of the current paradigm of educational research, it will fail to seize the moment to adopt reforms that can significantly aid our efforts to understand how best to help people teach and learn. And we will miss an important opportunity to enact systemic changes in research practice across the edtech industry with the hope that academia follows suit.

    Our goal over the next three articles is to hold a mirror up, highlighting several crucial shortcomings of educational research. These institutionalized practices significantly limit its impact and informativeness.

    We argue that edtech is uniquely incentivized and positioned to realize long-needed research improvements through its efficacy efforts.

    Independent education research is a critical part of the learning world, but it needs improvement. It needs a new role model, its own George Washington Carver, a figure willing to test theories in the field, learn from them, and then to communicate them to back to practitioners. In particular, we will be focusing on three key ideas:

    Why ‘What Works’ Doesn’t: Education research needs to move beyond simply evaluating whether or not an effect exists; that is, whether an educational intervention ‘works’. The ubiquitous use of null hypothesis significance testing in educational research is an epistemic dead end. Instead, education researchers need to adopt more creative and flexible methods of data analysis, focus on identifying and explaining important variations hidden under mean scores, and devote themselves to developing robust theories capable of generating testable predictions that are refined and improved over time.

    Desperately Seeking Relevance: Education researchers are rarely expected to interpret the practical significance of their findings or report results in ways that are understandable to non-specialists making decisions based on their work. Although there has been progress in encouraging researchers to report standardized mean differences and correlation coefficients (i.e., effect sizes), this is not enough. In addition, researchers need to clearly communicate the importance of study findings within the context of alternative options and in relation to concrete benchmarks, openly acknowledge uncertainty and variation in their results, and refuse to be content measuring misleading proxies for what really matters.

    Embracing the Milieu: For research to meaningfully impact teaching and learning, it will need to expand beyond an emphasis on controlled intervention studies and prioritize the messy, real-life conditions facing teachers and students. More energy must be devoted to the creative and problem-solving work of translating research into useful and practical tools for practitioners, an intermediary function explicitly focused on inventing, exploring, and implementing research-based solutions that are responsive the needs and constraints of everyday teaching.

    Ultimately education research is about more than just publication. It’s about improving the lives of students and teachers. We don’t claim to have the complete answers but, as we expand these key principles over coming weeks, we want to offer steps edtech companies can take to improve the quality and value of educational research. These are things we’ve learned and things we are still learning.

    This series is produced in partnership with Pearson. EdSurge originally published this article on January 6, 2017, and it was re-posted here with permission.

     

    read more
  • Why 'what works' doesn't: False positives in education research

    by Jay Lynch, PhD and Nathan Martin, Pearson

    blog image alt text

    If edtech is to help improve education research it will need to kick a bad habit—focusing on whether or not an educational intervention ‘works’.

    Answering that question through null hypothesis significance testing (NHST), which explores whether an intervention or product has an effect on the average outcome, undermines the ability to make sustained progress in helping students learn. It provides little useful information and fails miserably as a method for accumulating knowledge about learning and teaching. For the sake of efficiency and learning gains, edtech companies need to understand the limits of this practice and adopt a more progressive research agenda that yields actionable data on which to build useful products.

    How does NHST look in action? A typical research question in education might be whether average test scores differ for students who use a new math game and those who don’t. Applying NHST, a researcher would assess whether a positive—i.e. non-zero—difference in scores is significant enough to conclude that the game has had an impact, or, in other words, that it ‘works’. Left unanswered is why and for whom.

    This approach pervades education research. It is reflected in the U.S. government-supported initiative to aggregate and evaluate educational research, aptly named the What Works Clearinghouse, and frequently serves as a litmus test for publication worthiness in education journals. Yet it has been subjected to scathing criticism almost since its inception, criticism that centers on two issues.

    False Positives And Other Pitfalls

    First, obtaining statistical evidence of an effect is shockingly easy in experimental research. One of the emerging realizations from the current crisis in psychology is that rather than serving as a responsible gatekeeper ensuring the trustworthiness of published findings, reliance on statistical significance has had the opposite effect of creating a literature filled with false positives, overestimated effect sizes, and grossly underpowered research designs.

    Assuming a proposed intervention involves students doing virtually anything more cognitively challenging than passively listening to lecturing-as-usual (the typical straw man control in education research), then a researcher is very likely to find a positive difference as long as the sample size is large enough. Showing that an educational intervention has a positive effect is quite a feeble hurdle to overcome. It isn’t at all shocking, therefore, that in education almost everything seems to work.

    But even if these methodological concerns with NHST were addressed, there is a second serious flaw undermining the NHST framework upon which most experimental educational research rests.

    Null hypothesis significance testing is an epistemic dead end. It obviates the need for researchers to put forward testable models of theories to predict and explain the effects that interventions have. In fact, the only hypothesis evaluated within the framework of NHST is a caricature, a hypothesis the researcher doesn’t believe—which is that an intervention has zero effect. A researcher’s own hypothesis is never directly tested. And yet with almost universal aplomb, education researchers falsely conclude that a rejection of the null hypothesis counts as strong evidence in favor of their preferred theory.

    As a result, NHST encourages and preserves hypotheses so vague, so lacking in predictive power and theoretical content, as to be nearly useless. As researchers in psychology are realizing, even well-regarded theories, ostensibly supported by hundreds of randomized controlled experiments, can start to evaporate under scrutiny because reliance on null hypothesis significance testing means a theory is never really tested at all. As long as educational research continues to rely on testing the null hypothesis of no difference as a universal foil for establishing whether an intervention or product ‘works,’ it will fail to improve our understanding of how to help students learn.

    As analysts Michael Horn and Julia Freeland have noted, this dominant paradigm of educational research is woefully incomplete and must change if we are going make progress in our understanding of how to help students learn:

    “An effective research agenda moves beyond merely identifying correlations of what works on average to articulate and test theories about how and why certain educational interventions work in different circumstances for different students.”

    Yet for academic researchers concerned primarily with producing publishable evidence of interventions that ‘work,’ the vapid nature of NHST has not been recognized as a serious issue. And because the NHST approach to educational research is relatively straightforward and safe to conduct (researchers have an excellent chance of getting the answer they want), a quick perusal of the efficacy pages at leading edtech companies shows that it holds as the dominant paradigm in edtech.

    Are there, however, reasons to think edtech companies might be incentivized to abandon the current NHST paradigm? We think there are.

    What About The Data You’re Not Capturing?

    Consider a product owner at an edtech company. Although evidence that an educational product has a positive effect is great for producing compelling marketing brochures, it provides little information regarding why a product works, how well it works in different circumstances, or really any guidance for how to make it more effective.

    • Are some product features useful and others not? Are some features actually detrimental to learners but masked by more effective elements?
    • Is the product more or less effective for different types of learners or levels of prior expertise?
    • What elements should be added, left alone or removed in future versions of the product?

    Testing whether a product works doesn’t provide answers to these questions. In fact, despite all the time, money, and resources spent conducting experimental research, a company actually learns very little about their product’s efficacy when evaluated using NHST. There is minimal ability to build on research of this sort. So product research becomes a game of efficacy roulette, with the company just hoping that findings show a positive effect each time it spins the NHST wheel. Companies truly committed to innovation and improving the effectiveness of their products should find this a very bitter pill to swallow.

    A Blueprint For Change

    We suggest edtech companies can vastly improve both their own product research as well as our understanding of how to help students learn by modifying their approach to research in several ways.

    • Recognize the limited information NHST can provide. As the primary statistical framework for moving our understanding of learning and teaching forward, it is misapplied because it ultimately tells us nothing that we actually want to know. Furthermore, it contributes to the proliferation of spurious findings in education by encouraging questionable research practices and the reporting of overestimated intervention effects.
    • Instead of relying on NHST, edtech researchers should focus on putting forward theoretically informed predictions and then designing experiments to test them against meaningful alternatives. Rather than rejecting the uninteresting hypothesis of “no-difference,” the primary goal of edtech research should be to improve our understanding of the impact that interventions have, and the best way to do this is to compare models that compete to describe observations that arise from experimentation.
    • Rather than dichotomous judgments about whether an intervention works on average, greater evaluative emphasis should be devoted to exploring the impact of interventions across subsets of students and conditions. No intervention works equally well for every student and it’s the creative and imaginative work of trying to understand why and where an intervention fails or succeeds that is most valuable.

    Returning to our original example, rather than relying on NHST to evaluate a math game, a company will learn more by trying to improve its estimates and measurements of important variables, looking beneath group mean differences to explore why the game worked better or worse for sub-groups of students, and directly testing competing theoretical mechanisms proposed to explain the game’s influence on learner achievement. It is in this way that practical, problem-solving tools will develop and evolve to improve the lives of all learners.

    This series is produced in partnership with Pearson. EdSurge originally published this article on February 12, 2017, and it was re-posted here with permission.

     

    read more