The study is beset by serious methodological shortcomings, missing data issues, and statistical reporting errors and omissions. The conclusion that individuals with ADHD have smaller brains is contradicted by the "effect- size" calculations that show individual brain volumes in the ADHD and control cohorts largely overlapped. The authors also failed to discuss the fact that the ADHD cohort had higher IQ scores.
Despite such scientific missteps, the study made headlines in many countries around the world. Yahoo News suggested that the study was "proving the reality" of ADHD. Lancet Psychiatry should immediately retract the study and new media headlines must be aired to inform clinicians and parents of the true results from this study, including the IQ data.
The Study's Claims and Headlines
In the study, Martine Hoogman and her 81 co-authors conducted a secondary data analysis of MRI scans used to measure brain volumes in 1713 patients diagnosed with ADHD and 1529 individuals who did not have this diagnosis. This data was gathered from 23 sites around the world. The authors wrote that their study, "using the largest dataset to date," documented that "the volumes of the accumbens, amygdala, caudate, hippocampus, putamen, and intracranial volume were smaller in individuals with ADHD compared with controls" (p.1).
There are many similar statements in the paper suggesting that this study provides evidence that smaller brain volumes are specific to individuals with an ADHD diagnosis. In their analysis, the authors also stated that they had investigated the ADHD cohort's exposure to stimulant medication and determined that the drugs played no role as a possible cause of the smaller volumes. "We . . . refute medication effects on brain volume suggested by earlier meta-analyses," they wrote (p. 1).
This was a large international study, funded by the National Institutes of Health. Their results, the authors concluded, contained important messages for clinicians: "The data from our highly powered analysis confirm that patients with ADHD do have altered brains and therefore that ADHD is a disorder of the brain. This message is clear for clinicians to convey to parents and patients, which can help to reduce the stigma that ADHD is just a label for difficult children and caused by incompetent parenting. We hope this work will contribute to a better understanding of ADHD in the general public" (p. 7).
The press releases sent to the media reflected the conclusions in the paper, and the headlines reported by the media, in turn, accurately summed up the press releases. Here is a sampling of headlines:
Given the implications of this study's claims, it deserves to be closely analyzed. Does the study support the conclusion that children and adults with ADHD have "altered brains," as evidenced by smaller volumes in different regions of the brain? And did the authors present data that convincingly "refutes" earlier studies suggesting that medication exposure may be a cause of smaller brain volumes?
To begin this review, we'll start with a surprising finding tucked away in an unusual place—the study's appendix. We can then imagine what media headlines might have looked like if the authors had focused on this data.
Alternative Headline: Large Study Finds Children with ADHD Have Higher IQs!
To discover this finding, you need to spend $31.50 to purchase the article, and then make a special request to Lancet Psychiatry to send you the appendix. Then you will discover, on pages 7 to 9 in the appendix, a "Table 2" that provides IQ scores for both the ADHD cohort and the controls.
Although there were 23 clinical sites in the study, only 20 reported comparative IQ data. In 16 of the 20, the ADHD cohort had higher IQs on average than the control group. In the other four clinics, the ADHD and control groups had the same average IQ (with the mean IQ scores for both groups within two points of each other.) Thus, at all 20 sites, the ADHD group had a mean IQ score that was equal to, or higher than, the mean IQ score for the control group.
Now the usual assumption is that ADHD children, suffering from a "brain disorder," are less able to concentrate and focus in school, and thus are cognitively impaired in some way. The authors of this study told of findings that show ADHD is a disorder of the brain. But if the mean IQ score of the ADHD cohort is higher than the mean score for the controls, doesn't this basic assumption need to be reassessed? If the participants with ADHD have smaller brains that are riddled with "altered structures," then how come they are just as smart as, or even smarter than, the participants in the control group?
The authors, however, chose to bury the IQ data in an appendix, which isn't easily obtained. Even after you purchase the paper, you have to make a special request to obtain the appendix. Why? And why didn't the authors discuss the IQ data in their paper, or utilize it in their analyses? When a scientific investigation leads to a surprising result that basically contradicts the study's main claim, authors are duty-bound—in terms of adhering to the ethical values supposed to govern science—to present those results. But the authors of this study didn't do this, and this is a principal reason that the study needs to be retracted.
Indeed, if the IQ data had been promoted in the study's abstract and to the media, the public would now be having a new discussion: Is it possible that children diagnosed with ADHD are more intelligent than average? Maybe we are drugging millions of bright children because they are more easily prone to boredom and schools aren't providing them with stimulating learning environments.
The authors claim that their study should reduce the stigma of ADHD. If they were truly interested in reducing the stigma associated with ADHD, then reporting that the IQ scores of children so diagnosed were equal to, or higher than, the IQ scores for the controls at all 20 sites would have done just that.
They Did Not Find That Children Diagnosed with ADHD Have Smaller Brain Volumes
While the summary statement in the study and the associated press release tells of robust, definitive findings, leading to media headlines that "Study Finds Brains of ADHD Sufferers Are Smaller," a review of the reported "effect sizes" reveals that they found no such thing.
When the public reads that a study proved that children diagnosed with ADHD have smaller brain volumes, most people will naturally assume this is a characteristic found in all children so diagnosed. The assumption is that the researchers must have established a "normal" volume (which would be the mean brain volume for a control group), and then determined that most, if not all, of those diagnosed with ADHD have smaller brain volumes than the norm.
But that was not the case here.
In this study, the authors pooled together MRI brain-scan data for the 3,242 participants in the study (which had been collected and archived at the 23 sites), and then calculated, for each cohort, mean intracranial volumes and mean volumes of specific brain regions. They reported the differences for each of these comparisons and the "effect size" of the differences. This is the critical aspect of the results to consider and understand: effect sizes reveal the true strength of the findings and how much overlap there is between the individual brain volumes in both groups, and thus establish the likelihood that an individual in the ADHD group has a smaller brain volume than an individual in the control group.
For instance, the authors reported a Cohen's d effect size of .19 for differences in the mean volume of the accumbens in children under 15. According to the authors, "the accumbens, with its prominent role in reward processing, is central to motivational and emotional dysfunction in patients with ADHD" (p. 7). Cohen's d effect sizes range from zero to three, and thus .19 is understood to reflect a small effect. Yet, in this study, for youth under 15, it was the largest effect size of all the brain volume comparisons that were made. (To learn more about what an effect size is, access this article by Robert Coe: It's the effect size, stupid.) Specifically, as to what this effect size of .19 means:
- Approximately 58% of the ADHD youth in this convenience sample had an accumbens volume below the average in the control group, while 42% of the ADHD youth had an accumbens volume above the average in the control group.
- Also, if you knew the accumbens volume of a child picked at random, you would have a 54% chance that you could correctly guess which of the two cohorts—ADHD or healthy control—the child belonged to.
There are ways to visualize the overlap of this data. If you plotted the individual accumbens measurements for all 1,637 children under age 15 in this study, and used a red dot to mark the ADHD participants and a black dot to mark the controls, you would see a mishmash of red and black dots. There would be a slightly higher percentage of red dots located in the lower half of the scale, and a slightly higher percentage of black dots in the upper half, but you could immediately see—from the mixed jumble of dots—that "small brain volume" was not a distinctive characteristic of individuals within the ADHD cohort. The individual brain volumes varied greatly, and that was true for both cohorts, and all the pooled data showed was that there was a slightly greater chance that any individual child diagnosed with ADHD, compared to a child in the control cohort, would have an accumbens measurement that plotted into the lower-volume half of the graph.
Indeed, if you drew a distribution curve plotting the individual accumbens scores for the two groups, the two curves would only be slightly offset. By rounding the .19 effect size up to .2 for illustration purposes, you can see there is a 92% overlap between the two curves.
With a Cohen's d effect size of .1, as was the case for the palladium brain-volume comparisons in children 15 and under, there would be a 96% overlap between the two groups.
The Medication Effect
As noted above, the authors' findings show there were small differences in the mean brain volumes for children with ADHD and the control group. Previous studies had suggested that ADHD medications could reduce brain volumes, and thus Hoogman and collaborators assessed whether the small differences in mean brain volumes might be due to exposure to such psychostimulants.
To do so, they compared the mean brain volumes of two groups in the ADHD cohort: 82 who said they had never used stimulant medication (medication-naïve), and 637 who said they "had used stimulant medication somewhere in their lifetime for a period of more than four weeks" (medication-exposed). The authors reported that there were "no differences in any of the volumes" between the medication-naïve and medication-exposed groups, and thus concluded that their study "refuted" the earlier studies. (p. 5)
But there were notable shortcomings in their performing and reporting of this analysis. Specifically:
- They didn't publish the mean volume data for the two groups. They simply declared that the volumes were the same.
- They didn't report how many of the medication-naïve and medication-exposed patients were children and how many were adults. Given that it was mainly in children under age 15 that there were "statistically significant" differences in mean brain volumes between ADHD and controls, their effort to look at whether medication exposure was a factor in those differences should have isolated medication use in that age group.
- They didn't provide any dosage-related information for the medication-exposed group, or information on how long they took the drugs. If a 30-year-old had taken a stimulant for four weeks as a child, could that really be expected to have a long-term effect on brain volume? And more to the point: were there volume differences between the "ADHD" children who had been on the drugs for several years and the children in the ADHD cohort who had never used them? That is the type of comparison that needed to be made.
- There is a missing group of patients in this comparison. At one point in their paper, the authors stated that they had information about medication use for 1254 of the 1713 participants in the ADHD group. Yet their comparison involved only 719 patients (82 plus 637). Why did they exclude 545 patients (1254 minus 719) from this comparison? [See footnote for a possible explanation for this.]
Individual Site Data Also Belies the Stated Conclusion
The authors reported that the "volumes of the accumbens, amygdala, caudate, hippocampus, putamen, and intracranial volume were smaller in individuals with ADHD compared with controls in the mega-analysis" (p. 1). If this is true, then smaller brain volumes should show up in the data from most, if not all, of the 21 sites that had a control group. But that was not the case.
Here are summaries of individual site results:
- Mean accumbens volumes: At 4 sites, the volume for the ADHD cohort was larger than for the control, and at another 6 sites, the mean volumes were basically of equal size.
- Mean amygdala volumes: At 5 sites, the mean volume for the ADHD cohort was larger than for the controls, and of equal size at 4 others.
- Mean caudate volumes: At 5 sites, the mean volume for the ADHD cohort was larger than for the controls, and of equal size at 2 others
- Mean hippocampus volumes: At 7 sites, the volume for the ADHD cohort was larger than for the controls, and of equal size at 4 others.
- Mean putamen volumes: At 5 sites, the volume for the ADHD cohort was larger than for the controls, and of equal size at 1 other.
- Mean intracranial volumes: At 5 sites, the volume for the ADHD cohort was larger than for the controls.
But once again, this reveals the flawed science—one might say absurd science—present in this "mega-analysis." The authors used pooled data that ignored the conflicting findings at individual sites, and yet these pooled results are assumed to be representative of all the ADHD patients in the study. For example, the authors report that the accumbens region is smaller in ADHD patients, when at 10 of 21 sites, the mean volumes of the ADHD patients were the same as the controls or larger. The ADHD cohorts in those 10 sites don't fit into the "pooled" finding at all, and yet the authors still write that "individuals with ADHD compared with controls" have smaller accumbens.
The Study is Riddled with Scientific Flaws
The diagnosis and assignment to cohort problem
For this study, it is explained and understood that there is one group that has ADHD, and a control group that does not. But given that there is no biological marker that can be used to make this diagnosis, how was this distinction made?
The methods section in the published paper does not provide any information about this critical question. Instead, the authors simply write that "diagnostic procedures for each site are listed in the appendix" (p. 3). So turn again to Table 2 in the appendix, and you find that there was no standardized diagnostic method applied at all sites. Instead, this critical distinction—ADHD versus no ADHD—was made in a haphazard manner.
First, two of the 23 sites didn't even have a control group. So it's hard to understand why the ADHD measurements from these two sites were included in the pooled data.
Second, it appears that none of the participants in the control groups at the remaining 21 sites were given a diagnostic assessment for ADHD. There is no report of any ADHD symptom scores for the controls. The participants labeled "healthy controls"—and thus seen as not having ADHD—were apparently never tested to see if they displayed the behaviors associated with this diagnosis.
Third, the authors didn't test nearly a thousand of the participants in the control cohort to determine if they were "healthy." They listed 867 in the control cohort as unknown related to comorbidity issues such as depression, anxiety, and substance abuse. Without such testing, it would not seem that this "not ADHD" group could be described as "healthy controls."
There is, in fact, very limited information about the controls. Why would these individuals have agreed to participate in this study? Were they recruited via advertisements that promised them payment? Or were they patients at the clinics who were getting an MRI for other medical reasons? In the appendix, the authors did state that 30 controls were diagnosed with depression, 11 with anxiety and 39 with substance use disorders. But with so little comprehensive information provided, it's impossible to know how representative of "healthy controls" this group is.
Fourth, at seven of the 23 sites, there aren't any ADHD symptom scores listed for the ADHD cohort. One can only guess how the diagnosis at those sites was made. Did the authors have records from the participants' doctors? Or did they rely on the participants' own self-diagnosis or self-report that they had ADHD? There is no way to know.
Fifth, even when symptom severity scores were reported, there was no standardization of the "instrument" used to assess symptoms, or the classification system used to make the diagnosis (either DSM IV or ICD 10). In other words, the authors at a clinic in Brazil might have had one standard for diagnosing ADHD, and the authors in China a second standard, and the authors in the UK a third, and so forth.
Yet, despite this lack of diagnostic and methodological rigor, the authors still stated that "the brain differences we have reported are not caused by any comorbid disorders, medication effects or ADHD symptom severity, but are exclusively related to the ADHD diagnosis" (p. 7). This is a puzzling conclusion to make, given that a large percentage of the participants were not tested for comorbid disorders, or for severity of ADHD symptoms, or—in the case of the controls—even for ADHD.
The fact that symptom severity didn't show any relationship to brain volume differences also presented the authors with an obvious conundrum. At 16 sites, they theoretically used symptom severity to assign participants into the ADHD cohort, and if the ADHD cohort had smaller brain volumes than the controls, then symptom severity should seemingly be linked to smaller brain volumes as well. But that was not the case. The authors' explanation for this confounding result is quite revealing: "Not finding effects of symptoms scores might also be due to the heterogeneity of the [differing] instruments used for different cohorts in our study or difference in raters (i.e. clinicians, teachers, and parents)" (p. 8).
In other words, they explained away this confounding result by suggesting that the tests used for assessing ADHD symptoms at the various sites were too different to provide meaningful results. They also suggested that the diagnoses of ADHD were often made by unqualified adults, e.g. parents and teachers, who have no expertise in the use of the DSM or making an ADHD diagnosis (and also lack the legal authority to do so).
Yet, as statisticians well know, there is an easy way to standardize data when the tests or data measures are different (such as a lack of standardization in diagnosing ADHD, as was the case here). In regression analysis, this is called "centering" and easily accomplished by converting the different data into z-scores. But the authors of this study did not standardize the data, even though this lack of standardization may have thrown a wrench into their results.
The MRI reliability problem
There is a gee-whiz sense to MRI scans that leads one to think that brain volume measurements made with this technology must be very precise. The assumption is that this modern technology allows authors to see into the brain and distinguish with great clarity one brain region from another. But that was not the case in this study.
The first concern in a multi-site MRI study is that different MRI machines may be used, with different imaging powers, which most likely was true in this study. The second concern is that the threshold, color, contrast and ordinates a technician chooses to use for an MRI scan may vary greatly from site to site. The machines used to image the brain and measure brain volumes may not be standardized to measure the same thing consistently from place to place.
Typically, in order to account for such site-to-site variations in MRI measurements, the authors must make adjustments that "normalize" the results. In this study, the authors did report that "data for all sites were newly analyzed with harmonized methods." However, their "harmonizing" of the data amounted to simply checking what version of software was being used by the machines, which does not account for differences in threshold, color, contrast and ordinate settings at each site.
A quick comparison of volume findings at different sites reveals how imprecise the measuring methods were, even after this harmonizing effort. For instance, at the ADHD-WUE clinic in Wurzburg, Germany, the mean brain volume for the accumbens region, for the two cohorts together, was 455.6 mm3. Meanwhile, for the same region of the brain at the MGH-ADHD clinic in New York City, the mean was 814.8 mm3 for the two cohorts together. This was so even though the authors had adjusted these results for "age and sex." We either have to assume that the accumbens region in children and adults in New York is 55% larger than the same region for children and adults in Germany, or conclude that the measurements of brain volumes in this study were remarkably imprecise.
But if the MRI scans did not produce consistent measurements across the 23 sites, how can such measurements be deemed reliable and, most important, valid? And with such large differences in volume measurements between sites, how can the authors lay claim to having found meaningful differences in the averages of pooled volumes from the two cohorts, when those mean differences were so small?
Indeed, just as the authors acknowledged the lack of standardization in diagnosis, so too they acknowledged that "acquisition of imaging data . . . differed between sites, a limitation contributing to heterogeneity across samples" (p. 8). Again, these are methodological shortcomings that should lead researchers to refrain from making definitive claims of proof.
And there are still more problems
There are many more scientific problems with this study that could be identified. But in order to keep this critique of reasonable length, here are just a few more.
(1) Errors: In several instances, the statistics do not appear to have been reported correctly. For example, in Table 3, which details the small yet somehow "robust" Cohen's d effect sizes for youth under 15 years of age, the difference in mean accumbens brain volumes for the ADHD cohort and controls is declared significant for diagnosis with p = .0001 and with a Cohen's d effect size of -.19. Yet, the confidence interval (CI) for the effect size goes from -.29 to .10. If a finding is significant, its confidence interval usually doesn't cross zero by going from a negative number to a positive number. This signals an uncertainty of whether the mean volume of the accumbens region is smaller (negative number for effect size), or larger in the ADHD cohort (positive number for effect size). In addition, there are numerous errors within the appendix. Were these errors due to typos, misunderstanding of the results, or, worse, inaccurate results being reported?
(2) Missing data: As we discussed above, there are many instances of missing data—missing ADHD symptom severity scores at some sites, no controls at two sites, no information about comorbidity information for nearly 1000 participants in the control cohort, and so forth. Textbooks regularly warn about how such missing data can lead to a misinterpretation of results and inaccurate findings. Yet it would appear their dataset was not cleaned appropriately for any of the analyses they performed.
(3) Omissions: The authors stated that they performed more than 10 analyses to come to the conclusions they did. But in the published study, they present limited results from less than one-third of the analyses. Without the results for each analysis, one cannot fully check their results for accuracy. Usually, in the peer-review process, such omissions would be identified, and the authors would be directed to provide the data that would enable readers to verify and better understand the stated findings. This did not happen in this study.
(4) Assumptions not met: There are multiple required assumptions that must be met for a researcher to perform specific inferential analyses, such as the regression analysis that the authors of this study claim to have performed. For a regression analysis, there needs to be random sampling procedures, normal distribution of the sample, and verification of the reliability and validity of the measurements in order to insure that the results are not being misinterpreted. It would appear that the authors did not meet any of the required assumptions needed to perform any of the 10-plus analyses. This is very critical point to consider.
For example, reliability represents a measure of how consistently an assessment measures the same thing over and over. Research documents that the less reliable your measurements are, the more likely your study's statistical findings will be inflated. This can lead to what is called Type I error, when your findings appear to be significant but in reality they are not. Given the lack of reliability for assessing brain volume and ADHD in this study, the results were most likely inflated and reported inaccurately. And given that an assessment cannot be valid if it is not also reliable, then the authors cannot state with confidence that they indeed measured what they said they measured.
(5) Non-random convenience sample: As mentioned above, in an analysis that seeks to make generalizable claims of "fact" like this study does, the participants should come from "random sampling" of a larger population. For example, if you put the names of 250 patients with the same diagnosis into a hat, and drew 25 names for the study, then you could say you had a representative sample of the larger population being studied. This "sampling" allows researchers to feel more comfortable in generalizing their findings to the larger population.
But there was no random sampling in this study. Instead, the datasets that were pooled together could best be described as a collection of "convenience" samples. A convenience sample basically represents a group of people who were easy to find, as opposed to being representative of the larger group. This dataset consists of MRI scans of individuals who conveniently were clients that had signed off on allowing their assessments to be used in research, or who, for some unknown reason, agreed to participate in the study.
There are other statistical reporting errors and omissions that could be highlighted. But suffice it to say, the scientific shortcomings of this study are many: a hiding of the IQ data; small effect sizes that belie any finding that small brain volume is a defining characteristic of ADHD; a lack of data presented regarding the confound of medication exposure; no consistency of mean-volume findings across sites; no standard method for diagnostic assessment; unreliable measuring tools; no representative sampling of patients; and a remarkable lack of information about the cases and controls.
Lancet Psychiatry: Do the Right Thing
The media gave this study a great deal of attention. They presented this study—which was written by an authors' group that included many who had close ties to pharmaceutical companies that sell ADHD medications—as proving that ADHD is a brain disorder, and that children so diagnosed have smaller brains. But this was not the media's fault. Reporters were basically repeating what Lancet Psychiatry promoted to the media and what the authors wrote in the abstract and summary sections of their published paper. Their data, the authors wrote, confirmed that patients "with ADHD have altered brains; therefore ADHD is a disorder of the brain."
But, as the effect size findings reveal, that is not true. The distribution curves of individual brain volumes in the two cohorts mostly overlapped (and that isn't even taking into account the many scientific problems that provide reason to question the validity of even the small differences in mean volumes that were reported). As such, it is grossly misleading for the authors to present their results as definitive evidence that individual children with ADHD have smaller brains, or suffer from "altered brains."
There is also this haunting question: Why did the authors hide the finding that the ADHD youth had higher IQ scores at 16 of the 20 sites? The hiding of this finding is, in its own way, as egregious as pretending that the pooled mean volume data, with its small effect sizes, showed that individuals diagnosed with ADHD have smaller brains.
The publication of this study, with its bottom-line message that ADHD children have smaller brains, does a great disservice to those children and to their parents, and ultimately to all of society. It essentially tells a lie, wrapped in the gauze of science, about those children. Lancet Psychiatry needs to retract this study, and inform the media that this has been done.
If you agree, please sign our petition at change.org.
- One possible reason for the exclusion of the 545 patients, in the researchers' inquiry into whether the small difference in mean brain volumes was due to the medication, is that the analysis software they used eliminated all data for the participants with missing data related to stimulant use. If so, the missing data here is a sign that the authors did not adequately clean their dataset to account for this problem in the first place. Failure to account for missing data greatly increases the chance of what is known as a Type II error in reporting results. In other words, due to missing data weakening the analysis, they might have assumed the stimulants showed no significant effect on brain volume size when in reality the drugs did.