Study reveals flaws in popular genetic method

Lund University
Tue, 30 Aug 2022 20:14 UTC

The most common analytical method within population genetics is deeply flawed, according to a new study from Lund University in Sweden. This may have led to incorrect results and misconceptions about ethnicity and genetic relationships. The method has been used in hundreds of thousands of studies, affecting results within medical genetics and even commercial ancestry tests. The study is published in Scientific Reports.

The rate at which scientific data can be collected is rising exponentially, leading to massive and highly complex datasets, dubbed the "Big Data revolution." To make these data more manageable, researchers use statistical methods that aim to compact and simplify the data while still retaining most of the key information. Perhaps the most widely used method is called PCA (principal component analysis). By analogy, think of PCA as an oven with flour, sugar and eggs as the data input. The oven may always do the same thing, but the outcome, a cake, critically depends on the ingredients' ratios and how they are combined.

"It is expected that this method will give correct results because it is so frequently used. But it is neither a guarantee of reliability nor produces statistically robust conclusions," says Dr. Eran Elhaik, Associate Professor in molecular cell biology at Lund University.

According to Elhaik, the method helped create old perceptions about race and ethnicity. It plays a role in manufacturing historical tales of who and where people come from, not only by the scientific community but also by commercial ancestry companies. A famous example is when a prominent American politician took an ancestry test before the 2020 presidential campaign to support their ancestral claims. Another example is the misconception of Ashkenazic Jews as a race or an isolated group driven by PCA results.

"This study demonstrates that those results were unreliable," says Eran Elhaik.

PCA is used across many scientific fields, but Elhaik's study focuses on its usage in population genetics, where the explosion in dataset sizes is particularly acute, which is driven by the reduced costs of DNA sequencing.

The field of paleogenomics, where we want to learn about ancient peoples and individuals such as Copper age Europeans, heavily relies on PCA. PCA is used to create a genetic map that positions the unknown sample alongside known reference samples. Thus far, the unknown samples have been assumed to be related to whichever reference population they overlap or lie closest to on the map.

However, Elhaik discovered that the unknown sample could be made to lie close to virtually any reference population just by changing the numbers and types of the reference samples (see illustration), generating practically endless historical versions, all mathematically "correct," but only one may be biologically correct.

In the study, Elhaik has examined the twelve most common population genetic applications of PCA. He has used both simulated and real genetic data to show just how flexible PCA results can be. According to Elhaik, this flexibility means that conclusions based on PCA cannot be trusted since any change to the reference or test samples will produce different results.

Between 32,000 and 216,000 scientific articles in genetics alone have employed PCA for exploring and visualizing similarities and differences between individuals and populations and based their conclusions on these results.

"I believe these results must be re-evaluated," says Elhaik.

He hopes that the new study will develop a better approach to questioning results and thus help to make science more reliable. He spent a significant portion of the past decade pioneering such methods, like the Geographic Population Structure (GPS) for predicting biogeography from DNA and the Pairwise Matcher to improve case-control matches used in genetic tests and drug trials.

"Techniques that offer such flexibility encourage bad science and are particularly dangerous in a world where there is intense pressure to publish. If a researcher runs PCA several times, the temptation will always be to select the output that makes the best story", adds Prof. William Amos, from the University of Cambridge, who was not involved in the study.

Reference:

Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated

Reader Comments

codis · 2022-09-02T09:32:49Z

The most common analytical method within population genetics is deeply flawed, according to a new study from Lund University in Sweden. This may have led to incorrect results and misconceptions about ethnicity and genetic relationships. The method has been used in hundreds of thousands of studies, affecting results within medical genetics and even commercial ancestry tests.

Another example is the misconception of Ashkenazic Jews as a race or an isolated group driven by PCA results.

Am I supposed to understand this as the main motivation for a(nother) scientific fraud in service of an agenda and a group ?!?

JTF Truth ·

codis LOL

· 2022-09-02T14:58:27Z

Or has been shown so many times in the past, there will be another big distraction. Consider the flu and pneumonia-disappeared with k0v1d which...

Gus

News should be informative not obvious. Shlock disguised as news. Its 42 C in the Okanagan, my fuse is short.

3.1415926535

I was asking Codie ( ex army) but super glad for your input which is accurate and nuanced. Get out of town if you can. For others bugging in seems...

Giant Meteor 24

Some of us were doing this back in the 70's and 80's...wonderful healthy lifestyle...Best to keep it low key...no need to advertise your success...

keith9876

This is good news but how does this.... On a couple of acres, a farmer might raise 1,200 pastured chickens, selling them as whole birds or cut-up...

Bsotted

Science & Technology

Study reveals flaws in popular genetic method

Reader Comments

Latest News

Picture of the Day

Quote of the Day

Recent Comments

Quantum Quirk