Academic Jobs Logo

Phenome-Wide CNV Analysis in 470,727 UK Biobank Genomes Reveals Disease Links

UK Genomics Milestone: Copy Number Variants and Human Health

Be the first to comment on this article!

You

Please keep comments respectful and on-topic.

A mountain covered in clouds under a blue sky
Photo by Artis Butkevics on Unsplash

Promote Your Research… Share it Worldwide

Have a story or a research paper to share? Become a contributor and publish your work on AcademicJobs.com.

Submit your Research - Make it Global News

UK-Led Genomic Breakthrough Illuminates Copy Number Variants' Hidden Impact

In a groundbreaking study published in Nature, researchers have conducted the largest-ever phenome-wide association study (PheWAS) on copy number variants (CNVs) using data from 470,727 UK Biobank whole-genome sequences. This massive analysis, led by scientists at AstraZeneca's Centre for Genomics Research in Cambridge, reveals how these large-scale genomic deletions and duplications influence a vast array of human traits, proteins, and diseases. CNVs, which involve the gain or loss of DNA segments spanning thousands of base pairs, represent a significant yet understudied source of genetic diversity compared to smaller single nucleotide variants.

The UK Biobank, a flagship UK resource based in Stockport with deep ties to institutions like the Wellcome Sanger Institute and University of Cambridge, provided the foundation for this work. Whole-genome sequencing (WGS) efforts, involving partnerships with deCODE genetics, AstraZeneca, and others, enabled high-confidence CNV calling for over 470,000 unrelated participants across multiple ancestries. This positions UK higher education and research ecosystems at the forefront of global genomics, showcasing collaborative prowess between industry, public funders, and academia.

Understanding Copy Number Variants: Building Blocks of Genetic Diversity

Copy number variants are structural variations where sections of the genome are duplicated or deleted, often spanning tens to hundreds of kilobases. Unlike point mutations, CNVs can affect multiple genes simultaneously, altering gene dosage and regulatory elements. In the study, researchers filtered for high-quality CNVs greater than 10 kb on autosomes, identifying 102,717 unique deletions and 80,147 duplications after rigorous quality control, including Mendelian violation checks.

This scale surpasses previous array-based CNV studies, offering unprecedented resolution from WGS read depth using DRAGEN software. For UK universities training the next generation of geneticists, such methods highlight the evolution from genotyping arrays to sequencing, emphasizing computational biology skills essential for PhD programs at places like Cambridge and Oxford.

Diagram illustrating copy number variants deletions and duplications in human genome

The UK Biobank: A Cornerstone of British Biomedical Research

Housed in the UK and supported by UK Research and Innovation (UKRI), the UK Biobank has sequenced half a million genomes, creating a treasure trove for phenome-wide studies. This CNV analysis leveraged phenotypes including 2,941 plasma proteins (from 49,736 individuals), 13,336 binary clinical outcomes, and 1,911 quantitative traits, enabling comprehensive PheWAS.

Institutions like the Wellcome Sanger Institute, affiliated with the University of Cambridge, played key roles in sequencing and data curation. Such resources not only drive discoveries but also train thousands of UK students in bioinformatics and epidemiology through access programs and internships.

Methods Mastery: From CNV Calling to Multiancestry PheWAS

CNVs were called using DRAGEN v3.7.8, with post-hoc filters reducing errors. PheWAS employed dominant/recessive models for deletions/duplications, gene-level collapsing for burden tests, and integration with protein-truncating variants (PTVs). Multiancestry meta-analysis across European (94.8%), African (1.8%), South Asian (2.1%), and others used a P < 10-8 threshold.

This rigorous pipeline, developed by Cambridge-based teams, exemplifies advanced statistical genetics taught in UK MSc/PhD courses, blending linear mixed models, logistic regression, and proteomics integration.

Proteomic Revelations: Cis and Trans Effects of CNVs

The study validated 98% of rare cis-protein quantitative trait loci (pQTLs) where deletions lowered and duplications raised nearby protein levels, uncovering 142 rare and 175 common CNV-protein links. Trans-pQTLs revealed protein-protein interactions, with hotspots like 16p11.2.

Combining CNVs with PTVs boosted detection, identifying novel candidates like TMPRSS5 for Charcot-Marie-Tooth disease. For UK medical researchers, this underscores proteomics' role in functional genomics, fueling PhD projects at Sanger and Cambridge.

Disease Associations: New Links to Gout, Telomeres, and More

189 CNV-binary phenotype associations emerged, including a rare ZNF451 deletion boosting leukocyte telomere length and a SLC2A9 enhancer deletion cutting gout risk. Hotspots at 17p12 (PMP22 duplication for CMT1A) and 21q22 highlighted oligogenic effects.

892 quantitative trait hits spanned height, BMI, and lipids. These findings, richer in non-European ancestries, inform UK diverse population health studies at universities like UCL and Edinburgh.Read the full Nature study

Multiancestry Perspectives: Broadening Genomic Equity in the UK

With samples from six ancestries, the analysis revealed African-specific sickle-cell links and East Asian α-thalassemia associations, addressing Eurocentric biases. UK universities, via UKBB access, are pivotal in inclusive genomics training, preparing students for global health challenges.

Drug Discovery Horizons: From Protective Deletions to Targets

Duplications increasing disease risk (e.g., NME7 for thrombophilia) suggest inhibition strategies, while protective deletions guide agonists. This resource aids therapeutic prioritization, aligning with UKRI's focus on translational research at hubs like Cambridge Biomedical Campus.

Cambridge's Genomics Ecosystem: Academia-Industry Synergy

AstraZeneca's Cambridge team, collaborating with University of Cambridge's Haematology Department and Sanger, drove this work. Partnerships like the Milner Therapeutics Institute exemplify how UK higher ed fosters innovation, offering students joint supervision PhDs and industry placements.Explore UK Biobank

Researchers at Cambridge analyzing UK Biobank genomic data

Future Outlook: CNVs in Precision Medicine and Training

Authors anticipate this dataset fueling mechanistic studies and biomarkers. For UK higher ed, it signals demand for genomics expertise, with programs at Cambridge, Oxford, and Imperial expanding. As CNVs explain more heritability, expect curriculum shifts toward structural variant analysis.

a large building with a lawn in front of it

Photo by Chris Johnson on Unsplash

Career Pathways in UK Genomics Research

This study highlights opportunities in bioinformatics, statistical genetics, and multiomics at UK universities. From postdocs at Sanger to lectureships in Cambridge, the field offers rewarding paths amid £2bn UKRI investments.

Portrait of Prof. Isabella Crowe

Prof. Isabella CroweView full profile

Contributing Writer

Advancing interdisciplinary research and policy in global higher education.

Discussion

Sort by:

Be the first to comment on this article!

You

Please keep comments respectful and on-topic.

New0 comments

Join the conversation!

Add your comments now!

Have your say

Engagement level

Frequently Asked Questions

🔬What is a phenome-wide association study (PheWAS)?

PheWAS scans genetic variants across thousands of phenotypes to find associations, unlike GWAS focusing on one trait. This CNV PheWAS used UK Biobank data for comprehensive insights.

🧬How many genomes were analyzed in this UK Biobank CNV study?

470,727 unrelated whole-genome sequences from diverse ancestries, enabling robust multiancestry findings on CNV effects.

🧪What are key CNV-protein associations discovered?

Rare deletions reduced cis-protein levels (98%), duplications increased them (91%). Novel trans-pQTLs suggest protein interactions; see Nature paper.

⚕️Which diseases showed new CNV links?

ZNF451 deletion with longer telomeres, SLC2A9 enhancer del reducing gout risk, PMP22 dup for CMT1A. 189 binary trait hits identified.

🏛️Role of UK institutions in this research?

AstraZeneca Cambridge, University of Cambridge Haematology, Wellcome Sanger Institute via UK Biobank sequencing partnerships drove the study.

🌍Why multiancestry analysis matters?

Revealed ancestry-specific links like sickle-cell in African samples, addressing Eurobias in genomics for diverse UK populations.

💊Implications for drug discovery?

Risk-increasing dups suggest inhibitors; protective dels guide agonists. CNVs as biomarkers like TMPRSS5 for neuropathies.

🎓How does this advance UK higher ed research?

Boosts training in bioinformatics at Cambridge/Oxford; highlights industry-academia collabs for PhD/postdoc opportunities.

🔮Future of CNV research post-study?

Dataset as reference for mechanisms, therapeutics. Expect focus on non-coding CNVs and oligogenic disease models.

📊Access the study data?

Publicly available via UK Biobank (apps 24898/68574); supports further UK uni-led investigations.UK Biobank

📈CNV vs SNV: Key differences in impact?

CNVs affect larger regions, larger effect sizes; study shows unique trait contributions vs single nucleotide variants.