Data from: Linkage disequilibrium clustering-based approach for association mapping with tightly linked genome-wide data

dc.contributor.affiliationUniversity of Helsinki - Li, Zitong
dc.contributor.affiliationUniversity of Helsinki - Kemppainen, Petri
dc.contributor.affiliationUniversity of Helsinki - Rastas, Pasi
dc.contributor.affiliationUniversity of Helsinki - Merilä, Juha
dc.contributor.authorLi, Zitong
dc.contributor.authorKemppainen, Petri
dc.contributor.authorRastas, Pasi
dc.contributor.authorMerilä, Juha
dc.date.accessioned2025-03-24T15:11:14Z
dc.date.issued2018-04-12
dc.date.issued2018-04-12
dc.descriptionGenome-wide association studies (GWAS) aim to identify genetic markers strongly associated with quantitative traits by utilizing linkage disequilibrium (LD) between candidate genes and markers. However, because of LD between nearby genetic markers, the standard GWAS approaches typically detect a number of correlated SNPs covering long genomic regions, making corrections for multiple testing overly conservative. Additionally, the high dimensionality of modern GWAS data poses considerable challenges for GWAS procedures such as permutation tests, which are computationally intensive. We propose a cluster-based GWAS approach that first divides the genome into many large non-overlapping windows, and uses linkage disequilibrium network analysis in combination with principal component (PC) analysis as dimensional reduction tools to summarize the SNP data to independent PCs within clusters of loci connected by high LD. We then introduce single- and multi-locus models that can efficiently conduct the association tests on such high dimensional data. The methods can be adapted to different model structures, and used to analyse samples collected from the wild or from bi-parental F2 populations, which are commonly used in ecological genetics mapping studies. We demonstrate the performance of our approaches with two publicly available data sets from a plant (Arabidopsis thaliana) and a fish (Pungitius pungitius), as well as with simulated data.
dc.identifierhttps://doi.org/10.5061/dryad.16g72gk
dc.identifier.urihttps://hydatakatalogi-test-24.it.helsinki.fi/handle/123456789/9037
dc.rightsOpen
dc.rights.licensecc-zero
dc.subjectquantitative trait loci
dc.subjectMulti-locus method
dc.subjectFour way cross
dc.subjectPrincipal component regression
dc.titleData from: Linkage disequilibrium clustering-based approach for association mapping with tightly linked genome-wide data
dc.typedataset
dc.typedataset

Files

Repositories