Psychiatric Genomics Consortium (PGC) Major Depressive Disorder (MDD) genome-wide association study meta-analysis removing individual overlap with UK Biobank.
Many uses of genome-wide summary statistics require that there be no sample overlap between the discovery and testing datasets. UK Biobank is an open health data set which has been included in previous PGC Major Depressive Disorder GWAS (Wray et al 2018, Howard et al 2019). Because UK Biobank (UKBB) is used by many researchers, we have conducted and released GWAS summary stastics where overlap with UKBB has been minimised.
Datasets used are individual level data from the MDD Wave2 cohorts and summary statistics from the additional MDD cohorts (deCODE, GenScot, GERA, iPsych, 23andMe).
The analysis excludes 335 participants from 12 PGC MDD cohorts and 622 participants from the Generation Scotland cohort. It retains two individuals overlapping with UK Biobank from one cohort (shp0
) that we are not able to exclude presently.
Data for this project are held on LISA in the directories listed in the README.mddw2sum
and README.mdd00001
files in your LISA home directory. Preimputation QC and imputation was performed previously using the RICOPILI modules.
Checksums were used to identify potentially identical individuals between UKBB and PGC MDD samples. See Section 2 of GWAS
Phenotypes were prepared by copying case/control status from each PGC MDD cohorts .fam
file and setting the phenotype of individuals overlapping with UKBB to -9
. See Section 3 of GWAS.
rmUKBB
)GWAS was performed using the updated phenotype files using the RICOPILI postimp_navi
command. See Section 4 of GWAS.
Meta analysis was first conducted on the 29 PGC MDD cohorts using the rmUKBB
summary statistics. These meta-analytic results were then meta-analyzed with the additional cohorts (deCODE, GenScot, GERA, iPsych, 23andMe). See Section 5 of GWAS.
Meta-analyzed summary statistics excluding 23andMe will be available for download from the PGC as “PGC MDD No UKB / No 23andMe”. Results including 23andMe will be available by contacting the PGC Data Access Committee
Code listed for this project can be cloned from https://github.com/psychiatric-genomics-consortium/mdd-rmUKBB
readr
, dplyr
, stringr
, tidyr
This project is licensed under the MIT License - see the LICENSE.md file for details