Genomewide Association Study of Fracture Nonunion Using Electronic Health Records

Nonunion is a clinically significant complication of fracture associated with worse outcomes, including increased pain, disability, and higher healthcare costs. The risk for nonunion is likely to be complex and multifactorial, and as such, the biology underlying such risk remains poorly understood. Genetic studies represent one approach to identify implicated biology for further investigation, but to date the lack of large cohorts for study has limited such efforts. We utilized the electronic health records of two large academic medical centers in Boston to identify individuals with fracture nonunion and control individuals with fracture but no evidence of nonunion. We conducted a genomewide association study among 1760 individuals of Northern European ancestry with upper or lower extremity fracture, including 131 with nonunion, to examine whether common variants were associated with nonunion in this cohort. In all, one locus in the Calcyon (CALY) gene exceeded a genomewide threshold for statistical significance (p = 1.95e–8), with eight additional loci associated with p < 5e–7. Previously reported candidate genes were not supported by this analysis. Electronic health records should facilitate identification of common genetic variations associated with adverse orthopedic outcomes. The loci we identified in this small cohort require replication and further study to characterize mechanism of action, but represent a starting point for the investigation of genetic liability for this costly outcome. © 2018 American Society for Bone and Mineral Research.


Introduction
N onunion, or delayed union, following skeletal fracture is a failure of healing to occur over the expected time course. (1) Nonunion is estimated to occur in 5% to 10% of fractures. (2)(3)(4)(5) In the United States, approximately 100,000 fractures result in nonunion each year. (6) Fractures which progress to nonunion are difficult to treat, with many leading to marked disability from pain or functional impairment. (4,(7)(8)(9)(10) In addition to individual patient disability, nonunion is associated with increased healthcare utilization and cost. (11)(12)(13)(14)(15)(16)(17)(18) Although nonunion is relatively common and both clinically and economically consequential, the cause of nonunion is incompletely understood. Injury and treatment each have a demonstrated role in nonunion rate; however, neither wholly explains the observed variation in nonunion rate. (19)(20)(21) Multiple biologically relevant patient-specific risk factors have been identified, including age, sex, weight, tobacco use, and a range of medication exposures. (2,22) Nevertheless, the biology of nonunion risk is poorly understood. (20,23) Despite limited understanding of the multifactorial biology of nonunion, biomarkers and relevant candidate genes have been proposed. (23,24) These biologically informed risk stratification tools are potentially important means of developing and targeting therapeutic intervention strategies aimed at reducing nonunion risk.
Genomewide association studies have provided new insight into a range of diseases and adverse outcomes across medicine because their unbiased nature allows discovery of unanticipated disease biology. However, for relatively rare outcomes, such studies are challenging and costly. To date, although multiple small case-control association studies have examined risk associated with individual candidate genes, no genomewide study has been reported. (25)(26)(27) To address this gap, we utilized data from a large biobank linked to the electronic health records of two academic medical centers. Although these records provide a more limited characterization of individual patients, they make genetic studies of rare outcomes, such as fracture nonunion, feasible at scale for the first time. The aim of this study was to identify genes associated with fracture nonunion and establish precedent for further genomewide association studies of adverse orthopedic outcomes in biobanks.

Subjects and Methods
Overview and data set generation We drew on three waves of participants in the Partners Biobank from the Brigham and Women's Hospital network as well as the Massachusetts General Hospital network, representing the first $15,000 individuals genotyped as part of the Partners HealthCare Biobank initiative. (28) Structured data, including sociodemographic data as well as inpatient and outpatient diagnostic codes (International Classification of Diseases, Ninth Revision [ICD9]), were extracted from the longitudinal electronic health record (EHR) of these two academic medical centers. (28) We included individuals age 18 years or older with biobank data, and at least one of the following ICD9 codes representing nonunion (733.81 or 733.82) (identifying cases), or fracture of upper (810-819) or lower limb (820-829) in the absence of nonunion (identifying controls). For secondary analysis examining the potential confounding effect of cigarette smoking, self-reported lifetime use was obtained from a standard biobank questionnaire.
A datamart containing all clinical data was generated with the Informatics for Integrating Biology and the Bedside (i2b2) server software, version 1.6 (i2b2, Boston, MA, USA; https://www.i2b2. org/index.html), a computational framework for managing human health data. (29)(30)(31) The Partners Institutional Review Board approved the biobank study protocol, and deidentified data were accessed using a Data Use Agreement with Partners HealthCare.

Study design and analysis
We utilized a standard case-control association design, contrasting individuals with nonunion (cases) with individuals with at least one fracture but no nonunion (controls), adjusting for ancestry as described in the "Genotyping and quality control" section below. Sensitivity analysis examined smoking status as a potential confounder. All individuals age 18 years and older with DNA available in the Partners HealthCare biobank were eligible for inclusion.
Genotyping and quality control DNA was extracted from buffy coat, and genotyping was done using three versions of the Illumina Multi-Ethnic Global (MEG) array (MEGA n ¼ 4927, MEGA EX n ¼ 5353, and MEG n ¼ 4784; mappable variants available for each were as follows: 1,411,334; 1,710,339; and 1,747,639; respectively.) These common variant arrays all incorporate content from the 1000 Genomes Project Phase 3 (1000G Phase 3). Single nucleotide polymorphism (SNP) coordinates were remapped based on the TopGenomicSeq provided from Illumina (MEGA_Consortium_v2_15070954_A2. csv); all rsIDs correspond to build 142 of dbSNP. To determine the forward strand of the SNP, we aligned both SNP sequences (alleles A and B) to hg19 using BLAT with default parameters set by the UCSC Genome Browser. (32) Each cohort was cleaned, imputed, and analyzed separately to avoid batch effects. In each batch, we included subjects with genotyping call rates exceeding 99%; no related individuals based on identity by descent (IBD) were included. (33) From these individuals, any genotyped SNP with a call rate of at least 95% and Hardy-Weinberg equilibrium p value <1 Â 10 À6 was included. Imputation used the Michigan Imputation Server implementing Minimac3. (34)(35)(36) Imputation used all population subsets from 1000G Phase 3 v5 as reference panel; haplotype phasing was performed using SHAPEIT. (37) For each batch, we applied principal components analysis (PCA) of a linkage-disequilibrium-pruned set of genotyped SNPs to characterize population structure, based on EIGENSTRAT as implemented in PLINK v1.9. We then plotted these components with superimposition of HapMap samples to confirm location of Northern European individuals, and only included those individuals to minimize risk for confounding by ancestry (ie, population stratification). (38)(39)(40) Analysis We examined single-locus associations in each cohort, then combined these results in inverse-variance-weighted fixedeffects meta-analysis. In all analyses, only biallelic SNPs with minor allele frequencies of at least 1% were retained. Tests for association used logistic regression assuming an additive allelic effect, with adjustment for the first 10 principal components of ancestry a priori. (In prior studies using the biobank, analyses incorporating five or 20 components did not yield meaningfully different results.) Association results are presented in terms of independent loci after pruning using the clump command in PLINK 1.9, with a 250-kilobase (kb) window and r 2 ¼ 0.2. (40) Locus plots were generated using locuszoom. (41)

Results
In total, we examined 1760 individuals of Northern European ancestry across the three batches (wave 1: 58 cases and 587 controls; wave 2: 29 cases and 514 controls; wave 3: 44 cases and 528 controls), with meta-analysis of 893,900 SNPs with minor allele frequency (MAF) of 1% or greater. Table 1  One locus reached the threshold for genomewide significance with fracture nonunion and is illustrated (Fig. 1A, B; Table 2). The one associated locus was on chr10, spanning $116 kb and nine genes (Fig. 2). The most strongly-associated SNP was intronic and lies in Calcyon (CALY). A total of eight additional loci were associated with p < 5e-7; among these is the locus coding for tachykinin receptor-1 (TACR1), implicated in neurotransmission of pain. Full meta-analytic results for p < 5e-6, including tests for heterogeneity and cohort-specific effects, are included (Supporting Table 1). Among previously reported candidate genes, we identified no associations with p < 1e-3 (Supporting Table 2). As a common variant in the gene coding for CALY, rs2298122 (chr10:135141822), was associated with adolescent smoking initiation in one prior study, and smoking is a risk factor for nonunion, it is possible the observed genomic association was confounded; thus, we performed a secondary analysis controlling for smoking status. (42) In regression models controlling for patient self-reported smoking, the association between CALY and nonunion persisted (p ¼ 7.978e-07). Similarly, examining proxy SNPs for rs2298122 with r 2 > 0.8 in models without smoking identified no SNPs associated with nonunion, again supporting an absence of confounding.

Discussion
In this analysis of 1760 individuals drawn from a biobank spanning two academic medical centers, we identified a novel locus associated with fracture nonunion, along with multiple other regions deserving follow-up. The most strongly-associated SNP is located in Calcyon (CALY), a transmembrane protein not previously associated with fracture or bone disease, and most strongly expressed in the brain and pituitary gland. (43) Specifically, Calcyon interacts with dopamine D1 receptors at the synapse. (44) However, given the modest number of associated SNPs at this locus, characterization of additional cohorts will be required to better localize the association signal. In one prior study, a common variant in the CALY coding gene was associated with adolescent smoking initiation, although subsequent studies have not confirmed such association. (42,45) Our sensitivity analysis indicates this association does not confound our results.
Among the loci associated with nonunion (p < 5e-7), one notable region spans the tachykinin receptor-1 (TACR1) gene, also referred to as the neurokinin or substance P receptor. (46) This gene product has been shown to play a central role in pain response. (47) Substance P activation and an overabundance of profibrotic sensory nerve fibers have been found to be associated with fibrous tissue formation and specifically arthrofibrosis of the knee. (48,49) If the association with nonunion Fracture location headings include more specific subheadings as well as general codes. is confirmed, it could suggest the importance of pain as a feedback mechanism in fracture healing; ie, individuals with change in pain sensitivity may, for example, engage in behavior that disrupts healing. Alternatively, substance P has been shown to be an important contributor to the bone formation process in fracture healing, with activation of osteoclastic activity playing a central role, making its potential implication in fibrous union development complex. (50,51) Our work extends three previous case-control studies examining nonunion in candidate genes. In one of the first genetic studies of nonunion, 62 individuals with atrophic nonunion were contrasted with 47 individuals with normal healing at a set of common variants in bone morphogenetic protein (BMP) pathway genes (BMP-2, BMP-7, NOGGIN, and SMAD6). (26) Another investigation characterized 33 individuals with nonunion and 29 without, at 144 SNPs across 30 genes. (52) A third study examined 101 individuals with uneventful healing and 66 with nonunion, across five genes (AM5C, BMP4, FGF3, FGF10, and FGFR1). (27) In the largest study to date, one SNP in a single gene, CYR61, was examined among 250 fracture nonunion cases and 250 healthy controls. (25) Our study does not provide further support for any of these genes, consistent with the known challenges in candidate gene studies across other areas of medicine.
Several limitations in interpreting our results should be considered. First, although the use of coded data allows for efficient ascertainment of cases, it may decrease the precision with which we can identify or exclude nonunion. Similarly, the paucity of clinical data available to us precludes consideration of covariates beyond ancestry. In both cases, these limitations should bias us toward the null; ie, they would diminish our power to detect association rather than diminishing confidence in the associations we do identify. More broadly, enlarging these cohorts or investigating independent ones will likely allow identification of additional loci. Rather than holding out a third cohort for replication, we elected to meta-analyze all data available to us, as is generally advised for such studies, and present our results in the hopes they will spur others to examine these phenotypes in other cohorts. We emphasize the importance of replication in other cohorts as they become available, particularly for loci that do not reach a traditional genomewide threshold for association. We also note that genomewide studies can only identify association, and establishing the mechanism of action will require follow-up investigation using other methods. Despite these challenges, our results indicate the utility of coded data in biobanks for genomic discovery in fracture nonunion cases. Such orthopedic phenotypes for which understanding of biology has been elusive may benefit from the unbiased screens afforded by genomewide studies.

Disclosures
RHP reports no relevant conflicts and notes grants from the National Human Genome Research Institute, National Institute of Mental Health, and Telefonica Alpha; serves on the scientific advisory board for Perfect Health, Genomind, and Psy Therapeutics; and consults for RID Ventures. THM reports no relevant conflicts and notes grants from The Stanley Center at The Broad Institute, Brain and Behavior Research Foundation, and Telefonica Alpha. ATF reports no relevant conflicts and discloses: speakers bureau and consultant for NuVasive, Smith & Nephew, Synthes. KAR reports no relevant conflicts and discloses: consultant for Onkos. KLH and AMP report no relevant conflicts nor disclosures. None of the work for this manuscript was compensated or supported by industry and no funder was involved in authorship or the choice to submit.