Genome-wide mapping identifies beta-1,4-N-acetyl-galactosaminyl-transferase as a novel determinant of sclerostin levels and bone mineral density

In bone, sclerostin is mainly osteocyte-derived and plays an important local role in adaptive responses to mechanical loading. Sclerostin is also present at detectable concentrations within the circulation. Our genome wide association study (GWAS) meta-analysis of 10,584 European-descent individuals identified two novel serum sclerostin loci, B4GALNT3 (standard deviation (SD) change in sclerostin per A allele β=0.20, P=4.6x10-49), and GALNT1 (β=0.11 per G allele, P=4.4x10-11), of which the former is a known locus for BMD estimated by heel ultrasound (eBMD). Common variants across the genome explained 16% of the phenotypic variation of serum sclerostin. Mendelian randomization revealed an inverse causal relationship between serum sclerostin and femoral neck BMD and eBMD, and a positive relationship with fracture risk. Colocalization analysis demonstrated common genetic signals within the B4GALNT3 locus for higher sclerostin, lower BMD, and greater B4GALNT3 expression in arterial tissue (Probability>99%). Renal and cortical bone tissue, and osteoblast cultures, were found to express high levels of B4GALNT3, an N-acetylgalactosaminyltransferase which adds a terminal LacdiNAc disaccharide to target glycocoproteins. Together, these findings raise the possibility that sclerostin is a substrate for B4GALNT3, such that its modification leads to higher levels, possibly through greater stability. GALNT1, an enzyme causing mucin-type O-linked glycosylation, may act in a similar capacity. We conclude that genetic variation in glycosylation enzymes represents a novel determinant of BMD and fracture risk, acting via alterations in levels of circulating sclerostin.


Introduction
Sclerostin is a glycoprotein produced by osteocytes, which is thought to play an important role in bone's adaptive response to mechanical loading, acting within the local bone microenvironment to suppress bone formation (1). The sclerostin antibody romosozumab has recently been found to increase bone mineral density (BMD) and reduce fracture risk (2)(3), establishing sclerostin as an important drug target for osteoporosis. Sclerostin is also detectable in the systemic circulation, the functional role of which remains unclear. Serum sclerostin exchanges with the bone micro-environment, and may simply mirror the prevalent conditions within bone. Consistent with this suggestion, serum sclerostin has been found to respond to physiological stimuli such as estrogen in parallel to alterations in local bone expression (4).
Several studies suggest that serum sclerostin exerts other roles outside the skeleton. For example, serum sclerostin has been reported to increase in chronic kidney disease (CKD), and may contribute to the pathogenesis of Chronic Kidney Disease-Mineral and Bone Disorder (CKD-MBD) (5). Serum sclerostin levels are also higher in patients with cardiovascular disease and may predict cardiovascular mortality (6). A relationship with glucose metabolism has been suggested in light by reports of higher sclerostin levels in association with type I and type II diabetes mellitus in adolescents (7) and adults (8) respectively. While these changes may represent 'off target' effects of sclerostin originating from bone, alternatively, sclerostin might act as an endocrine hormone, subject to regulation by as yet unidentified additional factors. Adding to this complexity, several extra skeletal tissues have been found to express sclerostin, including the liver, chondrocytes, kidney and vascular smooth muscle cells (9). Aside from extra skeletal target tissues, perturbations in serum sclerostin could conceivably affect the bone itself through consequent alterations of sclerostin levels within the local bone environment.
In order to identify novel factors affecting bone metabolism and other systems through altered sclerostin levels, we performed a genome wide association study (GWAS) metaanalysis of serum sclerostin. We then applied our results in a Mendelian randomization (MR) framework (10) to explore putative causal relationships between serum sclerostin and bone metabolism (as reflected by BMD).

Genome-wide association study meta-analysis of serum sclerostin
GWAS results were available in 10,584 participants across four cohorts (see Table 1).

Supplementary Figures 1 and 2 show the Manhattan plot and QQ plot of association results
from the fixed-effect meta-analysis of sclerostin in Europeans, respectively (see Supplementary Figure 3 for the QQ plots for each study). There was little evidence of inflation of the association results as the genomic inflation factors λ is 1.051 and the LD score regression intercept is 1.015 (11). Therefore, genomic control correction was not applied to any of the meta-analysis results. LD score regression revealed that all common variants we included in the meta-analysis explained 16 Two loci were identified to be associated with serum sclerostin at a genome-wide level. 69 SNPs in the B4GLANT3 gene region were associated with sclerostin level ( Figure 1A). The top hit rs215226 at this locus (β = 0.205 SD change in serum sclerostin per A allele, SE = 0.014, P = 4.60 x 10 -49 , variance explained = 1.99%) was previously reported to be associated with BMD estimated by heel ultrasound (eBMD) (12). The second locus was in the GALNT1 gene region ( Figure 1B) with the top hit rs7241221 (β = 0.109 SD per G allele, SE = 0.017, P = 4.40 x 10 -11 , variance explained = 0.39%). In addition, one SNP near the TNFRSF11B region encoding OPG, rs1485303, was marginally associated with sclerostin (β = 0.074 SD per G allele, SE = 0.014, P = 7.70 × 10 −8 , variance explained = 0.26%) ( Figure   1C). The latter SNP is in moderate LD with a previously reported BMD associated SNP rs1353171 (r 2 =0.277 in the 1000 Genome Europeans) (13). The association signals of the GWAS meta-analysis can be found in Table 2 Table 2). The latter appeared to reflect a relatively strong association in 4D, suggesting as possible interaction with the presence of CKD. Conditional analyses on the lead SNP in each association locus yielded no additional independent signals reaching genome-wide significance.
Only minor genetic influences on serum sclerostin were observed within the SOST gene (Chromosome 17, 41731099 to 41936156), which produces sclerostin. In total, four SOST SNPs have previously been reported in association with BMD (12) (14), and one with fracture (15). However, proxy SNPs for these SNPs showed little or no association with serum sclerostin (Supplementary Table 3). We used the GWAS results to estimate the genetic correlation between sclerostin and 11 bone phenotypes using LD score regression via LD Hub (16). We observed no strong evidence of genetic correlation between sclerostin SNPs and the bone phenotypes we tested. (Supplementary Table 4).

Mendelian randomization and colocalization analysis of sclerostin and bone phenotypes
Using two sample Mendelian Randomization (MR), we applied our GWAS results to larger datasets in order to examine putative causal relationship between sclerostin and bone phenotypes. The two SNPs robustly associated with sclerostin (rs215226 in B4GALNT3 region and rs7241221 in GALNT1 region) were used as genetic instruments of the exposure.
DXA-derived BMD and eBMD results from the GEFOS consortium (17)  We also undertook bidirectional MR (19) to evaluate the possibility of reverse causality between femoral neck BMD (or lumbar spine BMD) and serum sclerostin level, based on SNPs identified in GEFOS (17). We modelled femoral neck BMD and lumbar spine BMD as our exposure and serum sclerostin level as our outcome. The Steiger filtering analysis (20) suggested that 28 of the 32 femoral neck BMD SNPs and all 27 lumbar spine BMD SNPs exerted their primary effect on BMD as opposed to sclerostin (Supplementary Table 7 Table 8 Table 8).

Functional follow-up
Predicted regulatory elements of the top association signals We identified DNA features and regulatory elements of SNPs in high LD with the top association signals (r 2 >0.8), using Regulomedb (23). Among the 24 tested SNPs, we found three proxy SNPs showing high Regulomedb score (Supplementary Table 9): rs1872426 (r 2 =0.87 with the leading SNP rs1485303 in the TNFRSF11B region) had a score of 1f (eQTL + TF binding / DNase peak); rs215224 and rs4980826 (r 2 =1 and 0.85 with the leading SNP rs215226 respectively in the B4GALNT3 region) had a score of 2b (TF binding + any motif + DNase Footprint + DNase peak) (see Supplementary Figure 6 for rs215224). We also intersected variants with ATAC-seq data (24) (25) generated from the proximal and distal femur (26). In the TNFRSF11B locus, one SNP, rs2073618, overlapped with an ATAC-seq peak, while in the B4GALNT3 locus, four SNPs (rs12318530, rs215223, rs215224 and rs215225) fell in the same ATAC-Seq and Chip-Seq peak (Supplementary Figure 7 and   Supplementary Table 10).

Expression QTLs lookups for sclerostin signals
To evaluate if the sclerostin association signals influence transcription of cis genes, we crossreferenced the sclerostin SNPs with cis-expression data in tissues measured in the GTEx consortium v7 (27), primary osteoblast cell lines (28) and iliac crest bone biopsies (29). The top association signal within the B4GALNT3 region, rs215226, showed a strong positive association with B4GALNT3 gene expression in arterial and ovarian tissue (Supplementary  Table 14). In trans-eQTL analyses, the GALNT1 signal was related to lower SOST mRNA levels in iliac crest bone biopsies, particularly in Affymetrix chip analyses, whereas no association was observed for B4GALNT3 or TNFRSF11B (Supplementary Table   15).

Methylation QTLs lookups for sclerostin signals
To evaluate if our top association signals have the potential to influence DNA methylation, we cross-referenced the sclerostin SNPs with cis-methylation data from blood cells obtained at five different time points (ALSPAC children at Birth, Childhood and Adolescence; ALSPAC mothers during Pregnancy and at Middle age (30)). We found that our top hit rs215226, within B4GALNT3, was consistently associated with cg20907806 and cg26388816 across four time points (Supplementary Table 16). These two CpG sites are located in a CpG island that overlapped with one of the four ATAC-seq peaks within the B4GALNT3 region (Supplementary Figure 7, upper plot). The other potential signal near TNFRSF11B, rs1485303, was associated with cg13268132 and cg17171407 at all five time points.

B4galnt3 mRNA expression pattern
In murine gene expression studies, B4galnt3 mRNA was expressed at highest levels in kidney, with relatively high levels of expression also observed in bone, particularly cortical bone ( Figure 3A). B4galnt3 mRNA was subsequently found to be expressed at relatively high levels in osteoblast cultures derived from neonatal mouse calvariae. The expression in osteoblast cultures was similar after 2 and 4 days of culture in osteogenic media ( Figure 3B).
After 7 days culture there was a suggestive increase in B4galnt3 expression. No expression was detected in in vitro cultured mouse bone marrow macrophages or osteoclasts ( Figure 3B).

DISCUSSION
Having performed a GWAS meta-analysis of serum sclerostin, we identified two genomewide significant loci (B4GALNT3 and GALNT1), and a further locus close to genome wide significance (TNFRSF11B). Together, common HapMap3 variants explained 16% of the variance of serum sclerostin. By using rs215226 in B4GALNT3 and rs7241221 in GALNT1 to examine causal relationships between serum sclerostin and BMD, we found evidence of an inverse relationship between sclerostin and BMD, and a positive relationship with fracture risk, in line with randomized control trial findings that the sclerostin inhibitor romosozumab increases BMD and reduces fracture risk (2)(3). Moreover, association signals with sclerostin at these two loci colocalized with those for eBMD, consistent with the possibility that genetic variation in B4GALNT3 and GALNT1 alters BMD via changes in serum sclerostin. Although our analyses were primarily intended to identify loci associated with serum sclerostin, there is some evidence to suggest that rs215224 was responsible for the underlying genetic signal at the B4GALNT3 locus. This SNP is in perfect LD with our top B4GALNT3 hit, rs215226.
Rs215224 shows strong evidence of alteration of transcriptional activity from RegulomeDB, supported by the finding that this SNP falls within an ATAC-seq site of E15.5 mouse femurs.
That said, though rs215224was also associated with differential methylation at two distinct CPG sites within B4GALNT3, one of which coincided with an ATAC-seq site, these were at distinct locations within the gene.
The B4GALNT3 locus that we observed to be strongly related to serum sclerostin expresses a Golgi enzyme, beta-1,4-N-acetyl-galactosaminyltransferase, which transfers a GalNac residue to a nonreducing terminal GlcNAc-β, producing the disaccharide structure LacdiNAc.
B4GALNT3 has previously been reported to be expressed in multiple human tissues, including arterial, gastrointestinal tract and testis (31). In the present study, in mice, sclerostin from degradation and clearance from the circulation (see Figure 4). This interpretation is analogous to the role of GALNT3, a Golgi enzyme which acts to Oglycosylate another osteocyte-derived protein, FGF23, protecting it from degradation (33).
Given our finding that B4GALNT3 is expressed at high levels in the kidney, a major source of extra-skeletal sclerostin, the kidney may represent the principle source of glycosylated sclerostin. To the extent that activity of renal sclerostin glycosylation influences BMD via changes in circulating sclerostin levels, rather than the latter simply reflecting 'spill over' from sclerostin within the bone micro environment, a two-way exchange may exist whereby perturbations in circulating sclerostin levels alter concentrations within bone. The GALNT1 locus, which was also associated with serum sclerostin, expresses an enzyme that initiates Oglycosylation (34), and may act similarly to alter sclerostin levels by reducing sclerostin clearance.
In addition, we found that B4GALNT3 is expressed at relatively high levels in bone, particularly cortical bone, where sclerostin is also preferentially expressed, reflecting its production by osteocytes. Furthermore, the B4GALNT3 SNP did not appear to act as a trans-eQTL signal for SOST in bone. However, eQTL studies in bone had considerably less power than those in other tissues analysed (285 and 78 individuals for tibial artery and bone biopsy studies respectively).
Genetic variation within the SOST gene appeared to have relatively weak associations with serum sclerostin, including SOST SNPs previously reported to be associated with BMD or hip fracture. This suggests that previously identified SOST SNPs influence BMD and hip fracture by altering local sclerostin activity in bone, independently of circulating sclerostin levels. That said, further studies with a larger sample size are needed to establish the contribution of genetic variation within the SOST locus to serum sclerostin. In terms of other genetic influences on serum sclerostin, the association between TNFRSF11B and serum sclerostin was just below our threshold. The latter contains a well-established BMD locus (13), and the protein product, OPG, plays a major role in regulating bone resorption (35).
However, against the suggestion that altered OPG production mediates this genetic association, eQTL studies in heart tissue revealed that rs1485303 (the most strongly associated SNP at this locus) is strongly associated with expression of the adjacent gene,

COLEC10.
Whereas MR analyses using B4GLANT3 and GALNT1 as genetic instruments supported a  (29). One explanation for this latter relationship is that individuals with higher BMD have a relatively large amount of bone tissue, and therefore, produce more sclerostin as result of having greater numbers of osteocytes. Additionally, these causal relationships may represent components of a regulatory feedback pathway, such that greater bone formation leading to increased BMD stimulates sclerostin production, which then feed backs to reduce bone formation and hence limit BMD gains.

Strengths and weaknesses
This study represents the first GWAS of serum sclerostin. We identified two loci, and one further suggestive locus, which were previously unknown to be related to serum sclerostin, including B4GALNT3, encoding a galactosaminyl transferase, which had a relatively strong association for a common variant (i.e. 0.2SD change in serum sclerostin per allele). In terms of weaknesses, the total study sample of around 11,000 was relatively small. In addition, different methods were used to measure sclerostin in the participating cohorts; as well as different ELISAs, the MANOLIS cohort used the OLINK proteomics platform. Nonetheless, similar genetic associations were observed across all four cohorts, suggesting different sclerostin measurement methods are unlikely to have importantly impacted on our results.

Conclusions
Having performed a GWAS meta-analysis for serum sclerostin, we identified two genome wide significant loci encoding glycosylation enzymes, namely B4GALNT3 and GALNT1.
Genetic association signals for sclerostin at both loci co-localised with BMD, raising the possibility that variation in sclerostin glycosylation represents a hitherto unrecognised determinant of osteoporosis risk. Given the interest in sclerostin as a target for new osteoporosis treatments (2)(3), and as a genetic basis for skeletal dysplasia (37)(38), understanding the contribution of glycosylation to sclerostin activity may yield new opportunities for medical application.

The Avon Longitudinal Study of Parents and Children (ALSPAC)
Cohort details ALSPAC is a prospective birth cohort which recruited pregnant women with expected delivery dates between April 1991 and December 1992 from Bristol UK. The initial number of pregnancies enrolled is 14,541 (for these at least one questionnaire has been returned or a "Children in Focus" clinic had been attended by 19/07/1999 protocol, the standard range for the array kit was 0 -240 pmol/l and the lower detection limit was 3.2 pmol/l. Each kit was run with an internal quality control. All samples were run in duplicate and if the sclerostin measurement differed more than 20% between the duplicates, we removed the individual from further analysis. Outliers four SDs away from the mean value were excluded. After these exclusions, 8772 individuals with linked genetic and demographic data were available for analysis (see Table 1).

GOOD Cohort
Cohort details The Gothenburg Osteoporosis and Obesity Determinants (GOOD) study was initiated to determine both environmental and genetic factors involved in the regulation of bone and fat mass (42). Male study subjects were randomly identified in the greater

Conditional analysis
To detect multiple independent association signals at each of the genome-wide significant sclerostin loci, we carried out an approximate conditional and joint genome-wide association analysis using the software package GCTA-COJO (52). SNPs with high collinearity (Correlation R2 > 0.9) were ignored, and those situated more than 10 Mb away were assumed to be in complete linkage equilibrium. A reference sample of 8890 unrelated individuals of from ALSPAC mothers was used to model patterns of LD between variants. The reference genotyping data set consisted of the same 5 million variants assessed in our GWAS.
Conditionally independent variants that reached GWAS significance were annotated to the physically closest gene with the Hg19 Gene range list available in dbSNP (https://www.ncbi.nlm.nih.gov/SNP/).

Estimation of SNP heritability using Linkage disequilibrium (LD) score regression
To estimate the amount of genomic inflation in the data due to residual population stratification, cryptic relatedness and other latent sources of bias, we used LD score regression (11). LD scores were calculated for all high-quality SNPs (i.e., INFO score > 0.9 and MAF > 0.1%) from the meta-analysis data set consisting of 10,584 individuals from the 4 cohorts. We further quantified the overall SNP-based heritability's with LD score regression using a subset of 1.2 million HapMap SNPs.

Estimation of genetic correlations using LD Hub
To estimate the genetic correlation between sclerostin and bone phenotypes related to osteoporosis, we used a recent method based on LD score regression as implemented in the online web utility LD Hub (16). This method uses the cross-products of summary test statistics from two GWASs and regresses them against a measure of how much variation each SNP tags (its LD score). Variants with high LD scores are more likely to contain more true signals and thus provide a greater chance of overlap with genuine signals between GWASs. The LD score regression method uses summary statistics from the GWAS meta-analysis of sclerostin and the bone phenotypes, calculates the cross-product of test statistics at each SNP, and then regresses the cross-product on the LD score.

Mendelian randomization of sclerostin on bone phenotypes
We undertook two sample MR (53) to evaluate evidence of a causal relationship between sclerostin measured in plasma and bone phenotypes. In this initial analysis, sclerostin was treated as the exposure and bone traits as the outcomes, using sclerostin associated SNPs as the instrumental variables. The multiple testing p-value threshold was calculated as p = 0.05 divided by the derived number of independent tests. We used the random effect inverse variance weighted (IVW) (53) and Wald ratio (54) methods to obtain MR effect estimates.
Heterogeneity analysis of the instruments were conducted using Cochran Q test. Results were plotted as forest plots using code derived from the ggplot2 package in R (55). All MR analysis were conducted using the MR-Base TwoSampleMR R package (github.com/MRCIEU/TwoSampleMR, (56).

Genetic colocalization analysis between sclerostin and bone phenotypes
We used a stringent Bayesian model (coloc) to estimate the posterior probability (PP) of each genomic locus containing a single variant affecting both serum sclerostin level and bone phenotypes (57). A lack of evidence (i.e. PP < 80%) in this analysis would suggest that one of the causal variants for protein is simply in linkage disequilibrium (LD) with the putative causal variant for the trait (thus introducing genomic confounding into the association between sclerostin level and bone phenotypes). We treated colocalised findings (PP>=80%) as "Colocalised", and other findings that did not pass colocalization as "Not colocalised".

Bidirectional MR
To investigate the possibility that BMD causally affects levels of serum sclerostin, we used summary results data from 49 conditionally independent autosomal variants reported in a femoral neck BMD GWAS using 32,744 GEFOS individuals (17). We looked up these variants in summary results GWAS data on sclerostin from our meta-analysis and found 32 instruments for femoral neck BMD and 27 instruments for lumbar spine BMD. When applying bidirectional MR, we assume that SNPs used to proxy BMD exert their primary association on BMD, and that any correlation with sclerostin levels is a consequence of a causal effect of BMD on sclerostin. However, if in reality sclerostin levels exert a causal effect on BMD, then it is possible that some sclerostin SNPs might also reach genome-wide 0 significance in a large GWAS of BMD. These sclerostin SNPs could then mistakenly be used as instruments for BMD, when in fact they should be used as instruments for sclerostin levels.
This was the case in the GEFOS and UKBB studies as some of the BMD associated SNPs were also strongly associated with sclerostin (i.e. at genome-wide levels of significance). We therefore applied MR Steiger filtering (20) as implemented in the TwoSampleMR R package (56)

Functional annotation
Predicted regulatory elements of the top association signals

ATAC-seq lookup in the top association regions
The Assay for Transposase Accessible Chromatin with high-throughput sequencing (ATACseq) is a method for mapping chromatin accessibility genome-wide (25). We intersected the sclerostin associated SNPs (or proxy SNPs with LD r 2 > 0.8) with ATAC-seq data generated from the proximal and distal femur of an E15.5 mouse (24) (25) lifted over to the orthologous positions in human genome build hg19).

Gene expression quantitative trait loci (eQTLs) lookups for sclerostin signals
We investigated whether the SNPs influencing serum sclerostin level were driven by cisacting effects on transcription by evaluating the overlap between the sclerostin associated SNPs and eQTLs within 500kb of the gene identified, using data derived from all tissue types from the GTEx consortium v7 (27). Evidence of eQTL association was defined as P < 1x10 -4 and evidence of overlap of signal was defined as high LD (r2 osteoblast cells (28), and cells derived from iliac crest bone from 78 postmenopausal women (29). In the latter, mRNA was quantified both by Affymetrix RNA chip and RNA-seq. RNA-Seq analyses were evaluating using a pipeline previously described (58)  mice by sequential digestion as described previously (59)   Regional association plots and ENCODE annotation of the loci that reached or marginally reached genome-wide significance (P < 5 × 10 −8 ) in the meta-analysis. The X-axis indicates the physical position of each SNP on the chromosome specified, whereas the Y-axis denotes the evidence of association shown as −log(P-value). A) the B4GLANT3 region; 2) the GALNT1 region and 3) the TNFRSF11B region.

Figure 2
Forest plot of putative causal relationship between serum sclerostin and bone phenotypes using Mendelian randomization. X-axis represents the causal estimates and 95% confidence intervals of SD change in BMD/eBMD and OR for fracture, per SD increase in sclerostin, as calculated by inverse variance weighted method. Y-axis lists the four bone phenotypes used in the MR analysis.