Test–Retest Reliability and Correlates of Vertebral Bone Marrow Lipid Composition by Lipidomics Among Children With Varying Degrees of Bone Fragility

ABSTRACT The reliability of lipidomics, an approach to identify the presence and interactions of lipids, to analyze the bone marrow lipid composition among pediatric populations with bone fragility is unknown. The objective of this study was to assess the test–retest reliability, standard error of measurement (SEM), and the minimal detectable change (MDC) of vertebral bone marrow lipid composition determined by targeted lipidomics among children with varying degrees of bone fragility undergoing routine orthopedic surgery. Children aged 10 to 19 years, with a confirmed diagnosis of adolescent idiopathic scoliosis (n = 13) or neuromuscular scoliosis and cerebral palsy (n = 3), undergoing posterior spinal fusion surgery at our institution were included in this study. Transpedicular vertebral body bone marrow samples were taken from thoracic vertebrae (T11, 12) or lumbar vertebrae (L1 to L4). Lipid composition was assessed via targeted lipidomics and all samples were analyzed in the same batch. Lipid composition measures were examined as the saturated, monounsaturated, and polyunsaturated index and as individual fatty acids. Relative and absolute test–retest reliability was assessed using the intraclass correlation coefficient (ICC), SEM, and MDC. Associations between demographics and index measures were explored. The ICC, SEM, and MDC were 0.81 (95% CI, 0.55–0.93), 1.6%, and 4.3%, respectively, for the saturated index, 0.66 (95% CI, 0.25–0.87), 3.5%, and 9.7%, respectively, for the monounsaturated index, and 0.60 (95% CI, 0.17–0.84), 3.6%, and 9.9%, respectively, for the polyunsaturated index. For the individual fatty acids, the ICC showed a considerable range from 0.04 (22:2n‐6) to 0.97 (18:3n‐3). Age was positively correlated with the saturated index (r 2 = 0.36; p = 0.014) and negatively correlated with the polyunsaturated index (r 2 = 0.26; p = 0.043); there was no difference in index measures by sex (p > 0.58). The test–retest reliability was moderate‐to‐good for index measures and poor to excellent for individual fatty acids; this information can be used to power research studies and identify measures for clinical or research monitoring. © 2020 The Authors. JBMR Plus published by Wiley Periodicals LLC on behalf of American Society for Bone and Mineral Research.

mineralization, and survival, (20,22,23) and inhibit osteoclast differentiation and function. (24,25) Given the association with bone fragility, (26,27) the bone marrow saturated index, which is the proportion of saturated fatty acids to total fatty acids, has been suggested as a biomarker for osteoporosis. (28) To date, many of the studies examining the interplay between bone and bone marrow adipose tissue have focused on adults, animal models, or in vitro experiments. (13)(14)(15)(16)(17)(18) Growth and development is an essential window for maximizing bone health. Factors that impede typical bone acquisition during growth and development will not only diminish peak bone mass attainment, but can have lasting adverse ramifications on bone health throughout the lifespan. Accordingly, recent attention is shining the light on bone marrow adipose tissue as a potential untapped source involved in bone acquisition during development, especially for pediatric patients with varying degrees of bone fragility, (18,29) such as mild osteoporosis noted in the spine in adolescent idiopathic scoliosis (AIS) and the profound fragility in cerebral palsy (CP). (30)(31)(32) Individuals with AIS and CP can have progressive scoliosis and may undergo spinal fusion surgery for correction of scoliosis. This provides a unique resource for acquiring pediatric bone and marrow samples to facilitate therapeutic and mechanistic research in the bone marrow adipose tissue arena with direct clinical relevance.
Lipidomics is a branch of metabolomics that seeks to identify the presence, abundance, diversity, and interactions of lipids in biological systems, and is a rapidly developing discipline making significant biomedical advances in the area of lipid biology. (33) Using lipidomics to assess bone marrow lipid composition from children with AIS and CP is particularly attractive as it presents the opportunity to comprehensively analyze the bone marrow lipidome to drive research directions and address specific biological questions. To date, the reliability of lipidomics to analyze the bone marrow lipid composition among pediatric populations with bone fragility has yet to be determined. This is particularly important for bone marrow as this tissue has a highly heterogeneous cellular pool with unknown cellular and tissuetype distribution for pediatric populations with bone fragility. Therefore, to help interpret bone marrow lipid composition measures in the clinical and research setting and to provide novel information to help power research studies where pediatric bone marrow lipid composition is a measure of interest, reliability assessment is imperative. Accordingly, the purpose of this study was to assess the test-retest reliability, SEM, and the minimal detectable change (MDC) of vertebral bone marrow lipid composition determined by targeted lipidomics among children with AIS and CP following routine orthopedic surgical care.

Participants
With institutional review board approval, participants aged 10 to 19 years, with a confirmed diagnosis of AIS or CP, and who were undergoing routine posterior spinal fusion surgery in the same pediatric orthopedic clinic at the University of Michigan were eligible for this study. Parental consent and, where appropriate, patient assent was obtained to use the transpedicular vertebral-body bone-marrow samples collected from surgery, and to obtain diagnosis and basic demographic information, including date of birth, sex, and race for all participants. Additional information was collected from children with CP, including type of CP and gross motor function classification system (GMFCS) ranking. The GMFCS ranks the severity of motor impairment on a Roman numeral scale from I to V, coinciding with mild to severe motor impairment, respectively. (34) Bone marrow samples were obtained from children with spastic-type CP and classified as GMFCS IV and V.

Bone marrow sample extraction
Bone marrow was extracted from the vertebral body via the pedicle in the lumbar and lower thoracic spine, based on the planned spinal fusion instrumentation construct, using an 18-gauge needle and syringe. Bone marrow aspirates were immediately placed on wet ice and kept cold until processing. Samples for lipidomics were divided into 1-mL aliquots and stored at −80 C until analysis. Of the 34 children that met eligibility criteria for the study and had at least 1 mL of vertebral bone marrow, 16 children had two 1-mL aliquots of bone marrow collected from the same site during the same surgery, and by the same two physicians (co-author MSC, n = 15; co-author YL, n = 1) from vertebrae T11 to L4 and were analyzed in this study. These two 1-mL bone marrow aliquots are referred to as sample 1 and sample 2 hereafter. Because bone marrow tissue is heterogeneous, the research question centers on the reliability of extracting bone marrow during routine orthopedic surgery and how much variability exists in adjacent marrow tissue. Therefore, the goal of this study was to assess the reliability of different segments of the extracted marrow for lipidomics, rather than the precision or repeatability of the lipidomics technique using the same exact sample (eg, intrasample variability).

Lipidomics
Samples were sent to the Michigan Nutrition Obesity Research Center Lipidomics Core (Ann Arbor, MI, USA) for targeted lipidomics using established procedures and the Total Lipid Extraction and Thin-Layer (Merck, Germany) Chromatography Cleanup assay by an experienced technician. Briefly, for each bone marrow sample, lipids were extracted using a modified Bligh-Dyer method of solvent partition. (35) An aliquot of 100 μL was taken and 200 μL of water was added. Total lipids were extracted after adding 2.25 mL of chloroform-methanol (1:2) containing 0.01% butylated hydroxytoluene and 10 μL of 4mM heptadecanoic acid (C17:0) as an internal standard. The mixtures were thoroughly homogenized on a vortex, treated with 0.75 mL of chloroform and 0.75 mL of NaCL (0.9%) solution, and then mixed and centrifuged on a table-top centrifuge at 3,000 rpm for 15 min, after which a clear separation between the two layers was observed. The lower organic layer (chloroform) contained the lipids and was transferred into another set of tubes and saved at −20 C until further analysis.
The fatty acids were broken down into their methyl esters via transesterification with BF 3 -methanol using a modified method, as previously described. (36) The fatty acid methyl esters (FAMEs) were extracted by adding 2 mL of hexane and 1 mL of water, then mixing and centrifuging followed by collecting them in the upper hexane layers. The solvents were then removed under nitrogen and the crude FAMEs were redissolved in a small volume of chloroform and purified on a thin-layer chromatographic plate (20 × 20 cm, silica gel 60; Merck, Darmstadt, Germany). The plate was developed with a solvent mixture of hexane-diethyl ether-acetic acid. Methyl ester bands were then identified by comparing the retention flow of the authentic standard and the contents from the thin-layer chromatographic powders, and were extracted with chloroform. The solvents were removed under nitrogen and methyl esters, were redissolved in a small volume of hexane (100 to 200 μL), and the fatty acid compositions of the lipids were analyzed by gas chromatography.
Analyses of FAMEs were performed by gas chromatography on an Agilent Gas Chromatography model 6890 N (Agilent Technologies, Santa Clara, CA, USA) with a flame-ionization detector, an auto sampler, and ChemStation software (Agilent) for analysis. The gas chromatography column used was Agilent HP 88 of 30 m, with 0.25-mm id and 0.20-μm film thickness. Hydrogen was used as a carrier gas as well as for the flame-ionization detector. Nitrogen was used as a makeup gas. Analyses were carried out with a temperature program of 125 C to 220 C. One μL of sample was injected by the auto sampler, and each sample was analyzed in 20 min. A calibration curve was prepared running known amounts of methyl heptadecanoate and other commercially available standard methyl ester mixtures containing saturated and unsaturated carbon chain length from 12 through 24 carbons on gas chromatography, and using the peak area ratio response of each methyl ester with respect to methyl heptadecanoate. A mixture of authentic methyl esters was also run side-by-side to identify the components in unknown samples by comparing their retention times. The fatty acids were quantified with respect to the amounts of C17:0 internal standard added and the calibration curve prepared. The coefficient of variation for gas chromatography analyses was found to be within 2.3% to 3.7%.
Measures were converted to relative abundance (%) for each fatty acid and the saturated, monounsaturated, and polyunsaturated indices were calculated as the proportion of the fatty acid type (eg, saturated) to the total fatty acid amount. The nomenclature used for individual fatty acids included the number of carbon chains, the number of double bonds, and for unsaturated fatty acids, the number of carbon chains after the double-bond and cis (c) or trans (t) configuration. For example, stearic acid 18:0, is a saturated fatty acid with 18 carbon chains and 0 double bonds; docosahexaenoic acid 22:6n-3, is a polyunsaturated fatty acid with 22 carbon chains, six double-bonds, and the first double-bond at the third carbon chain from the omega end.

Statistical analysis
Test-retest reliability was assessed using relative and absolute estimates, including the intraclass correlation coefficient (ICC; relative estimate), SEM (absolute estimate), and MDC (absolute estimate). ICC reflects the degree of correlation and agreement between measures, which is a superior method for test-retest reliability assessment. (37) Following guidelines for selecting and reporting ICC for test-retest reliability, (37) the ICC and 95% CI were estimated using SPSS statistical package version 24 (SPSS Inc, Chicago, IL, USA) based on a two-way mixed-effects model and absolute agreement. As a general rule, ICC <0.50 indicates poor reliability, ICC between 0.50 and 0.75 indicates moderate reliability, ICC between 0.75 and 0.90 indicates good reliability, and ICC >0.90 indicates excellent reliability. (37) The SEM provides an estimate of the discrepancy between repeated measures or intraindividual variability, (38) and was calculated as follows: SEM = SD pooled × √(1-ICC), where SD pooled is the pooled SD of both sets of measures. The MDC provides an estimate of the minimal magnitude of change that is required for the measure to be considered a real change rather than the result of random variation or measurement error at the 95% CI level. (39,40) The MDC was calculated as follows (41) : MDC 95% = 1.96 × SEM × √2, where 1.96 comes from the z-score corresponding to the 95% CI and √2 accounts for the underlying uncertainty during measurement.

Sensitivity analysis
The reliability estimates are vulnerable to outliers and influential observations because of the small sample size. Outliers and influential observations were assessed using agreement and correlation methods for each lipid composition measure. Bland-Altman plots were constructed to evaluate the agreement between the two samples by regressing the difference between sample 1 and sample 2 measures with the mean of the two sample measures. (42) The mean of the difference between bone marrow samples provides an estimate of the measure's fixed bias, which was tested statistically using a one-sample t test. Normality of the mean difference was assessed using skewness, kurtosis, and the Shapiro-Wilk test.
Correlations between lipid composition measures from the two bone marrow samples were visually inspected using scatter plots. Outliers were assessed by Cook's distance (D i ), which identifies the strength of the influence of each data point (via residuals and leverage) on the regression between measures (the higher the D i value, the more influential the data point). (43) Although a cutoff threshold of D i > 1.0 has been proposed in the presence of a large sample size, (44) we used a more conservative threshold of D i > 0.50 because of the small sample size for this study. It is important to note that ICC reflects both the agreement and correlation. (37) The use of Bland-Altman (agreement) and D i (correlation) as model diagnostics assesses each, but not both simultaneously. Nevertheless, for lipid composition measures that displayed potential outliers or had data points with D i > 0.50 (influential observations), ICC was reassessed after removing the data point(s) to determine if these values influenced the reliability estimates (assessed qualitatively).
The mean (AE SD), ICC, SEM, and MDC of the lipid composition measures from the 16 bone marrow samples are presented in Table 2, which is organized based on index versus individual fatty acids, type of fatty acid (eg, saturated), and the ICC point estimate from high to low. Based on the average of the first measurement (average of second measurement), 31.1% (31.4%) of the bone marrow lipids consisted of saturated fatty acids, while 33.9% (34.6%) and 34.9% (34.0%) consisted of monounsaturated and polyunsaturated fatty acids, respectively. 18:2n-6, 18:1n-9, and 16:0 were the most abundant fatty acids and 13:0, 12:0, and 22:2n-6 were the least abundant fatty acids.
For the index measures, the ICC was 0.81 (95% CI, 0.55-0.93) for the saturated index, suggesting good (moderate-to-excellent) reliability, and 0.66 (95% CI, 0.25-0.87) and 0.60 (95% CI, 0.17-0.84) for the monounsaturated and polyunsaturated indices, respectively, suggesting moderate (poor-to-good) reliability. The SEM was 1.6%, 3.5%, and 3.6% for the saturated, monounsaturated, and polyunsaturated index, respectively, and the MDC was 4.3%, 9.7%, and 9.9%, respectively. Figure 1 shows the scatter plots demonstrating the relationship between the index measures ( Fig. 1A-C) and Bland-Altman plots for the agreement (Fig. 1D-F) between bone marrow samples for the index measures. Based on the Bland-Altman plots, the average bias was minimal (near zero) for each index measure, and there was no evidence of fixed bias (all p > 0.45 using a one-sample t test). The relative abundance of 13:0 was too low to detect and assess reliability. For all other fatty acids, the ICC exhibited a considerable range from 0.04 (95% CI, −0.49 to 0.52) to 0.97 (95% CI, 0.92-0.99). Based on the ICC point estimate of the 29 fatty acids where reliability estimates were obtained (ie, excluding 13:0), 2 had excellent reliability (6.9%), 7 had good reliability (24.1%), 12 had moderate reliability (41.4%), and 8 had poor reliability (27.6%). Figure 2 shows the scatter plots ( Fig. 2A-C) and Bland-Altman plots (Fig. 2D-F) of three fatty acids corresponding to the lowest (22:2n-6), middle (14:0), and highest (18:3n-3) ICC to represent the range of correlation and agreement among individual fatty acids. Based on the Bland-Altman plots, the average bias for these three representative fatty acids was minimal (near zero) and there was no evidence of fixed bias (all p > 0.27 using a one-sample t test). However, the mean difference was not normally distributed for 22:2n-6 (p = 0.013) or 18:3n-3 (p < 0.001), and showed evidence of an outlier.

Sensitivity analysis
In addition to 22:2n-6 and 18:3n-3, there was evidence of an outlier or influential observation for 12 other individual fatty acids, but not for the index measures. However, only 1 or 2 out of the 16 data points suggested a potential outlier or influential observation per fatty acid.
The mean (AE SD), ICC, SEM, and MDC for the 14 fatty acids after removing the potential outliers or influential observations are presented in Table 3. The ICC value increased in six of the fatty acids, decreased in six of the fatty acids, and did not change or exhibited negligible change in two of the fatty acids. Some of the ICC values changed more drastically than others, eg, ICC increased from 0.67

Exploratory analysis
Age was significantly and positively correlated with the saturated index (Fig. 3A, r 2 = 0.36; p = 0.014), not significant correlated with the monounsaturated index (Fig. 3B), and significantly and negatively correlated with the polyunsaturated index (Fig. 3C, r 2 = 0.26, p = 0.043; results are presented for sample 1, but are consistent for both samples). There was no statistical difference between girls and boys for the saturated index ( 46) for this small sample (index results are presented for sample 1, but are consistent for both samples); however, all nine girls had AIS whereas three out of seven of the boys had CP, and bone marrow was mostly extracted from the thoracic spine for girls (n = 6) and for boys it was the lumbar spine (n = 5).

Discussion
The findings from this study suggest moderate-to-good testretest reliability for the vertebral bone marrow saturated and unsaturated index via targeted lipidomics from a small sample of children with AIS and CP using bone marrow collected under routine orthopedic surgical conditions. When we examined individual fatty acids, we observed a considerable range from poorto-excellent test-retest reliability with approximately one third having good to excellent (34.5%), moderate (37.9%), or poor (27.6%) test-retest reliability. In the exploratory analysis, we found that age was positively correlated with the saturated index and negatively correlated with the polyunsaturated index. These findings have important logistical implications for future research and clinical endeavors because delineating the bone In the current study, we observed that the test-retest reliability of some fatty acids were sensitive to potential outliers or influential observations. Specifically, in the sensitivity analysis, we observed one or two potential outliers or influential observations for four saturated fatty acids, four monounsaturated fatty acids, and six polyunsaturated fatty acids exhibiting a range in relative abundance (<0.01% to 25.48%). It is important to note that these potential outliers or influential observations came from 10 different participants rather than the same or few participants, which may suggest the potential for considerable biological variability in bone marrow lipids between and within participants. Although bone marrow consists of multiple cellular lineages, the cellular and tissue heterogeneity may be even greater or unique for individuals with bone fragility, such as AIS and CP. Factors related to AIS and CP that may influence the status of bone marrow lipid composition, and its variability within a given site, include nutrition, hormonal factors, comorbidities (eg, cardiometabolic disease, inflammation), surgeries, and medications. (18,45,46) However, it is important to note that the variability may be attributed solely to or in conjunction with the small sample size; we therefore urge caution in interpretation. Further, the current study found that age during this adolescent period was associated with a higher bone marrow saturated index and a lower polyunsaturated index, but none of these indices was associated with sex. These findings are consistent with studies that found higher marrow fat with age in the distal femur of 11-to 18-year-olds with anorexia nervosa (47) and in L4 of typically developing newborns to 18-year-olds with no evidence of sex differences. (48) The reasons for the suboptimal reliability performance for some individual fatty acids are likely multifactorial. The relative abundance may be one of these reasons. Although not presented in the Results section, when the relationship between the relative abundance of sample 1 and the ICC value was examined for all 30 individual fatty acids, the r 2 was 0.01 (p = 0.530). However, when only fatty acids with less than 1% abundance were included (n = 22), there was a stronger and positive correlation with the r 2 at 0.32 (p = 0.006). This may be because of logistical considerations for the fatty acids with very low abundance, such as detection sensitivity of the lipidomics technique or sample collection or processing methods that may elicit degradation of these fatty acids. For example, bone marrow fatty acid integrity may be impacted by collection methods (eg, needle extraction may damage cell membranes), storage temperature (eg, colder environments may limit enzyme activity and lipid degradation), duration of storage, and the number of freeze-thaw cycles, (49)(50)(51)(52)(53)(54)(55) especially for fatty acids with very low abundance. None of the samples in the current study underwent previous freeze-thaw cycles. Nevertheless, study findings have important implications for research and clinical investigation where biological effects of specific fatty acids from bone marrow may be of interest. Based on our findings using bone marrow collected during routine orthopedic surgical conditions, which may not be generalizable to other clinical populations or methods that are designed specifically for bone marrow extraction for analysis, we urge caution when assessing specific fatty acids with very low abundance from bone marrow based on the potential for poor reliability in measurement assessment.
We observed that the SEM was 1.6%, 3.5%, and 3.6% for the saturated, monounsaturated, and polyunsaturated index, respectively, and that the SEM ranged from <0.1% to 5.8% for the individual fatty acids. These observations indicate the range of values around the point estimate for each measure for intraindividual variability. (38) We observed that the MDC was 4.3%, 9.7%, and 9.9% for the saturated, monounsaturated, and polyunsaturated index, respectively, and that the MDC ranged from <0.1% to 16.0% for the individual fatty acids. These findings suggest that a change between repeated measurements larger than the MDC values could indicate a real change with 95% certainty among a similar sample. Seeing the considerable range of MDC, the degree of change needed to be considered a true change depends on the specific bone marrow lipid-composition measure. These values can be used for power analysis and samplesize calculation for research studies. Further consideration for longitudinal assessment is that the absolute abundance of some individual fatty acids may remain the same, whereas others change drastically, thus impacting the relative abundance measure. How the interplay between relative and absolute measures impacts biological outcomes remains to be determined.
It is important to note that there are other approaches to examine the composition of bone marrow that are less invasive. For example, bone marrow fat content using magnetic resonance spectroscopy has been used as a biomarker to distinguish skeletal fragility and altered metabolic states, such as type 2 diabetes mellitus, (27,56) and standard MRI has been used to quantify the extent of bone marrow fat infiltration in pediatric populations. (30) A major advantage of in vivo imaging is, in general, the relatively high reproducibility, such as an ICC of 0.96 for bone marrow fat content in children with and without CP. (30) However, the in vivo imaging techniques that are currently available for human research lack more granular information that tissue extraction can provide, such as the degree of saturation and unsaturation using magnetic resonance spectroscopy (27,47,56) compared with quantifying individual fatty acids using tissue lipidomics.
A major strength of this study is that, to our knowledge, this is the first study to assess the bone marrow lipid composition in children with varying degrees of bone fragility, and the first study to assess the test-retest reliability of bone marrow lipid profiles among children and adolescents. Further, the interest in the bone marrow adipose field is growing, yet the acquisition of bone marrow tissue from children and adolescents for research is rare. Therefore, this work provides novel and fundamental insights about the interpretability of bone marrow lipidomics among pediatric populations with bone fragility, and the use of human tissue facilitates data acquisition with direct clinical relevance. Although meaningful patterns did emerge, one limitation of this study is the small sample size. However, a large sample size is not feasible because of the invasive nature of bone marrow acquisition. Nevertheless, the findings from this sample size have practical implications as research studies that include clinical pediatric populations, such as AIS or CP, often have few participants caused by logistical challenges, such as a low recruitment pool. Therefore, a sample size of 16 is not uncommon for such studies, especially for studies that require tissue extraction. Another limitation is the unknown site-specific effect on reliability measures as we collected tissue from T11 to L4. However, even if there are differences in bone marrow lipid profiles along the vertebral column, the potential variability is less for reliability assessment in this study considering the tissue samples are collected from the same site (as opposed to intersite), immediately adjacent to one another, and collected during the same surgery. In this study, site-specific variation of reliability would indicate vertebral differences in the degree of bone marrow tissue heterogeneity. Although it is well-documented that bone marrow tissue is highly heterogeneous in terms of a diverse cellular pool, we are unaware of any studies that have shown that the degree of heterogeneity differs by neighboring vertebral sites. Further, we examined the correlation and agreement by lumbar and thoracic sites (eg, all figures), which provides little evidence of sitespecific effects. However, the sample size is small; therefore, we are unable to conclusively determine the presence or absence of site-specific effects on the reliability measures, SEM, and MDC. Finally, our exploratory analyses examined a small set of potential variables, which may be confounded by other agerelated factors, such as height.
In conclusion, the test-retest reliability for vertebral bone marrow saturated and unsaturated index was moderate-to-good and poor-to-excellent for individual fatty acids, using targeted lipidomics from children with bone fragility in which bone marrow was collected during routine orthopedic surgery. Further, age, but not sex, was associated with a higher and lower saturated and polyunsaturated index, respectively, but the lack of sex difference may have been caused by differences in characteristics (eg, CP). Future studies are needed to determine the interplay among bone, marrow, physiology, physical function, and disease for pediatric populations with bone fragility, not just for reliability assessment of the bone marrow lipidome, but also for clinical management, intervention, and optimizing peak bone mass attainment for these skeletally vulnerable populations.

Disclosures
All authors state that they have no conflicts of interest.