A Statistical Approach Regarding the Diagnosis of Osteoporosis and Osteopenia From DXA: Are We Underdiagnosing Osteoporosis?

ABSTRACT Osteoporosis and osteopenia are diagnosed most commonly by evaluating the lowest T‐score of BMD measurements, typically taken at three sites: the L1‐L4 lumbar spine, femoral neck, and total hip. This study aimed to evaluate the effect of using all three BMD measurements and multivariate statistical theory to evaluate how the diagnoses of osteoporosis and osteopenia change in simulation studies and in real data. First, it was found that the T‐scores from these three BMD measurements rarely give concordant diagnoses using the same World Health Organization (WHO) and International Society for Clinical Densitometry (ISCD) guidelines, so that the diagnosis strongly depends on the BMD sites measured. Next, strong correlations were found between the BMD measurements at different sites within the same person, which resulted in increased congruence/concordance between the diagnoses obtained from the BMD T‐scores. Multivariate statistical theory was used to show that the joint distribution of the BMD T‐scores at different sites follows a multivariate t distribution and found that the marginal distribution of any BMD T‐score follows a univariate t distribution. Confidence ellipsoids were derived that are equivalent to the univariate WHO/ISCD thresholds for osteoporosis (T‐score ≤−2.5) and osteopenia (−2.5 < T‐score <−1). The study found that more patients are diagnosed with osteoporosis using the multivariate version of the WHO/ISCD guidelines rather than the current WHO/ISCD guidelines in both real data and simulation studies. Diagnoses of osteoporosis using the statistics derived method were also associated with higher FRAX (fracture risk assessment tool) probabilities of major osteoporotic (p = 0.001) and hip fractures (p = 2.2 × 10−6). In conclusion, this study shows that considering all three BMD T‐scores is potentially more informative than using the single lowest BMD T‐score. © 2020 The Authors. JBMR Plus published by Wiley Periodicals LLC. on behalf of American Society for Bone and Mineral Research.


B one Mineral Density (BMD) is influenced by several factors
including genetics, activity, nutrition, and medical comorbidities. (1,2,3) BMD naturally decreases after midadulthood and low BMD is associated with increased risk of fractures of the spine, forearm, and hip. (1,3) Hip fractures, for example, occur in approximately 400 of 100,000 people in the United States. (4) The one-year mortality following a hip fracture is approximately 21.9% for women and 32.5% for men. (4) Therefore, identifying and treating patients with low BMD, who are at risk for fractures, especially hip fractures, is important. Dual-energy X-ray absorptiometry (DXA) is the most commonly used screening imaging test for the diagnosis of low BMD. (5,6) The BMD is typically measured at three sites: the lumbar spine, specifically, the L1-L4 vertebral bodies; the femoral neck; and the total hip. (6) Osteoporosis in the elderly is diagnosed using the World Health Organization (WHO) and International Society for Clinical Densitometry (ISCD) guidelines when the BMD measured at any site is 2.5 standard deviations (SDs) or more below that of sex-matched young adults (lowest T-score ≤−2.5); osteopenia in the elderly is diagnosed when the BMD measured at any site is between 1 and 2.5 SDs below that of sex-matched young adults (−2.5 < lowest T-score <−1). (6,7,8) Thus, the WHO/ISCD method uses only one BMD measurement to diagnose osteopenia and osteoporosis.
These three BMD measurements (L1-L4 BMD, femoral neck BMD, and the total hip BMD) are likely correlated because they are measured within the same individual and they are measuring overall bone homeostasis within that individual. This correlation structure has been largely ignored in clinical practice. For example, a patient with L1-L4, femoral neck, and total hip BMD T-scores of −1.0, −1.0, and − 1.0 respectively, is normal under the current WHO clinical guidelines, whereas a patient with L1-L4, femoral neck, and total hip BMD T-scores of −1.1, 1.5, and 1.5 respectively, has osteopenia. (7,8) This results in patients with three borderline low-BMD measurements having a more severe diagnosis than patients with a single marginally low-BMD measurement.
The current WHO DXA BMD T-score threshold values for the diagnosis of osteoporosis were determined by finding the individual areal BMD T-scores that would give a proportion of individuals osteoporosis that matched the known prevalence of patients with osteoporosis while simultaneously matching the lifetime fracture risk. (7,8,9,10) Kanis and colleagues (7) noted that a BMD T-score threshold value of −2.5 identifies approximately 30% of postmenopausal women as having osteoporosis using BMD measurements, and this 30% is approximately equivalent to the lifetime fracture risk (39.7%) at these sites in postmenopausal women (Tables 1 and 2). This is why the threshold T-score of −2.5 was used to define osteoporosis. A prior report noted that this WHO/ISCD definition makes osteoporosis population-based/statistics-based rather than BMD-based. (11) The aim of this study was to use statistical theory to account for the correlations between BMD measurements to create joint thresholds using all three BMD measurements to categorize patients into being osteoporotic or osteopenic. Simulated and real data are used to show the potential change in the number of osteoporosis and osteopenia diagnoses when the multivariate method in which all three BMD measurements are used in the diagnosis of osteopenia and osteoporosis as compared with the current WHO/ISCD method in which only one BMD measurement is used for this diagnosis.

Statistical theory
First, we show that the joint distribution of the L1-L4 lumbar spine, femoral neck, and total hip BMD T-scores from DXA studies follow a multivariate T-distribution. (12)(13)(14)(15)(16) We calculate 3-dimensional confidence intervals (3D CIs) from the multivariate T-distribution to be used for thresholds to diagnose osteopenia and osteoporosis. We show that the marginal distribution of each BMD (L1-L4 lumbar spine, femoral neck, and total hip) remains a univariate T distribution. We then derive the conditional distribution of any single BMD T-score when the other two BMD T-scores are known (Appendix).

Simulated data
We simulated random samples of 1000 postmenopausal white women with DXA studies, evaluating the BMD of the L1-L4 lumbar spine, femoral neck, and total hip (see the Appendix for full details). We assumed that the BMD of the L1-L4 lumbar spine, the BMD of the femoral neck, and the BMD of the total hip were normally distributed.
We simulated the DXA data of 1000 postmenopausal white women from a trivariate normal distribution with mean, μ O ,and variance covariance matrix: where ρ 12 is the correlation between the L1-L4 BMD and the femoral neck BMD, ρ 13 is the correlation between the L1-L4 BMD and the total hip BMD, and ρ 23 is the correlation between the femoral neck BMD and the total hip BMD. σ O. 1 is the SD of the L1-L4 BMD, σ O. 2 is the SD of the femoral neck BMD, and σ O. 3 is the SD of the total hip BMD. The simulated L1-L4 BMD, femoral neck BMD, total hip BMD, and the mean and SDs of the young-adult population L1-L4 BMD, femoral neck BMD, and total hip BMD (available from the National Health and Nutrition Examination Survey III [NHANES III] cohort for the young-adult total hip and femoral neck BMD, and from GE Lunar Prodigy DXAs (10) for the young-adult L1-L4 BMD) were used to calculate the BMD Tscores in the simulated data. There were three pairwise correlations between the L1-L4 BMD, femoral neck BMD, and total hip BMD: (i) the correlation between the L1-L4 BMD and the femoral neck BMD (ρ 12 ), (ii) the correlation between the L1-L4 BMD and the total hip BMD (ρ 13 ), and (iii) the correlation between the femoral neck BMD and the total hip BMD (ρ 23 ). We considered five sets of values for these three pairwise correlations, ρ 12 , ρ 13 , and ρ 23 :

Congruence of simulated BMD measurements
We calculated the proportion of times all three simulated BMD measurements (L1-L4 BMD, femoral neck BMD, and total hip  (7) and Melton 3 rd et al. (9) (9) BMDs) were all congruent and were consistent with a diagnosis of osteoporosis, osteopenia, and normal BMD, respectively, using the WHO guidelines.

Diagnoses of osteoporosis and osteopenia in the simulated data
From the simulated data, we calculated the proportion of patients that would be diagnosed with osteoporosis, osteopenia, and normal BMD, respectively, using the WHO/ISCD guidelines, for the five different pairwise correlation structures examined. (17) We compared these proportions to the proportion of patients that would be diagnosed with osteoporosis, osteopenia, and normal BMD, respectively, using the statistics-based method.

Real data
This retrospective study was approved by the local institutional review board (IRB) and the need for signed informed consent from each patient was waived ( The L1-L4 BMD, femoral neck BMD, and total hip BMD, as well as the L1-L4 BMD T-scores, femoral neck BMD T-scores, and total hip BMD T-scores were obtained and recorded from a sample of 1000 white women 65 years of age who underwent routine screening DXA studies at a single tertiary-care academic medical center. Patient age, sex, height, weight, and BMD measurements (BMD and BMD T-scores), history of diabetes mellitus, history of adult fracture, and history of calcium/vitamin D supplementation at the time of the DXA study were recorded. We also recorded the patients who started pharmacologic treatment of osteoporosis.

Congruence of BMD T-score measurements
The study sample mean and SDs of the L1-L4 BMD, femoral neck BMD, and total hip BMD, as well as the mean and SDs of the L1-L4 BMD T-scores, femoral neck BMD T-scores, and total hip BMD Tscores were calculated. Next, we calculated the proportion of instances where all three BMD T-score measurements were congruent and were consistent with diagnoses of osteoporosis, osteopenia, and normal BMD, respectively, using the WHO/ISCD guidelines.

Diagnoses in the study sample
The prevalence of osteoporosis, osteopenia, and normal BMD among 65-year-old white women in the sample data was estimated using the WHO/ISCD guidelines. (17) Then, the prevalence of osteoporosis, osteopenia, and normal BMD among these 65-year-old white women in the sample data was estimated using the statistics-based guidelines.

Patients considered for osteoporotic therapy
The fracture risk assessment tool (FRAX) estimates the 10-year risk of major osteoporosis-related fractures and hip fractures based on clinical risk factors and BMD measurements obtained at a single site: the femoral neck. Currently, treatment of osteoporosis is considered for patients with osteoporosis based on the WHO guidelines (lowest T-score at any site ≤−2.5) and patients with osteopenia based on the WHO guidelines (lowest T-score at any site between −1.0 and −2.5) with a 10-year FRAX risk of major osteoporosis-related fracture of ≥20% or a 10-year FRAX risk of hip fracture ≥3%. We were asked to evaluate how the number/proportion of patients considered for osteoporotic treatment under the current guidelines would change using our statistics-based method utilizing all three T-scores. To answer this question, the 10-year FRAX risk major osteoporosis-related fracture and 10-year FRAX risk of hip fracture from the DXA output when performed was collected.
All test statistics were two-sided. R statistical software was used (R Foundation for Statistical Computing, Vienna, Austria; https:// www.r-project.org/). (18) The proportions of patients with each diagnosis (normal, osteopenia, and osteoporosis) calculated using the current WHO method and the statistics-based method were compared using McNemar's test. The type I error rate was set at 0.05, and p values <0.05 were considered statistics significant.

Congruence of BMD T-scores
The mean and SD of each BMD measurement were the same for each site measured, regardless of the correlation structure between BMD measurements at different sites. However, the congruence of the BMD T-score measurements was dependent on the correlation between BMD measurements at different sites. The congruence of the WHO/ISCD diagnoses using BMD T-score measurements increased with higher correlations between the BMD measurements at different sites (Table 3). We found that there was poor congruence between the diagnoses obtained from the BMD measurements at each site using the WHO/ISCD guidelines. Assuming a correlation r = 0 between BMD measurements at different sites, 0% (0 of 14) of patients had all three BMD measurements that were congruent for osteoporosis and 1.0% (4of 391) of patients had all three BMD measurements that were congruent for osteopenia. Assuming a correlation r = 0.8 between BMD measurements at different sites, 20% (2 of 10) of patients had all three BMD measurements that were congruent for osteoporosis and 21.2% (52 of 245) of patients had all three BMD measurements that were congruent for osteopenia (Table 3). This shows that the congruence of the diagnoses at each site was poor but increases with increasing correlation between the BMD measurements at different sites.

Diagnoses
The prevalence of osteoporosis using the WHO/ISCD guidelines decreased with higher correlations between the BMD measurements in the simulated data. We found that there was also an increase in the prevalence of patients with normal BMD using the WHO/ISCD guidelines when there were high correlations between the BMD measurements. However, when using the statistics-based method, we found that there was no change in the numbers/proportions of patients with osteoporosis and osteopenia regardless of the correlation structure between the BMD measurements (Table 3). Confidence ellipsoids for the diagnosis of osteoporosis were generated assuming the correlations    (Figure 1C), (0.6, 0.6, 0.6) ( Figure 1D), and (0.8, 0.8,0.8) (Figure 1E), respectively. Simulated data points are superimposed to show the clustering of these data points. The planes corresponding to T-score thresholds of −2.5 in each of the three axes are included to illustrate the difference between using the WHO/ISCD method and the statistics-based method. Figure 2 shows the confidence ellipsoids for the diagnosis of osteopenia with superimposed simulated data points assuming the correlations between BMD measurements (ρ12, ρ13, ρ23) are (0, 0, 0) (Figure 2A (Figure 2E), respectively. The planes corresponding to T-score thresholds of −1 in each of the three axes are included to illustrate the difference between using the WHO/ISCD guidelines and the statistics-based method.

Real data
Most patients (60.3%) were already taking a multivitamin/calcium/vitamin D supplement. There were 24 patients (2.4%) with a prior history of adult fracture before the DXA study and 71 patients (7.1%) with a history of diabetes mellitus. One hundred patients (10.0%) were started on pharmacologic therapy for osteoporosis after the DXA screening test.

Congruence of BMD T-scores
The BMD measurements were congruent at all three sites for 41.1% (411 of 1000) of the patients in the study sample. There were 26 patients (2.6%) who had a consistent diagnosis of osteoporosis using the WHO/ISCD guidelines using all three of their BMD measurements (L1-L4 BMD T-score, femoral neck BMD Tscore, and total hip BMD T-score), 18.3% (183 of 1000) of patients had a consistent WHO/ISCD diagnosis of osteopenia using all three of their BMD measurements, whereas 20.2% (202 of 1000) of patients had a consistent diagnosis of normal BMD using all three of their BMD measurements (Table 4). Of the patients with WHO/ISCD osteoporosis, 11.5% (26/226) had all three BMD measurements that were congruent; of the patients with WHO/ISCD osteopenia, 32.0% (183 of 572) had all three BMD measurements that were congruent.

Diagnoses
Of the study sample, 22.6% (226 of 1000) had osteoporosis and 57.2% (572 of 1000) had osteopenia using the WHO/ISCD guidelines. However, when using the statistics-based method, we found that approximately 373 patients would be diagnosed with osteoporosis, a 65% increase; and 468 patients would be diagnosed with osteopenia, a 19.2% decrease. Table 4 gives a summary of the clinical and demographic statistics for the 1000 65-year-old white women in the study. All patients with WHO/ISCD osteoporosis were diagnosed as having osteoporosis using the statistics-based method. We found that 79.8% of the sample were osteopenic/osteoporotic using WHO/ISCD guidelines, whereas 84.1% of the sample were osteopenic/osteoporotic using the statistics-based method. The greatest difference between the statistics-based method and the WHO/ISCD guidelines was that more patients would be classified as having osteoporosis under the statistics-based method.
McNemar's test showed that the statistics-based metric identified more patients with osteopenia/osteoporosis than the WHO/ISCD guidelines (χ 2 = 24.16, p < 8.8 × 10 −7 ) and the statistics-based metric identified more patients with osteoporosis than the WHO/ISCD guidelines (χ 2 = 145.01, p < 2.2 × 10 −16 ). Confidence ellipsoids for the diagnosis of osteopenia ( Figure 3A) and osteoporosis ( Figure 3B) with superimposed data points and WHO/ISCD guidelines using the real data. The planes corresponding to T-score thresholds of −2.5 and −1.0 in each of the three axes are included to illustrate the difference between using the WHO/ISCD guidelines and the statistics-based method on each respective plot.

Patients considered for osteoporotic therapy
Of the 572 patients with a WHO/ISCD diagnosis of osteopenia, 523 (91.4%) had 10-year FRAX risk scores. Of these 523 patients with osteopenia and FRAX risk scores, 57 had 10-year FRAX risk of major osteoporosis-related fracture of ≥20% or a 10-year FRAX risk of hip fracture ≥3%; of these 57 patients, 26 (45.6%) would have been classified as osteoporotic using the statistics-based method. Of the 523 patients with osteopenia, 133 (25.4%) would have been classified as osteoporotic using the statistics-based method. The patients who would have been classified as osteoporotic using the statisticsbased method had a significantly higher 10-year FRAX risk of major osteoporosis-related fracture (13.36%) than those who were not classified as osteoporotic using the statistics-based method (11.58%, p = 0.001). Similarly, the patients who would have been classified as osteoporotic using the statistics-based method had a significantly higher 10-year FRAX risk of hip fracture (1.96%) than those who were not classified as osteoporotic using the statisticsbased method (1.42%, p = 2.3 × 10 −6 ).

Discussion
DXA studies are routinely used for the measurement of BMD. BMD measurements are typically obtained at three sites: the lumbar spine (L1-L4 BMD), the femoral neck, and total hip. The current WHO/ISCD guidelines are based on the lowest BMD Tscore at any site, and the threshold BMD T-scores for the diagnosis of osteoporosis and osteopenia were determined based on the population prevalence of osteoporosis and osteopenia. Instead of using only the lowest BMD T-score for diagnosis, we considered using all three BMD T-scores in a joint BMD T-score threshold.
The T-score thresholds were constrained to −2.5 (for osteoporosis) and −1.0 (for osteopenia) when generating the confidence ellipsoids. Because of this, all patients diagnosed with osteoporosis using WHO/ISCD guidelines would be diagnosed with osteoporosis using the statistics-based guidelines. We found that the BMD measurements at different sites were highly correlated. The results of the simulated study suggest that 37 (370%) more patients would be diagnosed with osteoporosis, and 98 (40%) more patients would be diagnosed with osteopenia assuming the correlation between BMD measurements was 0.8, which is closest to what the correlation is likely to be in practice. Our findings suggest that there are individuals with osteoporosis from a statistical viewpoint, who are untreated because they do not satisfy current WHO/ISCD criteria for diagnosis. This potentially has significant clinical implications because the number of individuals eligible for clinical treatment could be much larger than currently thought. Further research is required to determine whether the patients, who would have been diagnosed as osteopenic or osteoporotic using the joint threshold BMD Tscore method but normal using the WHO/ISCD guidelines, would have had fewer fragility fractures if they were treated.
The T-scores at all three sites do not always give the same WHO/ISCD diagnosis, and because only the lowest T-score is used, there is information in the other two discarded T-scores that may be clinically relevant. In clinical practice, a patient with L1-L4, femoral neck, and total hip BMD T-scores of −1.5, 1.5, and 1.5, respectively, on their first study and on a subsequent study has BMD T-scores of −1.5, −1.4, −1.4, has the same diagnosis on both studies despite the changes in the femoral neck and total hip BMD T-scores because only the lowest T-score is considered. Therefore, there is merit in using all three BMD T-scores. The multivariate model can be extended to include the forearm BMD T-scores or any combination of one or more T-scores, including trabecular bone T-scores.
Although the FRAX scores are based on clinical data and only a single BMD measurement, we showed that there was a strong association between FRAX scores and the statistics-based method for the diagnosis of osteoporosis. This shows that all BMD measurements are more predictive of fracture risk rather than just the BMD at a single site: the femoral neck for patients with osteopenia using the WHO/ISCD guidelines. Further research is required to evaluate how the FRAX scores could be improved using all three BMD measurements.
There are a few limitations regarding this analysis. The true correlation structure between DXA measurements in young adults is unknown and was simulated in our simulated data. The correlation structure between BMD measurements in the real patient data may not generalize to all ages and sexes. Our method captures information that is likely clinically relevant but ignored in typical interpretation of DXA studies. In particular, the correlation structure between DXA measurements of BMD at different sites (L1-L4, femoral neck, and total hip) are likely clinically relevant. For example, it is known that corticosteroid use preferentially affects and decreases lumbar spine BMD relative to the femoral neck and total hip BMD. (19,20) Therefore, the correlation structure between BMD measurements at different sites for patients treated with corticosteroids is likely different from that in the young, untreated adult population. Other studies have shown that the DXA studies provide valuable data that can be used to identify osteoblastic metastases and spinal fractures through changes in this correlation structure along with other DXA measurements. (20)(21)(22) The mean and SD BMDs for the femoral neck and total hip were obtained from the NHANES III cohort, but each DXA manufacturer uses slightly different mean and SD L1-L4 BMDs because this measurement was not obtained in the NHANES III cohort. This may influence the correlation structure noted between the L1-L4 BMD and the femoral neck BMD, and total hip BMD measurements. A limitation of the real data analysis was that patients were evaluated using different DXA machines. Although the DXA machines were calibrated according to the International Society for Clinical Densitometry (ISCD) guidelines, there is the possibility that variability between machines may have affected the results. Finally, we acknowledge that the WHO/ISCD guidelines use the lowest BMD T-score, which has been validated in clinical practice; however, it remains unclear whether the site with the lowest BMD T-score is the site at highest risk for a fragility fracture and whether the lowest BMD measurement is more globally informative than all three BMD measurements.
Although we use the term "statistics-based model," it is important to note that the current WHO/ISCD guidelines are in fact statistics-based but univariate because they use only one BMD T-score, as noted by Elandt-Johnson and colleagues. (11) Our model is a more sophisticated multivariate statistical method used to diagnose osteoporosis and osteopenia.
In summary, the proportion of patients diagnosed with osteoporosis from DXA studies using one BMD T-score under current WHO/ISCD guidelines is less than the proportion of patients diagnosed with osteoporosis using a statistics-based multivariate method in which all three BMD T-scores are used. Further research needs to be performed to assess whether the multivariate method predicts fracture risk better than the univariate method.

Disclosures
The authors declare no conflicts of interest.

Data accessibility
Data are freely available upon request from the corresponding author. À Á = (ρ 12 , ρ 13 , ρ 23 ). The values of μ Y and Σ Y are unknown for young white women aged 20-to 29-years-old, and were estimated from the NHANES III sample of young white women 20-to 29-years-old (409 subjects) for the femoral neck and the total hip BMD. The NHANES III sample mean femoral neck BMD, Y Y2 = 1 N P N i = 1 Y Y2i was 1.052 g/cm 2 , and the sample mean total hip BMD is Y Y3 = 1 N P N i = 1 Y Y3i , (1.016 g/cm 2 ). The estimated mean L1-L4 BMD, Y Y1 from GE Lunar is 1.176 g/cm 2 .
Þ , and let W is a Wishart distribution, and S is independent of X O .
is the sum of the diagonal elements of the matrix; j(N − 1) Sj is the determinant of the square matrix, Consider a random variable X that has a gamma distribution, Gam (α, β), with shape parameter α, and scale parameter β, so that α > 0 and β > 0. A chi-squared distribution with ν degrees of freedom (ν > 0) can be written as χ 2 υ = Gam υ 2 , 1 2 À Á : Under the null hypothesis, Consider, The threshold statistic (d * threshold p ) that corresponds to a T-score of −2.5 is 6.25. The threshold statistic that corresponds to a T-score of −1 is 1. The axes of the confidence thresholds are the eigenvectors of the variance-covariance matrix. An online tool to evaluate the statistics-based diagnosis based on user-provided BMD T-scores is available at http://www.ronniesebro.com/dxa/.

A.2. Marginal distribution of the multivariate T distribution
Let T = (T 1 , T 2 , T 3 ), where T has a multivariate distribution with υ degrees of freedom. Therefore, the marginal distribution of a multivariate T distribution is a univariate T-distribution.

A.3. Conditional distribution of the multivariate T distribution
The variance-covariance matrix of the femoral neck BMD and total hip BMD in the NHANES III young adult female cohort and the L1-L4 BMD, Σ Y can be partitioned into a 2 × 2 matrix: where,