OUP user menu

The reproducibility and sensitivity of the 6-min walk test in elderly patients with chronic heart failure

Lee Ingle, Rhidian J. Shelton, Alan S. Rigby, Samantha Nabb, Andrew L. Clark, John G.F. Cleland
DOI: http://dx.doi.org/10.1093/eurheartj/ehi259 1742-1751 First published online: 14 April 2005


Aims The 6-min walk test (6-MWT) is used to estimate functional capacity. However, in elderly patients with chronic heart failure (CHF): (i) 1 year reproducibility of the 6-MWT; (ii) sensitivity of the 6-MWT to self-perceived changes in symptoms of heart failure; and (iii) implications for patient numbers required for studies using the 6-MWT as an endpoint have not been described.

Methods and results One thousand and seventy-seven patients with CHF, aged>60, with NYHA Class ≥II were recruited. Heart failure symptom assessment was determined using a questionnaire related to aspects of physical function, and patients performed a baseline 6-MWT, with follow-up 1 year later. Seventy-four patients with unchanged symptoms had an unchanged 6-MWT distance, with an overall intraclass correlation coefficient of 0.80 (95% CI=0.69–0.87). Four hundred and twenty-three patients reported an improvement in symptoms during follow-up. There was a negative correlation (r=−0.55; P=0.0001) between Δ symptoms and Δ 6-MWT (i.e. a reduced 6-MWT distance is associated with reduced symptom severity at follow-up). Five hundred and sixteen patients reported worsening symptoms of heart failure, a moderate inverse correlation (r=−0.53; P=0.0001) was displayed between Δ symptoms and Δ 6-MWT. For all patients, irrespective of symptom status, a high inverse correlation (r=−0.75; P=0.0001) was evident. On the basis of the data for patients with unchanged symptoms, it is calculated that to detect an increase in 6-MWT of 50 m, with 90% power, a study size of approximately 120 is required.

Conclusion In elderly patients with CHF, the 6-MWT shows satisfactory agreement when repeated 1 year later. Change in 6-MWT distance is sensitive to change in self-perceived symptoms of heart failure.

  • Symptoms of heart failure
  • Elderly patients
  • 1 year follow-up
  • Power curves


In patients with chronic heart failure (CHF), the 6-min walk test (6-MWT) is a simple, low-cost method for estimating exercise capacity; only a pre-measured level surface and a timing device are needed.14 The mode of exercise is familiar to patients, although it may represent a maximal test for some.5,6 The test appears useful for the assessment of some interventions such as cardiac resynchronization7,8 and has strong predictive power for both mortality and morbidity.4,6,7 Despite the routine inclusion of the 6-MWT in CHF studies,3,4,9,10 few have focused on test–re-test reproducibility. O'Keeffe et al.10 recruited 60 elderly patients (mean age 82) who completed the 6-MWT and were re-tested within 3–8 weeks. Intraclass correlation coefficients (ICCs) of 0.91 were reported for 24 patients with no overall change in cardiac status, indicating satisfactory agreement. In patients with CHF, despite the interest in using the 6-MWT as a tool to assess treatment and despite the fact that it is an important outcome measure for intervention studies,11,12 only one study has examined reproducibility with a test–re-test interval >3 months9 and none has reported data after 1 year.

A major aim of health care is to reduce symptom severity within the physical limits imposed by a disease.1316 It is not clear whether objective measures of functional capacity are sensitive to self-perceived changes in symptoms of heart failure. Therefore, the aim of the current study was to determine in an elderly representative population of patients with CHF, the following: (i) long-term (1 year) reproducibility of the 6-MWT; (ii) sensitivity of the 6-MWT to self-perceived changes in symptoms of heart failure; and (iii) implications for patient numbers required for studies using the 6-MWT as an endpoint.


The Hull and East Riding Ethics Committee approved the study, and all patients provided informed consent for participation. Patients were recruited from a local community heart failure clinic, inclusion criteria were as follows: age >60; evidence of left ventricular systolic dysfunction (LVSD); and symptoms of heart failure (NYHA Class ≥II). In total, 68% of the patients had heart failure of ischaemic aetiology and suffered from the condition for at least 6 months before the study. Co-morbidities including hypertension and diabetes mellitus of moderate or less severity were included according to the National Institute for Clinical Excellence Guidelines.16 Patients were excluded if they were unable to walk without assistance from another person (not including mobility aids) or if they were unable to exercise because of non-cardiac limitations including osteoarthritis and chronic obstructive pulmonary disease of at least moderate severity.17 A history of smoking was evident in 74.2% of patients, although current smoking levels were 11.8%.

Heart failure was defined in accordance with the National Institute for Clinical Excellence Guidelines16 and with the European Society of Cardiology.17 Left ventricular function was determined from 2D-echocardiography or magnetic resonance imaging. Echocardiography was carried out by one of three trained operators. Left ventricular function was assessed by estimation on a scale of normal, mild, moderate, and severe impairment and was assessed by a second operator blind to the assessment of the first; where there was disagreement on the severity of left ventricular dysfunction, the echocardiogram was reviewed jointly with the third operator and a consensus reached. Left ventricular ejection fraction (LVEF) was calculated using the Simpson's formula from measurements of end-diastolic and end-systolic volumes on apical 2D views, following the guidelines of Schiller et al.,18 and LVSD was diagnosed if LVEF was ≤40%. When the echocardiogram was of low quality, patients underwent a cardiac magnetic resonance scan to determine left ventricular volume and function.

Baseline visit

Patients were studied when they were clinically stable, without any changes in medication during the previous 3 weeks. They underwent clinical history and physical examination, together with ECG and echocardiogram. Symptoms of heart failure were determined by methodology used in the EuroHeart Failure Survey. Patients were asked a series of six questions graded from 1 to 6, where 1 was unimpaired and 6 was very much impaired. Thus, patients could score between 6 and 36 points. These questions related to perceived heart failure symptoms during physical function14 (see Appendix).

6-MWT protocol

The 6-MWT was conducted following a standardized protocol, between 10 a.m. and 4 p.m. after usual medication.3 A 15 m flat, obstacle-free corridor, with chairs placed at either end was used. Patients were instructed to walk as far as possible, turning 180° every 15 m in the allotted time of 6 min. Patients were able to rest, if needed, and time remaining was called every second minute.19 Patients walked unaccompanied so as not to influence walking speed. On completion of 6 min, patients were instructed to stop and total distance covered was calculated to the nearest metre. Standardized verbal encouragement was given to patients after 2 and 4 min, respectively.

Patients returned for follow-up at 1 year and the evaluation of symptom severity and 6-MWT performance were repeated. Patients were divided into three prospectively defined groups based on changes in symptoms of heart failure between baseline and 1 year. In group 1, patients reported unchanged symptoms, defined as baseline score ±3 points. These results were used for the reproducibility analysis as no perceived changes in heart failure symptoms were reported.20 In group 2, worsening symptoms were reported, defined by a rise ≥4 points; and in group 3, improved symptoms were reported, defined by a fall of ≤4 points.

Statistical analysis

Data were analysed using SPSS statistical software for Windows version 11.5 (SPSS Inc., Chicago, IL, USA). To assess reproducibility, ICC with 95% CIs were calculated. Several investigators have suggested that an ICC of ≥0.75 is satisfactory when studying groups of patients, so this threshold was defined as acceptable for the current study.10,21 Bland–Altman plots with 95% limits of agreement were also derived.22 For heart failure symptom assessment, medians and inter-quartile ranges (IQRs) were presented. A χ2 test was used to determine the differences in heart failure symptoms between baseline and 1 year. Spearman correlation coefficients were used to determine the relation between changes in 6-MWT performance and changes in symptoms of heart failure. Group differences at baseline were determined by the analysis of variance (ANOVA). In order to account for the inflation of the experiment-wise type I error owing to multiple testing, we have followed previous recommendations of reporting unadjusted P-values.23 Indeed, as Perneger24 concluded ‘simply describing what tests of significance have been performed, and why, is generally the best way of dealing with multiple comparisons’. We have also performed a subgroup analysis that provides information on the consistency (or lack of) of findings. In patients with more severe symptoms, 6-MWT may be limited predominantly by cardiorespiratory disease, whereas in patients with milder disease other factors may be important. Data are presented as mean±SD; all tests were two-sided, and P<0.05 was taken as being statistically significant.

We used the standard deviation (SD) of the 6-MWT at baseline to construct power curves for a proposed intervention study. Power was defined as the probability of showing a difference between two (or more groups), if a difference actually exists between them.25 The curves were designed to show the sample size required per group (equal allocation) in step sizes of 10 m. For every 10 m gained, the sample size is reduced. Note that when planning an intervention study, an estimate would be required for the potential loss-to-follow-up. Nomograms and power curves have been produced both for general medical use26,27 and for more specialist problems such as those posed by reliability studies.28,29


Of an initial population of 1077 patients, 64 died (46 males, 77.7±7.2 years, and body mass 74.6±17.9 kg) (Figure 1). At baseline, 6-MWT distance was significantly lower for patients who subsequently died than for groups 1, 2, and 3 (P=0.002), although symptom severity was not different.

Figure 1 Flow chart showing number of patients in each group.

Data from the remaining 1013 patients were analysed. Seventy-four patients (52 males) showed no change in symptoms over 1 year. Baseline clinical characteristics are shown in Table 1. There was no difference in 6-MWT distance between patients with unchanged symptoms and those with worsening symptoms (P=0.086), but a difference between patients with improved symptoms of heart failure and the other groups was seen at baseline (P=0.032).

View this table:
Table 1

Clinical characteristics

Clinical characteristics (mean±SD)Classification according to change in symptom score over 1 yearDeadANOVA (P-value)
No changeWorseUnable to repeat 6-MWTBetter
n  74516132423 64
Male/female 52/22388/128 62/70301/122 48/16
Stature (m) 1.71±0.14  1.66±0.10  1.65±0.09  1.72±0.13  1.70±0.10.39
Body mass (kg) 75.3±16.8 79.4±15.1 82.3±14.0 78.4±15.2 74.6±17.90.01
Age (years) 72.4±6.7 70.1±9.1 72.4±6.4 75.8±8.1 77.7±7.20.01
LVEF (%)a 33.4±7.6 29.9±5.3 28.3±4.8 34.6±7.8 32.8±8.30.12
Hypertension (%) 39.2 38.4 42.3 40.5 41.80.21
Diabetes (%) 24.6 25.3 23.8 19.8 19.70.08
Treatment at baseline
Warfarin (%) 29.7 27.5 29.2 33.6 32.40.36
Loop diuretic (%) 75.7 72.5 70.3 74.4 74.60.21
Beta-blockers (%) 42.7 45.5 43.0 48.4 41.40.34
Digoxin (%) 25.7 28.9 27.1 27.7 26.60.45
ACE-I (%) 52.7 58.6 55.8 56.6 55.80.38
Statin (%) 32.4 38.4 40.2 36.5 40.20.34
Baseline 6-MWT (m)285±122279±127263±95342±117208±103b0.032
Follow-up 6-MWT (m)276±118195±130396±1260.0001
Δ Mean 6-MWT  9±77−84±63 54±460.0001
Baseline symptom score (median±IQR) 15±4 15±7 15±7 16±5 16±80.23
Follow-up symptom score (median±IQR) 15±4 20±6 26±6 10±40.0001
Δ Mean symptom scorec  0  5 11 −60.0001

aQuantitative assessment of LVEF obtained>80% of patients.

bSignificant difference between deceased patients and other groups (P=0.002).

cPositive values indicate deterioration and negative values indicate improvement.

Long-term (1 year) reproducibility of the 6-MWT

In patients with unchanged symptoms after 1 year, baseline 6-MWT distance was 285±122 m, and fell slightly, but not significantly (276±118 m, P=0.07). The ICC for 6-MWT for all 74 patients was 0.80 (95% CI=0.69–0.87) showing a high level of agreement by our criteria.10,19 After stratifying by beta-blocker usage between baseline (r=0.80; 95% CI=0.69–0.87) and 1 year (r=0.81; 95% CI=0.69–0.89), reproducibility remained unchanged. We then divided patients by sex. Males walked 301±115 m at baseline compared with 246±133 m (P=0.07) for females, although females (mean age 73.4±5.4) were older than males (mean age 71.6±7.4), albeit not significantly (P=0.086). After 1 year, the difference in distance walked between males (307±107 m) and females (205±112 m, P=0.0015) was significant. Reproducibility was higher in males (ICC=0.85; 95% CI=0.75–0.91) than in females (ICC=0.65; 95% CI=0.33–0.84). The Bland–Altman plot for the 6-MWT is shown in Figure 2. There was no relation between the differences in values (calculated as 1 year−baseline) and the mean values (average of 1 year and baseline). The mean difference was −8.6 m with 95% limits of agreement of −162.1–144.8 m. We have also reported that NYHA Class II patients show moderate 1 year reproducibility for the 6-MWT (ICC=0.52; 95% CI=−0.09–0.85).

Figure 2 Bland–Altman plot for 6-MWT in 74 patients with no change in symptoms.

Sensitivity to change of the 6-MWT based on changes in symptoms

Figure 3 shows no relation between baseline symptoms and baseline 6-MWT in groups 1, 2, and 3 (r=0.00, P=0.74), whereas Figure 4 shows a strong association between Δ symptom severity and Δ 6-MWT (r=−0.75; P=0.00001) in all patients. In 516 patients (327 males; 63%) with worsening symptoms of heart failure, mean 6-MWT distance fell from 279±127 to 192±165 m. There was an inverse correlation (r=−0.53; P=0.0001) between Δ symptoms and Δ 6-MWT. However, 132 patients (62 males) declined to participate in the 6-MWT. These patients had a greater decline in symptoms than the other groups. Details of these patients are presented in Table 1. In patients with improved symptoms, there was an inverse correlation (r=−0.55; P=0.0001) between Δ symptoms and Δ 6-MWT. There was no overall association (Figure 5) between baseline symptom severity and Δ symptoms after 1 year in all groups (r=0.01; P=0.109). However, in patients with worsening symptoms, a strong inverse correlation was evident (r=−0.67; P=0.00045). No relation existed between baseline 6-MWT and Δ 6-MWT at 1 year (r=0.2; P=0.125) (Figure 6).

Figure 3 Baseline symptom severity vs. baseline 6-MWT.

Figure 4 Changes in symptoms vs. change in 6-MWT after 1 year.

Figure 5 Baseline symptom severity vs. changes in symptoms after 1 year.

Figure 6  Baseline 6-MWT vs. change in 6-MWT after 1 year.

Implications for study size using the 6-MWT as an endpoint

We constructed power curves to estimate the sample size required for an intervention study based on the 6-MWT (Figure 7). We calculated power curves in 10% intervals from 50 to 90%. There is no minimum acceptable power but the higher the better, though high power does not come without cost. For example, the higher the power, the larger the sample size. To construct the power curves, we required information on the type I error (also known as the P-value), where P is the probability of a false positive. Typically, 5% is chosen though this is an arbitrary threshold for statistical significance, and we assumed a two-tailed test. We also required information on the SD of the outcome measure. The SD of the 6-MWT at baseline was 120 m. We also required distance walked data. Power curves were then constructed over a whole range of distances ranging from 30 to 100 m (Figure 7) using formulae outlined by Altman.25 The interested reader should then read off the required sample size for a given power (5% significance, two-tailed) for our assumed SD. For a gain of 10 m, we would require over 3000 patients per group (90% power, 5% significance). Conversely, for a 100 m gain, we would require just over 30 patients per group (90% power, 5% significance). It is noteworthy that SD will vary depending on the heterogeneity of the population studied (Table 3).

Figure 7 Power curves showing the sample size required for an intervention study based on gained 6-MWT distance (based on SD=120 m; two-tailed test; P<0.05).


At baseline, 6-MWT performance (mean: 208±103 m) was significantly lower in elderly patients with LVSD who died prior to follow-up, although symptom severity was not different. Previous studies have shown that a 6-MWT of <300 m significantly increases mortality risk.7,30 Current data support this finding. It is also possible that the baseline 6-MWT performance is prognostically more sensitive than the assessment of baseline symptoms. We found no relationship between baseline symptom severity and 6-MWT performance; however, one of the novel aspects of our study is the sensitivity between changes in symptoms and changes in 6-MWT performance between baseline and 1 year. Obviously, we could not follow-up deceased patients, and further prognostic information regarding symptom severity is yet to be determined.

Long-term reproducibility of the 6-MWT

The current study shows that after 1 year, the 6-MWT displays acceptable reproducibility (ICC=0.80; 95% CI=0.69–0.87) in a population of elderly patients with CHF due to systolic dysfunction and associated co-morbidities of hypertension and diabetes mellitus. However, it is difficult to generalize these findings to patients with several co-morbidities including osteoarthritis and chronic obstructive pulmonary disease. In order to compensate for changes in patients' clinical conditions, we assessed only those patients whose symptom severity remained unchanged (7.3% of total). Although sample size was limited (n=74), it was larger than in previous studies (n≤26) of patients with lung disease,4,31 fibromyalgia,18 brain injury,32 and CHF.10 Furthermore, patients in these studies were followed up after <8 weeks. The study by Demers et al.9 assessed 768 patients at baseline, 18 and 43 weeks. The trial aimed to examine the effects of candesartan, enalapril, and metropolol on LVEF (RESOLVD study). The authors reported high reproducibility after 43 weeks (ICC=0.91; CI not reported); however, the study did not compensate for changes in patients' clinical conditions. To our knowledge, the current study is the first to assess reproducibility of the 6-MWT after 12 months in which clinical condition was controlled.

A habituation period, where the test manoeuvre is repeatedly practised, reduces variability in 6-MWT performance.33 A learning effect of 6% was reported in a cardiac rehabilitation population completing the 6-MWT on non-consecutive days,34 and the effect was maintained for up to 2 months in healthy subjects.35 Some have argued that studies should include a minimum of one or even two practice sessions. However, tests would need to be administered on separate days, which would be cumbersome to implement in clinical trials. On the basis of the results of the current study, satisfactory reproducibility can be achieved without repeating the 6-MWT. We stratified by beta-blockade and found no changes in reproducibility, indicating that beta-blockers do not dissociate 6-MWT performance and symptom severity in this cohort of patients. Therefore, we have added confidence that our data demonstrate a true reflection of the association between 6-MWT performance and symptom severity. We also found a clear difference between males (ICC=0.85; 95% CI=0.75–0.91) and females (ICC=0.65; 95% CI=0.33–0.84). Although females may often walk shorter distances,2 there is little evidence that they provide less stable data during repeated measures. It has been reported that differences in symptom severity between males and females36,37 may be responsible, however, this was not a finding in the current study. It is noteworthy that females (mean age 73.4±5.4) were older than males (mean age 71.6±7.4), albeit not significantly (P=0.086); therefore, it is conceivable that age differences may be in some way responsible.

We used a 15 m long corridor for the patients to perform the 6-MWT, whereas others have used corridors of different length including 20 m38 and >30 m.3 Although never formally tested, shorter corridor lengths may have an impact on 6-MWT performance due to the increased impact of turning. The current study indicates that stringent standardization of test procedures does not guarantee low SD in a heterogenous heart failure population, in accordance with other studies including RESOLVD.9

Our data show relatively low reproducibility (ICC=0.52; 95% CI=−0.09–0.84) after 1 year in patients with NYHA Class II symptoms (Table 2). It is possible that in patients with more severe symptoms of heart failure, that is, Class III/IV, the 6-MWT will better reflect cardiorespiratory function (ICC=0.74; 95% CI=0.50–0.86), whereas in patients with milder symptoms, other, yet to be identified factors may be important (possibly mood).

View this table:
Table 2

Reproducibility data for patients with unchanged symptom status

VariableNumber of patientsICC (95% CI)
Overall740.80 (0.69–0.87)
Male520.85 (0.75–0.91)
Female220.65 (0.33–0.84)
Age at baseline (years)
 60–64 80.78 (0.68–0.96)
 65–69130.78 (0.43–0.92)
 70–74160.91 (0.79–0.97)
 75–79180.72 (0.40–0.88)
 80+190.57 (0.46–0.85)
NYHA classification
 II620.52 (−0.09–0.85)
 III/IV120.74 (0.50–0.86)
Stratified by beta-blockade
 Baseline600.80 (0.69–0.87)
 1 year730.81 (0.69–0.89)

Sensitivity to changes in the 6-MWT based on changes in symptoms

We have found that change in 6-MWT distance is sensitive to changes in symptoms of heart failure in a representative sample of patients with CHF. To our knowledge, the current study is the first to focus on self-perceived symptoms of heart failure. Other studies have found no association between generic quality of life (QoL) instruments and 6-MWT.19,28,39 Steptoe et al.11 assessed health-related QoL and psychological well-being in 99 patients with dilated cardiomyopathy. They reported no association between functional capacity and QoL in patients with NYHA Class I and II symptoms. The current study shows similar findings for symptom severity at baseline (Figure 2).

A comparative investigation13 of 205 patients with heart failure reported similar findings to the current study. Our data suggest that for patients with a range of heart failure symptoms (NYHA II–IV), 6-MWT is sensitive to changes in symptoms of heart failure. The sensitivity of a test to changes in symptoms is an important but often neglected clinical measure.40 Many factors contribute to these changes including pathophysiological and psychological alterations.11 Patients with CHF are prone to episodes of depression with a resulting deterioration in symptoms.41 The extent to which objective measures of functional capacity predict self-reported mental health status have yet to be determined. We did not measure depression/depressive symptoms; therefore, it is not possible to say whether symptom severity or indeed reproducibility of the 6-MWT was affected by this variable at follow-up. Future studies should focus on how changes in 6-MWT and symptom severity influence prognosis in patients with CHF. To provide adequate statistical power, careful consideration should be given to sample size and study design.

We found that only 7% of patients had symptoms that remained unchanged over 1 year. Few studies have focused on mid- to long-term changes in symptoms and QoL without intervention. The study by O'Keeffe et al.10 reported that in 45 elderly patients with heart failure followed up after 3–8 weeks, 53% had no changes in QoL, which is much higher than our findings. However, O'Keeffe's study10 employed a smaller sample size, included a short follow-up period, and used a different QoL inventory and method of analysis, and did not focus specifically on changes in symptoms. Therefore, it is very difficult to compare these findings. To our knowledge, our study is the first to report changes in symptoms over a 12 month period in a large cohort of patients with LVSD. Future studies are required to corroborate or refute these findings.

Incomplete data sets due to attrition or non-compliance represent a major challenge for researchers.4249 In particular, it is important to recognize the pattern of missingness because this can determine the statistical analysis. According to Little and Rubin45 the missing-data mechanism is called ‘missing completely at random’ (MCAR), where the missingness is independent of response (e.g. a patient misses an appointment because of bad weather). The missing-data mechanism is called missing at random, where the missingness depends on the observed response only (i.e. a patient stays in hospital for a few weeks but then skips an appointment). Otherwise, the missing-data mechanism is known as non-ignorable missing. We tested the assumption of MCAR on the 6-MWT data at follow-up by applying Little's test.43 The test statistic is based on the pattern specific mean values and the pooled estimates of the population mean and covariance. The missing-data mechanism was not MCAR (χ2=782, P<0.0001). Recognition of the missing-data mechanism is important in selecting an appropriate method of analysis because methods that disregard the missing-data process may lead to biased estimates of effect size and unrealistic estimates of power,42,47,49 though the latter may be overcome to some extent by including more patients.49

There are a wide variety of statistical methods available for handling missing data, the interested reader is directed to Engels and Diehr.46 Some methods use information pertaining to the patient whose data is missing, others use the values of other patients. A commonly applied method (that is easy to apply in practice) is carrying the last observation forwards. This technique will lead to a more conservative treatment effect but at the same time it has a smaller SD (thus, 95% CIs may be unrealistically low). However, there is consensus that no one single method that is appropriate for all situations.46 More generally, it is recommended that ‘…in longitudinal studies, where the overall trend is for worse health over time and where missing data can be assumed to be primarily related to worse health, missing data should be imputed from the available longitudinal data for that person’.46 In the context of a randomized controlled trial, we recommend that researchers follow the advice of Houck et al.48 who contended ‘…attention to the missing-data mechanism should be an integral part of clinical trial data’.

Implications for study size using the 6-MWT as an endpoint

Guyatt et al.3 suggested that the minimum clinically significant distance for the 6-MWT is 30 m. On the basis of our calculations (Figure 4), a gain of 30 m would require 250 patients per group with 80% power at 5% significance or 340 patients per group with 90% power at 5% significance. The study by O'Keeffe et al.10 reported baseline 6-MWT distance of 239±52 and 275±103 m 3–8 weeks later in patients with ‘much better’ symptoms. Therefore, for a gain of 47 m, our power analysis indicates that a sample size of 150 patients with 90% power, or 120 with 80% power at 5% significance is required. The study10 recruited 60 patients, and based on our findings was therefore underpowered. Our curves can be used to assist in planning group sizes for intervention studies where the 6-MWT is an outcome measure. Note that when designing studies, an estimate is required of the potential loss-to-follow-up, which should be factored into the planning process.

Table 3 identifies a selection of non-interventional CHF trials in which the 6-MWT was used as an endpoint. These data can be used to determine whether the SD of 120 m identified in the current study could be applied to other heart failure populations. Table 3 indicates a mean SD of 110 m, which is similar to the current study; however, mean age is much lower (59 years) than our data (>70 years). It is possible that walking performance is more variable in older patients, which may explain the SD differences. Care should be taken when applying these power curves because of the heterogeneity of different subgroups.

View this table:
Table 3

Non-interventional CHF trials employing the 6-MWT

AuthorsMean age (years) (range or SD)Sex (male/female)nNYHA classDistance walked (m)SD
Rostagno et al.757 (29–70)119/95214III–IV229112
O'Keeffe et al.1081 (74–92) 38/22 60I–IV239 52
Roul et al.3159 (11)121II–III448 92
Zugck et al.5354 (12) 90/23113I–III423104
485 91
Morales et al.5453 (11) 37/9 46II–IV408 91
Opasich et al.3953 (9)274/41315II–III390 88
Hulsmann et al.5557 (8) 79/17 96I–IV221145
Martensson et al.5661 (9) 48/0 48I–IV242100
Hauptman et al.5761 (13)363/121484315112
Cahalin et al.3849 (8) 45II–IV310100

A limitation of this study was that 132 patients with worsening symptoms declined to participate in the 6-MWT. This loss was very similar to the smaller study by O'Keeffe et al.10 Although patient medication may have been optimized, it is possible that ACE inhibitors,15 and beta-blockers50,51 do not lead to positive changes in symptom severity despite the well-known benefits to mortality risk and improvement in LVEF. Further, subgroup numbers (Table 2) are small as the reproducibility analysis is based on only 74 patients. The sensitivity of the 6-MWT to perceived changes in symptom severity was determined without measuring perceived changes in anxiety and depression from validated inventories. We acknowledge that these factors may play a role in changes in functional capacity over time, and should be included in future studies. An unexpected observation was that for patients whose symptoms improved; they walked further and were older than patients whose symptoms worsened or did not change at follow-up (Table 1). It is difficult to provide an explanation for these findings, future studies may wish to address this issue.

We did not carry out an a priori power calculation, and to our knowledge, there is little written about power for reproducibility studies, with perhaps the work of Donner28,29 the most well known. Lack of power becomes a (possible) problem if no significant differences are found. With the exception of NYHA Class II (ICC=0.52), all the ICCs were significant at the 5% level (Table 2). Using the criteria of Landis and Koch,52 an ICC of 0.52 would be classified as having only a ‘moderate’ level of reliability. Using the power curves of Donner28 to show ‘moderate’ reliability would require about 50 patients (assuming two measurements on each), if the actual reliability was 0.8 (80% power, 5% significance, two-tailed). If the actual reliability was <0.8, then we would require more patients or more than two measurements per patient.28


We have shown a satisfactory long-term (1 year) reproducibility for the 6-MWT in elderly patients with heart failure due to systolic dysfunction. These data suggest that the 6-MWT may be an appropriate test of functional capacity in these patients. Males demonstrated lower variability than females. On the basis of these findings, we conclude that 6-MWT distance is sensitive to self-perceived changes in symptoms of heart failure. When the 6-MWT is an endpoint in a clinical trial, a minimum of 500 patients is needed to detect a change of 30 m in an intervention. However, SDs will vary depending on the heterogeneity of the population studied. Researchers may expect a degree of missing data especially in longitudinal studies; attention to the missing-data problem should become an integral part of the clinical trial protocol.


We wish to thank the referees for their constructive comments.


For each question, patients responded by providing one of six responses based on the options follow: (1) no; (2) very little; (3) a little; (4) some; (5) a lot; (6) very much.

The six questions relating to symptoms are listed below.

In the last month, how much did the following affect you?

  1. breathlessness limiting daily activities;

  2. fatigue limiting daily activities;

  3. inability to do normal daily activities due to health;

  4. inability to do hobbies/sports due to health;

  5. inability to work due to health;

  6. chest pain during normal activity.


View Abstract