Mission Statement:

To improve the care and outcomes of kidney disease
patients worldwide through promoting coordination,
collaboration and integration of initiatives to develop
and implement clinical practice guidelines.

 

KDIGO Clinical Practice Guideline for the Diagnosis, Evaluation, Prevention, and Treatment of Chronic Kidney Disease-Mineral and Bone Disorder (CKD-MBD)

Guideline 2: Methodological approach

Kidney International (2009) 76, S9-S21; doi:10.1038/ki.2009.190

This clinical practice guideline contains a set of recommendations for the diagnosis, evaluation, prevention, and treatment of chronic kidney disease-mineral and bone disorder (CKD-MBD). The aim of this chapter is to describe the process and methods by which the evidence review was conducted and the recommendations and statements were developed.

The members of the Work Group and of the Evidence Review Team (ERT) collaborated closely in an iterative process of question development, evidence review, and evaluation, culminating in the development of recommendations that have been graded according to an approach developed by the GRADE (Grading of Recommendations Assessment, Development and Evaluation) Working Group (Table 2).14 This grading scheme with two levels for the strength of a recommendation was adopted by the KDIGO (Kidney Disease: Improving Global Outcomes) Board in December 2008. The Board also approved the option of an ungraded statement instead of a graded recommendation. This alternative allows a Work Group to issue general advice on the basis of what it considers a reasonable approach for clinical practice. We ask the users of this guideline to include the grades with each recommendation and consider the implications of the respective grade (see detailed description below). The importance of the explicit details provided in this chapter lies in the transparency required of this process, and strives to instill confidence in the reader about the methodological rigor of the approach.

OVERVIEW OF THE PROCESS

The development of the guideline included concurrent steps to:

  • appoint the Work Group and ERT, which were responsible for different aspects of the process;
  • confer to discuss process, methods, and results;
  • develop and refine topics;
  • define specific populations, interventions or predictors, and outcomes of interest;
  • create and standardize quality assessment methods;
  • create data extraction forms;
  • develop literature search strategies and run searches;
  • screen abstracts and retrieve full articles on the basis of predetermined eligibility criteria;
  • extract data and perform a critical appraisal of the literature;
  • grade the quality of the outcomes of each study;
  • tabulate data from articles into summary tables;
  • grade the quality of evidence for each outcome and assess the overall quality of bodies of evidence with the aid of evidence profiles;
  • write recommendations and supporting rationale;
  • grade the strength of the recommendations on the basis of the quality of evidence and other considerations;
  • write the narrative; and
  • respond to peer review by the KDIGO Board of Directors in December 2007 and again in early 2009, and public review in 2008 before publication.

The KDIGO Co-Chairs appointed the Co-Chairs of the Work Group, who then assembled the Work Group to be responsible for the development of the guideline. The Work Group consisted of domain experts, including individuals with expertise in adult and pediatric nephrology, bone disease, cardiology, and nutrition. The Tufts Center for Kidney Disease Guideline Development and Implementation at Tufts Medical Center in Boston, MA, USA was contracted to provide expertise in guideline development methodology and systematic evidence review. One Work Group member (Alison MacLeod) also served as an international methodology expert. KDIGO support staff provided administrative assistance and facilitated communication.

The ERT consisted of physicians/methodologists with expertise in nephrology and internal medicine, and research associates and assistants. The ERT instructed and advised Work Group members in all steps of literature review, in critical literature appraisal, and in guideline development. The Work Group and the ERT collaborated closely throughout the project. The Work Group, KDIGO Co-Chairs, ERT, liaisons, and KDIGO support staff met five times for 2-day meetings in Europe and in North America. The meetings included a formal instruction in the state of the art and science of guideline development, and training in the necessary process steps, including the grading of evidence and the strength of recommendations, as well as in the formulation of recommendations. Meetings also provided a forum for general topic discussion and consensus development with regard to both evidence appraisal and specific wording to be used in the recommendations.

The first task was to define the overall topics and goals for the guideline. The Work Group Chairs drafted a preliminary list of topics. The Work Group then identified key clinical questions. The Work Group and ERT further developed and refined each topic specified for a systematic review of treatment questions, and summarized the literature for nontreatment topics.

The ERT performed literature searches, and abstract and article screening. The ERT also coordinated the methodological and analytical process of the report. It defined and standardized the method for performing literature searches and data extraction, and for summarizing evidence. Throughout the project, ERT offered suggestions for guideline development, and led discussions on systematic review, literature searches, data extraction, assessment of quality and applicability of articles, evidence synthesis, and grading of evidence.

The ERT provided suggestions and edits on the wording of recommendations, and on the use of specific grades for the strength of the recommendations and the quality of evidence.

The Work Group took on the primary role of writing the recommendations and rationale, and retained final responsibility for the content of the recommendations and for the accompanying narrative.

 

Table 2

 

Figure 2

Figure 2 | Evidence model. Arrows represent relationships and correspond to a question or questions of interest. Solid arrows represent well-established associations. Dashed arrows represent associations that need to be established with greater certainty. The relationships between laboratory abnormalities and organ diseases other than bone and cardiovascular diseases are not depicted here. In addition to the laboratory abnormalities shown, there are other factors that are determinants of bone and cardiovascular health, which are not depicted. CKD, chronic kidney disease; CVD, cardiovascular disease; DXA, dual-energy X-ray absorptiometry; EBCT, electron beam computed tomography; IMT, intimal-medial thickness; MSCT, multislice computed tomography; PTH, parathyroid hormone; (q)CT, (quantitative) computed tomography; (q)US, (quantitative) ultrasound; QOL, quality of life.

DEVELOPMENT OF AN EVIDENCE MODEL

With the initiation of the evidence review process of the KDIGO CKD-MBD guideline, the ERT developed an evidence model and refined it with the Work Group (Figure 2). This was carried out to conceptualize what is known about epidemiological associations, hypothesized causal relationships, and the clinical importance of different outcomes. Ultimately, this model served to clarify the questions for evidence review and to weigh the evidence for different outcomes. The model depicts laboratory abnormalities as a direct consequence of CKD and bone disease, and cardiovascular disease (CVD) as a consequence of laboratory abnormalities as well as due to direct consequences of CKD. Bone disease and CVD are defined as abnormalities in structure and function, which can be seen on imaging tests or tissue examination. Bone disease and CVD are then shown as factors that--together with other direct consequences of CKD--lead to clinical outcomes, such as fractures, pain, and disability on the one hand, and clinical CVD events on the other. All of these contribute to morbidity and mortality. The arrows represent relationships and correspond to a question or questions of interest. Solid arrows represent well-established associations. Dashed arrows represent associations that need to be established with greater certainty. The model suggests a hierarchy with the clinical importance of each condition increasing from top to bottom. The model is incomplete in that it does not show other factors or disease processes that may contribute to, or directly result in, abnormalities at every level. For example, bone abnormalities in a patient with CKD may also be the result of aging and osteoporosis, and abnormalities of CVD will be a result of other traditional and nontraditional CVD risk factors. Thus, the model does not reflect the complexity of the multifactorial processes that result in clinical disease, nor the uncertainty with regard to the relative and absolute risk attributable to each risk factor. However, it does highlight the complexity of the issues facing the Work Group, which evaluated the evidence to make recommendations for the care of patients, but found that the majority of outcomes from clinical trials in this field studied laboratory outcomes.

REFINEMENT OF TOPICS, QUESTIONS, AND DEVELOPMENT OF MATERIALS

The Work Group Co-Chairs prepared the first draft of the scope-of-work document as a series of open-ended questions to be considered by Work Group members. At their first 2-day meeting, members added further questions until the initial working document included all topics of interest to the Work Group. The inclusive, combined set of questions formed the basis for the deliberation and discussion that followed. The Work Group strove to ensure that all topics deemed clinically relevant and worthy of review were identified and addressed.

For questions of treatments, systematic reviews of the literature, which met prespecified criteria, were undertaken (Table 3). For these topics, the ERT created forms to extract relevant data from articles, and extracted information for baseline data on populations, interventions, and study design. Work Group experts extracted the results of included articles and provided an assessment of the quality of evidence. The ERT reviewed and revised data extraction for results and quality grades performed by Work Group members. In addition, the ERT tabulated studies in summary tables, and assigned grades for the quality of evidence in consultation with the Work Group.

For nontreatment questions, that is, questions related to prevalence, evaluation, natural history, and risk relationships, the ERT conducted systematic searches, screened the yield for relevance, and provided lists of citations to the Work Group (Table 4). The Work Group took primary responsibility for reviewing and summarizing this literature in a narrative format.

On the basis of the list of topics, the Work Group and ERT developed a list of specific research questions for which systematic review would be performed. For each systematic review topic, the Work Group Co-Chairs and the ERT formulated well-defined systematic review research questions using a well-established system.4 For each question, clear and explicit criteria were agreed upon for the population, intervention or predictor, comparator, and outcomes of interest (Table 3). Each criterion was defined as comprehensively as possible. A list of outcomes of interest was generated and the Work Group was advised to rank patient-centered clinical outcomes (such as death or cardiovascular events) as being more important than intermediate outcomes (such as bone mineral density) or laboratory outcomes (such as phosphorus level), and not to include experimental biomarkers. In addition, study eligibility criteria were decided on the basis of study design, minimal sample size, minimal follow-up duration, and year of publication, as indicated (Table 3). The specific criteria used for each topic are explained below in the description of review topics. In general, eligibility criteria were determined on the basis of clinical value, relevance to the guideline and clinical practice, a determination on whether a set of studies would affect recommendations or the quality of evidence, and practical issues such as available time and resources.

LITERATURE SEARCH

A MEDLINE search was carried out to capture all abstracts and articles relevant to the topic of CKD and mineral metabolism, bone disorders, and vascular/valvular calcification. This search encompassed original articles, systematic reviews, and meta-analyses. The entire search was updated through 17 December 2007; the search for randomized controlled trials (RCTs) was updated through November 2008, and articles (including RCTs in press) identified by Work Group members were included through December 2008. The starting point of the literature search was the reference lists from the KDOQI (the Kidney Disease Outcomes Quality Initiative) Bone Guidelines for Adults and Children,5, 6 which were based on a systematic search of MEDLINE (1966-31 December 2000). This was supplemented by a MEDLINE search for relevant terms, including kidney, kidney disease, renal replacement therapy, bone, calcification, and specific treatments. The search was limited to English language publications since 1 January 2001 (Supplementary Table 1). Additional pertinent articles were added from the reference lists of relevant meta-analyses and systematic reviews.7, 8, 9, 10 and 11

During citation screening, journal articles reporting original data were used. Editorials, letters, abstracts, unpublished reports, and articles published in non-peer-reviewed journals were not included. The Work Group also decided to exclude publications from journal supplements because of potential differences in the process of how they get solicited, selected, reviewed, and edited compared with peer-reviewed publications in main journals. However, one article published in a supplement12 was used for the clarification of adverse events (AEs) related to a study for which primary results were reported elsewhere.13 Selected review articles and key meta-analyses were retained from the searches for background material. An attempt was made to build on or use existing Cochrane or other systematic reviews on relevant topics (Supplementary Table 2).

EXCLUSION/INCLUSION CRITERIA FOR ARTICLE SELECTION FOR TREATMENT QUESTIONS

Search results were screened by members of the ERT for relevance, using predefined eligibility criteria in the following paragraphs. For questions related to treatment, the systematic search aimed at identifying RCTs with sample sizes and follow-up periods as described in (Table 3).

Restrictions by sample size and duration of follow-up were based on methodological and clinical considerations. Generally, trials with fewer than 25 people per arm would be unlikely to have sufficient power to find significant differences in patient-centered outcomes in individuals with CKD. This is especially true for dichotomous outcomes, such as deaths, cardiovascular clinical events, or fractures. However, for specific topics in which little data were available, lower sample-size thresholds were used to provide some information for descriptive purposes.

The minimum mean duration of follow-up of 6 months was chosen on the basis of clinical reasoning, accounting for the hypothetical mechanisms of action. For treatments of interest, the proposed effects on patient-centered outcomes require long-term exposure and typically would not be evident before several months of follow-up.

Any study not meeting the inclusion criteria for a detailed review could nevertheless be cited in the narrative.

Interventions of interest are listed in (Table 3). For dietary phosphate restriction, the literature search identified no RCTs comparing assignment to different levels of dietary phosphate intake and outcomes of CKD-MBD. There were studies that compared assignment to different levels of protein restriction, and some of them quantified phosphate intake as a result of the dietary protein intervention. The question of dietary protein restriction, however, has been systematically reviewed previously.5 Thus, the Work Group chose a narrative format to review this topic. For the question of how alternative dialysis schedules affect serum calcium and phosphorus and parathyroid hormone, the Work Group chose to restrict itself to describing only the effects of RCTs, comparing different dialysis schedules on these laboratory outcomes. A complete review of all outcomes from these studies was deemed to be beyond the scope of this guideline.

Interventions of interest for children included all interventions reviewed in the adult population as well as growth hormone.

The use of observational studies for questions on the efficacy of interventions is a topic of ongoing methodological debate, given the many potential biases in the observational studies of treatment effects. The decision on how to incorporate this type of evidence in the development of this guideline was guided by concepts outlined in the GRADE approach.14 Observational studies of treatment effects start off as 'low quality'. Their quality, however, can be upgraded if they show a consistent and independent, strong association. For the strength of the association, GRADE defines two arbitrary thresholds: one for a relative risk of >2 or <0.5 to upgrade the quality of evidence by one level, and the second for a relative risk of >5 and <0.2 to upgrade by two levels.14 As the quality of observational studies can be downgraded for methodological limitations or indirectness, they can yield high- or moderate-quality evidence only if they have no serious methodological limitations and show a strong or very strong association for a patient-relevant clinical outcome. Thus, the Work Group was asked to identify the observational studies of treatment effects that were relevant to the guideline questions and that showed a relative risk of >2.0 or <0.5 for patient-relevant clinical outcomes. This process for identifying observational studies was used instead of systematic searches on the basis of the assumption that high-quality observational studies of patient-relevant clinical outcomes with large effect sizes would be well known to experts in the field. No observational studies meeting these criteria were identified. Observational studies with smaller estimates of treatment effects for clinical outcomes could be discussed and referenced in the rationale. The ERT cautioned against interpreting observational studies with smaller effect sizes for treatments as high-quality evidence, especially in areas in which RCTs are feasible.

 

Table 3

 

Table 4

EXCLUSION/INCLUSION CRITERIA FOR ARTICLE SELECTION FOR NONTREATMENT QUESTIONS

For studies related to questions of diagnosis, prevalence, and natural history (Table 4), the ERT completed a search in March 2007, screened the literature yield, and screened abstracts for relevance on the basis of the list of topics and questions. The yield of abstracts was tabulated by citation, population, number of individuals, follow-up time, study design (cross-sectional or longitudinal, prospective or retrospective), and by predictors and outcomes of interest. These lists were reviewed by the Work Group at the second Work Group meeting on 6 March 2007. The Work Group, in subgroups, made decisions to eliminate studies for a number of reasons (including publication prior to 1995, study size, poor study design, or not contributing pertinent information). The Work Group, with the assistance of the ERT, made the final decision for the inclusion or exclusion of all articles. These articles were either reviewed in a narrative form by the Work Group members or were tabulated into overview tables by the ERT and interpreted by the Work Group members. Articles pertinent to these nontreatment questions could be added by the Work Group members after the literature search date of March 2006. This hybrid process of a systematic search and selection of pertinent articles by experts was used to find information that was relevant and deemed important by the Work Group for the specific questions. The final yield of studies for these topics cannot be considered to be comprehensive and thus does not constitute a systematic review. The articles were not data extracted or graded.

The following sections apply to studies included in the systematic reviews of treatment questions.

LITERATURE YIELD FOR SYSTEMATIC REVIEW TOPICS

The literature searches up to December 2007 yielded 15,921 citations. For treatment topics, 92 articles were reviewed in full, of which 49 publications of 38 trials were extracted and included in summary tables. The remaining 43 articles were rejected by the ERT after a review of the full text. Details of the yield can be found in Table 5. An updated search for RCTs was conducted in November 2008. It yielded an extension study of an earlier RCT15, which was added as an annotation to the respective summary table. Two other RCTs in press were added by the Work Group.

There were no RCTs comparing treatment to different targets of phosphorus or parathyroid hormone levels. Thus, observational studies were reviewed for data on risk relationship to define extreme ranges of risk, rather than treatment targets.

For the question related to parathyroidectomy vs medical management for secondary or tertiary hyperparathyroidism, a search was run for 'parathyroidectomy' and 'kidney disease' published from 2001 to 2008. These dates were used to capture citations published after the final search for the 2003 KDOQI bone guidelines. This search did not reveal any RCTs. Observational studies also did not meet criteria in terms of relative risk or odds ratio; therefore, a list of potential observational studies comparing these two modalities was provided to the Work Group as references for a narrative review.

For the question of calcium supplementation vs other active or control treatments for preventing the development of hyperparathyroidism, the search did not yield any RCTs that met the inclusion criteria. This question had not been specifically addressed in the 2003 KDOQI Bone Guidelines; thus, the literature search with key words pertaining to 'kidney', 'calcium', and 'parathyroid hormone' was not limited to a specific publication year (i.e., 1950 onward).

For the question of bisphosphonates as a treatment for CKD-MBD, one RCT was identified that evaluated the use of bisphosphonates for the prevention of glucocorticoid-induced bone loss in patients with glomerulonephritis.16 As this study predominately included patients with CKD stages 1-2, and therefore, by definition, did not evaluate CKD-MBD, it was not included in the systematic review table of this topic.

For treatment topics in the pediatric population, 30 articles were reviewed in full. A total of 11 RCTs were identified. If treatment studies in children met the same criteria as those for adult studies, including sample size and follow-up, they were added to adult summary tables.17, 18 Otherwise, they were described in the corresponding section in the narrative. Separate evidence profiles for studies in children were not generated.

For the topic of growth hormone, a Cochrane meta-analysis update published in January 200719 was found to include all studies identified by the ERT through to 16 July 2007. In this meta-analysis, RCTs were identified from the Cochrane Central Register of Controlled Trials, MEDLINE, EMBASE through to July 2005, as well as from article reference lists, and through contact with local and international experts in the field. The screening criteria were similar to the criteria established by the ERT and Work Group, but were more inclusive in that studies with less than five individuals per arm were included. The ERT and the Work Group decided that a summary of this meta-analysis was adequate for the question of growth hormone treatment in children with CKD.

Table 5

DATA EXTRACTION

The ERT designed data extraction forms to capture information on various aspects of primary studies. Data fields for all topics included study setting, patient demographics, eligibility criteria, stage of kidney disease, numbers of individuals randomized, study design, study-funding source, description of mineral bone disorder parameters, descriptions of interventions, description of outcomes, statistical methods, results, quality of outcomes (as described in the following paragraphs), limitations to generalizability, and free-text fields for comments and assessment of biases.

The ERT extracted the baseline data. The Work Group extracted results, including AEs, graded the quality of the data, and listed the limitations to generalizability. Training of the Work Group members to extract data from primary articles occurred during Work Group meetings and by e-mail. The ERT reviewed and checked the data extraction carried out by the Work Group. Discrepancies in grading were resolved with the relevant Work Group members or with the entire Work Group during Work Group meetings. The ERT subsequently condensed the information from the data extraction forms. These condensed forms as well as the original articles were posted on a shared web site that all Work Group members could access to review the evidence. Data extraction of bone histology outcomes was carried out by two Work Group members specialized in that field (Susan Ott and Vanda Jorgetti). The ERT could not proof the results or evidence grades for this outcome. The method applied for assessing bone histomorphometry data by the Work Group experts is described in detail in the next section.

DATA EXTRACTION AND METHODS FOR CATEGORIZING BONE HISTOMORPHOMETRY DATA

The KDIGO position statement about renal osteodystrophy2 recommended that bone biopsy results should be reported on a unified classification system that includes parameters of turnover, mineralization, and volume. The clinical trials with bone histology outcomes reviewed for this guideline, however, were written before this statement, and the bone histomorphometry results were presented in a wide variety of ways. After reviewing the studies that met the inclusion criteria, two Work Group members chose a method that could be applied to most of the reported data. Most reports presented enough information to determine whether patients had changed from one category to another; sometimes this required extrapolation from figures or graphs. The categories are defined in Chapter 3.1, page S34.

The Work Group defined an improvement in turnover as a change from any category to normal, from adynamic or osteomalacia to mild or mixed, from osteitis fibrosa to mild, or from mixed to mild. Worsening bone turnover was defined as a change from normal to any category, from any category to adynamic or osteomalacia, from adynamic or osteomalacia to osteitis fibrosa, or from mild to osteitis fibrosa. These changes are shown in Figure 3, left side.

The average change in the bone formation rate could not be used to determine improvement, because a patient with a high bone-formation rate improves when it decreases, whereas a patient with adynamic bone disease must increase bone-formation rate to show improvement. A categorical approach, however, is also not ideal, because a patient could have substantial improvement but remain within a category, whereas another patient with a baseline close to the threshold between categories may change into another category with a small change. Another problem is variable definitions of the mixed category. A better method would be to report the mean change toward normal.20, 21 Most of the reports, however, did not provide enough detail to analyze biopsies in this manner.

With some treatments, an overall index of improvement does not convey all the important information, because the results have to be interpreted in the context of the original disease. For example, a medicine that decreased bone turnover could be beneficial if the original disease was osteitis fibrosa, but harmful if the patient had adynamic disease.

Assessing mineralization was more straightforward. An increase in mean osteoid volume, osteoid thickness or mineralization lag time indicates a worsening of mineralization. An increase indicates a worsening of mineralization. Using categories, an improvement would be a change from mixed or osteomalacia to normal, adynamic, or osteitis fibrosa; worsening would be a change to the osteomalacia or mixed categories (Figure 3, right side).

Figure 3

Figure 3 | Parameters of bone turnover, mineralization, and volume. Ady, adynamic bone disease; O. Fib, osteitis fibrosa; OM, osteomalacia.

For bone volume, an increase is usually an indication of improvement. Exceptions would be when patients develop osteosclerosis, but this is unusual. Most reports did not take bone volume into account. The studies also did not usually report differences in cortical vs cancellous bone, or report other structural parameters.

SUMMARY TABLES

Summary tables were developed to tabulate data from studies pertinent for each treatment question. Each summary contains three sections: a 'Baseline Characteristics Table', an 'Intervention and Results Table', and an 'Adverse Events Table'. Baseline Characteristics Tables include a description of the study size, the study population at baseline, demographics, country of residence, duration on dialysis, calcium concentration in the dialysis bath, diabetes status, previous use of aluminum-based phosphate binders, and findings on baseline MBD laboratory, bone, and calcification tests. Intervention and Results Tables describe the studies according to four dimensions: study size, follow-up duration, results, and methodological quality. Adverse Events Tables include study size, type of AEs, numbers of patients who discontinued treatment because of AEs, number of patients who died, and those who changed modality (including those who received a kidney transplant). The Work Group specified AEs of interest for each particular intervention (for example, hypercalcemia). Work Group members proofed all summary table data and quality assessments.

To provide consistency throughout the summary tables, data were sometimes converted or estimated. When follow-up times were reported in weeks, the results were converted into months by estimating 1 month as 4 weeks. Conventional units were converted into SI units, with the exception of creatinine clearance.

EVIDENCE PROFILES

Evidence profiles were constructed by the ERT to record decisions with regard to the estimates of effect, quality of evidence for each outcome, and quality of overall evidence across all outcomes. These profiles serve to make transparent to the reader the thinking process of the Work Group in systematically combining evidence and judgments. Each evidence profile was reviewed by Work Group experts. Decisions were taken on the basis of data and results from the primary studies listed in corresponding summary tables, and on judgments of the Work Group.

Judgments with regard to the quality, consistency, and directness of evidence were often complex, as were judgments regarding the importance of an outcome or the net effect and quality of the overall quality of evidence across all outcomes. The evidence profiles provided a structured approach to grading, rather than a rigorous method of quantitatively summing up grades. When the body of evidence for a particular question or for a comparison of interest consisted of only one study, the summary table provided the final level of synthesis and an evidence profile was not generated.

EVIDENCE MATRICES

Evidence matrices were generated for each systematic review for a treatment question. The matrix shows the quantity and quality of evidence reviewed for each outcome of interest. Each study retained in the systematic review is tabulated with the description of its authors, year of publication, sample size, mean duration of follow-up, and the quality grade for the respective outcome. Conceptually, information on the left upper corner shows high-quality evidence for outcomes of high importance. Information on the right lower corner shows low-quality evidence for outcomes of lesser importance. Evidence for AEs was not graded for quality, but still tabulated in one column in the matrices.

An evidence matrix was not generated for a systematic review topic when the yield for the topic was only one study that met inclusion criteria, as the entire study is summarized in the summary table that contains all relevant information.

An overall evidence matrix was generated to show the yield of all studies included in summary tables for all interventions of interest. This overall evidence matrix shows the entire yield for all treatment questions, both in terms of outcomes reviewed and the quality of evidence for each outcome in each study. Single studies that did not warrant an individual evidence matrix (that is, they were the only studies for a specific intervention question) were still included in the overall evidence matrix.

 

Table 6

GRADING OF QUALITY OF EVIDENCE FOR OUTCOMES IN INDIVIDUAL STUDIES

Study size and duration

The study (sample) size is used as a measure of the weight of evidence. In general, large studies provide more precise estimates. Similarly, longer-duration studies may be of better quality and more applicable, depending on other factors.

Methodological quality

Methodological quality (or internal validity) refers to the design, conduct, and reporting of the outcomes of a clinical study. A three-level classification of study quality was previously devised (Table 6). Given the potential differences in the quality of a study for its primary and other outcomes, study quality was assessed for each outcome.

The evaluation of questions of interventions included RCTs. The grading of the outcomes of these studies included a consideration of the methods (that is, duration, type of blinding, number and reasons for dropouts, etc.), population (that is, does the population studied introduce bias?), outcome definition/measurement, and thoroughness/precision of reporting and statistical methods (that is, was the study sufficiently powered and were the statistical methods valid?).

Results

The type of results used from a study was determined by the study design, the purpose of the study, and the Work Group's question(s) of interest. Decisions were based on screening criteria and outcomes of interest (Table 3).

Approach to grading

A structured approach, modeled after GRADE,14, 22, 23, 27 and facilitated by the use of Evidence Profiles and Evidence Matrices, was used to determine a grade that described the quality of the overall evidence and a grade for the strength of a recommendation. For each topic, the discussion on grading of the quality of evidence was led by the ERT, and the discussion regarding the strength of the recommendations was led by the Work Group Chairs.

Grading the quality of evidence for each outcome

The 'quality of a body of evidence' refers to the extent to which our confidence in an estimate of effect is sufficient to support a particular recommendation (GRADE Working Group, 2008).24 Following GRADE, the quality of a body of evidence pertaining to a particular outcome of interest is initially categorized on the basis of study design. For questions of interventions, the initial quality grade is 'High' if the body of evidence consists of RCTs, or 'Low' if it consists of observational studies, or 'Very Low' if it consists of studies of other study designs. For questions of interventions, the Work Group graded only RCTs. The grade for the quality of evidence for each intervention/outcome pair was then decreased if there were serious limitations to the methodological quality of the aggregate of studies; if there were important inconsistencies in the results across studies; if there was uncertainty about the directness of evidence including a limited applicability of findings to the population of interest; if the data were imprecise or sparse; or if there was thought to be a high likelihood of bias. The final grade for the quality of evidence for an intervention/outcome pair could be one of the following four grades: 'High', 'Moderate', 'Low', or 'Very Low' (Table 7).

Grading the overall quality of evidence

The quality of the overall body of evidence was then determined on the basis of the quality grades for all outcomes of interest, taking into account explicit judgments about the relative importance of each outcome. The resulting four final categories for the quality of overall evidence were 'A', 'B', 'C', or 'D' (Table 8).14 This grade for overall evidence is indicated behind the strength of recommendations. The summary of the overall quality of evidence across all outcomes proved to be very complex. Thus, as an interim step, the evidence profiles recorded the quality of evidence for each of three outcome categories: patient-centered outcomes, other bone and vascular surrogate outcomes, and laboratory outcomes. The overall quality of evidence was determined by the Work Group and is based on an overall assessment of the evidence. It reflects that, for most interventions and tests, there is no high-quality evidence for net benefit in terms of patient-centered outcomes.

Assessment of the net health benefit across all important clinical outcomes

Net health benefit was determined on the basis of the anticipated balance of benefits and harm across all clinically important outcomes. The assessment of net medical benefit was affected by the judgment of the Work Group and ERT. The assessment of net health benefit is summarized in one of the following statements: (i) There is net benefit from intervention when benefits outweigh harm; (ii) there is no net benefit; (iii) there are tradeoffs between benefits and harm when harm does not altogether offset benefits, but requires consideration in decision making; or (iv) uncertainty remains regarding net benefit (Table 9).

 

Table 7

 

Table 823

GRADING THE STRENGTH OF THE RECOMMENDATIONS

The 'strength of a recommendation' indicates the extent to which one can be confident that adherence to the recommendation will do more good than harm. The strength of a recommendation is graded as Level 1 or Level 2.23

Table 10 shows the nomenclature for grading the strength of a recommendation and the implications of each level for patients, clinicians, and policy makers. Recommendations can be for or against doing something.

Table 11 shows that the strength of a recommendation is determined not just by the quality of evidence, but also by other, often complex judgments regarding the size of the net medical benefit, values and preferences, and costs. Formal decision analyses, including cost analysis, were not conducted. Where there is doubt regarding the balance of benefits and harm with respect to patient centered outcomes, or when the quality of evidence is too low to assess balance, the recommendation is necessarily a 'level 2'.

UNGRADED STATEMENTS

The Work Group felt that having a category that allows it to issue general advice would be useful. For this purpose, the Work Group chose the category of a recommendation that was not graded. Typically, this type of ungraded statement met the following criteria: it provides guidance on the basis of common sense; it provides reminders of the obvious; and it is not sufficiently specific enough to allow an application of evidence to the issue, and therefore it is not based on a systematic evidence review. Common examples include recommendations regarding the frequency of testing, referral to specialists, and routine medical care. The ERT and Work Group strove to minimize the use of ungraded recommendations.

 

Table 923

FORMULATION AND VETTING OF RECOMMENDATIONS

The selection of specific wording for each of the statements was a time-intensive process. In addition to striving for the recommendations to be clear and actionable, the wording also considered grammar, proper English-word usage, and the ability of concepts to be translated accurately into other languages. A final wording of recommendations and the corresponding grades for the strength of the recommendations and the quality of evidence were voted upon by the Work Group, and required a majority to be accepted. The process of peer review was a serious undertaking. It included an internal review by the KDIGO Board of Directors and an external review by the public to ensure widespread input from numerous stakeholders, including patients, experts, and industry and national organizations, and then another internal review by the KDIGO Board of Directors.

 

Table 10

 

Table 1123

FORMAT FOR CHAPTERS

Each chapter contains one or more specific 'recommendations'. Within each recommendation, the strength of the recommendation is indicated as level 1 or level 2, and the quality of the overall supporting evidence is shown as A, B, C, or D. The recommendations are followed by a section that describes the chain of logic, which consists of declarative sentences summarizing the key points of the evidence base and the judgments supporting the recommendation. This is followed by a narrative that provides the supporting rationale and includes data tables where appropriate. In relevant sections, research recommendations suggest future research to resolve current uncertainties.

COMPARISON WITH OTHER GUIDELINES

The reconciliation of a guideline with other guidelines reduces potential confusion related to variability or discrepancies in guideline recommendations. At the beginning of the guideline process, the ERT searched for other current guidelines on CKD-MBD and compiled them by topic. This information was submitted to the Work Group to highlight those topics that other guidelines had addressed and what recommendations had been issued. However, given the global nature of the KDIGO guidelines, it was felt that judging how any guideline might be applicable in a particular setting would require a process of 'guideline adoption', and that it would be the task of a local 'guideline adoption group' to review and reconcile the recommendations of the KDIGO guideline with those of other guidelines pertinent and applicable to its country or context. Thus, this KDIGO guideline does not contain a comparison of how the recommendations from this KDIGO Work Group differ from those of other existing guidelines.

LIMITATIONS OF APPROACH

Although the literature searches were intended to be comprehensive, they were not exhaustive. MEDLINE was the only database searched, and the search was limited to English language publications. Hand searches of journals were not performed, and review articles and textbook chapters were not systematically searched. However, important studies known to domain experts, which were missed by the electronic literature searches, were added to the retrieved articles and reviewed by the Work Group. Nonrandomized studies were not systematically reviewed. The majority of the ERT and Work Group resources were devoted to a detailed review of randomized trials, as these were deemed to most likely provide data to support treatment recommendations with higher quality evidence. Where randomized trials are lacking, it was deemed to be sufficiently unlikely that studies previously unknown to the Work Group would result in higher quality evidence. Evidence for patient-relevant clinical outcomes was low. Usually, low-quality evidence required a substantial use of expert judgment in deriving a recommendation from the evidence reviewed.

SUMMARY OF THE PROCESS

Several tools and checklists have been developed to assess the quality of the guideline development process and to enhance the quality of guideline reporting. These include the Appraisal of Guidelines for Research and Evaluation (AGREE) criteria25 and the Conference on Guideline Standardization (COGS) checklist.26 Supplementary Table 3 shows the key features of the guideline development process according to the COGS checklist.

SUPPLEMENTARY MATERIAL

Supplementary Table 1. Literature search strategy.
Supplementary Table 2. Use of other relevant systematic reviews and meta-analyses.
Supplementary Table 3. Key features of the guideline.
Supplementary material is linked to the online version of the paper at http://www.nature.com/ki