Validity and Relevance

How Do We Evaluate Validity and Relevance?

Not all research is good or applicable to medical practice. Once we have selected the studies on which to base the Daily POEMs, we present the information in a reader-friendly format that provides a thorough validity assessment of the research. Our reviewers write concise, structured reviews of articles and assessments of research design using criteria developed by the Evidence-Based Medicine (EBM) Working Group (Oxman AD, Sackett DL, Guyatt GH. Users' guides to the medical literature: I. How to get started. JAMA 1993;270:2093-5).

Specific Criteria for Evaluating Validity

Basic Criteria: We only review original research articles and systematic reviews that provide POEMs, Patient-Oriented Evidence that Matters. Preliminary results or evaluations reporting on intermediate or surrogate outcomes usually are not reviewed.

Criteria for Relevance

  1. Did the authors study an outcome that patients would care about?
    We do not include results that require extrapolation to outcomes that truly matter to patients.
  2. Is the problem studied one that is common to primary care and is the intervention feasible?
    Only information that can be implemented in primary care practice is reviewed.
  3. Will the information, if true, require a change in current practice?
    Information that confirms existing standards of practice generally will not be reviewed.

All studies must meet all three criteria or they will not be reviewed.

Studies of Treatments

Studies of treatments, whether a drug, device, or other intervention, must be randomized, controlled trials. Since most new, relevant medical information involves advances in treatment, these studies must sustain rigorous review.

Validity Questions

  1. Was it a controlled trial and were the subjects randomly assigned?
    Studies not meeting both criteria are not reviewed.
  2. Are the patients in the study so dissimilar to typical primary care patients that the results will not apply?
    Studies performed on patients enrolled in settings markedly different from primary care will not be reviewed.
  3. Were steps taken to conceal the treatment assignment from personnel entering patients into the study?
    "Concealed allocation" through the use of opaque envelopes, centralized randomization, or other methods, prevents selective enrollment of patients into a study. It's not the same as blinding--blinding occurs after the study is started. It's actually the people enrolling the subjects into the trial that we concern ourselves with. When the investigators are enrolling people, before the trial ever starts, we want to make sure they don't know the group to which they will be allocated--this knowledge might bias them and affect the way they enroll patients. The use of concealed allocation generally will be noted in POEMs reviews but not in Evidence-Based Practice. If the allocation concealment is unclear, we will still include the study unless there is a good chance that unconcealed allocation could produce a systematic bias (e.g., when popular opinion favors one treatment over another or when a skewed distribution of disease severity may affect the study outcome).
  4. Were all patients who entered the trial properly accounted for at its conclusion?
    Follow-up of patients entering the trial will be assessed. Studies with incomplete follow-up or large dropout rates (>20%) will not be reviewed.
  5. Was intention-to-treat analysis performed? Were patients analyzed in the group to which they were randomized?
    Keeping patients in their initial groups, allowing noncompliance or treatment failure to affect the outcomes, better reflects the effectiveness of a therapy in actual practice. Lack of intention-to-treat analysis ("efficacy analysis") will be reported.
  6. Were patients and study personnel blind to treatment assignment?
    Lack of blind (masked) assessment will be noted if outcome measures are subjective (e.g., pain relief, TIA, general assessment).
  7. Were the intervention and control groups similar?
    The effectiveness of the randomization process will be assessed. Imbalances in randomization may invalidate the study and the study will not be reviewed. Worrisome, though not large, imbalances will be reported.
  8. If a negative trial, was the power of the study adequate?
    The power of a study lies in its ability to find a difference in two therapies if one truly exists. It depends on the magnitude of the difference of effect between the two therapies and the number of patients enrolled in the study. Sample size calculations and effect difference estimates will be evaluated.
  9. Were there other factors that might have affected the outcome?
    Potential confounders will be mentioned.

Studies of Diagnosis

Studies of diagnostic tests, whether in a lab or as part of the physical exam, must demonstrate that:

  • the test is accurate at identifying the disease when it is present
  • the test does not identify the disease when it isn't present
  • it works well over a wide spectrum of patients with and without the disease

Validity questions

  1. What is the disease being addressed?
    Studies evaluating a diagnostic test that identify an abnormality but not a disease generally are not reviewed.
  2. Is the test compared with an acceptable "gold standard"?
    The characteristics of the new test should be compared with the best available method for identifying the disease.
  3. Were both tests applied in a uniformly blind manner?
    This question determines whether every patient received both tests, and that one test wasn't performed with knowledge of the results of the other test, which could introduce bias.
  4. Is the new test reasonable?
    Studies will not be reviewed that evaluate diagnostic tests that cannot be readily implemented by primary care clinicians.
  5. What is the prevalence of disease in the study population?
    The prevalence of disease in the study population will be reported so that readers can compare it with their own practice.
  6. What are the test characteristics?
    The sensitivity, specificity, predictive values, and likelihood ratios will be reported. These will be calculated from data in the study if not reported by the authors.


Only systematic reviews and meta-analyses are included; traditional, non-systematic review articles are not.

  1. Were the methods used to locate relevant studies comprehensive and clearly stated?
    Reviews not stating the method of locating studies will not be reviewed.
  2. Were explicit methods used to select studies to include in the overview?
    Reviews not stating methods of including or excluding studies will not be reviewed.
  3. Was the validity of the original studies included in the overview appropriately assessed?
    Reviews not stating the method used to assess the validity of the original studies will not be reviewed. Reviews can include or exclude studies based on quality scores. Reviews including all studies irrespective of their quality scores should present the validity evaluation; reviews eliminating studies based on low quality should explicitly describe how these studies were eliminated.
  4. Was the assessment of the relevance and validity of the original studies reproducible and free from bias?
    Published methods of assessing relevance or validity of others can be referenced or new criteria can be described. Generally, validity assessment should be performed independently by at least two investigators.
  5. Was variation between the results of the relevant studies analyzed?
    Heterogeneity in study results should be evaluated and, if present, explained.
  6. Were the results combined appropriately?
    When results from different studies are combined, only similar outcomes should be combined. Reviews that attempt to convert study results from one scale to another generally won't be considered.

Studies About Prognosis

The main threats to studies of prognosis are initial patient identification and loss of follow-up. We include only prognosis studies that identify patients before they have the outcome of importance and are able to follow-up with at least 80% of them.

Validity questions:

  1. Was an "inception cohort" assembled? Did the investigators identify a specific group of people and follow them forward in time?
    Studies that do not are not reviewed.
  2. Were the criteria for entry into the study objective and reasonable?
    Entry criteria must be reproducible and not too restrictive or too broad.
  3. Was follow-up of subjects adequate (at least 80%)?
  4. Were the patients similar to those in primary care in terms of age, sex, race, severity of disease, and other factors that might influence the course of the disease?
  5. Where did the subjects come from--was the referral pattern specified?
    The source of subjects will be noted in the review.
  6. Were outcomes assessed objectively and blindly?

Decision Analysis

Decision and cost-effectiveness analyses are included if they consider all relevant strategies and perform a comprehensive review of the literature to determine costs, benefits, and harms. Though all clinical decisions are made under conditions of uncertainty, this uncertainty decreases when the medical literature includes directly relevant, valid evidence. When the published evidence is scant, or less valid, uncertainty increases. Decision analysis allows clinicians to compare the expected consequences of pursuing different strategies under conditions of uncertainty. In a sense, decision analysis is an attempt to artificially construct POEMs out of DOEs.

Validity questions:

  1. Were all important strategies and outcomes included?
    Analyses evaluating only some outcomes or strategies will not be reviewed.
  2. Was an explicit and sensible process used to identify, select, and combine the evidence into probabilities? Is the evidence strong enough?
  3. Were the utilities obtained in an explicit and sensible way from credible sources?
    Specifically, we will determine whether utilities were obtained from small samples or from groups not afflicted with the disease or outcome.
  4. Was the potential impact of any uncertainty in the evidence determined?
    We will determine whether a sensitivity analysis was performed to determine how robust the analysis is under different conditions.
  5. How strong is the evidence used in the analysis? Could the uncertainty in the evidence change the result?
    We will report on whether a given variable unduly influences the analysis.

Qualitative Research

Qualitative research uses non-quantitative methods to answer questions. While this type of research is able to investigate questions that quantitative research cannot, it is at risk for bias and error on the part of the researcher. Qualitative research findings will be reported if they are highly relevant, although specific conclusions will not be drawn from the results.

Validity questions:

  1. Was the appropriate method used to answer the question?
    Interviews or focus groups should be used to study perceptions. Observation is required to evaluate behaviors. Studies not using the appropriate method will not be reviewed.
  2. Was appropriate and adequate sampling used to get the best information?
    Random sampling is not used in qualitative research. Instead, subjects are selected with the idea that they are best suited to provide appropriate information. Assurance that enough people were studied to provide sufficient information should be found in the description.
  3. Was an iterative process of collecting information used?
    In qualitative research, the researcher learns about the topic as the research progresses. The study design should consist of data collection and analysis, followed by more data collection and analysis, in an iterative fashion, until no more information is obtained.
  4. Was a thorough analysis presented?
    A good qualitative study not only presents the findings but also provides a thorough analysis of the data.
  5. Are the background and training of the investigators described?
    Since the investigator is being relied on for analysis of the data, we must know their training and biases. Knowing these characteristics, we can use them to evaluate their conclusions.