About Prevalence and Incidence Statistics


Statistical information such as prevalence, incidence, deaths, and other data is provided from numerous sources and is subject to numerous provisos. Nevertheless, it is hoped to be useful, if not completely accurate.

Prevalence versus incidence: Prevalence and incidence are different measures of a disease's occurrence. The "prevalence" of a condition means the number of people who currently have the condition, whereas "incidence" refers to the annual number of people who have a case of the condition. These two measures are very different. A chronic incurable disease like diabetes can have a low incidence but high prevalence, because the prevalence is the cumulative sum of past year incidence rates. A short-duration curable condition such as the common cold can have a high incidence but low prevalence, because many people get a cold each year, but few people actually have a cold at any given time.

Maximum of prevalence or incidence: Taking the maximum value of either of the prevalence and incidence numbers for a disease is a reasonably useful indicator. It is a kind of "people affected" measure that gives an approximate value to the number of people who would have to deal with a condition in any given year.

Problems with prevalence data: Prevalence attempts to measure the number of people affected by a condition at any given time. Two estimates of prevalence are not necessarily comparable. Some estimates attempt to quantify the number of diagnosed people. Other prevalence estimates attempt to include undiagnosed people who unknowingly have the condition. Some prevalence numbers include only symptomatic conditions whereas others may include latent infections. Prevalence numbers may also have been computed via various estimate methods ranging from research studies to phone surveys. Conditions that go into "remission" but are not necessarily "cured", such as cancer, cause problems for prevalence data. Some such estimates use 5-year prevalence or 10-year prevalence estimates, which includes only people who have had cancer 5 or 10 years previously. This effectively assumes that a remission becomes a cure after 5 or 10 years.

Problems with incidence data: Incidence data attempts to measure the number of people who become affected with a condition each year. Incidence includes only new conditions, not ongoing treatment of existing conditions. The actual number of people affected by a condition in a year can be less than incidence reports in cases where people get multiple cases (e.g. common cold). Two incidence rates are not necessarily comparable. Some incidence data uses government notifications, others based on physician or hospital diagnoses, and various other methods. Some estimates of incidence for under-diagnosed conditions attempt to justify a larger incidence rate than is reported by doctors or medical authorities, whereas other rates may use only the official reported rates.

Rates of incidence/prevalence calcuations: This site attempts to manipulate prevalence and incidence data to give more relevant data, such as to report the percent of the population affected, total number of people affected nationally, or the odds in a "1 in 1000" format. These computations are based on population data for the relevant reporting region (usually the national USA). Some computation rates use different base data: prevalence, incidence, or maximum of prevalence/incidence. In some cases where the data is reported as a word such as "common", "rare", "uncommon" or similar phrase, an arbitrary numerical percentage has been applied to this information. Data that is reported based on births, such as 1-in-3000 births, has either been left as is (for chronic conditions) or modified by an estimate of the number of births. Data reported as a percent of pregnancies or pregnant women has been calculated using an estimate of the number of pregnancies annually.

Lifetime risk data: Some conditions report a risk factor for having a condition in your lifetime. For example, cancer is widely reported to affect about 1 in 3 people in their lifetime. These rates are naturally much higher than either prevalence or incidence data, because they are effectively the cumulative risk of incidence/prevalence over multiple years.

General problems with the data: In addition to the above discussion, there are various general qualifiers with regard to prevalence, incidence, and any of the other types of data. Use of the data may incur the old apples-and-oranges comparison problem because of data differences. Problems with using the data include:

  • Unclear sources: there are numerous statistics reported in articles and on the internet, and determining the actual study or survey on which an estimate is based is often difficult, even for statistics reported by health authorities or government agencies.
  • Data ranges: where a rate is reported as a range, such as "3 to 5 million people", the lower number is arbitrarily chosen and used here. This is a conservative assumption, but may cause some estimates to be lower than they should.
  • Different definitions of prevalence: some prevalence numbers use estimates of people diagnosed, others try also to include estimated of undiagnosed people, and some use different values like 5-year prevalence or 10-year prevalence data.
  • Different sources: data has been collected from numerous sources, and the reputability and accuracy of each source cannot reasonably be completely confirmed.
  • Different study methodologies: the data comes from various studies that used different methodologies. Some data comes from government notification bodies, other from patient phone surveys, others using various methods of estimation, and so on. Many estimates are computed from a small sample and then extrapolated to a larger population group, and this method has various inherent limitations to its accuracy.
  • Different disease categories: some data may use different categorization arrangements to determine who has a particular disease. Some studies use the ICD categories, others do not, and there are actually small variations in the different ICD categorizations in any case. For example, should wheezing be part of asthma or separate?
  • Different years: data may come from numerous different years.
  • Different locations: data may come from different countries, states, or areas.
  • Different age groups: data may refer to a particular age group, such as "3% of adults", and may not necessarily reflect the overall prevalence in the entire population of all ages.
  • Different racial factors: some data may reflect a particular race more accurately and not apply to the entire population.
  • Inherent reporting bias: some organizations tend to quote higher numbers to make the conditions they monitor seem more important and to justify funding levels.
  • Country-specific information: Most of the data is reported from USA sources, and may be of limited value to other countries. For example, certain conditions have a much higher prevalence worldwide, especially in developing countries, than in industrialized nations like the USA.

Medical Tools & Articles:

Next articles:

Medical Articles:
CureResearch.comTM Copyright © 2010 Health Grades, Inc. All rights reserved.
Home | Contents | Search | Site Map | Feedback | Contact Us | Terms of Use | Privacy Policy | About Us | Advertise