Published in Skeptical Inquirer Vol 44, No. 1 (Jan/Feb 2020)
Dr. H. Gilbert Welch is an American
physician and cancer screening researcher. As a former Professor in Dartmouth
Institute for Health Policy and Clinical Practice, he has published many
peer-reviewed papers about the harms of early detection and specifically,
cancer screening — the systematic search for cancer before it causes
symptoms.
Welch is also a science writer. His
first book, published in 2004, is Should
I Be Tested For Cancer? Maybe Not and Here’s Why. Welch, along with
researchers Lisa Schwartz and Steven Woloshin, wrote Overdiagnosed – Making People Sick in the Pursuit of Health, which
deals with screening and other cases where medicine has been too much, probably
causing more harm than good. His latest book was published in 2015 and is
titled Less Medicine, More Health – 7
Assumptions That Drive Too Much Medical Care.
In this interview, Welch and I
discussed about why diagnosing a cancer early does not necessarily mean it is
always a good thing.
Dr. H Gilbert Welch |
Welch: I am a conventionally trained
physician and I believe medical care can do a lot of good – particular for
people who are sick and injured. Making a timely diagnosis in people who are
sick is really important. What I am worried about is when medical care expands
to the population that is well – because it is hard to make a well person better,
but it is not that hard to make them worse.
We might involve a thousand people
in a screening program for ten years and one person is helped. This is good,
but an important question is: What happened to the other 999? That is where I
have been in my career for the last 20 years.
Nogueira: What is the main idea behind screening and its problems?
Welch: In the past, doctors waited for
problems to develop in a population and made diagnosis and treatment in that
fraction. The idea of screening or early detection is to advance in time the
moment of diagnosis in the same population. The assumption behind screening is:
the people diagnosed early will be those destined to develop problems.
However, the reality has been
different: whenever we look hard for early forms of disease, we find that more
people have them. Thus, not all of them
will develop problems. As we do not know who is going to develop problems, we
tend to treat all of them. This means we are treating some people for whom the
disease would never be a problem — it is the overdiagnosed and needlessly
treated fraction. They cannot be helped, but they can be harmed.
Overdiagnosis happens to a
relatively few individuals. A more common problem of screening is the disease
scare — a false positive result. Many individuals require multiple visits and
multiple tests before we are sure they don’t have cancer. Patients understand
medications can have harms, but they cannot imagine how a test could have
harms. They think that it is always good to know, but they do not recognize the
cascade of events that a test can trigger. Even a perfectly safe test can lead
to a series of events that can harm people
Finally, to promote screening we
need to scare people about the disease (“that’s why you need to be
screened”). In other words, we are
making everybody more worried about the future. Ironically, part of being
healthy is being not too worried about health. Screening is responsible for
injecting some “dis-ease” into the population.
Nogueira: What is the effect of screening/early detection in survival
statistics?
Welch: With more detection, the typical
patient now does better. Among patients with the disease, they appear to have
survived longer. This happens because people overdiagnosed or with less severe
forms of disease are included in the “disease” group. Screening effects are
really misleading: the harder you look, the more you find and everyone appears
to be better. It is related to the popularity paradox of screening: the more
overdiagnosis screening causes, the more popular screening becomes.
Nogueira: What have we learned about cancer progression and its
relationship with screening?
Welch: Cancer is much more heterogeneous
than we thought. Abnormalities that meet the pathological definition of cancer
could have very different natural histories; they have variable growth rates.
It has been described as the
barnyard pen of cancers. There are three animals in the barnyard: the birds,
the rabbits and the turtles. The goal of screening is fence them in — to catch
them early. However, we cannot catch the birds, because they are already gone.
Birds are the most aggressive cancers; they have already spread by the time
they are detectable. Screening does not help with those cancers. Sometimes we
can treat them, but they are the worst type.
It is possible to catch the rabbits
if you build enough fences. The rabbits are the cancers that can be detected
earlier and will bother patients. So screening may help in these cases. For
screening to be of help, treatment needs to be more effective early than it is
late. Sometimes this is not true. In the case of breast cancer, a
two-centimeter tumor can be treated as well a one-centimeter tumor.
Finally, we don’t need any fences
for the turtles – because they are not going anywhere. Turtles meet the
pathological definition of cancer. However, they are either not growing or
growing so slowly that they will never cause problems until the patient dies
from something else. Or they are regressing—some cancers start and they
disappear; perhaps recognized by a well-functioning immune system.
The unfortunate reality is that
screening is very good at finding turtles. Doctors are not able to distinguish
turtles from rabbits, thus we treat everybody – creating the major harm of
early detection: overdiagnosis and overtreatment.
Nogueira: How has screening affected the incidence of prostate cancer?
Welch: Note how the incidence of prostate
cancer in US bounces around (see Figure 1). There is no known tumor biology or
carcinogenic process that can explain this graph. It looks more like a financial chart than a
cancer incidence chart. And this is not a small number problem; it is the most
common cancer in the database
The graph can be divided in four
phases. It begins in 1975 with the
growth of Transurethral resection of the prostate (TURP), which at the time was
a common prostate surgery done to help men with large prostates. With more
pieces of prostates being sent to pathologists, the incidence of prostate
cancer slowly increased. The second phase is PSA promotion, when hospitals
started to offer free PSA test, knowing they would make their money back in subsequent
blood tests, biopsies and treatments. Around 1995, the retrenchment era begun
with urologists recognizing that they should not offer PSA screening for men
with less than ten years of life expectancy, since they cannot be helped by
screening. Finally, the discouragement took place after the US Preventive
Services Task Force argued against PSA screening. It is remarkable the
incidence at present is almost the same of 1975. In other words, this is a
scrutiny-dependent cancer. I do not know of more powerful example of how the
health care system affects the apparent amount of cancer.
|
Nogueira: Among common cancer screening programs (for cervical, colorectal, breast and prostate cancer), what are their effects in the mortality of those cancers?
Welch: We never had a randomized trial of
cervical cancer screening; it was implemented before we considered randomized
trials. There is a lot of observational data that suggests it is helpful, but
it does not explain the 80% reduction in cervical cancer mortality. For
instance, we have seen an 80% reduction in stomach cancer mortality and it is a
cancer that we do not screen for. Colon cancer mortality is also declining and
the fall started before the introduction of screening.
Screening for cervical cancer and
colorectal cancer has had some effect in the mortality of those cancers. Breast
cancer screening has had only a little effect on breast cancer mortality. The
big effect in breast and prostate cancer is better treatment—we learned those
cancers are hormonal diseases.
Nogueira:
How do you see the risk and benefit
ratio of those cancer screening programs?
Welch: In general, people consider
colorectal and cervical cancer screening in the side of more benefit than harm.
I think this is largely because the problem of cancer overdiagnosis is less
evident in those cases. Since they detected precancerous lesions, overdiagnosis
takes place at a prior step — dysplastic polyps or cervical dysplasia. In
colorectal cancer screening, there are complications from colonoscopy and from
polypectomies (e.g. bleeding, perforations). In cervical cancer screening,
there are complications from cryotherapy and excisions for precancerous
lesions. (e.g. bleeding, preterm birth)
Cancer screening has a mix of
effects. Most screening, including PSA and mammography, does help a few people,
but also harm others. This is the conundrum we must be clear about. So,
screening is not a public health imperative; it’s a choice.
And it can distract people from more
important things they can be doing for their health. It can also distract
resources from other more important interventions. There are two very different
aspects to the word prevention. One
is health promotion from behavior advices, such as do not smoke, eat real food,
move regularly, and find meaningful relationships. They are not sexy or
technological, but are very important to health. But when the prevention
movement got medicalized, it became a technological imperative to look for
early forms of disease.
We also have to be sensible with the
overdiagnosis problem. We have to stop thinking the best test as the one that
finds more cancers. Typically that is how tests are promoted, “this test finds
more cancer than that”. That is not a good test; we are not looking to find
more cancers; we want to find a few cancers that matter.
Nogueira: How can we make screening better, for instance to find those
cancers we can make a difference on?
Welch: This is best exemplified in the case
of lung cancer screening. In the US, lung cancer is the most common cause of
cancer death; it is a big problem. There is really well-defined risk group,
which can be identified by a single question “Do you smoke?” We have a really
common cause of death and an easy way to find a high risk group — it is a
perfect situation for screening.
It was the first cancer studied for
screening and it happened in the 1980s using chest x-ray. The results were
terribly disappointing: screening led to more deaths; not less. This happened
because screening triggered operations and some died from those operations. The
idea of overdiagnosis in lung cancer was crazy, but it happened. Then, spiral
CT comes along. Importantly, the investigators responsible for spiral CT trial
knew about overdiagnosis. What they did was groundbreaking: when the spiral CT
found a small lesion that looks worrisome, they did not act and did not biopsy
immediately; they waited three months to see whether the lesion was growing.
They were making use of the diagnostic value of time. Time provides information
both about the genetics of the tumor and the body’s reaction to it. I think
that is a step forward.
Everything changes when you move to
a genuine high-risk population (recall that regular cigarette smokers are 20
times more likely than non-smokers to die from lung cancer). They are much less
likely to be overdiagnosed and much more likely to be helped. But there are not
a lot of risk factors as common and powerful as cigarette smoking. Most cancers
are sporadic – not the result of some obvious risk factor.
Nogueira: All-cause mortality is not reduced in population wide cancer
screenings trials. Could you explain why it matters?
Welch: It begins with what counts as a
cancer death. In the context of evaluating a screening, I want cancer death not
only to include deaths from cancer but also deaths due to interventions
performed as part of looking for and treating the cancer. That is not what
happens. That is why all-cause mortality is important. If we are going to tell
people that screening “save lives”, I would like to know if it changes their
risk of death. Unless you want to play a game that you care more about one type
of death than another.
A good example is a classic study —
the Minnesota Colon Cancer Control Study. It has now 30 years of follow-up.
There are three arms in the study: annual and biennial screening and control
group. After 30 years, 2% of annual group and 3% of control group died from
colon cancer. This is the benefit: 1%, or to put in relatively terms, a 33%
reduction in colon cancer death. However, all-cause mortality was the same in
all groups (Figure 2). It is hard to say that is saving lives; it may be
trading one form of death for another.
Figure 2. The Minnesota Colon Cancer Control Study: All-cause mortality was the same between three groups: control (non-screened), annual screening, and biennial year (Shaukat et al. 2013). |
Nogueira: Since screening benefits are not large and there are harms,
what are the reasons for the heavy promotion of screening?
Welch: The first is a true belief that
early detection must help, as a solution to every bad disease. Money is another
part, because is a great way to recruit new patients. It is good for Pharma,
for test manufactures and increasingly good for our hospitals. It is a powerful
idea to look for diseases early: if you could argue that everyone should do
something, it is a huge market.
Nogueira: What about clinical breast examination and self-breast
examination often advertised to women?
Welch: The data is clear that clinical
breast exam and teaching women to self-examine their breasts does not seem to
help. But if a woman becomes aware of a new breast lump, she should have it
evaluated. Part
of the attention to breast cancer has been good. Ironically, it is possible that screening
mammography could be the best way to do the clinical breast exam, if the
threshold would be looking for things of 1-cm or bigger. I think a lot of harm
from mammography could be reduced if the thresholds for further investigation
were much higher.
The general conundrum of screening
is we have to involve a whole bunch of people to potentially help a very few.
We have to pay attention to not disturb the rest of them.
Nogueira: How
do you see the paper that claimed an increase in advanced cases of prostate
cancer after USPSTF 2012 recommendation against screening?
Welch: That report — an increased number
of late stages of prostate cancer — was highly flawed. They were only talking
about "counts"; they never had a denominator.
In the US data so far (Figure 3),
the incidence of metastatic prostate cancer at first presentation — the cancer
was already metastatic at the moment of diagnosis — continues to stay stable.
But I expect it
will go up.
What you see is the implementation
of PSA screening really had an effect on that incidence — almost cut it in
half. This is a sign that the bad cancers are being found early. But now it's been fairly stable, but I
wouldn't be surprised if go back up, because PSA screening is going down. But
whether that changes death rates, it is a separate question, because early
treatment must matter.
Notice, in comparison, the incidence
of metastatic breast cancer at first presentation never changes, it is pretty
stable. Mammography screening has not
been able to reduce the amount of breast cancer diagnosed at this very late
stage. That’s not the mammographers
fault, that the fault of the agressive cancers (the birds in the barnyard analogy).
Figure 3. Incidence of cancer that was metastatic at first presentation in the United States, 1975–2012 (Welch et al. 2015). |
References:
Shaukat, A., S.J. Mongin, M.S. Geisser,
et al. 2013. Long-term mortality after screening for colorectal cancer. N Engl J Med. 369(12):1106-14. doi:
10.1056/NEJMoa1300720.
Welch, H.G., O.W Brawley. 2018. Scrutiny-Dependent
Cancer and Self-fulfilling Risk Factors. Ann
Intern Med. 168(2):143-144. doi: 10.7326/M17-2792.
Welch, H.G., D.H. Gorski, P.C. Albertsen.
2015. Trends in Metastatic Breast and Prostate Cancer — Lessons in Cancer
Dynamics. N Engl J Med 373:1685-1687 doi:
10.1056/NEJMp1510443