Monday, April 27, 2020

Skepticism about cancer screening: An Interview with Dr. H. Gilbert Welch



Published in Skeptical Inquirer Vol 44, No. 1 (Jan/Feb 2020)


Dr. H. Gilbert Welch is an American physician and cancer screening researcher. As a former Professor in Dartmouth Institute for Health Policy and Clinical Practice, he has published many peer-reviewed papers about the harms of early detection and specifically, cancer screening — the systematic search for cancer before it causes symptoms.   


Welch is also a science writer. His first book, published in 2004, is Should I Be Tested For Cancer? Maybe Not and Here’s Why. Welch, along with researchers Lisa Schwartz and Steven Woloshin, wrote Overdiagnosed – Making People Sick in the Pursuit of Health, which deals with screening and other cases where medicine has been too much, probably causing more harm than good. His latest book was published in 2015 and is titled Less Medicine, More Health – 7 Assumptions That Drive Too Much Medical Care. 

In this interview, Welch and I discussed about why diagnosing a cancer early does not necessarily mean it is always a good thing. 

Dr. H Gilbert Welch
Nogueira: When we are discussing problems of screening, how can we get the message clear for the people not to be confused that all medical care is being criticized?
Welch: I am a conventionally trained physician and I believe medical care can do a lot of good – particular for people who are sick and injured. Making a timely diagnosis in people who are sick is really important. What I am worried about is when medical care expands to the population that is well – because it is hard to make a well person better, but it is not that hard to make them worse.

We might involve a thousand people in a screening program for ten years and one person is helped. This is good, but an important question is: What happened to the other 999? That is where I have been in my career for the last 20 years.

Nogueira: What is the main idea behind screening and its problems?
Welch: In the past, doctors waited for problems to develop in a population and made diagnosis and treatment in that fraction. The idea of screening or early detection is to advance in time the moment of diagnosis in the same population. The assumption behind screening is: the people diagnosed early will be those destined to develop problems.       

However, the reality has been different: whenever we look hard for early forms of disease, we find that more people have them.  Thus, not all of them will develop problems. As we do not know who is going to develop problems, we tend to treat all of them. This means we are treating some people for whom the disease would never be a problem — it is the overdiagnosed and needlessly treated fraction. They cannot be helped, but they can be harmed.  

Overdiagnosis happens to a relatively few individuals. A more common problem of screening is the disease scare — a false positive result. Many individuals require multiple visits and multiple tests before we are sure they don’t have cancer. Patients understand medications can have harms, but they cannot imagine how a test could have harms. They think that it is always good to know, but they do not recognize the cascade of events that a test can trigger. Even a perfectly safe test can lead to a series of events that can harm people

Finally, to promote screening we need to scare people about the disease (“that’s why you need to be screened”).  In other words, we are making everybody more worried about the future. Ironically, part of being healthy is being not too worried about health. Screening is responsible for injecting some “dis-ease” into the population.

Cover - Skeptical Inquirer 44.1

Nogueira: What is the effect of screening/early detection in survival statistics?
Welch: With more detection, the typical patient now does better. Among patients with the disease, they appear to have survived longer. This happens because people overdiagnosed or with less severe forms of disease are included in the “disease” group. Screening effects are really misleading: the harder you look, the more you find and everyone appears to be better. It is related to the popularity paradox of screening: the more overdiagnosis screening causes, the more popular screening becomes.

Nogueira: What have we learned about cancer progression and its relationship with screening?
Welch: Cancer is much more heterogeneous than we thought. Abnormalities that meet the pathological definition of cancer could have very different natural histories; they have variable growth rates.

It has been described as the barnyard pen of cancers. There are three animals in the barnyard: the birds, the rabbits and the turtles. The goal of screening is fence them in — to catch them early. However, we cannot catch the birds, because they are already gone. Birds are the most aggressive cancers; they have already spread by the time they are detectable. Screening does not help with those cancers. Sometimes we can treat them, but they are the worst type.

It is possible to catch the rabbits if you build enough fences. The rabbits are the cancers that can be detected earlier and will bother patients. So screening may help in these cases. For screening to be of help, treatment needs to be more effective early than it is late. Sometimes this is not true. In the case of breast cancer, a two-centimeter tumor can be treated as well a one-centimeter tumor.

Finally, we don’t need any fences for the turtles – because they are not going anywhere. Turtles meet the pathological definition of cancer. However, they are either not growing or growing so slowly that they will never cause problems until the patient dies from something else. Or they are regressing—some cancers start and they disappear; perhaps recognized by a well-functioning immune system.

The unfortunate reality is that screening is very good at finding turtles. Doctors are not able to distinguish turtles from rabbits, thus we treat everybody – creating the major harm of early detection: overdiagnosis and overtreatment.

Nogueira: How has screening affected the incidence of prostate cancer?
Welch: Note how the incidence of prostate cancer in US bounces around (see Figure 1). There is no known tumor biology or carcinogenic process that can explain this graph.  It looks more like a financial chart than a cancer incidence chart. And this is not a small number problem; it is the most common cancer in the database

The graph can be divided in four phases.  It begins in 1975 with the growth of Transurethral resection of the prostate (TURP), which at the time was a common prostate surgery done to help men with large prostates. With more pieces of prostates being sent to pathologists, the incidence of prostate cancer slowly increased. The second phase is PSA promotion, when hospitals started to offer free PSA test, knowing they would make their money back in subsequent blood tests, biopsies and treatments. Around 1995, the retrenchment era begun with urologists recognizing that they should not offer PSA screening for men with less than ten years of life expectancy, since they cannot be helped by screening. Finally, the discouragement took place after the US Preventive Services Task Force argued against PSA screening. It is remarkable the incidence at present is almost the same of 1975. In other words, this is a scrutiny-dependent cancer. I do not know of more powerful example of how the health care system affects the apparent amount of cancer.   

Figure 1 - Age-adjusted incidence of prostate cancer in the United States during 1975–2014 (Welch and Brawley 2018).

Nogueira
:
Among common cancer screening programs (for cervical, colorectal, breast and prostate cancer), what are their effects in the mortality of those cancers? 
Welch: We never had a randomized trial of cervical cancer screening; it was implemented before we considered randomized trials. There is a lot of observational data that suggests it is helpful, but it does not explain the 80% reduction in cervical cancer mortality. For instance, we have seen an 80% reduction in stomach cancer mortality and it is a cancer that we do not screen for. Colon cancer mortality is also declining and the fall started before the introduction of screening.

Screening for cervical cancer and colorectal cancer has had some effect in the mortality of those cancers. Breast cancer screening has had only a little effect on breast cancer mortality. The big effect in breast and prostate cancer is better treatment—we learned those cancers are hormonal diseases. 

Nogueira: How do you see the risk and benefit ratio of those cancer screening programs?
Welch: In general, people consider colorectal and cervical cancer screening in the side of more benefit than harm. I think this is largely because the problem of cancer overdiagnosis is less evident in those cases. Since they detected precancerous lesions, overdiagnosis takes place at a prior step — dysplastic polyps or cervical dysplasia. In colorectal cancer screening, there are complications from colonoscopy and from polypectomies (e.g. bleeding, perforations). In cervical cancer screening, there are complications from cryotherapy and excisions for precancerous lesions. (e.g. bleeding, preterm birth)

Cancer screening has a mix of effects. Most screening, including PSA and mammography, does help a few people, but also harm others. This is the conundrum we must be clear about. So, screening is not a public health imperative; it’s a choice.

And it can distract people from more important things they can be doing for their health. It can also distract resources from other more important interventions. There are two very different aspects to the word prevention. One is health promotion from behavior advices, such as do not smoke, eat real food, move regularly, and find meaningful relationships. They are not sexy or technological, but are very important to health. But when the prevention movement got medicalized, it became a technological imperative to look for early forms of disease.

We also have to be sensible with the overdiagnosis problem. We have to stop thinking the best test as the one that finds more cancers. Typically that is how tests are promoted, “this test finds more cancer than that”. That is not a good test; we are not looking to find more cancers; we want to find a few cancers that matter.

Nogueira: How can we make screening better, for instance to find those cancers we can make a difference on?  
Welch: This is best exemplified in the case of lung cancer screening. In the US, lung cancer is the most common cause of cancer death; it is a big problem. There is really well-defined risk group, which can be identified by a single question “Do you smoke?” We have a really common cause of death and an easy way to find a high risk group — it is a perfect situation for screening.

It was the first cancer studied for screening and it happened in the 1980s using chest x-ray. The results were terribly disappointing: screening led to more deaths; not less. This happened because screening triggered operations and some died from those operations. The idea of overdiagnosis in lung cancer was crazy, but it happened. Then, spiral CT comes along. Importantly, the investigators responsible for spiral CT trial knew about overdiagnosis. What they did was groundbreaking: when the spiral CT found a small lesion that looks worrisome, they did not act and did not biopsy immediately; they waited three months to see whether the lesion was growing. They were making use of the diagnostic value of time. Time provides information both about the genetics of the tumor and the body’s reaction to it. I think that is a step forward.

Everything changes when you move to a genuine high-risk population (recall that regular cigarette smokers are 20 times more likely than non-smokers to die from lung cancer). They are much less likely to be overdiagnosed and much more likely to be helped. But there are not a lot of risk factors as common and powerful as cigarette smoking. Most cancers are sporadic – not the result of some obvious risk factor.

Nogueira: All-cause mortality is not reduced in population wide cancer screenings trials. Could you explain why it matters?
Welch: It begins with what counts as a cancer death. In the context of evaluating a screening, I want cancer death not only to include deaths from cancer but also deaths due to interventions performed as part of looking for and treating the cancer. That is not what happens. That is why all-cause mortality is important. If we are going to tell people that screening “save lives”, I would like to know if it changes their risk of death. Unless you want to play a game that you care more about one type of death than another.

A good example is a classic study — the Minnesota Colon Cancer Control Study. It has now 30 years of follow-up. There are three arms in the study: annual and biennial screening and control group. After 30 years, 2% of annual group and 3% of control group died from colon cancer. This is the benefit: 1%, or to put in relatively terms, a 33% reduction in colon cancer death. However, all-cause mortality was the same in all groups (Figure 2). It is hard to say that is saving lives; it may be trading one form of death for another.

Figure 2. The Minnesota Colon Cancer Control Study: All-cause mortality was the same between three groups: control (non-screened), annual screening, and biennial year (Shaukat et al. 2013).
Nogueira: Since screening benefits are not large and there are harms, what are the reasons for the heavy promotion of screening?
Welch: The first is a true belief that early detection must help, as a solution to every bad disease. Money is another part, because is a great way to recruit new patients. It is good for Pharma, for test manufactures and increasingly good for our hospitals. It is a powerful idea to look for diseases early: if you could argue that everyone should do something, it is a huge market.  

Nogueira: What about clinical breast examination and self-breast examination often advertised to women?
Welch: The data is clear that clinical breast exam and teaching women to self-examine their breasts does not seem to help. But if a woman becomes aware of a new breast lump, she should have it evaluated.  Part of the attention to breast cancer has been good.  Ironically, it is possible that screening mammography could be the best way to do the clinical breast exam, if the threshold would be looking for things of 1-cm or bigger. I think a lot of harm from mammography could be reduced if the thresholds for further investigation were much higher.           
The general conundrum of screening is we have to involve a whole bunch of people to potentially help a very few. We have to pay attention to not disturb the rest of them.

Nogueira: How do you see the paper that claimed an increase in advanced cases of prostate cancer after USPSTF 2012 recommendation against screening?
Welch: That report — an increased number of late stages of prostate cancer — was highly flawed. They were only talking about "counts"; they never had a denominator.
In the US data so far (Figure 3), the incidence of metastatic prostate cancer at first presentation — the cancer was already metastatic at the moment of diagnosis — continues to stay stable. But I expect it 
will go up.

What you see is the implementation of PSA screening really had an effect on that incidence — almost cut it in half. This is a sign that the bad cancers are being found early.  But now it's been fairly stable, but I wouldn't be surprised if go back up, because PSA screening is going down. But whether that changes death rates, it is a separate question, because early treatment must matter.

Notice, in comparison, the incidence of metastatic breast cancer at first presentation never changes, it is pretty stable.  Mammography screening has not been able to reduce the amount of breast cancer diagnosed at this very late stage.  That’s not the mammographers fault, that the fault of  the agressive cancers (the birds in the barnyard analogy).

Figure 3. Incidence of cancer that was metastatic at first presentation in the United States, 1975–2012 (Welch et al. 2015).

References:

Shaukat, A., S.J. Mongin, M.S. Geisser, et al. 2013. Long-term mortality after screening for colorectal cancer. N Engl J Med. 369(12):1106-14. doi: 10.1056/NEJMoa1300720.

Welch, H.G., O.W Brawley. 2018. Scrutiny-Dependent Cancer and Self-fulfilling Risk Factors. Ann Intern Med. 168(2):143-144. doi: 10.7326/M17-2792.

Welch, H.G., D.H. Gorski, P.C. Albertsen. 2015. Trends in Metastatic Breast and Prostate Cancer — Lessons in Cancer Dynamics. N Engl J Med 373:1685-1687 doi: 10.1056/NEJMp1510443

No comments:

Post a Comment