Monday, April 27, 2020

Skepticism about cancer screening: An Interview with Dr. H. Gilbert Welch



Published in Skeptical Inquirer Vol 44, No. 1 (Jan/Feb 2020)


Dr. H. Gilbert Welch is an American physician and cancer screening researcher. As a former Professor in Dartmouth Institute for Health Policy and Clinical Practice, he has published many peer-reviewed papers about the harms of early detection and specifically, cancer screening — the systematic search for cancer before it causes symptoms.   


Welch is also a science writer. His first book, published in 2004, is Should I Be Tested For Cancer? Maybe Not and Here’s Why. Welch, along with researchers Lisa Schwartz and Steven Woloshin, wrote Overdiagnosed – Making People Sick in the Pursuit of Health, which deals with screening and other cases where medicine has been too much, probably causing more harm than good. His latest book was published in 2015 and is titled Less Medicine, More Health – 7 Assumptions That Drive Too Much Medical Care. 

In this interview, Welch and I discussed about why diagnosing a cancer early does not necessarily mean it is always a good thing. 

Dr. H Gilbert Welch
Nogueira: When we are discussing problems of screening, how can we get the message clear for the people not to be confused that all medical care is being criticized?
Welch: I am a conventionally trained physician and I believe medical care can do a lot of good – particular for people who are sick and injured. Making a timely diagnosis in people who are sick is really important. What I am worried about is when medical care expands to the population that is well – because it is hard to make a well person better, but it is not that hard to make them worse.

We might involve a thousand people in a screening program for ten years and one person is helped. This is good, but an important question is: What happened to the other 999? That is where I have been in my career for the last 20 years.

Nogueira: What is the main idea behind screening and its problems?
Welch: In the past, doctors waited for problems to develop in a population and made diagnosis and treatment in that fraction. The idea of screening or early detection is to advance in time the moment of diagnosis in the same population. The assumption behind screening is: the people diagnosed early will be those destined to develop problems.       

However, the reality has been different: whenever we look hard for early forms of disease, we find that more people have them.  Thus, not all of them will develop problems. As we do not know who is going to develop problems, we tend to treat all of them. This means we are treating some people for whom the disease would never be a problem — it is the overdiagnosed and needlessly treated fraction. They cannot be helped, but they can be harmed.  

Overdiagnosis happens to a relatively few individuals. A more common problem of screening is the disease scare — a false positive result. Many individuals require multiple visits and multiple tests before we are sure they don’t have cancer. Patients understand medications can have harms, but they cannot imagine how a test could have harms. They think that it is always good to know, but they do not recognize the cascade of events that a test can trigger. Even a perfectly safe test can lead to a series of events that can harm people

Finally, to promote screening we need to scare people about the disease (“that’s why you need to be screened”).  In other words, we are making everybody more worried about the future. Ironically, part of being healthy is being not too worried about health. Screening is responsible for injecting some “dis-ease” into the population.

Cover - Skeptical Inquirer 44.1

Nogueira: What is the effect of screening/early detection in survival statistics?
Welch: With more detection, the typical patient now does better. Among patients with the disease, they appear to have survived longer. This happens because people overdiagnosed or with less severe forms of disease are included in the “disease” group. Screening effects are really misleading: the harder you look, the more you find and everyone appears to be better. It is related to the popularity paradox of screening: the more overdiagnosis screening causes, the more popular screening becomes.

Nogueira: What have we learned about cancer progression and its relationship with screening?
Welch: Cancer is much more heterogeneous than we thought. Abnormalities that meet the pathological definition of cancer could have very different natural histories; they have variable growth rates.

It has been described as the barnyard pen of cancers. There are three animals in the barnyard: the birds, the rabbits and the turtles. The goal of screening is fence them in — to catch them early. However, we cannot catch the birds, because they are already gone. Birds are the most aggressive cancers; they have already spread by the time they are detectable. Screening does not help with those cancers. Sometimes we can treat them, but they are the worst type.

It is possible to catch the rabbits if you build enough fences. The rabbits are the cancers that can be detected earlier and will bother patients. So screening may help in these cases. For screening to be of help, treatment needs to be more effective early than it is late. Sometimes this is not true. In the case of breast cancer, a two-centimeter tumor can be treated as well a one-centimeter tumor.

Finally, we don’t need any fences for the turtles – because they are not going anywhere. Turtles meet the pathological definition of cancer. However, they are either not growing or growing so slowly that they will never cause problems until the patient dies from something else. Or they are regressing—some cancers start and they disappear; perhaps recognized by a well-functioning immune system.

The unfortunate reality is that screening is very good at finding turtles. Doctors are not able to distinguish turtles from rabbits, thus we treat everybody – creating the major harm of early detection: overdiagnosis and overtreatment.

Nogueira: How has screening affected the incidence of prostate cancer?
Welch: Note how the incidence of prostate cancer in US bounces around (see Figure 1). There is no known tumor biology or carcinogenic process that can explain this graph.  It looks more like a financial chart than a cancer incidence chart. And this is not a small number problem; it is the most common cancer in the database

The graph can be divided in four phases.  It begins in 1975 with the growth of Transurethral resection of the prostate (TURP), which at the time was a common prostate surgery done to help men with large prostates. With more pieces of prostates being sent to pathologists, the incidence of prostate cancer slowly increased. The second phase is PSA promotion, when hospitals started to offer free PSA test, knowing they would make their money back in subsequent blood tests, biopsies and treatments. Around 1995, the retrenchment era begun with urologists recognizing that they should not offer PSA screening for men with less than ten years of life expectancy, since they cannot be helped by screening. Finally, the discouragement took place after the US Preventive Services Task Force argued against PSA screening. It is remarkable the incidence at present is almost the same of 1975. In other words, this is a scrutiny-dependent cancer. I do not know of more powerful example of how the health care system affects the apparent amount of cancer.   

Figure 1 - Age-adjusted incidence of prostate cancer in the United States during 1975–2014 (Welch and Brawley 2018).

Nogueira
:
Among common cancer screening programs (for cervical, colorectal, breast and prostate cancer), what are their effects in the mortality of those cancers? 
Welch: We never had a randomized trial of cervical cancer screening; it was implemented before we considered randomized trials. There is a lot of observational data that suggests it is helpful, but it does not explain the 80% reduction in cervical cancer mortality. For instance, we have seen an 80% reduction in stomach cancer mortality and it is a cancer that we do not screen for. Colon cancer mortality is also declining and the fall started before the introduction of screening.

Screening for cervical cancer and colorectal cancer has had some effect in the mortality of those cancers. Breast cancer screening has had only a little effect on breast cancer mortality. The big effect in breast and prostate cancer is better treatment—we learned those cancers are hormonal diseases. 

Nogueira: How do you see the risk and benefit ratio of those cancer screening programs?
Welch: In general, people consider colorectal and cervical cancer screening in the side of more benefit than harm. I think this is largely because the problem of cancer overdiagnosis is less evident in those cases. Since they detected precancerous lesions, overdiagnosis takes place at a prior step — dysplastic polyps or cervical dysplasia. In colorectal cancer screening, there are complications from colonoscopy and from polypectomies (e.g. bleeding, perforations). In cervical cancer screening, there are complications from cryotherapy and excisions for precancerous lesions. (e.g. bleeding, preterm birth)

Cancer screening has a mix of effects. Most screening, including PSA and mammography, does help a few people, but also harm others. This is the conundrum we must be clear about. So, screening is not a public health imperative; it’s a choice.

And it can distract people from more important things they can be doing for their health. It can also distract resources from other more important interventions. There are two very different aspects to the word prevention. One is health promotion from behavior advices, such as do not smoke, eat real food, move regularly, and find meaningful relationships. They are not sexy or technological, but are very important to health. But when the prevention movement got medicalized, it became a technological imperative to look for early forms of disease.

We also have to be sensible with the overdiagnosis problem. We have to stop thinking the best test as the one that finds more cancers. Typically that is how tests are promoted, “this test finds more cancer than that”. That is not a good test; we are not looking to find more cancers; we want to find a few cancers that matter.

Nogueira: How can we make screening better, for instance to find those cancers we can make a difference on?  
Welch: This is best exemplified in the case of lung cancer screening. In the US, lung cancer is the most common cause of cancer death; it is a big problem. There is really well-defined risk group, which can be identified by a single question “Do you smoke?” We have a really common cause of death and an easy way to find a high risk group — it is a perfect situation for screening.

It was the first cancer studied for screening and it happened in the 1980s using chest x-ray. The results were terribly disappointing: screening led to more deaths; not less. This happened because screening triggered operations and some died from those operations. The idea of overdiagnosis in lung cancer was crazy, but it happened. Then, spiral CT comes along. Importantly, the investigators responsible for spiral CT trial knew about overdiagnosis. What they did was groundbreaking: when the spiral CT found a small lesion that looks worrisome, they did not act and did not biopsy immediately; they waited three months to see whether the lesion was growing. They were making use of the diagnostic value of time. Time provides information both about the genetics of the tumor and the body’s reaction to it. I think that is a step forward.

Everything changes when you move to a genuine high-risk population (recall that regular cigarette smokers are 20 times more likely than non-smokers to die from lung cancer). They are much less likely to be overdiagnosed and much more likely to be helped. But there are not a lot of risk factors as common and powerful as cigarette smoking. Most cancers are sporadic – not the result of some obvious risk factor.

Nogueira: All-cause mortality is not reduced in population wide cancer screenings trials. Could you explain why it matters?
Welch: It begins with what counts as a cancer death. In the context of evaluating a screening, I want cancer death not only to include deaths from cancer but also deaths due to interventions performed as part of looking for and treating the cancer. That is not what happens. That is why all-cause mortality is important. If we are going to tell people that screening “save lives”, I would like to know if it changes their risk of death. Unless you want to play a game that you care more about one type of death than another.

A good example is a classic study — the Minnesota Colon Cancer Control Study. It has now 30 years of follow-up. There are three arms in the study: annual and biennial screening and control group. After 30 years, 2% of annual group and 3% of control group died from colon cancer. This is the benefit: 1%, or to put in relatively terms, a 33% reduction in colon cancer death. However, all-cause mortality was the same in all groups (Figure 2). It is hard to say that is saving lives; it may be trading one form of death for another.

Figure 2. The Minnesota Colon Cancer Control Study: All-cause mortality was the same between three groups: control (non-screened), annual screening, and biennial year (Shaukat et al. 2013).
Nogueira: Since screening benefits are not large and there are harms, what are the reasons for the heavy promotion of screening?
Welch: The first is a true belief that early detection must help, as a solution to every bad disease. Money is another part, because is a great way to recruit new patients. It is good for Pharma, for test manufactures and increasingly good for our hospitals. It is a powerful idea to look for diseases early: if you could argue that everyone should do something, it is a huge market.  

Nogueira: What about clinical breast examination and self-breast examination often advertised to women?
Welch: The data is clear that clinical breast exam and teaching women to self-examine their breasts does not seem to help. But if a woman becomes aware of a new breast lump, she should have it evaluated.  Part of the attention to breast cancer has been good.  Ironically, it is possible that screening mammography could be the best way to do the clinical breast exam, if the threshold would be looking for things of 1-cm or bigger. I think a lot of harm from mammography could be reduced if the thresholds for further investigation were much higher.           
The general conundrum of screening is we have to involve a whole bunch of people to potentially help a very few. We have to pay attention to not disturb the rest of them.

Nogueira: How do you see the paper that claimed an increase in advanced cases of prostate cancer after USPSTF 2012 recommendation against screening?
Welch: That report — an increased number of late stages of prostate cancer — was highly flawed. They were only talking about "counts"; they never had a denominator.
In the US data so far (Figure 3), the incidence of metastatic prostate cancer at first presentation — the cancer was already metastatic at the moment of diagnosis — continues to stay stable. But I expect it 
will go up.

What you see is the implementation of PSA screening really had an effect on that incidence — almost cut it in half. This is a sign that the bad cancers are being found early.  But now it's been fairly stable, but I wouldn't be surprised if go back up, because PSA screening is going down. But whether that changes death rates, it is a separate question, because early treatment must matter.

Notice, in comparison, the incidence of metastatic breast cancer at first presentation never changes, it is pretty stable.  Mammography screening has not been able to reduce the amount of breast cancer diagnosed at this very late stage.  That’s not the mammographers fault, that the fault of  the agressive cancers (the birds in the barnyard analogy).

Figure 3. Incidence of cancer that was metastatic at first presentation in the United States, 1975–2012 (Welch et al. 2015).

References:

Shaukat, A., S.J. Mongin, M.S. Geisser, et al. 2013. Long-term mortality after screening for colorectal cancer. N Engl J Med. 369(12):1106-14. doi: 10.1056/NEJMoa1300720.

Welch, H.G., O.W Brawley. 2018. Scrutiny-Dependent Cancer and Self-fulfilling Risk Factors. Ann Intern Med. 168(2):143-144. doi: 10.7326/M17-2792.

Welch, H.G., D.H. Gorski, P.C. Albertsen. 2015. Trends in Metastatic Breast and Prostate Cancer — Lessons in Cancer Dynamics. N Engl J Med 373:1685-1687 doi: 10.1056/NEJMp1510443

Thursday, April 23, 2020

How Much Longer Will Cancer Screening Myths Survive?


Published in Skeptic Vol 24, No. 4



It has been 20 years since Dr. Angela Raffle published an article in prestigious medical journal The Lancet with the provocative title “How long will screening myths survive?” [1]. Although screening myths have been discussed extensively in peer-reviewed articles since Raffle’s publication, I think the public still needs balanced information. This article will review important concepts and myths of screening.  

Screening is systematic search for disease, through medical tests, in people without symptoms of the disease being screened for. Common population screening programs include screenings for breast, cervical, colorectal and prostate cancers.  

One of the myths discussed in Raffle’s article was regarding the comment “A recent analysis…has shown a reduction in the risk of cervical cancer by 95% for at least 8 years.” Another example is the statement made by Rudy Giuliani, former New York City Mayor who had been diagnosed with prostate cancer. When he was running for president in 2008, Giuliani tried to make a political statement that the American health system was much better than the “socialized” medicine of England, when he claimed he had an 82% of chance of surviving prostate cancer in the United States, compared to only 44% in England [2].

Probably without knowing it, Giuliani compared patients that were diagnosed in different ways, making his comparison invalid. Screening in the United States was much more common than in England. It is not possible to compare screened patients with non-screened patients due to the healthy screened-effect—people who get screened tend to be healthier, physically fit, non-smokers, and to have less social problems than those who do not get screened [3].    

That is why randomization is used in clinical trials—to assure the groups are equal with the only difference being the intervention under analysis. In a screening context, however, those numbers could have come from randomized clinical trials and it still would have been misleading. 

The numbers Giuliani referred to are the 5-year survival rate for prostate cancer in the United States and England in 2000. Five-year survival rate is the proportion of patients with a specific cancer who are still alive five years after the diagnosis. It is probably the most common statistic used to measure cancer prognosis [4]. The problem happens when the survival rates include screened patients.

The idea of screening is to advance in time the moment of diagnosis to allow early detection—a cancer that would have been diagnosed due to symptoms in advanced stages is now detected years before in the asymptomatic phase. Imagine that a group of patients without screening is diagnosed due to symptoms at age 63 years, but die from the cancer at 65. Now, consider that screening detects the tumor at age 59 and the patients still die from the cancer at 65. Thus, without screening the 5-year survival rate was 0%, but with screening it is 100%, even though screening did not make any of them to live longer—both group of patients died at the same time. This is called the lead time bias [2, 3] and it is illustrated in Figure 1.  



Another problem with screening is the length-time bias. This happens because screening is done periodically, and cancer, contrary to what most people think, is a heterogeneous disease with different progressions rates [2,3]. Really aggressive and more lethal cancers tend not to be detected by screening because they grow fast and cause symptoms between screening rounds. Similarly, screening tends to detect slow-progressive cancers. As a result, a group of patients whose cancers were detected by screening will live longer than those diagnosed clinically simple because screening selected a group of patients with a better prognosis. In fact, some of those would not have progressed or would have regressed spontaneously. That means that screening detects abnormalities that meet the pathological criteria of cancer, but would not have caused symptoms or death in the patient’s lifetime. This is called overdiagnosis [5]. The problem of overdiagnosis is that at the time of diagnosis it is not possible to know which cases will progress and which will not, so almost everyone is treated, leading to overtreatment. 

Overdiagnosis also inflates the survival statistic [2, 3]. Figure 2 shows a hypothetical example. Imagine that without screening, of 1000 people with a specific cancer, after five years 900 are dead and 100 are alive. Now imagine that screening correctly identifies those 1000 patients, but also identifies 4000 patients whose cancer would not have progressed—they were overdiagnosed. Screening increased 5 year survival rate from 10% to 82%, but the number of deaths was the same in both scenarios. 



Had Rudy Giuliani compared the prostate cancer mortality rates in both countries, he would have seen that they were roughly the same in 2000: 26 and 27 per 100,000 in US and England, respectively [2]. Contrary to his claim of a superior healthcare system in the US, higher survival rates and unchanged mortality actually show the opposite: Americans are more likely to be overdiagnosed and overtreated for prostate cancer, but for no reason because there is no benefit from those extra diagnosed and treated patients.

An inflated survival rate without screening saving lives is not just a speculation. In a randomized trial of chest X-ray screening for lung cancer in smokers, the 5-year survival rate was 35% for the screened group and 19% for the control group, but the mortality was slightly higher in the screened group [6]. Moreover, another study [7] found no correlation between differences in 5-year survival rates and mortality rates for 20 different types of cancers in the US. Between 1950 to 1995 in the US, the most drastic change in 5-year survival rate was for prostate cancer (43% to 93%) [7], a period where the incidence of prostate cancer increased substantially especially after screening started.   

The screening for neuroblastoma is another illuminating case [3]. Neuroblastoma is a cancer that occurs in children and usually has a better prognosis when it appears before age one, and a worse prognosis after that. In 1985, screening children for neuroblastoma started nationwide in Japan. Until 1988, screening detected 337 cases and they had a 97% survival rate, much higher than the 5-year survival rate of 50-55% of unscreened children. Even though screening increased considerably the incidence of neuroblastoma, the number of children diagnosed after age one did not change. More critically, the mortality of neuroblastoma in Japan was similar of other countries that had not introduced screening. Due to considerable overdiagnosis and lack of clear benefits, the neuroblastoma screening in children ended in Japan in 2004. 

Despite biased survival statics being widely used for the promotion of screening, medical doctors are not informed about it. In a 2012 survey [8], 76% of American physicians wrongly thought that better 5-year survival rates are evidence of screening benefits. Furthermore, 47% wrongly considered that more cancers detected in screened than non-screened populations represent evidence that screening saves lives. Only mortality can be used as evidence for screening efficacy. But it is also not that simple.

The main outcome used to measure screening efficacy is the death rate caused by the screened cancer—cancer specific mortality. There are two problems with cancer-specific mortality. First, it is more common to be overdiagnosed and overtreated than to avoid a death caused by the screened-cancer. For example, in the ERSPC study, which reported a reduction in prostate cancer mortality, 27 men had to be treated to avoid one prostate cancer death [9]. As another example, screening for breast cancer leads to more mastectomies [10]. Moreover, treatment can increase mortality for other causes. For instance, radiotherapy for breast cancer increases mortality for lung cancer and heart disease [10]. Thus, to look only for cancer specific mortality could miss deaths caused by treatment. Second, it has been documented that misclassification of the cause death is another source of bias in favor of screening [10]. A good example of how biased cancer specific mortality might be is the Swedish trials of mammography screening: for every 1,000 women screened every other year for 12 years, while one breast cancer death was avoided, the total number of deaths increased by six [11].

Thus, the only unbiased outcome is overall mortality. Only a reduction in overall mortality actually shows what we want to know: whether screening save lives. The best evidence of a screening reducing overall mortality was for lung cancer using low-dose CT in smokers, but a subsequently systematic review did not find that effect [12]. Since we are dealing with healthy people, a large sample is required to detect a difference in overall mortality. As H. G. Welch wrote in his 2015 book Less Medicine, More Health [13], “we should dump “screening saves lives language.” We should publicly acknowledge that we cannot be sure whether early detection lengthens, shortens, or has no effect on how long people live. And we should be clear that if it takes so many people to find out for sure, then the benefit must be, at best, small.”

The only way for people who get screened to understand its harms is, as Raffle ended her article, to “avoid contributing myths and fallacies to the debate.”  

References
1. Raffle, A. 1999. “How long will screening myths survive?” Lancet, 354:431-43
2. Gigerenzer, G., Gaissmaier, W., Kurz-Milcke, E., et al. 2007. “Helping Doctors and Patients Make Sense of Health Statistics.” Psychological Science in The Public Interest. 8(2):53-96.
3. Raffle, A.E., Gray, J.A.M. 2007. Screening: Evidence and Practice. Oxford: Oxford University Press. 
4. Wegwarth, O., Gaissmaier, W., Gigerenzer, G. 2010. “Deceiving Numbers.” Medical Decision Making 3: 386-394. 
5. Brodersen, J.,Schwartz, L.M., Heneghan, C., et al. 2018. ‘Overdiagnosis: what it is and what it isn’t.” BMJ Evid Based Med. 23:1-3.
6. Woloshin, S., Schwartz, L.M., Welch, H.G. 2008. Know Your Chances: Understanding Health Statistics. Berkeley (CA): University of California Press. 
7. Welch, H.G., Schwart, L.M., Woloshin, S. 2000. “Are increasing 5-year survival rates evidence of success against cancer?” JAMA 283: 2975-2978.
8. Wegwarth, O., L.M. Schwartz, S. Woloshin, et al. 2012. “Do physicians understand cancer screening statistics? A national survey of primary care physicians in the United States.” Annals of Internal Medicine 156:340-349.
9. Fenton, J.J., Weyrich, M.S., Durbin, S., et al. 2018. “Prostate-specific antigen–based screening for prostate cancer: A systematic evidence review for the U.S. Preventive Services Task Force.” Agency for Healthcare Research and Quality, Evidence Synthesis No. 154. AHRQ Publication No. 17-05229-EF-1. 
10. Gøtzsche PC, Jørgensen KJ. 2013. “Screening for breast cancer with mammography.” Cochrane Systematic Review.
Due to overdiagnosis, radiotherapy is used more in screened than non-screened groups. A radiotherapy meta-analysis reported 78% and 27% excess mortality from lung cancer and heart disease, respectively. Bias in cause of death is another issue.
First, determining the cause of death when patients have multiple diagnoses is a source of error. And, in many mammography screening trials, cause of death was not assessed on blind review, increasing the chance of bias. Furthermore, as radiotherapy reduces the chance of breast cancer local recurrence, it makes it more likely that deaths in screened women with breast cancer will be misclassified as from other causes.  
11. Gøtzsche P.C., Olsen O. 2000. “Is screening for breast cancer with mammography justifiable?” Lancet. 355:129-34.
12. Prasad, V., Lenzer, J., Newman, D.H. 2016. “Why cancer screening has never been shown to ‘save lives’—and what we can do about it.” BMJ, 352.
13. Welch, H.G. 2015. Less Medicine, More Health: 7 Assumptions That Drive Too Much Medical Care. Beacon Press.