MISUSE OF STATISTICS
IN MEDICAL RESEARCH

V.I. Fabrikant
Prisoner # 167 932 D
Archambault jail
Ste-Anne-des-Plaines
Québec, Canada J0N 1H0


Summary.  We analyse here several recent cardiologic studies as well as some cancer research results to show that their use of statistics was fundamentally flawed and led to unwarranted conclusions.  Some recommendations are made with respect to the directions of further research in these fields.

INTRODUCTION

A typical research in the effectiveness of new medications is done according to the following scheme.  The researchers take two groups of consenting patients and assign one group to receive the medication, while the other group receives a placebo.  The study is usually called “blind”, since the patients do not know, whether they receive a medication or a placebo.  Sometimes the study is called “double-blind”, which means that the researchers are also unaware as to which patient belongs to which group.  This creates in public the impression of impartiality of the researchers.

After several years of observations of both groups of patients, a statistical analysis of the results leads the researchers to the conclusion, whether the medication was effective and whether it should be recommended for general use.  Reading of several research publications in cardiology revealed that the difference between the medicated group and the placebo group was just a couple of percentage points, nevertheless, these medications were declared effective and recommended for a universal use.

Yet another kind of medical research uses statistics to establish a causal connection between the individual lifestyle (individual diet included) and the incidence of certain disease.  Sometimes, their conclusions seem to make sense and sometimes they make no sense at all.  As an example of the first, we may recall that for about 100 years we strongly believed, that ulcers were caused by stress, spicy food and alcohol, none of which was correct, but it did make sense and it was supported by proper statistics.  As an example of the second, one can find in the publication of American Cancer Society (1997) [1] higher education listed as a risk factor for breast cancer.  This makes no sense and leads us in a wrong direction.

All the above was quite compelling to undertake an in-depth analysis of the use of statistics in medical research in trying to understand, whether the use of statistics was proper and whether the conclusions reached were well founded.  The results of such analysis are presented below.
 

SOME FUNDAMENTALS OF STATISTICS

We need to remind the reader some fundamentals and to show, how they are used in medical research.  Bertrand Russell has allegedly defined mathematics as science, which does not know, what it is talking about and does not know, whether what it is saying is correct.  What he meant to say, was that every mathematical statement is conditional: if statement  A  is correct, then statement B is correct, otherwise it is not.  Therefore, we have first to analyse, whether the presumptions of statistics are fulfilled in a specific medical research.

First, we introduce the term ideal medication.  This term refers to a medication, which cures 100% of patients.  Though there is no exactly ideal medication, some medications are pretty close to it.  In the case of an ideal medication, there is no space for statistical research, as we know it, the process of cure is not random at all.  The better is the medication, the less random is the process, the less applicable is the contemporary statistical apparatus, which is based on the so-called  normal distribution.  An effective medication, by its definition, should affect all (or at least majority) of patients.  When a studied medication affects a minority of the patients only and when we presume the effect of medication be governed by a normal distribution, we effectively admit, that it does not work, as it should.  Not every statistical method requires the distribution to be normal, but it then uses other equally questionable presumptions.

Now we give a layman description of statistics behind the medical research.  We want to find out whether a certain medication helps patients affected by a specific disease.  We imagine an abstract experiment: we take at random all possible samples of patients from the total population of patients.  We take two samples  n1  and  n2  patients at a time, assign one sample to receive a placebo, while the other sample receives the medication under study.

We observe both samples for several years and register the number of endpoint events (death, heart attack, etc.) in each group.  Suppose, the number of endpoints in the placebo group was N1, while the medicated group registered N2 endpoints.  We compute the percentage of endpoints in each group as  p1=(N1/n1)*100%   and   p2=(N2/n2)*100%.  Though the distribution of each  p1  and  p2  is not normal, the distribution of their difference  p1–p2  will be sufficiently close to normal for a large sample of, say, 5000 patients.  We emphasize once again, that for an ideal medication, the distribution of  p1–p2  will never be normal, no matter, how big the sample is, so in a way, the closer to normal is the distribution, the more ineffective is the medication.

We called the experiment described above as abstract, because it is humanly impossible to actually take  every  possible sample out of many millions of population and to observe them for a number of years.  We need to be able to take just one pair of samples, observe them for several years and to conclude whether the medication actually helped the patients and whether it should be recommended for general use.  Statistics provides us with such an apparatus.

Mark Twain has allegedly quipped that there are three kinds of lies: an ordinary lie, a big damn lie and statistics (some sources attribute this quote to Disraeli, but we would not be surprised, if none of them actually said it).   The serious side of this joke tells us, that we have to be very careful while processing any statistical data and not to jump to the conclusions, which are not warranted.  The usual pitfalls of statistics include the  post hoc ergo propter hoc (Latin for  after something means because of something) syndrome.  Even if statistics tells us that women with higher education are more likely to get breast cancer, it is nonsense to quote education as a risk factor, because depriving women of education certainly will not make them healthier.  Yet another pitfall is application of normal distribution laws to quantities, which do not follow normal distribution.  As we shall see further, contemporary medical research is deficient in all the above items plus a whole lot more.

Returning back to our experiment, it is obvious that, if the percentage of endpoints in the medicated group  p2  is greater than the percentage of endpoints in placebo group  p1, the medication does not work and the matter is closed.  In the opposite case, we need to decide, whether the difference  p1–p2  is  statistically significant.  This term means, that we must be reasonably sure, that the difference between  p1  and  p2  is not just due to chance.  Presuming that  p1–p2  has a normal distribution, the following formula can be used to estimate the probability, that the difference between  p1 and  p2  cannot be attributed to a chance [2]:

                                                                    (1)

Here  C  is called critical parameter.  When this parameter is computed, we can look into the table [2] and estimate the probability that the difference  p1–p2   is not just due to a chance.  In reality, medical research uses more complicated apparatus (Kaplan-Meier procedure, logrank test, etc.), rather than formula (1).  We introduced for our analysis formula (1), because of its simplicity and because it essentially leads us to the same conclusions, as the more sophisticated procedures.

Let us analyse formula (1).  The product  p1(100–p1) has a minimum equal to zero for  p1=0  or  p1=100; it has a maximum of 2500 for  p1=50.  In the analysed studies, the parameter p1  is of the order of 10% and the sample size is of the order of 5000.  This means that the denominator in equation (1) will be of the order of  0.6.  For the case of normal distribution, the critical parameter C should be greater than 1.96 to guarantee the 95% confidence interval.  This means that the difference between p1 and p2 of  1.2% becomes statistically significant.  The degree of statistical significance increases dramatically with growth of the difference between  p1  and  p2 : when this difference is just 2%, the confidence interval becomes greater, than 99.8%; the difference of 2.5% guarantees the confidence interval of  greater than 99.99%.  This creates in us impression that we just cannot go wrong with our conclusions, and this impression is false.

There are several pitfalls here.  First, as we mentioned before, for a really effective medication, the distribution is not normal.  Second, we can never be sure, that our data are absolutely correct, humans are prone to errors, so we should allow at least ±(5÷10)% variation and check whether such a variation would lead us to an opposite conclusion.  In the cases of cardiac studies discussed below, it does.  Third, the critical parameter tells us that the two samples are probably different, it does not tell us, that they are different, because of the medication under study.  Now we are ready to discuss some specific studies.
 
 
 

CARDIAC RESEARCH STUDIES

The EUROPA study [3].  Over 12000 patients were randomised in a placebo group (n1=6108) and perindopril group (n2=6110).  After 4 years of study, 603 patients (p1=9.9%) in the placebo group and 488 patients (p2=8%) in perindopril group had a primary endpoint.  The author of the study publication claims that perindopril has reduced the relative risk of primary endpoint by 20% and recommended it for general use.  How did he get 20% out of the difference   p1–p2=1.9%?  The trick is in the introduction of the term relative risk:  1.9% constitutes 20%  of 9.9%.  Exactly the same result could be expressed in a different wording, namely, if a patient takes perindopril, his chance of not having a heart attack is 92%, while if he does not take perindopril, his chance of not having a heart attack is 90%.  In this wording, does the recommendation to use perindopril sound convincing?

Now let us see, how the study results deal with the pitfalls mentioned in the previous section.   We do not have to deal here with stability of the basic data, because on the page 784 of the publication [3] it is mentioned that at three years of study, 81% of patients in perindopril group were actually taking the medication.  This means, that 19% of the perindopril group did not take the medication and de-facto belonged to the placebo group.  The number of patients in perindopril group has to be re-calculated as n2=6110*0.81=4950 and the difference of 1160 patients added to the placebo group, so that n1=6108+1150=7268.  The percentages are computed as  p1=603/7268=8.3% and  p2=488/4950=9.9%.  Now the placebo group is safer than the perindopril group and their difference p1–p2= -1.6%, according to formula (1), is statistically significant with the confidence interval greater than 99.7%.  Does this prove that perindopril is dangerous for the health?  Of course, not, but it does prove, just how shaky is the basis, on which the authors of the study made their conclusions.

Of course, one can argue that among 488 endpoint cases in perindopril group, some should also be transferred to the placebo group.  If this is the case, this proves only one thing: the study was extremely poorly organized and no reliable conclusion can be made from its results.  A responsible researcher always errs on the side of caution.

Just out of curiosity, we made computations to check the stability of the results due to the ±5%  variation in the basic data.  The subtraction of 5% from the perindopril group and addition of 5% to the placebo group gave  p1=9%  and  p2=8.9%.  Now their difference is not statistically significant, so the data failed the stability test.

If all the above did not convince the reader that there was no basis to declare perindopril useful for patients, here is one more argument.  Presume, that all the data are exact.  Can we then conclude that perindopril save lives?  No, because all, what statistics test tells us, is that one group was different from the other, it does not  tell us why.  It is the authors of the study own conclusion, that the reason for difference was perindopril.  A careful scientist surely has to ask whether other reasons for 2% difference can be found.

Let us look at the Table 1 of [3], where the baseline characteristics of both groups are given.  We see, that average age and standard deviation in both groups are the same.  Does this mean, that both groups have the same age distribution?  If this distribution was normal, then yes, otherwise, the answer is no.  Human age cannot possibly have a normal distribution, therefore, we cannot say that the same age and standard deviation guarantees equivalency of groups.  The same argument is valid for other parameters, like heart rate, blood pressure, etc.  In addition, in every international study due to different life expectancy, a 60-year-old from one country might be equivalent to a 70-year-old from another.

We note, that placebo group had 1% more of diabetics, than perindopril group, 0.7% more patients with positive stress test, 0.3% more patients with peripheral vascular disease, 0.2% more with hypertension, etc.   In addition, almost all patients were receiving other medications (6 different medications are listed in the table), which certainly could influence the final results.  The total of 21 items are listed in Table 1 (alcohol and tobacco consumption are not listed, and they could influence the outcome of the trial as well), so it would be quite plausible to presume, that those tiny differences between groups did add up to 2% final difference between them and that this 2% difference has nothing to do with perindopril.

Last, but not least: we read in [3], that 50 (fifty!) patients need to be treated  for 4 (four!) years to prevent 1 (one!) endpoint event.  It is surprising to see, that nobody seems to ask: “What about the remaining 49 patients?  Why don’t they get some measurable benefit?”  Indeed, perindopril belongs to a group of medications called angiotensin-converting enzyme  (ACE) inhibitors.  We read in [3], that “in addition to lowering blood pressure, ACE inhibitors possess direct cardiovascular protective effects”.  If this is so, then this protective effects should be felt by all (or, at least, majority) of patients, not by a tiny minority.  Since there is no explanation, as to why perindopril does not act on everybody, one should adopt more logical opinion, that perindopril does no good to anyone and that the tiny difference of 2% between the perindopril and placebo group should be attributed to numerous other factors.

Heart Protection study [4].  In this study, 5963 adults aged 40-80 with diabetes  and 14573 patients with occlusive arterial disease, but no diabetes, were randomly allocated 40 mg simvastatin daily or matching placebo.  The total simvastatin group consisted of 10269  patients, while the placebo group had 10267 patients.  After 5 years of observation, there were 898 (p2=8.7%) major coronary events in simvastatin group and there were 1212 (p1=11.8%) major coronary events in the placebo group.  The authors of the study declared that the use of simvastatin resulted in a significant decrease of  24%  in the number of major coronary events.

They also mentioned, that average compliance in the simvastatin group was 85%.  This means that 15% of patients in simvastatin group did not take the medication and should be added to the placebo group.  If we perform the re-calculation, then the simvastatin group will be 10269*0.85=8723;  the placebo group will become 10267+1540=11807;  p1=1212/11807=10.3%;  p2=898/8723=10.3%.  Now there is no difference between the two groups.

The situation becomes overcomplicated due to the fact, that 17% of the patients in the placebo group were taking non-study statins.  The authors play it in their favour, claiming that, if everyone in the simvastatin group took the medication, the difference would have been even stronger.  This logic presumes, as given, that simvastatin is beneficial.  It goes against the concept of the null hypothesis: we presume simvastatin to be useless, until proven otherwise.  As was shown above, when proper adjustments are made to the numbers of patients in each group, simvastatin is, indeed, useless.

Of course, one may argue, that not only the patients have to be re-distributed, but also the number of major events in each group has to be re-distributed, according to whether or not a patient was taking simvastatin.  We are not sure, that this can be done reasonably accurate, and again, a cautious scientist should consider the worst possibility, namely, all outcomes in the simvastatin group came from the patients, who took simvastatin. The main point though has been proven: the source data are unreliable, a slight variation leads to opposite conclusion; in this situation, the only reasonable conclusion: simvastatin is useless.

HOPE vitamin E study [5].  A total of 2545 women and 6996 men 55 years of age or older with high risk of cardiovascular events were enrolled to see whether a daily dose of 400 IU of vitamin E might be beneficial for them.  A total of 772 of the 4761 patients assigned to vitamin E (16.2 percent) and 739 of the 4780 assigned to placebo (15.5 percent) had a primary outcome event.  The conclusion was made that vitamin E had no statistically significant effect on cardiovascular outcomes.  In this study, about 90% of patients were actually taking vitamin E.  A cautious scientist should investigate the worst-case scenario: taking this 10% of patients, who did not use vitamin E, from the medicated group and placing them into a placebo group.  Now re-calculated percentages would become 18% for the vitamin E group and 14% percent for the placebo group.  Now their difference becomes highly statistically significant and we may formally conclude that vitamin E is harmful.  Would it be a well-founded conclusion?  Of course, not, but we just wanted to show, how a small variation in data leads to totally different conclusions.

HOPE ramipril study [6].  A total of 9297 high-risk patients (55 years of age or older) who had evidence of vascular disease or diabetes plus one other cardiovascular risk factor were randomly assigned to receive ramipril (10 mg once per day orally) or matching placebo for a mean of five years.  At the end of the study, 651 patients out of 4645 who were assigned to receive ramipril (14.0 percent) reached the primary endpoint, as compared with 826 patients out of 4652 who were assigned to receive placebo (17.8 percent).  The conclusion was made that ramipril was highly beneficial and it was recommended for general use.  As before, movement of just 10% of patients from ramipril group to placebo group changes these numbers to 15.5% and 16.1% respectively.  Now the difference between them is no longer statistically significant.  Yet another 10% change in the group allocations will make the percentages 17.5% and 14.8% and now ramipril will become dangerous.

The authors explain the beneficial action of ramipril as one of the ACE inhibitors: “…it is likely that angiotensin-converting-enzyme inhibitors exert additional direct mechanisms on the heart or the vasculature that are important. These may include antagonizing the direct effects of angiotensin II on vasoconstriction, the proliferation of vascular smooth-muscle cells, and rupture of plaques; improving vascular endothelial function; reducing left ventricular hypertrophy; and enhancing fibrinolysis”.  It has been our experience, that when so many words are being used to describe the benefits of a medication, the plain English equivalent would be: “We do not know, whether ACE inhibitors do any good, and if they do, we have no idea, why”.

At the same time, the authors write: “Treating 1000 patients with ramipril for four years prevents about 150 events in approximately 70 patients”.   Well, we decided to check these numbers.  In the placebo group,  826/4.652=178 patients per thousand had a primary endpoint; in the ramipril group,  651/4.645=140 patients per thousand had their endpoint, the difference being 38, so how did they come to the number of 70 patients?  Whatever is the right number, one thing is obvious: only a tiny minority of patients is claimed to have benefited from the medication and nobody asks an obvious question: if ramipril is an effective medication with so many beneficial factors, why did not majority of patients benefit from it?

The patients in this study were recruited from December 1993 to June 1995 at 129 centers in Canada, 27 centers in the United States, 76 centers in 14 western European countries, 30 centers in Argentina and Brazil, and 5 centers in Mexico.  Human beings are prone to errors.  There is no way, that so many participants from so many countries would not make mistakes accounting for the 3.8% of actual difference (17.8% – 14.0%) between the ramipril group and placebo group.  Again, due to a different life expectancy, a 60-year-old from North America is not equivalent to a 60-year-old from South America.  In addition, ramipril is one of the ACE inhibitors, so all the criticism, which was levied above against EUROPA study, is valid here as well.
 

CANCER RESEARCH

There is a proverb stating that if it walks like a duck and quacks like a duck, it is a duck.  It does sound like a common sense approach, but it was  shown on numerous occasions to be plain wrong in scientific research.   One such example is ulcer research.  It did make sense to blame stress, diet and alcohol, and there was plenty statistics to support it, yet it was proven wrong, when H. pallory was discovered to be the real reason for ulcer.  Regretfully, medical research does not learn from past mistakes and limits itself to investigation whether something “walks like a duck and quacks like a duck” and then immediately declares it to be “ a duck”, while neglecting all the evidence to the contrary.  Here are some examples.

Lung cancer.  It has long been accepted, that the lung cancer is caused mainly by smoking.  There is plenty of statistics to support it: majority of lung cancer cases are smokers.  When a person is not a smoker, we attribute it to the secondary smoke.  Are there some facts, which contradict this notion?  Of course, there are, but they are being ignored.  For example, from the year 1930 to 1990 the percentage of smokers in US decreased in half; at the same time, the number of people dying of lung cancer per 100,000 of population has increased more than 10 (ten!) times [1].

If smoking were real reason for the lung cancer, decrease in the number of smokers should certainly decrease the number of deaths.  Not only did it not happen, on the contrary, the number of deaths increased ten times.  Of course, one may argue, that the lung cancer takes many years to develop, so we can not expect an immediate reaction.  O.K, we are prepared to allocate 40 years of delay.  This would mean, that from 1890 to 1950 there was ten times increase of the number of smokers.  For example, if we presume only 10% of population smoking in 1890, by 1950 100% of population should become smokers, which was certainly not the case.

Every rigorous science rejects a scientific hypothesis, when there is just one contradicting fact.  Regretfully, medical research does not do it.  There is more statistics contradicting causal relationship between smoking and lung cancer.  For example, vast majority of smokers do not die of lung cancer.  For example, over 25% of population in US smoke, while only 6% of deaths are due to lung cancer, which means, that even if we presume that all dead from lung cancer were smokers (which is not the case), 3 out of 4 smokers do not die of lung cancer.

The most harm from the smoking theory of lung cancer, that all efforts are being made to stop people from smoking, while statistics shows clearly that number of deaths not only does not decrease, but rather increases.  It is time to have the courage to admit the mistakes and start looking for real reasons of lung cancer from square one.  It is very important to start asking the right questions.  In this case, the right question would be: why some people can smoke all their lives and live until 99 years old, without any ill effects of smoking.  The answer to this question would help us to identify the people, who are vulnerable to smoke and save their lives.

Breast cancer research.  There were numerous claims for various medications to “protect” from breast cancer, but all these claims were as unfounded as above mentioned “heart protection” studies. The real picture comes clear from consideration of the number of deaths from breast cancer during 1930-1990 [1].  The curve goes a little bit up and a little bit down, but mainly remains unchanged during 60 years, which leads to a clear conclusion: none of medications really worked.

If one looks at the information in [1], practically everything becomes a risk factor for breast cancer: early menarche, late menopause, use of contraceptives, even higher education.  Such numerous risk factors are the best proof of the truth: we know nothing about the reasons of  breast cancer and all highly advertised medical treatment does not work.  Those, who claim to be cured from breast cancer, most probably, never had it and lost their breasts for nothing.

Skin cancer.  Older people remember about 20 years ago, all the commercials were praising Coppertone for helping to get suntan.  Now we have gone to another extreme: the sun all of a sudden has become an enemy number one and we all  need  protection,  otherwise we get skin cancer.  People are spending billions on buying sun blockers.  Let us see, what is the  evidence, that sun causes skin cancer.  As usual, there is some statistics to support it: Southern states have more cases than Northern, white people more often have skin cancer, than blacks.

On the other hand, skin cancer often appears on the parts of body, which are  never exposed to sun.  Less than 10,000 people die from skin cancer in US each year.  To place this in proper perspective, people in US are 10 times more likely to be killed    in  a  hospital  due  to a medical error or twice as likely to be shot to death by a gun.  People are now spending billions on the so-called sun blockers, trying to save themselves from something, which does not threaten them at all: out of 300 sun-baders, 299 never get skin cancer.   If the sun was really responsible for skin  cancer, should not the number be much greater?  Last, but not least: if sun were the cause of skin cancer, introduction of  sun blockers  would  have significantly reduced the cases of skin cancer.  Since it is claimed that it takes many years to develop a skin cancer, it would be quite long before anything can be proven.
 

DISCUSSION

It might sound paradoxically, but the greater is the number of medications against a specific disease, the less effective they all are.  The proof is quite simple: if just one medication were really effective, there would be no need for any other.  In the same vein, if a patient with a deadly incurable disease is prescribed five or more different medications to take every day, we may safely state that majority of them (if not all) do the patient no good.  The deadlier is the disease and the less effective is the medication, the greater is the profit for a drug company.  From the profit point of view, the worst medication is the one, which cures the disease.

Of course, it would be preposterous even to think, that our noble drug companies might be profit-driven in designing their medical research policies; they are totally devoted to curing all the patients from all diseases, even if such a goal would mean total bankruptcy for all of them.  It would be equally preposterous to suppose, that our noble medical doctors might be corrupted by the millions of dollars showered on them by the drug companies and recommend for use any drug, which they know to be useless or even harmful.

Here are some common features of the reviewed medical research.  It deals with a serious incurable disease, like heart disease.  It investigates events, whose incidence is of the order of 10%  or less (heart attacks, strokes, deaths).  The usual observed difference between the medicated group and the placebo group is 2–4%, but introduction of the term relative risk allows researchers to claim 20–40% improvement, which sound much more impressive.  In order to make this 2–4% difference statistically significant, researchers take a big sample of patients of the order of 5,000.

All the researchers do not seem to realise that all, what statistics tells them, is that two samples are probably different, the statistics does not tell them that this difference is due to the medication under study.  The researchers blindly believe that the randomization procedure gives them two groups, which are the same, except for the medication.  This is just not so: human being is a creature too complicated to guarantee, that a randomization process would not leave differences responsible for 2–4% deviation in outcomes.

In any scientific research, it is very important to ask the right questions.  This does not seem to be the case in medical research.  Take, for example, cardiac research.  Here, the main culprit is considered cholesterol.  What is the evidence?  It has been noted, that the plaque in coronary arteries consists mainly of cholesterol.  There is significant statistics, showing that people with higher cholesterol are more likely to die.  So, “it walks like a duck and it quacks like a duck”, but is this enough to declare it “a duck”?  Not at all.  We know of numerous cases of heart disease in people with perfectly normal cholesterol, at the same time, there is a village in Italy, where everyone has extremely high level of cholesterol and nobody has a heart disease.  For any responsible scientist this would be enough to “rehabilitate” cholesterol.

Yet another question to be asked in cardiac research: a blood with the same content of cholesterol is circulating all over the human body, how come it creates plaques only at specific places of specific arteries and not elsewhere, for example, never in the veins?  When stomach acid was deemed to be responsible for ulcers, the medication was used to inhibit acid production, without thinking, that the acid was needed for digestion.  It did not help to eliminate ulcer.  The same efforts are made now to decrease cholesterol.  Our critic of EUROPA study showed this to be a useless exercise.

One thing is to establish some statistical correlation between the amount of cholesterol and the number of endpoints, and it is a totally different issue to establish a causal relationship between them.  In this case, the proper question is: if we reduce artificially the amount of cholesterol in blood, would the number of end points decrease?  There are several studies on this subject, which claim to have a positive answer, but a really good analysis of these studies shows, that they proved the opposite: there is no influence on the outcomes due to the artificial change of cholesterol and the noticed change should be attributed to the errors of randomization.
Three recent studies look to us indicative.  The first one [7] (PROVE IT) randomly assigned 4162 patients with an ACS (Acute Coronary Syndrome) to pravastatin (40 mg/day) or atorvastatin (80 mg/day). The median LDL-cholesterol at initiation of therapy was 106 mg/dL (2.74 mmol/L).  At a mean follow-up of two years, the median LDL-cholesterol achieved was significantly lower with atorvastatin (62 versus 95 mg/dL [1.60 versus 2.46 mmol/L]).  The primary end point  was claimed to be significantly reduced with atorvastatin (22.4 versus 26.3 percent).
In the second study [8] (MIRACL), 3086 adults with a non-ST elevation ACS were randomly assigned to atorvastatin (80 mg/day) or placebo between 24 and 96 hours after hospital admission. Atorvastatin was associated with a reduction in mean LDL-C concentration from 124 to 72 mg/dL (3.2 to 1.9 mmol/L).  At 16-week follow-up, the primary end point was less frequent with atorvastatin (14.8 versus 17.4 percent for placebo).

The third study [9] (A to Z) randomized two groups of patients with ACS.  The first group (n1=2232) received a placebo for 4 months followed by 20 mg/d of simvastatin, the second group (n2=2265) received 40mg/d of simvastatin for 1 month followed by 80 mg/d thereafter.  During the first 4 months, no difference was noticed between the groups in the number of primary endpoints.  After 2 years, a total of 343 patients (16.7%) in the placebo plus simvastatin group and 309 patients (14.4%) in the simvastatin only group experienced the primary endpoint.  The difference was not judged statistically significant.

Let us compare some of the results in the studies.  At 4 months time, LDL cholesterol differences between the medicated group and placebo in A to Z and MIRACL trials was about the same (62 mg/dL vs 63 mg/dL), but A to Z trial showed no risk reduction (strictly speaking, the placebo group was somewhat better), while MIRACL showed 16% reduction.  On the other hand, PROVE IT compared 2 statin drugs and achieved the same risk reduction of 16% with a much smaller LDL cholesterol differential of 33 mg/dL.  Clearly, an artificial reduction of cholesterol does not do any good to the number of outcomes.

We may also compare the placebo group of A to Z trial, which had at 4 months 8.1% rate of primary end points, with the medicated group of the MIRACL trial, which had 14.8% rate of primary end points and conclude, that atorvastatin is highly harmful, which, of course, would not be correct, but it is also incorrect to claim it to be of any use. One may argue, that sample of patients in MIRACL trial was somewhat older, but on the other hand, the A to Z sample had significantly more smokers.  The observed difference in each trial should be attributed to error of randomization.  This hypothesis explains well, why the difference was the same, whether the comparison was with a placebo group or with a medicated one.

We quote below from [10], because we think, it proves our point the best:

To evaluate the effects of intercessory prayer in a coronary care unit (CCU) population, a randomized double blind protocol was followed. Over a 10-month period, 393 CCU admissions were randomized, after signing informed consent, to an intercessory prayer group (IPG) , 192 patients, or to a no prayer group (NPG), 201 patients.  The IPG, while hospitalized, received intercessory prayer by participating Christians, praying outside the hospital; the NPG did not.  On entry into the study, there was no statistical difference between the groups on any of the 34 entry variables.  Logistic analysis failed to separate the groups on the entry variables.  After entry, IPG had statistically less pulmonary oedema, 6 patients vs 18 patients (p<0.03); was intubated less frequently, none vs 12 patients (p<0.002), and received less antibiotics, 3 patients vs 16 patients (p<0.007).

Wow, the effect is statistically at least as good, as all the “medicated” studies, and in addition, the prayer is free, God is a specialist in all fields, so why not to extend prayer to reducing the cholesterol level, reopen the clogged arteries, etc.?  If we take these studies seriously (and from purely statistical point of view, they are), let us make the next step: we can find out whether God is Jewish, Christian or Moslem.  After that, we can compare the healing strength of Yahweh, Krishna and Buddha.  We are sure, there will be a statistical difference between them.

Yet another approach is to honestly admit, that since we do not know the real causes of heart disease, cancer, etc., we cannot really randomize the samples of patients, therefore, the difference of the order of 5% should be always considered as an error of randomization, except for the studies of the adverse effects, where even smaller quantities should not be disregarded.  In the case, where the probability of a bad event is around 10%, the difference of 5% looks like 50% in relative terms and makes it so attractive to claim saving lives.  Regretfully, the claim is false.

The best way to prove saving a life, is to re-live the life twice: the first time without a medication and second time with the medication.  If without a medication the patient dies, while with the medication the same patient stays alive, we may claim saving a life, but regretfully, such an experiment is not possible.  The second best, is to see whether there was a dramatic reduction in deaths after the introduction of certain medication.

The statins were introduced in 1994, so we checked whether there was any significant change in death rates 4 years prior, compared to 4 years after 1994.  We used the data from [11].  The general trend during many years is reduction of the proportion of cardiac deaths in the total annual number of deaths.  During 4 years from 1989 to 1993, the proportion of cardiac deaths reduced from 34.1% in 1989 to 32.7% in 1993, with total difference of 34.1–32.7=1.4%.  The same quantities were 32.2% in 1994 and 31% in 1998, with the difference of 32.2–31=1.2%.  Clearly, not only there was no dramatic change in trends, but the rate of reduction seems to have slowed down.  We do not blame statins for that, but the “miracle drug” obviously did not produce any miracle.

CONCLUSIONS AND RECOMMENDATIONS

It is time to introduce some scientific rigour in medical research.  Researchers should recall the fundamental requirements of every field of knowledge, which wants to call itself a science: the experimental results should be reproducible.  In the cases, where there are causal relationships, the experiments are  reproducible.  Where the experiments cannot be reproduced, the causal relationship should be considered absent, and the research should start from square one.

A large sample of patients has certain statistical strength, but one has to realize, that a multi-center study, which involves over 500 researchers in over 50 countries, is much more prone to human error, than a small one, so one should always take the data accuracy with a grain of salt and check, whether the conclusion would change with a ±5% variation of the data.

A real medication works on majority of patients.  If only a minority is claimed to be affected, the observed “effectiveness” is nothing, but an error of randomization.  Several polls concerning Bush and Kerry were published prior to elections, which differed one from the other by 6%.  Both claim to have taken correct random sample, but obviously, at least one of the pollsters was in error.

Here are some recommendations of general kind:

? It is OK to say that one patient out of 50 may have an adverse effect during 4 years of treatment.  It is not OK to say that one patient out of 50 treated during 4 years might benefit from a medication.  This is not how a medication is supposed to work.

? If you need a blind or double blind study to find out whether your medication works, you have your answer: it does not.

? If you plan to develop a new medication, where you expect just a couple of percentage points difference with a placebo, do something else, since a prayer gives much better results, requires little effort and is free.

? It sounds impressive, when you call your work “Heart Protection Study”, it also makes a great commercial (“It is your future – be there!”), it gives though the science you represent a bad name, when your results can be reworded as: if a patient takes the medication for 4 years, his probability of avoiding an endpoint is 92%, while if he does not take it, his probability is 90%.  The MIRACL study produced no miracles and PROVE IT actually proved nothing.

? Just because there is a statistical correlation between two parameters, this does not mean, that one causes the other, both might be caused by something third or even fourth.  They might also be totally independent.  For example, since 1932, there was a 100% correlation between the results of the Redskins game and the results of presidential elections.  This can happen by chance of 1 against over 260,000.

? Ask proper questions.  Without asking proper questions, you will never get the answers.  Some examples of proper questions were given above.

Statistical research as such has not cured a single disease.  If one takes two groups of people, one of which drinks red wine, while the other drinks white, and discovers that one group has 17% more (or less) lung cancer, it does not mean, that red (white) wine has anything to do with increase (decrease) of lung cancer (regretfully, this is not an anecdote, we described a real study).  In the same vein, there is no point to waste time and money to study 60,000 women, who drunk 4 servings of milk against another group, who drunk 2 servings or less to discover doubling of cervical cancer incidence.  First, the woman’s probability to die of cervical cancer is 1 out of 250; even if this rate doubles, it will still be quite small 2 out of 250.  In addition, milk has nothing to do with cervical cancer.
Human beings do so many other things, in addition to drinking some kind of wine or certain quantities of milk, that you can always find yet another research with totally opposite results.  Do not say in this case, that additional research is needed: not only any additional research is not needed, the initial research was obviously useless.  Scientific results should be reproducible, if they are not, they are not scientific results.

Heart disease, as complicated as it might look, is essentially the problem of “plumbing”: a clogged “pipe” results in a heart attack and possible death.  It does not happen overnight and is not a random or accidental event at all.  If we could look inside coronary arteries, we could see it coming and prevent it almost completely.  We should stop wasting time and money on development of useless drugs and useless statistical research, we should use saved money to buy sufficient number of high quality CT-scans, examine all heart patients every 6 months and do the revascularisation whenever scans show dangerous narrowing.  This way, we shall really start saving lives.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

References

1. Cancer facts & figures – 1997.  American Cancer Society, 1997.

2. Sanders, DH, and Allard, F,  Statistics: a Fresh Approach.  McGraw-Hill, 1990.

3. Fox K.M., Efficacy of perindopril in reduction of cardiovascular events among patients with stable coronary artery disease: randomised, double-blind, placebo-controlled, multicentre trial (the EUROPA study).  The Lancet 2003, 362: 782-788.

4. MRC/BHF Heart Protection Study of cholesterol-lowering with simvastatin in 5963 people with diabetes: a randomised placebo-controlled trial.  The Lancet, 2003, 361: 2005-2016.

5. The Heart Outcomes Prevention Evaluation Study Investigators. Vitamin E supplementation and cardiovascular events in high-risk patients.   New England Journal of Medicine, Vol. 342(3), 2000, pp. 154-160

6. The Heart Outcomes Prevention Evaluation Study Investigators.    Effects of an Angiotensin-Converting-Enzyme Inhibitor, Ramipril, on Cardiovascular Events in High-Risk Patients.  New England Journal of Medicine, Vol. 342(3), pp 145-153.

7. Cannon, CP, Braunwald, E, McCabe, CH, et al. Intensive versus moderate lipid lowering with statins after acute coronary syndromes. N Engl J Med 2004; 350:1495.

8. Schwartz, GG, Olsson, AG, Ezekowitz, MD, et al.  Effects of atorvastatin on early recurrent ischemic events in acute coronary syndromes. The MIRACL Study: A randomized controlled trial. JAMA 2001; 285:1711.

9. de Lemos, J.A., Blazing, D.A., et al., Early intensive vs a delayed conservative simvastatin strategy in patients with acute coronary syndromes.  JAMA, 2004, Vol. 292, No. 11, pp. 1307-1316.

10. Byrd, RC, Positive therapeutic effects of intercessory prayer in a Coronary Care Unit population.

11. Leading causes of death 1900-1998.  National Center of Health Statistics.