Posted on May 12, 2017
Could it be that most healthcare research may be wrong? In this blog, Saul explores multiple, fundamental issues affecting research before discussing possible solutions to these issues….
As advancements in medicine become ever more sophisticated, our care of patients continues to improve. A reliable and solid evidence base enables the continuation of this progression. As a result, research is the very backbone of our clinical practice. This is confusing when we think about medical research as a whole and propose that, most of this research could be wrong…
There are a number of reasons why much of the medical literature out there may be misleading. Firstly, it’s important to look at the vast contradiction of findings that exist. It seems like every other day we see an article in the news saying things like ‘Tomatoes reduce cancer risk!’ or ‘Tomatoes increase cancer risk!’ or ‘Chocolate causes weight loss!’ All of this confusion is disconcerting for both clinicians and patients. Researchers recognised this issue and carried out a large scale meta-analysis to assess the claims about an array of foods and their association with cancer risk (1). It found that many foods are claimed to increase or decrease the risk of cancer, however these are based on small studies and once tested via meta-analysis, these effects decrease significantly.
The current hierarchy of publication outlets based on impact factor has been seen as a good way of sorting the good from the bad. Nevertheless, just because a study is published within a journal with a high impact factor, this shouldn’t automatically imply the study is perfect. For example, Cummings et al. highlighted the issue of error bars within experimental biology (2). Error bars can display things such as standard deviations, standard error of the mean, confidence intervals and more. It is therefore crucial they are labelled appropriately. In a particular issue of Nature, a well-respected, high-impact journal, it was found that seven items had error bars that were simply unlabelled (3). This is worrying given the importance of accurately understanding original research. This potentially misleading or careless error can often make readers question which other areas of that research might also be inaccurate.
The results of a 2015 reproducibility project titled “Estimating the reproducibility of psychological science” aimed to replicate 100 studies published in 3 psychology journals (4). Of the original 100 studies, 97% displayed a p-value of less than 0.05. In the repeat studies, only 36% found a p-value of less than 0.05. As p-values are arbitrarily set, the study also looked at the effect sizes and still concluded that less than half of the study results were adequately replicated. These findings don’t necessarily mean the original studies were wrong; the repeats could be false, or they both could be incorrect. What can be established is that there was great difficulty in replicating the original work. Research is always on the fringe of scientific knowledge and researchers try their best to control for various factors, but the process of new discovery is a challenging one.
Another catalyst to the issue of reproducibility is the lack of incentives to perform repeat studies. A strong pressure not to perform these types of studies exists, with many journals only publishing original work. Even if a journal accepts repeat studies, they are less likely to publish these ‘boring’ studies. The attitude of ‘we already know this’ or ‘this won’t attract excitement and readership’ still looms.
A 2003 review of ‘promising’ basic scientific discoveries assessed their translation from clinical research to clinical use (5). Over 20 years, only 27 of the 101 studies resulted in a clinical trial, whilst only 5 were licensed for clinical use. Of these, only 1 had widespread use throughout medical practice. This equates to a <1% translation rate for these ‘pivotal’ discoveries. This emphasises the problem of research being labelled incorrectly and the exaggeration of ‘promising’ findings.
Medical reversal is the process whereby a methodologically improved randomised controlled trial (RCT) contradicts the standard practice of that time (6). For example, the COURAGE trial contradicted the routine stenting of stable coronary disease. A 2013 review identified that medical reversal is common across all aspects of medical research (7). The concept of reversal is dangerous given that patients may be unwillingly harmed using the previous treatment options, new trials can cause conflicts between practitioners that swear by traditional methods, and lastly, patients may lose faith with the medical system. It also raises the important question of ‘which practices that we use today will later be proven ineffective or harmful?’
Systematic bias is defined as an inherent problem in the study design that reduces that study’s internal validity. Examples include selection bias, attrition bias, response bias and many more. The definitions of these concepts are beyond the scope of this article, but useful information about these can be found on the Students 4 Best Evidence (S4BE) website. Sometimes, bias is unavoidable, such as in the cases of rare diseases (small samples) or during RCTs of certain surgical interventions (given an inability to fully blind surgeons or patients).
Publication bias is also an important issue when assessing the evidence base as a whole, such as in systematic reviews and meta-analyses. This is the type of bias whereby positive research (where a significant difference is observed) is more likely to be published compared to negative studies (where no significant difference is observed). Journals have a preference for positive studies to increase interest and readership of their articles. The underrepresentation of negative results makes it difficult to assess the counter arguments to various interventions. For example, there could be 10 robust RCTs that contradict the use of intervention X but 1 positive RCT. The positive RCT gets published whereas the 10 ‘uninteresting’ negative RCTs do not get published, and hence intervention X is adopted.
Clinical research is founded upon probabilistic statistics. It is impossible to say with 100% certainty that a treatment works, so instead, we rely on strong statistical chance. As the widely accepted p-value of statistical significance is arbitrarily set at 0.05 (or 5%), this can lead to false positives (or a type 1 error). This still seems fairly high and many results could be due to chance, especially when multiple endpoints are assessed. The use of many endpoints or manipulating the data until the results drop below the golden 0.05, is known as data-dredging or p-hacking. This practice is frowned upon and should not be the basis of high quality evidence.
A successful research career is deemed by seniority, obtained through publishing high volumes in high impact journals with shocking ‘new’ results. This can shift the emphasis away from the quality of research produced.
Secondly, pharmaceutical and medical device companies often sponsor studies about their products. The problem is, these companies have an interest in their profit margins, and consequently they require studies to support the products they are trying to sell. Occasionally, companies have been known to hire ghost writers so that the studies produced appear to be independent. The business of pharmaceuticals and medical devices is a massive, billion pound industry. An unfathomable quantity of money rides on these few studies and so the researchers have to ‘make-it-work’! This is usually achieved through producing the bare minimum, with lots of exclusion criteria to create a bigger effect in small, non-generalizable sample. Moreover, negative results about the products are unlikely to see the light of day for doctors and certainly not the public. This is often a double-edged sword, as without these companies and their investments and technology, modern medicine would not be able to take the steps forward that it has over the past century…
Additionally, a topic that is often not discussed regarding medical research is fraud. It does exist and some unethical practices are out there. This involves actions such as falsifying data, conducting research without informed consent of the participants, un-blinding themselves to treatment allocations and much more. This is clearly detrimental to the pursuit of truth and reliable clinical evidence in exchange for personal gain.
The problems discussed above may be small when assessing individual papers. It may be that many of the points raised are simply irrelevant to certain areas of research. However, when these problems are taken collectively and applied across the entirety of clinical research, it’s easy to see how much of the literature may produce false conclusions.
Negative studies and replication studies should be published more. An important step towards this ideal was the creation of the ‘Journal of Negative Results in Biomedicine’ (JNRBM) in 20028. There is an increasing trend of journals committed to publishing negative results, which is fantastic for reducing publication bias. It is hoped these measures will help to produce an evidence base that is more representative of the science conducted.
Following on from the idea of publishing more negative studies, the same should be done for replication studies. This could be in the form of a dedicated journal or a new policy for existing journals to accept a certain quota of repeat studies. This will enable interventions to have a higher volume of evidence to support their use and improve the reliability of their findings. In turn, this will also improve the robustness and completeness of systematic reviews and meta-analyses.
Academic success should be assessed on the quality of an individual’s research as opposed to the volume produced or the location where it was published. This will incentivise researchers to produce high quality, tangible evidence which is in the best interest for clinicians and patients alike.
All studies should be prospectively registered with a reputable body, requiring approval. Just like systematic reviews and randomised controlled trials, all studies should be registered before the commencement of the work. This is useful to prevent researchers from deviating from their original methods, as well as p-hacking and other practices. In addition, if the study produces negative results, the study may not be written up for publication. This process can highlight these negative results and encourage their dissemination in outlets such as the JNRBM. This should go hand in hand with post-publication review, allowing other researchers to review papers and flaws that may be missed in peer review.
Research skills are often not taught comprehensively in medical schools. Medical schools and postgraduate training programs should focus more heavily on improving research literacy to minimise common errors seen in clinical studies and improve the overall competence amongst all practitioners.
Retraction watch is a blog that was created to feature important study retractions. It was founded on the principle that science should be self-correcting, eliminating mistakes that take us away from the real answers to health-related questions. It does a great job in increasing the publicity of retractions, which are typically not heard of and hence, practice can be altered in light of these retractions. The website also raises awareness of the issues discussed in this article as well as holding journals to high standards which should, in theory, improve the objectiveness of scientific publications across the board.
With the continual, rapid advancement of medical research and the importance placed upon evidence-based medicine, it is crucial now more than ever, that we remain sceptical of what we read.