Discussing your study's limitations - International Science Editing examples

Why include a limitations section?

Including a section on the limitations of your findings will demonstrate command over your research. A reviewer may look negatively upon your study if they spot a limitation that you failed to acknowledge. If you discuss each limitation in the context of future research—i.e., suggest ways to improve the validity of the research in future studies—your article is more likely to be cited, as it will inform the research questions of other researchers.

How to identify the limitations of your study

You should think about your study from two angles – internal validity and external validity.

Internal validity refers to the strength of the inferences from the study, i.e., how confident you are that the outcome observed was caused by the test variable. Could other factors have affected the outcome? If so, the internal validity of your study may be threatened.

External validity refers to the degree to which the results can be generalised to a more universal population. If you were to re-do the study in a different context, e.g., with different subjects or in a different setting, would you get a similar outcome? If not, the external validity of your study may be questionable.

Limitations should not be feared

It is important to remember that all studies are questionable in one way or another. Therefore, a study does not have to be limitation-free to be deemed acceptable.

In this post…

…we list the most commonly seen limitations in STEM studies and provide real-world examples. However, please be advised that this is not a comprehensive list. In addition, please note that these limitations are not mutually exclusive; many can overlap.

Examples of study limitations

Selection bias

Selection bias occurs when the selection of individuals, groups, or data for analysis is not randomised.

For example, imagine a study in which different surgical procedures are retrospectively compared in relation to mortality risk [e.g., 1]. One of the procedures is newer than the others. Surgeons typically choose the most ideal surgery candidates when testing new procedures. Therefore, the outcome of the study could be affected by surgeons selectively choosing a particular type of individual—ideal candidates for surgery—for only one of the treatment groups.

Confounding

A confounder is another, sometimes hidden, variable that affects the dependent variable. If a confounder is not accounted for, any relationships detected between the test variable and outcome could be inaccurate.

For example, imagine a study in which the use of eye-tracking applications to measure cognitive performance is examined [e.g., 2]. Cognitive performance is known to decrease with age. Therefore, if age is not included as a confounder in the study, the effect size could be under- or overestimated.

In another example, imagine a study in which the association between osteoarthritis and cardiovascular (CV) events is examined [e.g., 3]. CV events have been linked to many factors, including smoking status, abdominal obesity, family history of CV events … etc., all of which could confound the outcome if not controlled for.

Survivorship bias

Survivorship bias occurs if inferences are made on the basis of only those subjects that made it past some selection process and those that did not were overlooked, typically because of their lack of visibility.

For example, imagine a study in which the link between cycling and sexual dysfunction is examined [e.g., 4]. It is possible that a person who experiences sexual dysfunction due to cycling would quit the activity. Therefore, if only active cyclists were recruited in the study, such a person would be overlooked, constituting a bias that could affect the study outcome.

Study scope limitations

Unreliable or unavailable data can limit the scope of a study and thus the overall outcome.

For example, imagine a study in which heat generation in different world regions is examined [e.g., 5]. The researchers do not have data on the use of firewood in households. In some regions, e.g., developing countries, household firewood use contributes greatly to the total heat produced. Therefore, the heat generation for such regions could be underestimated.

Sample size limitations

A small sample size may make it difficult to determine if a particular outcome is a true finding and in some cases a type II error may occur, i.e., the null hypothesis is incorrectly accepted and no difference between the study groups is reported.

For example, imagine a study in which the efficacy of thrombolysis in treating acute myocardial infarction (AMI) is examined. Thrombolysis has an important but very small effect on AMI. Therefore, a study with a relatively small sample size may not have the (statistical) power to expose such a small effect, possibly resulting in a type II error [6].

Experimenter bias

Experimenter bias occurs when the individuals running the experiment inadvertently affect the outcome by unconsciously behaving in different ways to participants in the different treatment groups.

For example, imagine a study in which gamers are tested for their ability to know whether they are playing against a human or an AI avatar [e.g., 7]. The facilitator stands behind the participant and observes gameplay. If the facilitator is aware of the nature of the avatar, there is a chance that they could unintentionally influence the participant.

Referral bias

Referral bias refers to the phenomenon whereby patients that are referred from one clinic to another, often to specialised units, tend to be sicker than non-referred patients. In studies including many referrals, risk factors are likely to be overestimated.

For example, imagine a study in which the clinical characteristics of neuroarcoidosis are evaluated in a specialised referral centre [e.g., 8]. Chronic aseptic meningitis is found to be the most frequently reported pathology—37% of cases. This frequency is relatively higher compared with other studies. The centre is known to have specific expertise on chronic meningitis. Therefore, cases of this kind are more likely to be referred to the centre, constituting a referral bias.

Self-reported data

Self-reported data is subject to various biases, e.g., selective memory, exaggeration … etc., and cannot be independently verified.

For example, imagine a study in which the effectiveness of typing pressure in determining stress in smartphone users is examined [e.g., 9]. Participants are asked to recall a stressful experience and rank their stress on a scale, after which typing pressure is measured. For whatever reason, participants may over- or underestimate their stress levels, affecting the outcome of the study.

Limitations of exploratory studies

If there has been little or no prior research on a topic, researchers may be required to establish a benchmark in relation to the research question and study design. As there is no benchmark for comparison, the validity of the outcome is disputable.

For example, imagine an exploratory study in which TV users are tested for usability of a new type of remote controller [e.g., 10]. Rather than the typical pressing of buttons, actions can be performed by squeezing or puffing on the remote. Findings from this study cannot be deemed conclusive until the results are replicated.

Methodological limitations

This refers to limitations in relation to the methodology used in a study.

For example, imagine a study in which the utility of telomere length as a diagnostic parameter for dyskeratosis congenita (DC) is tested [e.g., 11]. The data of DC patients from two different hospitals are used in the study. Each hospital uses its own method for DNA extraction, one of which has been shown to extract shorter DNA, a limitation which could affect the study outcome.

In another example, imagine a study in which a novel technology is tested for its ability to monitor damage in structures known to be difficult to monitor (e.g., beneath bridges) [e.g., 12]. The study suggests that the new technology is promising; however, its coverage area is only 30 × 30 m, meaning it is only suitable for short-distance applications.

Systematic literature reviews

In a systematic literature review (SLR), researchers use a well-defined search strategy to search for literature relevant to a particular research question. However, depending on the search criteria, there is no guarantee that all relevant literature will be retrieved from the search; Often grey literature – e.g., theses and technical reports – are excluded; and often SLRs only include studies presented in one language, typically English.

Hawthorne effect

This refers to the phenomenon whereby participants behave differently when they are aware that they are being observed.

For example, imagine a study in which fear appeal messages are tested for their ability to promote security behaviour online [e.g., 13]. Participants are shown a fear appeal message detailing the prevalence and effects of cyber-attacks, after which they are surveyed on their behaviour online. Participants are surveyed again 4 weeks later to see if the effect of the fear appeal lasted over time and whether intentions were acted upon. Participants may fraudulently claim to have improved their behaviour in an effort to diminish shame at not having altered their behaviour or in an effort to please the study conductors.

Regression toward the mean

This refers to the phenomenon whereby a variable that is extreme (i.e., far away from the average) the first time it is measured will be less extreme the next time it is measured. This typically happens with asymmetric sampling, e.g., only the very worst or the very best performers are used in a study. However, it can occur by chance as well (see the example given).

For example, imagine a study in which the effects of haematocrit (the ratio of the volume of red blood cells to the total volume of blood) on avian flight performance is examined [e.g., 14]. In the pre-test, i.e., before their haematocrit is manipulated, birds in one of the treatment groups have considerably better flight performance compared to the other groups. Even without manipulation, the flight performance of these birds would likely be reduced if the test was repeated, due to the regression toward the mean effect. Therefore, the results of the post-test, i.e., after manipulation, may be influenced by this effect and may not be reflective of the true effects of the manipulation.

Repeated testing

Repeatedly testing participants may lead to bias. A pre-test may sensitise participants in unanticipated ways, influencing the results of the post-test.

For example, imagine a study in which the anxiety induced from different eye tests used to diagnose glaucoma is measured [e.g., 15]. Almost all of the participants have already experienced one of the tests. This could lead to an underestimation of the magnitude by which anxiety increases with this test.

Population validity

This refers to how representative the sample used in a study is to the target population.

For example, imagine a study in which the target population is all U.S. Internet users. It would not be representative to only use data from Twitter users, as U.S. adult Twitter users are younger and more likely to be Democrats compared to the general public [16].

How to present limitations

Study limitations are generally presented towards the end of the discussion section in the past tense (see our post on Verb Tenses in Scientific Manuscripts). Start by stating the limitation. Mention if you took any steps to circumvent the issue. Describe any evidence that might lessen the effect of the limitation. Discuss how the limitation could affect the study outcome. Finally, if applicable, discuss the steps that could be taken to overcome the limitation in future studies.

References

Stiles ZE, Behrman SW, Glazer ES, Deneve JL, Dong L, Wan JY, Dickson PV. Predictors and implications of unplanned conversion during minimally invasive hepatectomy: an analysis of the ACS-NSQIP database. HPB. 2017 Nov 1;19(11):957–65.
Rosa PJ, Gamito P, Oliveira J, Morais D, Pavlovic M, Smyth O. Show me your eyes! The combined use of eye tracking and virtual reality applications for cognitive assessment. In Proceedings of the 3rd 2015 workshop on ICTs for Improving Patients Rehabilitation Research Techniques 2015 Oct 1 (pp. 135–138). ACM.
Kendzerska T, Jüni P, King LK, Croxford R, Stanaitis I, Hawker GA. The longitudinal relationship between hand, hip and knee osteoarthritis and cardiovascular events: a population-based cohort study. Osteoarthr Cartilage. 2017 Nov 1;25(11):1771–80.
Gaither TW, Awad MA, Murphy GP, Metzler I, Sanford T, Eisenberg ML, Sutcliffe S, Osterberg EC, Breyer BN. Cycling and female sexual and urinary function: results from a large, multinational, cross-sectional study. J Sex Med. 2018 Apr 1;15(4):510–8.
Mekonnen MM, Gerbens-Leenes PW, Hoekstra AY. The consumptive water footprint of electricity and heat: a global assessment. Environ Sci-Water Res Technol. 2015;1(3):285–97.
Jones SR, Carley S, Harrison M. An introduction to power and sample size estimation. Emerg Med J. 2003 Sep 1;20(5):453–8.
Wehbe RR, Lank E, Nacke LE. Left Them 4 Dead: Perception of humans versus non-player character teammates in cooperative gameplay. In Proceedings of the 2017 Conference on Designing Interactive Systems 2017 Jun 10 (pp. 403–415). ACM.
Leonhard SE, Fritz D, Eftimov F, van der Kooi AJ, van de Beek D, Brouwer MC. Neurosarcoidosis in a tertiary referral center: a cross-sectional cohort study. Medicine. 2016 Apr;95(14).
Exposito M, Picard RW, Hernandez J. Affective keys: towards unobtrusive stress sensing of smartphone users. In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct 2018 Sep 3 (pp. 139–145). ACM.
Bernhaupt R, Desnos A, Pirker M, Schwaiger D. TV interaction beyond the button press. InIFIP Conference on Human-Computer Interaction 2015 Sep 14 (pp. 412–419). Springer, Cham.
Gadalla SM, Khincha PP, Katki HA, Giri N, Wong JY, Spellman S, Yanovski JA, Han JC, De Vivo I, Alter BP, Savage SA. The limitations of qPCR telomere length measurement in diagnosing dyskeratosis congenita. Mol Genet Genomic Med. 2016 Jul;4(4):475–9.
Kang D, Cha YJ. Autonomous UAVs for structural health monitoring using deep learning and an ultrasonic beacon system with geo‐tagging. Comput Aided Civ Inf. 2018 Oct;33(10):885–902.
Jansen J, van Schaik P. The design and evaluation of a theory-based intervention to promote security behaviour against phishing. Int J Hum Comput Stud. 2019 Mar 1;123:40–55.
Yap KN, Dick MF, Guglielmo CG, Williams TD. Effects of experimental manipulation of hematocrit on avian flight performance in high-and low-altitude conditions. J Exp Biol. 2018 Nov 15;221(22):jeb191056.
Chew SS, Kerr NM, Wong AB, Craig JP, Chou CY, Danesh-Meyer HV. Anxiety in visual field testing. Br J Ophthalmol. 2016 Aug 1;100(8):1128–33.
Pew Research Center. Sizing Up Twitter Users. 2019 Apr 24. Available from: Visit Twitter page [Accessed 6 June 2019].

Discussing your study’s limitations