"

5 Data Collection, Management and Reporting

5.1 Why are data collection, management and reporting ethical concerns?

Every scientific experiment generates data. In fact, every stage of an experiment deals with data; from the experimental design to analysis to interpreting and reporting results. Data is the empirical evidence for scientific observations (Kalichman, 2016). Without data, there is no science. When we think of data, most people imagine numbers in a notebook or on a spreadsheet. But there are many types of data including images, samples, recordings and observations. Regardless of the type of data, all researchers have a responsibility to do their utmost to “protect the integrity of the research record” (Kalichman, 2016).

Good data management begins in the initial stages of planning an experiment. Researchers must carefully select the data they wish to collect as well as the method they will use to do so. Thorough planning at this stage can prevent resources from being wasted, reduce the effects of potential biases and improve the quality of the data. It is also important to establish good record keeping protocols (Kalichman, 2016). Files, samples and any other kinds of data should be clearly labelled with information such as the date, the name of the individual who worked with it, the location and what was done to it at each stage. It is also best if data is compiled in the same place and backed up frequently to prevent data from being misplaced or lost (Kalichman, 2016).

When it comes to reporting the gathered data, it is vital that the information is unbiased and accurately represented. If the data is manipulated inappropriately, for instance to produce a specific result, the integrity and validity of the data can be damaged (Cushman & Bell, 2022). Once the data has been reported or published, that does not mean that it no longer has value. Retaining the data allows researchers to demonstrate ownership or intellectual property of the research and it is vital in scientific misconduct investigations. The length of time data is retained depends on the type of research since some samples degrade over time. At minimum, researchers should retain and store enough data to recreate the research (Kalichman, 2016).

Another ethical issue of data management to consider is ownership of data. Although we might assume that a researcher has ownership of data they have collected and worked with, that is not always the case. Often, ownership of generated data belongs to the institution employing the scientist (Kalichman, 2016; Cushman & Bell, 2022). It is important for researchers to know what rights they have as it pertains to intellectual property, especially if there are patents or marketable products involved (Kalichman, 2016).

Science is becoming increasingly transparent, which is of great benefit because it encourages collaboration and new ideas, and it allows other researchers to build on their existing work (Cushman & Bell, 2022). However, there are instances where it is not permissible or beneficial for a scientist to share data. Firstly, some data is confidential, particularly when human subjects are involved. Secondly, sharing data before it is published leaves the scientist open to intellectual property theft or loss of credit (Kalichman, 2016). Finally, some research can be dangerous or used for dangerous purposes (see society and social responsibility: dual-purpose research) and therefore must be kept confidential.  Confidentiality infringement is an ethical violation and can result in severe penalties.

While all the above concerns have important ethical implications, fabrication and falsification of data (otherwise known as scientific fraud) (Fanelli, 2009) are among the most prevalent ethical violations resulting in research misconduct investigations, particularly in the biomedical field (Sivasubramaniam et al., 2021). In a 2009 meta-analysis review study, nearly 2% of surveyed scientists admitted to fabricating, falsifying or modifying data at least once while 14% admitted to having witnessed the same acts from colleagues. These are likely conservative estimates of the actual prevalence of this type of misconduct as individuals are less likely to admit to their own unethical actions (Fanelli, 2009). There are many reasons why scientists engage in this kind of behaviour including pressure to publish results, opportunities for personal advancement and ignorance of good data management practices (Burnham et al., 2021). Regardless of the motivation behind the act, engaging in scientific fraud has significant consequences. Fraudulent work wastes resources including the time co-contributors spent on a project doing legitimate work and it damages the reputation and career of the culprit sometimes resulting in loss of employment or even criminal convictions (Burnham et al., 2021; National Academies of Sciences (b), 2018; Oransky & Marcus, 2023). Perhaps most significantly, it taints the scientific knowledge base, sometimes irrevocably as papers based on fraudulent work are cited and used as foundational information in new research. This can be very dangerous if, for example, medical treatments are based on fraudulent information (National Academies of Sciences (b), 2018).

5.2 Example: Ranjit Chandra

One of the most striking examples of data fraud occurred at Memorial University in Newfoundland. Ranjit Chandra, the self-proclaimed “father of nutritional immunology”, was a highly regarded nutrition researcher who authored over 200 papers and received the Order of Canada in 1989 (Koziol, 2016). His first fraudulent paper studied the effects of a breast-feeding mother’s diet on infant allergies compared to formula. This paper was cited over 100 times before a nurse who participated in the study reported him to the university for misconduct. She told the committee that he had completed the statistical analysis before the data was even collected (Canadian Broadcast Company, 2015; Koziol, 2016). Despite the report finding many red flags including an inability to produce raw data, a lack of medical records and little input from co-authors, the university found that the investigation was flawed and thus could not be relied upon. No penalties were levied against Chandra. He then attempted to publish another study that found his patented vitamin (undisclosed as his at the time) could improve cognition in elderly patients. It was rejected from one journal before being accepted elsewhere. A paper supporting his research findings was published in a journal edited by Chandra. The author, Amrit Jain, was never reachable and later was found to be a pseudonym for Chandra spelling “I am Ranjit”. After repeated requests from journals for raw data and a journal finding the results of one study to be “statistically impossible” (Koziol, 2016), his papers began to be retracted. He resigned from Memorial University and left Canada for India. After a CBC documentary covered his fraudulent behaviour, Memorial University opened another inquiry which was only disclosed after Chandra filed a lawsuit against CBC for libel. His lawsuit failed and he was stripped of his order of Canada in 2015 (Canadian Broadcast Company, 2015). More papers were retracted, and he was charged with Healthcare fraud against the Ontario Health Insurance Plan (OHIP) in 2016. As of 2020, 4 of his papers have been retracted from various journals (Koziol, 2016).

Ranjit Chandra’s papers have been cited hundreds of times across various disciplines. Given the outcome of the investigations into his misconduct, it is impossible to say if his papers are reliable and factual. His work is so widespread that at this point, there is no telling how many researchers and experiments have been affected by his misconduct.

5.3 Practice Questions

  1. Smeared gels and acceptable margin of error
You are a master’s student running an experiment that is investigating protein sequences. One step of your experiment requires you to run gel electrophoresis to isolate your protein of interest. You have been having a lot of issues running the gels. They keep coming out smeared. Eventually, you can run a clean gel. Upon examination, you notice that your protein of interest migrated very little. Comparing the protein of interest to the molecular weight ladder you selected during your experimental design, you conclude that the protein is only slightly larger than the largest marker on the ladder. You considered using a molecular weight with a broader range of weights but worried that you would be unable to get another clean gel and so you choose to proceed to the next step of the experiment. 

2. Model selection and conflicting results

 

Adapted from: Research Cases for Use by the NIH Community. 2007. Theme 7 – Data management and Scientific Misconduct. Case study 3.

https://oir.nih.gov/system/files/media/file/2021-08/case_studies-2007.pdf

 

You are studying the impact of CBD oil supplements in dogs as a method of preventing heart disease as it has anti-inflammatory properties. A preliminary study showed a strong positive correlation between the use of CBD oil and reduced incidence of heart disease. Your study will require several years to complete and so your supervisor suggests that you also collect data on the incidence of other illnesses such as cancer and arthritis to maximize your funding. You reluctantly agree but do not alter your hypothesis, experimental protocol or statistical analysis.

After analyzing the data, including the incidence of the other conditions suggested by your supervisor, the results indicate that the use of CBD oil increased the rate of heart disease, which conflicts with prior research. There is also a strong positive impact on the prevention of arthritis.

 

3. Statistical analysis and communication

You are co-authoring a paper with a lab mate focusing on the frequency of recessive albinism alleles in a community in Botswana compared to a community in Malasia. The collaboration thus far has been very successful, and you have both contributed new ideas and insights to the project. You collected your data together and now must perform statistical analysis. You were under the impression that you would be responsible for this component and so analyzed the data using a Chi-Square analysis to compare predicted to observed phenotypic frequencies in both communities. In a meeting with your co-author the next week, you discover that they too thought they were responsible for the statistical analysis, but they used a student T-test to assess differences in frequencies between the two communities. Your results do not indicate the same significance but you both have drawn your own conclusions based on your individual analyses. Neither of you wish to defer to the other’s results, each believing that your own conclusions are most impactful. 

 

Your supervisor helps you decide which analyses to use and chooses the one that indicated the most novel/significant implications.  

 

4. Reporting non-significant data

You recently finished your data analysis for your most recent experiment and are writing a paper to report your findings. You create a table to encompass your data but for space, choose not to put in data from the portion of the experiment that showed no significant results 

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Ethical Case Studies for Biological Laboratories Copyright © 2025 by Annie Grigg-Branchflower, Dr. Kerrianne Ryan, Debra Grantham and Dr. Jen Frail-Gauthier. is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.