Bayesian hypothesis testing reveals that reproducible models in systems biology get more citations

This study investigates the citations of reproducible vs. not reproducible papers and is based on 328 published models, classified by Tiwari et al. based on their reproducibility are analyzed in this study. Hypothese testing is performed using a flexible Bayesian approach for a complete assessment of posteriors. The approach handels outliers via a non-central t distribution. Results show that reproducible papers are significantly more citet between 2013 and 2020, i.e. 10 years after the introduction of SBML. This trend persists also for later periods with more than 95% credibility. In conclusion, this statistical analysis demonstrates long-term benefits of reproducible modeling for the individual researcher and the scientific community.

DOI: 10.15490/fairdomhub.1.study.1103.2

Zenodo URL: None

Created at: 11th Jan 2023 at 16:22

Contents

Statistical analysis and BEST method of Kruschke for python applied on citation data in Systems Biology

The statistical analysis was performed in a jupyter notebook.
This notebook contains the commands for all performed analyses (Statistical_analysis_of_FAIR_citations.ipynb)

The Bayesian Estimation Superseeds the t Test (BEST) method of Kruschke 2013 was used for the Bayesian significance testing.
The method was implemented in a python class together with visualization and distributional analysis methods (BEST_method_python_Kruschke2012.py).
Also the bayesian multiple comparison analysis can be
...

Curated citation data

The classification in reproducible and not reproducible models was made by Tiwari et al.

Citations were looked up in Scopus, Web of Science and Google Scholar.

The following journals had to be excluded, as Journal Impact Factors (JIF) were missing or papers were discontinued:
* Experientia was closed 1996 and continued as Cellular and Molecular Life Sciences 1997
* The American journal of physiology – split into fields 1977, further splits in 1980 and 1989
* IFAC Proceedings Volumes – last issue
...

  • Citation data.zip

Posterior traces and visualizations

The Results of the analysis are structured in three parts:
1. The results of the main analysis
2. The results with a broader prior (Sensitivity analysis)
3. The Results of the multiple period comparison

For each part, full posterior traces for all analysis and visualizations of the paper are avalable.

Furthermore the diagnostics and traces were added for the different analysis.
The trace for the mulitple comparison was to large to upload it and is available on request.

  • Results.zip

BEST method and executable notebook

The folder contains the jupyter notebook for the execution of all analyses of the study.
The BEST method is used in the notebook and is added in a separate python skript.

There is a class for the BEST method according to Kruschke and a class für the BEST multiple comparison.

A conda environment file with all libraries that are necessary to perform the analysis, including the package version was created.
It can be easily installed via
conda env create -f pymc_env.yml

  • Statistical analysis and python BEST method.zip
Fingerprints

These checksums allow you to check a Snapshot you have downloaded hasn't been modified. For details on how to use these please visit this guide

MD5: 420604823ed43ccff2ecbd7d6b890cba

SHA1: ae6b4ac3ade99ebbe71d16d5719380792f742c33

Citation
Höpfl, S. (2023). Bayesian hypothesis testing reveals that reproducible models in systems biology get more citations. FAIRDOMHub. https://doi.org/10.15490/FAIRDOMHUB.1.STUDY.1103.2
Snapshots
Snapshot 2 (11th Jan 2023) DOI
Snapshot 1 (3rd Oct 2022) DOI
Activity

Views: 333   Downloads: 7

Created: 11th Jan 2023 at 16:22

Last updated: 11th Jan 2023 at 16:31

Powered by
(v.1.14.2)
Copyright © 2008 - 2023 The University of Manchester and HITS gGmbH