Bootcamp

What does reproducibility mean?

An Overview of ICDS Research Computing Services

Justin Petucci

This presentation will be recorded and shared publicly.

This session will overview the services that ICDS provides to PSU researchers through our Advanced CyberInfrastructure (ACI). ICDS-ACI serves as the backbone of our traditional HPC system including our web based graphical portal, our High Performance Research Cloud (HPRC), scientific gateways, and storage services. In addition, non-ACI services such as advanced technical consulting with the Research Innovations with Scientists and Engineers (RISE) will be discussed.

Kate Stankiewicz TOP

'Data is available online': the challenges of reanalyzing published data

Kate Stankiewicz

This presentation will be recorded and shared publicly.

Slides

Paul Medvedev TOP

Understanding how bioinformatics work is published and incentivized: a computer science perspective

Paul Medvedev

Arun Srinivasan TOP

Out of Tune: Tuning Parameters and Reproducible Machine Learning

Arun Srinivasan

This presentation will be recorded and shared publicly.

Slides

With the increasing availability of data, machine learning has emerged as a powerful tool to study questions in biology. While much attention has been placed on the influence of misinterpreting correlation and p-values, the choice of tuning parameters when running a model can drastically affect the outcome. This talk will focus on how important reporting the thought process behind the selection tuning parameters used in machine learning models is when striving for reproducibility of data analysis.

Qunhua Li TOP

Statistical issues on reproducibility in genomic research

Qunhua Li

This presentation will be recorded and shared with PSU students only.

Proper use of statistics is important for maintaining robustness and reproducibility of scientific findings. I will talk about proper practice for maintaining statistical reproducibility and go over some statistical concepts that are important for reporting scientific findings but are easily to be misinterpreted.

Matthew Jensen TOP

Constructing and maintaining reproducible bioinformatics pipelines in your research

Matthew Jensen

This presentation will be recorded and shared publicly.

Slides

The talk will cover how to assemble bioinformatic pipelines for genomic data analysis from different softwares, maintain the pipelines for future use by yourself or other lab mates, and prepare your pipelines for posting on GitHub (i.e. code commenting, README pages, example data) upon publication of your work. The talk will be based on my own experiences working with analysis of genomics datasets (i.e. whole-genome sequencing, RNA-Sequencing), and includes an example from my early PhD career where my lack of experience led to some irreproducible RNA-Seq results from using two different pipelines.

Return to main page

Speakers

István Albert TOP

What does reproducibility mean?

István Albert

Slides

Justin Petucci TOP

An Overview of ICDS Research Computing Services

Justin Petucci

Slides

Kate Stankiewicz TOP

'Data is available online': the challenges of reanalyzing published data

Kate Stankiewicz

Slides

Paul Medvedev TOP

Understanding how bioinformatics work is published and incentivized: a computer science perspective

Paul Medvedev

Arun Srinivasan TOP

Out of Tune: Tuning Parameters and Reproducible Machine Learning

Arun Srinivasan

Slides

Qunhua Li TOP

Statistical issues on reproducibility in genomic research

Qunhua Li

Matthew Jensen TOP

Constructing and maintaining reproducible bioinformatics pipelines in your research

Matthew Jensen

Slides

Return to main page