Bootcamp

INFORMATION TOP

The first day of the boot camp will be held in 108 Business Building. On Monday we begin with coffee and light breakfast at 8:45 AM.

Day 2-5 will be in 107 Business Building with the following program:

8:45-9:00AM: Coffee and Snacks
10:15-10:35: Coffee Break
12:05-1:15: Lunch
3:30-3:50: BREAK (Snacks, Juices )

Day 5 ends at noon, lunch provided.

Please bring your laptop. You may need to install new software instructions.

Teaching Assistants

Hillary Koch, hillary.koch01@gmail.com, Graduate Student, Statistics Program CBIOS Trainee

Molly Rathbun, molly.rathbun@gmail.com, Graduate Student, BMMB Program CBIOS Trainee

Funding and Support

The boot camp is organized by the Computation, Bioinformatics and Statistics Predoctoral Training Program at Penn State supported by the NIH T32 program. The boot camp is funded by participating colleges at Penn State.

The boot camp was first conceived and supported in Jun 2016 by the Administrative Supplement to NIGMS Predoctoral Training Grants (PA-15-136)

DAY 1 TOP

Day 1: Perspectives on Data Reproducibility - Monday June 3, 2019

Rm 108 Business Building

Coordinator: Cooduvalli Shashikant

8:45-9:15, Coffee and Snacks
9:00-9:15, TAs introduction, Hillary Koch and Molly Rathbun
9:15-10:00, Cooduvalli Shashikant: Is there a reproducibility Crisis?
10:00-10:25, Orfeu Buxton: Data Reproducibility as a social system of trust
10:25-10:45, Break
10:45-11:05:, Mingfu Shao: Reproducibility in methods publications: an example
11:05-11:50, Istvan Albert: Reproducibility Fallacies
11:50- 1:15, Lunch
1:15-2:00, Charles Cole: Good luck finding that plot again - data reproducibility in field ecology
2:00-2:45, Michael Hallquist: Psychology How good programming practices support scientific reproducibility.
2:45-3:30, Cheryl Keller: Reproducibility begins at the bench
3:30-3:50, Break
3:50-4:30, Vasant Honavar: Computational reproducibility and data sharing. - CANCELED
4:30-5:00PM, Instructions for Tuesday, TAs, Shaun Mahony

DAY 2 TOP

Day 2: Software Carpentry and Data Analysis - Tuesday June 4, 2019

The basics of computational reproducibility: version control, documentation, automation.

Rm 107 Business Building
Instructors:

Dr. Shaun Mahony, Assistant Professor, Biochemistry & Molecular Biology
Prashant Kumar Kuntala, Computational Scientist (Pugh Lab), Biochemistry & Molecular Biology

Schedule

8:45 - 9:00 - Coffee and snacks
9:00 - 10:15 - Intro & Shell scripting basics
10:15 - 10:30 - Coffee break
10:30 - 11:45 - Shell scripting (continued) & Markdown
11:45 - 12:05 - Best Practices in Scientific Research
12:05 - 1:15 - Lunch
1:15 - 3:00 - Git basics
3:00 - 3:20 - Coffee break
3:20 - 5:00 - Github for team projects

Class Material

Install software

Install the required software

Presentation

Software carpentry and reproducible data analysis by Shaun Mahony
Using GitHub for team projects by Prashant Kumar Kuntala

Lesson

Best Practices for Scientific Computing

Shell scripting

Shell scripting basics

Markdown

Git

Version control with git

Additional Reading Materials

Running scripts on Penn State clusters (ICS-ACI)

Advanced Bash scripting

Advanced Bash-Scripting Guide

Make

Automation with make
Snakemake: a different kind of automation

Alternatives to Git/GitHub

Bitbucket: a different type of "GitHub"
Fossil: a different type of source code managment

Alternatives to Markdown

Asciidoc and Asciidoctor: a different type of text markup

DAY 3 TOP

Day 3: Essential Pieces of Reproducibility - Wednesday Jun 5, 2019

Rm 107 Business Building

Instructor: Anton Nekrutenko

Agenda

9:00 - 10:30 Session 1
10:30 - 10:50 Coffee Break
10:50 - 12:00 Session 2
12:00 - 1:30 Lunch break
1:30 - 3:30 Session 3
3:30 - 3:50 Coffee Break
3:50 - 5:00 Session 4

DAY 4 TOP

Day 4: Proper Statistical Interference, effective plotting and reproducible reporting - Thursday, Jun 6, 2019

Rm 108 Business Building

Instructor: Qunhua Li

Schedule:

8:45 - 9:00 Breakfast
9:00 - 10:30 Statistical inference I
10:30 - 10:50 Coffee break
10:50 - 12:05 Statistical inference II
12:05 - 1:15 Lunch
1:15 - 3:00 Reproducible reporting I - R Markdown and Batch effects
3:00 - 3:20 Coffee break
3:20 - 5:00 Hillary Koch: Reproducible reporting II - workflowr

Materials:

Full schedule with links and resources here
The html version of workflowr demo.
The RMarkdown source of the workflowr demo.

DAY 5 TOP

Day 5: Reproducibility Issues - Friday, Jun 7, 2019

Rm 108 Business Building

Coordinators: Hillary Koch & Molly Rathbun

8:45-9:00 Coffee and Snacks
9:00-10:15: Reproducibility Cases
Vijay Kumar: Providing reproducible user experience using Docker containers
Scott Eckert: Reproducibility issues while using publicly available metagenomic datasets
10:15-10:35 Coffee Break
10:35-11:20 Dajiang Liu: Statistical and practical issues for replicating GWAS signals in meta-analysis
11:20-12:05: Wrap up-TAs

DISCUSSIONS TOP

Discussion from Boot Camp

Day 2

some favorite shell commands

cat <filename> | wc -l
- counts lines in a file
find <directory_to_search_from> -name <regex> -exec <command> {} \;
- start with find -name <regex> to first safely find all your files before executing anything (modifying them, deleting them, etc)
- example: find . -name "*.pdf" -exec rm {} \; removes all pdf files in your current working directory and nested directories
probably never type rm -r *
- deletes everything in your current directory and in all nested directories
more/less
- preview files from the command line (go through with the space bar)
head/tail -<number of lines>
- see the first/last however many lines of a file
- head -1 | wc -w gives the number of columns in a file (assuming the columns aren't multi-word, aka don't have white space)
sort -k<column number>
- sort by a given column in a file

big reasons to use Git/GitHub

descriptive log of project development
version control with or without groups
sharing reproducible workflows
branching for major changes

other resources

dabble in course "Git Essential Training" on Lynda (you have a subscription to Lynda with your PSU ID)

Day 3

Galaxy 101
Make a proposal for computing allocation with Xsede

other resources

make Markdown easy with a Markdown Cheatsheet
Get started with easy rendering of markdown locally, e.g. in Chrome or Sublime Text (there are other ways)
Jupyter and Galaxy: Easing entry barriers into complex data analyses for biomedical researchers
Do you like JavaScript? Make attractice online notebooks with ObservableHQ

Day 4

A guide to reproducibile code in ecology and evolution

ARCHIVES TOP

Penn State • generated from bootcamp-central via pyblue

2019 PSU Bootcamp on Reproducible Research

Day 1: Perspectives on Data Reproducibility - Monday June 3, 2019

Day 2: Software Carpentry and Data Analysis - Tuesday June 4, 2019

Schedule

Class Material

Install software

Presentation

Lesson

Best Practices for Scientific Computing

Shell scripting

Markdown

Git

Additional Reading Materials

Running scripts on Penn State clusters (ICS-ACI)

Advanced Bash scripting

Make

Alternatives to Git/GitHub

Alternatives to Markdown

Day 3: Essential Pieces of Reproducibility - Wednesday Jun 5, 2019

Agenda

Day 4: Proper Statistical Interference, effective plotting and reproducible reporting - Thursday, Jun 6, 2019

Day 5: Reproducibility Issues - Friday, Jun 7, 2019

Day 2

some favorite shell commands

big reasons to use Git/GitHub

other resources

Day 3

other resources

Day 4