The first day of the boot camp will be held in 108 Business Building. On Monday we begin with coffee and light breakfast at 8:45 AM.
Day 2-5 will be in 107 Business Building with the following program:
- 8:45-9:00AM: Coffee and Snacks
- 10:15-10:35: Coffee Break
- 12:05-1:15: Lunch
- 3:30-3:50: BREAK (Snacks, Juices )
Day 5 ends at noon, lunch provided.
Please bring your laptop. You may need to install new software instructions.
Teaching Assistants
Hillary Koch, hillary.koch01@gmail.com
, Graduate Student, Statistics Program
CBIOS Trainee
Molly Rathbun, molly.rathbun@gmail.com
, Graduate Student, BMMB Program
CBIOS Trainee
Funding and Support
The boot camp is organized by theĀ Computation, Bioinformatics and Statistics Predoctoral Training Program at Penn StateĀ supported by the NIH T32 program. The boot camp is funded by participating colleges at Penn State.
The boot camp was first conceived and supported in Jun 2016 by theĀ Administrative Supplement to NIGMS Predoctoral Training Grants (PA-15-136)
Day 1: Perspectives on Data Reproducibility - Monday June 3, 2019
Rm 108 Business Building
Coordinator: Cooduvalli Shashikant
- 8:45-9:15, Coffee and Snacks
- 9:00-9:15, TAs introduction, Hillary Koch and Molly Rathbun
- 9:15-10:00, Cooduvalli Shashikant: Is there a reproducibility Crisis?
- 10:00-10:25, Orfeu Buxton: Data Reproducibility as a social system of trust
- 10:25-10:45, Break
- 10:45-11:05:, Mingfu Shao: Reproducibility in methods publications: an example
- 11:05-11:50, Istvan Albert: Reproducibility Fallacies
- 11:50- 1:15, Lunch
- 1:15-2:00, Charles Cole: Good luck finding that plot again - data reproducibility in field ecology
- 2:00-2:45, Michael Hallquist: Psychology How good programming practices support scientific reproducibility.
- 2:45-3:30, Cheryl Keller: Reproducibility begins at the bench
- 3:30-3:50, Break
- 3:50-4:30, Vasant Honavar: Computational reproducibility and data sharing. - CANCELED
- 4:30-5:00PM, Instructions for Tuesday, TAs, Shaun Mahony
Day 2: Software Carpentry and Data Analysis - Tuesday June 4, 2019
The basics of computational reproducibility: version control, documentation, automation.
Rm 107 Business Building
Instructors:
- Dr. Shaun Mahony, Assistant Professor, Biochemistry & Molecular Biology
- Prashant Kumar Kuntala, Computational Scientist (Pugh Lab), Biochemistry & Molecular Biology
Schedule
- 8:45 - 9:00 - Coffee and snacks
- 9:00 - 10:15 - Intro & Shell scripting basics
- 10:15 - 10:30 - Coffee break
- 10:30 - 11:45 - Shell scripting (continued) & Markdown
- 11:45 - 12:05 - Best Practices in Scientific Research
- 12:05 - 1:15 - Lunch
- 1:15 - 3:00 - Git basics
- 3:00 - 3:20 - Coffee break
- 3:20 - 5:00 - Github for team projects
Class Material
Install software
Presentation
Lesson
Best Practices for Scientific Computing
Shell scripting
Markdown
Git
Additional Reading Materials
Running scripts on Penn State clusters (ICS-ACI)
Advanced Bash scripting
Make
Alternatives to Git/GitHub
- Bitbucket: a different type of "GitHub"
- Fossil: a different type of source code managment
Alternatives to Markdown
Day 3: Essential Pieces of Reproducibility - Wednesday Jun 5, 2019
Rm 107 Business Building
Instructor: Anton Nekrutenko
Agenda
- 9:00 - 10:30 Session 1
- 10:30 - 10:50 Coffee Break
- 10:50 - 12:00 Session 2
- 12:00 - 1:30 Lunch break
- 1:30 - 3:30 Session 3
- 3:30 - 3:50 Coffee Break
- 3:50 - 5:00 Session 4
Day 4: Proper Statistical Interference, effective plotting and reproducible reporting - Thursday, Jun 6, 2019
Rm 108 Business Building
Instructor: Qunhua Li
Schedule:
- 8:45 - 9:00 Breakfast
- 9:00 - 10:30 Statistical inference I
- 10:30 - 10:50 Coffee break
- 10:50 - 12:05 Statistical inference II
- 12:05 - 1:15 Lunch
- 1:15 - 3:00 Reproducible reporting I - R Markdown and Batch effects
- 3:00 - 3:20 Coffee break
- 3:20 - 5:00 Hillary Koch: Reproducible reporting II - workflowr
Materials:
Day 5: Reproducibility Issues - Friday, Jun 7, 2019
Rm 108 Business Building
Coordinators: Hillary Koch & Molly Rathbun
Discussion from Boot Camp
Day 2
some favorite shell commands
cat <filename> | wc -l
find <directory_to_search_from> -name <regex> -exec <command> {} \;
- start with
find -name <regex>
to first safely find all your files before executing anything (modifying them, deleting them, etc)
- example:
find . -name "*.pdf" -exec rm {} \;
removes all pdf files in your current working directory and nested directories
- probably never type
rm -r *
- deletes everything in your current directory and in all nested directories
more
/less
- preview files from the command line (go through with the space bar)
head
/tail -<number of lines>
- see the first/last however many lines of a file
head -1 | wc -w
gives the number of columns in a file (assuming the columns aren't multi-word, aka don't have white space)
sort -k<column number>
- sort by a given column in a file
big reasons to use Git/GitHub
- descriptive log of project development
- version control with or without groups
- sharing reproducible workflows
- branching for major changes
other resources
- dabble in course "Git Essential Training" on Lynda (you have a subscription to Lynda with your PSU ID)
Day 3
other resources
Day 4