## Content
1. [Adapted by Istvan Albert from Software Carpentry [Best Practices in Scientific Computing][best]
](#adapted-by-istvan-albert-from-software-carpentry-best-practices-in-scientific-computingbest)
1. [Adapted by Istvan Albert from Software Carpentry [Best Practices in Scientific Computing][best]
](#adapted-by-istvan-albert-from-software-carpentry-best-practices-in-scientific-computingbest)
1. [Adapted from Software Carpentry [Best Practices in Scientific Computing][best]
](#adapted-from-software-carpentry-best-practices-in-scientific-computingbest)
1. [Adapted by Istvan Albert from Software Carpentry [Best Practices in Scientific Computing][best]
](#adapted-by-istvan-albert-from-software-carpentry-best-practices-in-scientific-computingbest)
---
Adapted by Istvan Albert from Software Carpentry [Best Practices in Scientific Computing][best]
[best]: http://swcarpentry.github.io/slideshows/best-practices/index.html
---
## Background
* Software is lab equipment for the 21st Century
* Scientists spend a lot of time writing it
* But over 90% are self-taught
* They don't know what "good" looks like
* So we describe 24 practices in 8 groups
**Good programmers are 10X more productive than average**
**Good practices are 10X more productive than average**
---
## Rule 1: Write Programs for People not Computers
* Hard to tell if code that's difficult to understand is doing what it's supposed to
* Hard for other scientists to re-use it...
* ...including your future self
### Rule 1.1: Keep it simple
* Short-term memory can hold 7±2 items
* So break programs into short, readable functions, each taking only a few parameters
### Rule 1.2: Make names consistent, distinctive, and meaningful.
* `p` doesn't help the reader's short term memory as much as `pressure`
* Don't use `temp` for both "temporary" and "temperature"
* `i`, `j` are OK for indices in small scopes
### Rule 1.3: Make code style and formatting consistent.
* _Which_ rules don't matter -- _having_ rules does
* Brain assumes all differences are significant
* Every inconsistency slows comprehension
---
## Rule 2: Let the Computer Do the Work
* Computers exist to repeat things quickly
* 99% accuracy ⇒ 63% of at least one error per hundred repetitions
### Rule 2.1: Make the computer repeat tasks.
* Write little programs for everything
* Even if they're called scripts, macros, or aliases
* Easier to do this with text-based programming systems than with GUIs
### Rule 2.2: Save recent commands in a file for re-use.
* Most text-based interfaces do this automatically
* Repeat recent operations using `history`
* "Reproducibility in the small"
* Saving history supports "reproducibility in the large"
* An accurate record of how a result was produced
* _If_ everything can be captured
### Rule 2.3: Use a build tool to automate workflows.
* Originally developed for compiling programs
* Can be used whenever some files depend on others
* Makes workflow explicit
---
## Rule 3: Make Incremental Changes
* Most scientists don't have "requirements"
* They are their own users
* Code evolves in tandem with research
* Closest fit from industry is _agile development_
### Rule 3.1: Small steps with frequent feedback
* People can concentrate for 45-90 minutes without a break
* So size each burst of work to fit that
* Longer cycle should be a week or two
### Rule 3.2: Use a version control system.
* Tracks changes
* Allows them to be undone
* Supports independent parallel development
* Essential for collaboration collaboration
### Rule 3.3: Version control EVERYTHING
* Not just software: papers, raw images, ...
* Not gigabytes...
* ...but metadata _about_ those gigabytes
* Leave out things generated by the computer
* Use build tools to reproduce those instead
* Unless they take a very long time to create
---
## Rule 4: Don't Repeat Yourself (or Others)
* Anything repeated in two or more places will eventually be wrong in at least one
* If it's faster to re-create than to discover or understand, _fix it_
### Rule 4.1: There can be only one
* Every piece of data must have
a single authoritative representation in the system.
* Define constants exactly once
* Ditto file formats, geographical locations, ...
### Rule 4.2: Modularize code rather than copying and pasting.
* Reducing code cloning reduces error rates
* Cuts the amount of testing needed
* And increases comprehension
### Rule 4.3: Re-use code instead of rewriting it.
* It takes experts years to build high-quality numerical or statistical software
* Your time is better spent doing science on top of that
---
## Rule 5: Plan for Mistakes
* No single practice catches everything
* So practice _defense in depth_
_Note: improving quality increases productivity_
### Rule 5.1: Don't trust. Verify
* Add assertions to programs to check their operation.
* "This must be true here or there is an error"
* Like diagnostic circuits in hardware
* No point proceeding if the program is broken...
* ...and they serve as _executable documentation_
### Rule 5.2: Use an off-the-shelf unit testing library.
* Manages setup, execution, and reporting
* Re-run unit tests after every change to the code to check for _regression_
Testing is Hard
* "If I knew what the right answer was, I'd have published by now."
* Compare to experimental data
* Or to analytic solutions of simple problems
* Or to old (trusted) programs
* If nothing else, forces scientists to document what "errors" are acceptable
### Rule 5.3: Turn bugs into test cases.
* Write a test that fails when the bug is present
* Then work on the code until that test passes...
* ...and no others are failing
Test-Driven Development
* Why wait? Always write the tests, then the code
* Improves focus
* Encourages writing testable code
* And ensures tests actually get written...
* "Red, green, refactor"
### Rule 5.4: Use a symbolic debugger.
* Explore the program as it runs
* Better than print statements
* You don't have to re-run...
* ...or guess in advance what you'll need to know
* Use _breakpoints_ to stop program at particular points or when particular things are true
---
## Rule 6: Optimize Software Only After It Works Correctly
* Even experts find it hard to predict performance bottlenecks
* Small changes to code often have dramatic impact on performance
* So get it right, _then_ make it fast
### Rule 6.1: Use a profiler to identify bottlenecks.
* Reports how much time is spent on each line of code
* Re-check on new computers or when switching libraries
* Summarize across unit tests
### Rule 6.2: Write code in the highest-level language possible.
* People write the same number of lines of code per hour regardless of language
* So use the most expressive language available to get the "right" version...
* ...then rewrite core pieces (possibly in a lower-level language) to get the "fast" version
---
## Rule 7: Document Design and Purpose not Mechanics
* Goal is to make the next person's life easier
* Focus on things the code _doesn't_ say
* Or doesn't say clearly
* E.g., file formats
* An example is worth a thousand words...
### Rule 7.1: Document interfaces and reasons not implementations.
* Interfaces and reasons change more slowly than implementation details, so documenting them is better economics
* And most people care about using code more than understanding it
### Rule 7.2: Refactor code in preference to explaining how it works.
* Good code can be understood when read aloud
* Good programmers build libraries so that solving their problem is straightforward
* Again, "red, green, refactor"
### Rule 7.3: Embed the documentation for a piece of software in that software.
* Specially-formatted comments or strings
* More likely to be kept up to date
* More accessible to interactive help
* Many modern tools embed code in documentation rather than vice versa
---
## Rule 8: Collaborate
* Computers were invented to calculate
* The web was invented to collaborate
* Science is more fun when it's shared
### Rule 8.1: Use pre-merge code reviews.
* Have someone else review changes _before_ merging in version control
* Significantly reduces errors
* Good way to share knowledge
* It's what makes open source possible
### Rule 8.2 Use pair programming
* Code in pairs when bringing someone new up to speed
and when tackling particularly tricky problems.
* Two people, one keyboard, one screen
* An extreme form of code review
* Can get a bit tired if done all the time...
### Rule 8.3: Use an issue tracking tool.
* A shared to-do list
* Items can be assigned to people
* Supports comments, links to code and papers, etc.
* "Version control is where we've been, the issue tracker is where we're going"
---
## Gosh, That's a Lot
One step at a time.
1. Use text-based interfaces
2. Turn history into scripts
3. Put everything in version control
4. Use test-driven development
Citation: [Best Practices for Scientific Computing" ,
PLOS Biology, Jan. 2014](http://dx.doi.org/10.1371/journal.pbio.1001745).
« back to top
Adapted by Istvan Albert from Software Carpentry [Best Practices in Scientific Computing][best]
[best]: http://swcarpentry.github.io/slideshows/best-practices/index.html
---
## Background
* Software is lab equipment for the 21st Century
* Scientists spend a lot of time writing it
* But over 90% are self-taught
* They don't know what "good" looks like
* So we describe 24 practices in 8 groups
**Good programmers are 10X more productive than average**
**Good practices are 10X more productive than average**
---
## Rule 1: Write Programs for People not Computers
* Hard to tell if code that's difficult to understand is doing what it's supposed to
* Hard for other scientists to re-use it...
* ...including your future self
### Rule 1.1: Keep it simple
* Short-term memory can hold 7±2 items
* So break programs into short, readable functions, each taking only a few parameters
### Rule 1.2: Make names consistent, distinctive, and meaningful.
* `p` doesn't help the reader's short term memory as much as `pressure`
* Don't use `temp` for both "temporary" and "temperature"
* `i`, `j` are OK for indices in small scopes
### Rule 1.3: Make code style and formatting consistent.
* _Which_ rules don't matter -- _having_ rules does
* Brain assumes all differences are significant
* Every inconsistency slows comprehension
---
## Rule 2: Let the Computer Do the Work
* Computers exist to repeat things quickly
* 99% accuracy ⇒ 63% of at least one error per hundred repetitions
### Rule 2.1: Make the computer repeat tasks.
* Write little programs for everything
* Even if they're called scripts, macros, or aliases
* Easier to do this with text-based programming systems than with GUIs
### Rule 2.2: Save recent commands in a file for re-use.
* Most text-based interfaces do this automatically
* Repeat recent operations using `history`
* "Reproducibility in the small"
* Saving history supports "reproducibility in the large"
* An accurate record of how a result was produced
* _If_ everything can be captured
### Rule 2.3: Use a build tool to automate workflows.
* Originally developed for compiling programs
* Can be used whenever some files depend on others
* Makes workflow explicit
---
## Rule 3: Make Incremental Changes
* Most scientists don't have "requirements"
* They are their own users
* Code evolves in tandem with research
* Closest fit from industry is _agile development_
### Rule 3.1: Small steps with frequent feedback
* People can concentrate for 45-90 minutes without a break
* So size each burst of work to fit that
* Longer cycle should be a week or two
### Rule 3.2: Use a version control system.
* Tracks changes
* Allows them to be undone
* Supports independent parallel development
* Essential for collaboration collaboration
### Rule 3.3: Version control EVERYTHING
* Not just software: papers, raw images, ...
* Not gigabytes...
* ...but metadata _about_ those gigabytes
* Leave out things generated by the computer
* Use build tools to reproduce those instead
* Unless they take a very long time to create
---
## Rule 4: Don't Repeat Yourself (or Others)
* Anything repeated in two or more places will eventually be wrong in at least one
* If it's faster to re-create than to discover or understand, _fix it_
### Rule 4.1: There can be only one
* Every piece of data must have
a single authoritative representation in the system.
* Define constants exactly once
* Ditto file formats, geographical locations, ...
### Rule 4.2: Modularize code rather than copying and pasting.
* Reducing code cloning reduces error rates
* Cuts the amount of testing needed
* And increases comprehension
### Rule 4.3: Re-use code instead of rewriting it.
* It takes experts years to build high-quality numerical or statistical software
* Your time is better spent doing science on top of that
---
## Rule 5: Plan for Mistakes
* No single practice catches everything
* So practice _defense in depth_
_Note: improving quality increases productivity_
### Rule 5.1: Don't trust. Verify
* Add assertions to programs to check their operation.
* "This must be true here or there is an error"
* Like diagnostic circuits in hardware
* No point proceeding if the program is broken...
* ...and they serve as _executable documentation_
### Rule 5.2: Use an off-the-shelf unit testing library.
* Manages setup, execution, and reporting
* Re-run unit tests after every change to the code to check for _regression_
Testing is Hard
* "If I knew what the right answer was, I'd have published by now."
* Compare to experimental data
* Or to analytic solutions of simple problems
* Or to old (trusted) programs
* If nothing else, forces scientists to document what "errors" are acceptable
### Rule 5.3: Turn bugs into test cases.
* Write a test that fails when the bug is present
* Then work on the code until that test passes...
* ...and no others are failing
Test-Driven Development
* Why wait? Always write the tests, then the code
* Improves focus
* Encourages writing testable code
* And ensures tests actually get written...
* "Red, green, refactor"
### Rule 5.4: Use a symbolic debugger.
* Explore the program as it runs
* Better than print statements
* You don't have to re-run...
* ...or guess in advance what you'll need to know
* Use _breakpoints_ to stop program at particular points or when particular things are true
---
## Rule 6: Optimize Software Only After It Works Correctly
* Even experts find it hard to predict performance bottlenecks
* Small changes to code often have dramatic impact on performance
* So get it right, _then_ make it fast
### Rule 6.1: Use a profiler to identify bottlenecks.
* Reports how much time is spent on each line of code
* Re-check on new computers or when switching libraries
* Summarize across unit tests
### Rule 6.2: Write code in the highest-level language possible.
* People write the same number of lines of code per hour regardless of language
* So use the most expressive language available to get the "right" version...
* ...then rewrite core pieces (possibly in a lower-level language) to get the "fast" version
---
## Rule 7: Document Design and Purpose not Mechanics
* Goal is to make the next person's life easier
* Focus on things the code _doesn't_ say
* Or doesn't say clearly
* E.g., file formats
* An example is worth a thousand words...
### Rule 7.1: Document interfaces and reasons not implementations.
* Interfaces and reasons change more slowly than implementation details, so documenting them is better economics
* And most people care about using code more than understanding it
### Rule 7.2: Refactor code in preference to explaining how it works.
* Good code can be understood when read aloud
* Good programmers build libraries so that solving their problem is straightforward
* Again, "red, green, refactor"
### Rule 7.3: Embed the documentation for a piece of software in that software.
* Specially-formatted comments or strings
* More likely to be kept up to date
* More accessible to interactive help
* Many modern tools embed code in documentation rather than vice versa
---
## Rule 8: Collaborate
* Computers were invented to calculate
* The web was invented to collaborate
* Science is more fun when it's shared
### Rule 8.1: Use pre-merge code reviews.
* Have someone else review changes _before_ merging in version control
* Significantly reduces errors
* Good way to share knowledge
* It's what makes open source possible
### Rule 8.2 Use pair programming
* Code in pairs when bringing someone new up to speed
and when tackling particularly tricky problems.
* Two people, one keyboard, one screen
* An extreme form of code review
* Can get a bit tired if done all the time...
### Rule 8.3: Use an issue tracking tool.
* A shared to-do list
* Items can be assigned to people
* Supports comments, links to code and papers, etc.
* "Version control is where we've been, the issue tracker is where we're going"
---
## Gosh, That's a Lot
One step at a time.
1. Use text-based interfaces
2. Turn history into scripts
3. Put everything in version control
4. Use test-driven development
Citation: [Best Practices for Scientific Computing" ,
PLOS Biology, Jan. 2014](http://dx.doi.org/10.1371/journal.pbio.1001745).
« back to top
Adapted from Software Carpentry [Best Practices in Scientific Computing][best]
[best]: http://swcarpentry.github.io/slideshows/best-practices/index.html
---
## Background
* Software is lab equipment for the 21st Century
* Scientists spend a lot of time writing it
* But over 90% are self-taught
* They don't know what "good" looks like
* So we describe 24 practices in 8 groups
**Good programmers are 10X more productive than average**
**Good practices are 10X more productive than average**
---
## Rule 1: Write Programs for People not Computers
* Hard to tell if code that's difficult to understand is doing what it's supposed to
* Hard for other scientists to re-use it...
* ...including your future self
### Rule 1.1: Keep it simple
* Short-term memory can hold 7±2 items
* So break programs into short, readable functions, each taking only a few parameters
### Rule 1.2: Make names consistent, distinctive, and meaningful.
* `p` doesn't help the reader's short term memory as much as `pressure`
* Don't use `temp` for both "temporary" and "temperature"
* `i`, `j` are OK for indices in small scopes
### Rule 1.3: Make code style and formatting consistent.
* _Which_ rules don't matter -- _having_ rules does
* Brain assumes all differences are significant
* Every inconsistency slows comprehension
---
## Rule 2: Let the Computer Do the Work
* Computers exist to repeat things quickly
* 99% accuracy ⇒ 63% of at least one error per hundred repetitions
### Rule 2.1: Make the computer repeat tasks.
* Write little programs for everything
* Even if they're called scripts, macros, or aliases
* Easier to do this with text-based programming systems than with GUIs
### Rule 2.2: Save recent commands in a file for re-use.
* Most text-based interfaces do this automatically
* Repeat recent operations using `history`
* "Reproducibility in the small"
* Saving history supports "reproducibility in the large"
* An accurate record of how a result was produced
* _If_ everything can be captured
### Rule 2.3: Use a build tool to automate workflows.
* Originally developed for compiling programs
* Can be used whenever some files depend on others
* Makes workflow explicit
---
## Rule 3: Make Incremental Changes
* Most scientists don't have "requirements"
* They are their own users
* Code evolves in tandem with research
* Closest fit from industry is _agile development_
### Rule 3.1: Small steps with frequent feedback
* People can concentrate for 45-90 minutes without a break
* So size each burst of work to fit that
* Longer cycle should be a week or two
### Rule 3.2: Use a version control system.
* Tracks changes
* Allows them to be undone
* Supports independent parallel development
* Essential for collaboration collaboration
### Rule 3.3: Version control EVERYTHING
* Not just software: papers, raw images, ...
* Not gigabytes...
* ...but metadata _about_ those gigabytes
* Leave out things generated by the computer
* Use build tools to reproduce those instead
* Unless they take a very long time to create
---
## Rule 4: Don't Repeat Yourself (or Others)
* Anything repeated in two or more places will eventually be wrong in at least one
* If it's faster to re-create than to discover or understand, _fix it_
### Rule 4.1: There can be only one
* Every piece of data must have
a single authoritative representation in the system.
* Define constants exactly once
* Ditto file formats, geographical locations, ...
### Rule 4.2: Modularize code rather than copying and pasting.
* Reducing code cloning reduces error rates
* Cuts the amount of testing needed
* And increases comprehension
### Rule 4.3: Re-use code instead of rewriting it.
* It takes experts years to build high-quality numerical or statistical software
* Your time is better spent doing science on top of that
---
## Rule 5: Plan for Mistakes
* No single practice catches everything
* So practice _defense in depth_
_Note: improving quality increases productivity_
### Rule 5.1: Don't trust. Verify
* Add assertions to programs to check their operation.
* "This must be true here or there is an error"
* Like diagnostic circuits in hardware
* No point proceeding if the program is broken...
* ...and they serve as _executable documentation_
### Rule 5.2: Use an off-the-shelf unit testing library.
* Manages setup, execution, and reporting
* Re-run unit tests after every change to the code to check for _regression_
Testing is Hard
* "If I knew what the right answer was, I'd have published by now."
* Compare to experimental data
* Or to analytic solutions of simple problems
* Or to old (trusted) programs
* If nothing else, forces scientists to document what "errors" are acceptable
### Rule 5.3: Turn bugs into test cases.
* Write a test that fails when the bug is present
* Then work on the code until that test passes...
* ...and no others are failing
Test-Driven Development
* Why wait? Always write the tests, then the code
* Improves focus
* Encourages writing testable code
* And ensures tests actually get written...
* "Red, green, refactor"
### Rule 5.4: Use a symbolic debugger.
* Explore the program as it runs
* Better than print statements
* You don't have to re-run...
* ...or guess in advance what you'll need to know
* Use _breakpoints_ to stop program at particular points or when particular things are true
---
## Rule 6: Optimize Software Only After It Works Correctly
* Even experts find it hard to predict performance bottlenecks
* Small changes to code often have dramatic impact on performance
* So get it right, _then_ make it fast
### Rule 6.1: Use a profiler to identify bottlenecks.
* Reports how much time is spent on each line of code
* Re-check on new computers or when switching libraries
* Summarize across unit tests
### Rule 6.2: Write code in the highest-level language possible.
* People write the same number of lines of code per hour regardless of language
* So use the most expressive language available to get the "right" version...
* ...then rewrite core pieces (possibly in a lower-level language) to get the "fast" version
---
## Rule 7: Document Design and Purpose not Mechanics
* Goal is to make the next person's life easier
* Focus on things the code _doesn't_ say
* Or doesn't say clearly
* E.g., file formats
* An example is worth a thousand words...
### Rule 7.1: Document interfaces and reasons not implementations.
* Interfaces and reasons change more slowly than implementation details, so documenting them is better economics
* And most people care about using code more than understanding it
### Rule 7.2: Refactor code in preference to explaining how it works.
* Good code can be understood when read aloud
* Good programmers build libraries so that solving their problem is straightforward
* Again, "red, green, refactor"
### Rule 7.3: Embed the documentation for a piece of software in that software.
* Specially-formatted comments or strings
* More likely to be kept up to date
* More accessible to interactive help
* Many modern tools embed code in documentation rather than vice versa
---
## Rule 8: Collaborate
* Computers were invented to calculate
* The web was invented to collaborate
* Science is more fun when it's shared
### Rule 8.1: Use pre-merge code reviews.
* Have someone else review changes _before_ merging in version control
* Significantly reduces errors
* Good way to share knowledge
* It's what makes open source possible
### Rule 8.2 Use pair programming
* Code in pairs when bringing someone new up to speed
and when tackling particularly tricky problems.
* Two people, one keyboard, one screen
* An extreme form of code review
* Can get a bit tired if done all the time...
### Rule 8.3: Use an issue tracking tool.
* A shared to-do list
* Items can be assigned to people
* Supports comments, links to code and papers, etc.
* "Version control is where we've been, the issue tracker is where we're going"
---
## Gosh, That's a Lot
One step at a time.
1. Use text-based interfaces
2. Turn history into scripts
3. Put everything in version control
4. Use test-driven development
Citation: [Best Practices for Scientific Computing" ,
PLOS Biology, Jan. 2014](http://dx.doi.org/10.1371/journal.pbio.1001745).
« back to top
Adapted by Istvan Albert from Software Carpentry [Best Practices in Scientific Computing][best]
[best]: http://swcarpentry.github.io/slideshows/best-practices/index.html
---
## Background
* Software is lab equipment for the 21st Century
* Scientists spend a lot of time writing it
* But over 90% are self-taught
* They don't know what "good" looks like
* So we describe 24 practices in 8 groups
**Good programmers are 10X more productive than average**
**Good practices are 10X more productive than average**
---
## Rule 1: Write Programs for People not Computers
* Hard to tell if code that's difficult to understand is doing what it's supposed to
* Hard for other scientists to re-use it...
* ...including your future self
### Rule 1.1: Keep it simple
* Short-term memory can hold 7±2 items
* So break programs into short, readable functions, each taking only a few parameters
### Rule 1.2: Make names consistent, distinctive, and meaningful.
* `p` doesn't help the reader's short term memory as much as `pressure`
* Don't use `temp` for both "temporary" and "temperature"
* `i`, `j` are OK for indices in small scopes
### Rule 1.3: Make code style and formatting consistent.
* _Which_ rules don't matter -- _having_ rules does
* Brain assumes all differences are significant
* Every inconsistency slows comprehension
---
## Rule 2: Let the Computer Do the Work
* Computers exist to repeat things quickly
* 99% accuracy ⇒ 63% of at least one error per hundred repetitions
### Rule 2.1: Make the computer repeat tasks.
* Write little programs for everything
* Even if they're called scripts, macros, or aliases
* Easier to do this with text-based programming systems than with GUIs
### Rule 2.2: Save recent commands in a file for re-use.
* Most text-based interfaces do this automatically
* Repeat recent operations using `history`
* "Reproducibility in the small"
* Saving history supports "reproducibility in the large"
* An accurate record of how a result was produced
* _If_ everything can be captured
### Rule 2.3: Use a build tool to automate workflows.
* Originally developed for compiling programs
* Can be used whenever some files depend on others
* Makes workflow explicit
---
## Rule 3: Make Incremental Changes
* Most scientists don't have "requirements"
* They are their own users
* Code evolves in tandem with research
* Closest fit from industry is _agile development_
### Rule 3.1: Small steps with frequent feedback
* People can concentrate for 45-90 minutes without a break
* So size each burst of work to fit that
* Longer cycle should be a week or two
### Rule 3.2: Use a version control system.
* Tracks changes
* Allows them to be undone
* Supports independent parallel development
* Essential for collaboration collaboration
### Rule 3.3: Version control EVERYTHING
* Not just software: papers, raw images, ...
* Not gigabytes...
* ...but metadata _about_ those gigabytes
* Leave out things generated by the computer
* Use build tools to reproduce those instead
* Unless they take a very long time to create
---
## Rule 4: Don't Repeat Yourself (or Others)
* Anything repeated in two or more places will eventually be wrong in at least one
* If it's faster to re-create than to discover or understand, _fix it_
### Rule 4.1: There can be only one
* Every piece of data must have
a single authoritative representation in the system.
* Define constants exactly once
* Ditto file formats, geographical locations, ...
### Rule 4.2: Modularize code rather than copying and pasting.
* Reducing code cloning reduces error rates
* Cuts the amount of testing needed
* And increases comprehension
### Rule 4.3: Re-use code instead of rewriting it.
* It takes experts years to build high-quality numerical or statistical software
* Your time is better spent doing science on top of that
---
## Rule 5: Plan for Mistakes
* No single practice catches everything
* So practice _defense in depth_
_Note: improving quality increases productivity_
### Rule 5.1: Don't trust. Verify
* Add assertions to programs to check their operation.
* "This must be true here or there is an error"
* Like diagnostic circuits in hardware
* No point proceeding if the program is broken...
* ...and they serve as _executable documentation_
### Rule 5.2: Use an off-the-shelf unit testing library.
* Manages setup, execution, and reporting
* Re-run unit tests after every change to the code to check for _regression_
Testing is Hard
* "If I knew what the right answer was, I'd have published by now."
* Compare to experimental data
* Or to analytic solutions of simple problems
* Or to old (trusted) programs
* If nothing else, forces scientists to document what "errors" are acceptable
### Rule 5.3: Turn bugs into test cases.
* Write a test that fails when the bug is present
* Then work on the code until that test passes...
* ...and no others are failing
Test-Driven Development
* Why wait? Always write the tests, then the code
* Improves focus
* Encourages writing testable code
* And ensures tests actually get written...
* "Red, green, refactor"
### Rule 5.4: Use a symbolic debugger.
* Explore the program as it runs
* Better than print statements
* You don't have to re-run...
* ...or guess in advance what you'll need to know
* Use _breakpoints_ to stop program at particular points or when particular things are true
---
## Rule 6: Optimize Software Only After It Works Correctly
* Even experts find it hard to predict performance bottlenecks
* Small changes to code often have dramatic impact on performance
* So get it right, _then_ make it fast
### Rule 6.1: Use a profiler to identify bottlenecks.
* Reports how much time is spent on each line of code
* Re-check on new computers or when switching libraries
* Summarize across unit tests
### Rule 6.2: Write code in the highest-level language possible.
* People write the same number of lines of code per hour regardless of language
* So use the most expressive language available to get the "right" version...
* ...then rewrite core pieces (possibly in a lower-level language) to get the "fast" version
---
## Rule 7: Document Design and Purpose not Mechanics
* Goal is to make the next person's life easier
* Focus on things the code _doesn't_ say
* Or doesn't say clearly
* E.g., file formats
* An example is worth a thousand words...
### Rule 7.1: Document interfaces and reasons not implementations.
* Interfaces and reasons change more slowly than implementation details, so documenting them is better economics
* And most people care about using code more than understanding it
### Rule 7.2: Refactor code in preference to explaining how it works.
* Good code can be understood when read aloud
* Good programmers build libraries so that solving their problem is straightforward
* Again, "red, green, refactor"
### Rule 7.3: Embed the documentation for a piece of software in that software.
* Specially-formatted comments or strings
* More likely to be kept up to date
* More accessible to interactive help
* Many modern tools embed code in documentation rather than vice versa
---
## Rule 8: Collaborate
* Computers were invented to calculate
* The web was invented to collaborate
* Science is more fun when it's shared
### Rule 8.1: Use pre-merge code reviews.
* Have someone else review changes _before_ merging in version control
* Significantly reduces errors
* Good way to share knowledge
* It's what makes open source possible
### Rule 8.2 Use pair programming
* Code in pairs when bringing someone new up to speed
and when tackling particularly tricky problems.
* Two people, one keyboard, one screen
* An extreme form of code review
* Can get a bit tired if done all the time...
### Rule 8.3: Use an issue tracking tool.
* A shared to-do list
* Items can be assigned to people
* Supports comments, links to code and papers, etc.
* "Version control is where we've been, the issue tracker is where we're going"
---
## Gosh, That's a Lot
One step at a time.
1. Use text-based interfaces
2. Turn history into scripts
3. Put everything in version control
4. Use test-driven development
Citation: [Best Practices for Scientific Computing" ,
PLOS Biology, Jan. 2014](http://dx.doi.org/10.1371/journal.pbio.1001745).
« back to top