Sunday, November 24, 2013

Free Resources I'd Gladly Have Paid For

In this post I hope to compile a list of free resources I frequently use.  The range of topics mostly includes statistics, population and evolutionary genetics, and programming.  As the title says, to me these resources have been so useful that I'd have paid for them, but since they are free I guess I have received infinite value!

Stats

John McDonald's Handbook of Biological Statistics (free PDF):

-- I am a huge fan of this book.  If John McDonald has done one thing here it's making sure that the easy stuff is EASY!  He explains the basics like Chi-Square, correlation, t-test, and regression with such clarity.  This is my go-to resource for making sure I'm doing the simple stuff right.  The other thing that is brilliant about this book is that for each statistical test he has a section about when to use it, a section about what the null hypothesis is, an intuitive section about how the test works, and some great (biologically relevant) examples using the test.  My only tiny gripe (if one can gripe about anything this good and free!) is that his code examples typically use SAS, and I'm now throughly entrenched in R.  That said, his explanations are so clear I have little trouble translating the concepts into R.

Christian Walck's Handbook on Statistical Distributions for Experimentalists (free PDF)

-- This is a pretty straight forward guide to a whole little bestiary of statistical distributions. For this type of information I refer to this handbook and Wikipedia about equally.  I find that I need examples to understand how distributions work.   For some distributions I more easily grasp Wikipedia's examples and for others I like the examples in this handbook.

Course Notes from U. Wisc. Statistics 571 by Brets Hanlon and Larget (website)

-- There are some great examples in the course slides from the two Brets.  All the examples I've seen use R.  Hopefully this site will stay up if they stop teaching the course.  Here's a great example of how to do power analyses in R.  I love how they marry the intuitive images with the math and the R code so you can translate between all three mental activities. 

Allen Downey's Think Bayes and Think Stats (Bayes PDF, Stats PDF)

-- Both of these are useful, especially if you program in Python, as all the examples are given through Python code.  I also found that both are right at the boundary of where I can readily follow the math.  Downey teaches at a really good engineering school, so I think his typical student is probably very good at math! 

Think Stats is OK, especially regarding intuitive descriptions of various distributions and their real world applications.  I'm more in favor of John McDonald's book (above) when it comes to understanding statistical tests and knowing which test to choose.  Regarding Think Bayes, it's a nice introduction to Bayesian thinking.  He avoids dragging the reader through the "why frequentist stats are wrong and need to be replaced by Bayesian stats" zealotry that many other Bayesian texts start with, and I appreciate that.  In the future I intend to blog more about my dabbles into Bayesian inference.  I'm currently working my way through John Kruschke's book, which is not free, but seems to me well worth the money (and does indulge in a little zealotry).  

UPDATE: Added the tutorial below on PCA by Lindsay Smith
Lindsay Smith's A Tutorial on Principal Components Analysis (free PDF)

-- Just like the title says, this is a nice gentle explanation of the inner workings of the PCA. Ever wonder what's happening under the hood when you run a PCA?  Read this and you'll have a working understanding of what's going on.

And one I don't use

-- A couple different folks with heavy-duty computational backgrounds have pointed me to David MacKay's Information Theory, Inference, and Learning Algorithms (free PDF). I've only spent a little time with it.  It's not my cup of tea, but I know people who think it's the bees knees so I'll list it here.  I think the problem is that I have no background and no current need for machine learning. (or maybe I do have a need but know so little about the topic that I don't realize it)

Population Genetics - Four Awesome Free Resources for the Price of Nothing!

Kent Holsinger's Lecture Notes in Population Genetics (free PDF)

-- I met Kent once a few years ago when I interviewed for a position at UConn.  He was a super nice guy and among the UConn grad students his population genetics course is something of a legend.  These are the course notes from that course.  Like John McDonald's stats handbook, Holsinger does a really nice job of explaining what you are trying to do, when you'd want to do it, and then works you through some great examples. 

Now with 3X Pop. Gen. Power per Page
Graham Coop's Notes on Population Genetics (free PDF) 
UPDATE: Per Graham's suggestion, I've linked to GitHub, where he keeps and up-to-date copy of his notes. You'll want the file called popgen_notes.pdf.

-- This is a super concentrated form of population genetics knowledge.  It's like that laundry detergent where you add a thimble full for a whole load of wash.  It has no Intro, no Table of Contents, and no Appendices or Indices, but you get 169 equations in 55 pages and just enough prose to connect all the equations!  I reference this often. Especially when I think I know what I'm doing but I want to make sure I'm right and don't want to read much fluff in the process.

Joe Felsenstein's Theoretical Population Genetics (free PDF)

-- This baby will probably retail for big bucks if/when Felsenstein every decides to publish it (so download your free copy today!).  It's a tome.  It has pretty much everything (though according to Felsenstein's website it is unfinished).  I read this when I want to make sure I really get something. And sometimes it helps, and sometimes it makes me realize that what I get is the tip of the iceberg! 

Magnus Nordborg's Chapter on Coalescent Theory (free PDF)

-- Several years ago I was really confused about coalescent theory.  I think the problem was that I was reading a bunch of really sophisticated uses of it, and lacked the necessary background information.  Then I found this book chapter and realized that 1) probably the most daunting thing about coalescent theory is it's fancy name, and 2) it's pretty intuitive to anyone who has done a fair bit of "tree thinking".  

UPDATE: Population Genetics for Non-Model Taxa (free videos and content)

-- The American Genetics Association is hosting videos and other materials for a course they offered in the summer of 2013 on Population Genetics for Non-Model Taxa.  I really like the videos, especially the five videos by Alex Buerkle (about half way down the page at the link above).  He does a great job of explaining Bayesian statistics and demonstrating how they can be useful for estimating allele frequencies and Fst from genomic data sets. There are also several other helpful videos detailing things like RAD and GBS methods and transcriptomics.  As the title of the course suggests, these methods are great for researchers working on non-model taxa (i.e. species with few existing genomic resources).      

Programming

Allen Downey's Think Python (free PDF)

-- I didn't use this book to learn Python. Instead I used Dive Into Python (also free).  I had come from programming in Perl and Java before finding Python, so the Dive In approach worked for me.  But Dive In pretty much assumes you have some programming background.  Rather than start with the classic "Hello World", the very first program is this one:
def buildConnectionString(params):
    """Build a connection string from a dictionary of parameters.
    Returns string."""
    return ";".join(["%s=%s" % (k, v) for k, v in params.items()])
if __name__ == "__main__":
    myParams = {"server":"mpilgrim", \
                "database":"master", \
                "uid":"sa", \
                "pwd":"secret" \
                }
    print buildConnectionString(myParams)

Looking at that now, after using Python nearly every day for about 7 years, and I can't tell you what it's doing!  The book should be called Dive Into the Deep End of Python.

Think Python, on the other hand, takes you through a much gentler route.  This is the book I now recommend to others, especially those who are new to programming.


R Reference Card (free PDF)
UPDATE: see the comment below by Mary M, which includes a link to a nice set of introductory R videos.


-- I've never found a good free tutorial for R.  Maybe somebody out there knows of one?  I used this book by Peter Dalgaard and a great course by Dan Nettleton to learn R.  I would generally recommend both, but you don't have to move to Iowa to get Dalgaard!  


I do occasionally keep a copy of the R Reference Card (above), although I think I lost my printed copy in my most recent move.  Once you get a toe-hold with R, you can use it's handy little builtin search to figure out stuff pretty easily. For example ??binomial, shows all the help materials that contain the term "binomial". Based on the descriptions, I think I want to see this one: Binomial.  So then to get the documentation just type in help(Binomial), and it gives me a pretty good working description of what a Binomial is and some examples of how to use it in R.  

8 comments:

  1. Glad the notes are helpful. I keep an updated copy of my notes on github (e.g. it has a contents page ;-) ). Feel free to correct typos suggestion changes via git.

    Graham

    ReplyDelete
  2. Sorry meant to include a link to the github: https://github.com/cooplab/popgen-notes .

    ReplyDelete
    Replies
    1. Thanks Graham. I do find your notes really helpful. It looks like there are some updates in GitHub. I'll update the link above and my own copy.

      Delete
  3. Have a look at this intro to R and see if you think it's useful. http://www.extension.org/pages/60427/introduction-to-r-statistical-software#.UpYi6uL3PTo

    ReplyDelete
    Replies
    1. Thanks for the link Mary M. I had a quick look and the video series seems to be a pretty good intro to R!

      I think investing the time to learn R is one of the best things I did in grad school. I now work in industry and at least where I work a knowledge of R is viewed as a big positive for job candidates. In part because R seems to get the latest techniques first, and also because industry pays staggering prices to license R's commercial competitors.

      Delete
  4. Lex - this is great! I didn't know about some of these resources. Now downloaded!
    - Jannice

    ReplyDelete
  5. Found this after looking for one of Holsinger's lectures. Thanks for assembling this list!

    ReplyDelete