Y chronology awry

Dienekes links to and discusses a current paper by George Busby and colleagues Busby:peopling:2011 on the Y chromosome chronology for the settlement of Europe: "Back to the drawing board for R-M269 (Busby et al. 2011)." The main idea is that microsatellite loci on the Y chromosome have made up the majority of our information about biogeography using this marker, but the rate of mutational changes of these loci has been badly misapplied:

A bad clock is not useless: it gives you some information about time. Moreover, you can often use several to iron out the inaccuracy of any single one of them.

Unfortunately, better estimation through averaging of bad estimators works only in one case: when the estimators are unbiased.

The inclusion of some fast-mutating STR loci tends to make all estimates too young. The paper finds that this problem is general, affecting most commonly-used datasets.

Our analysis confirms that this phenomenon is not specific to the R-M269 haplogroup nor to methods using ASD. Figure 4b shows that STRs with high D produce larger estimates of T. What is clear is that estimates of T implicitly depend on the STRs that are selected to make this inference. Using BATWING on an HGDP population for which 65 Y-STRs are available, we have shown that the median estimate of TMRCA can differ by over five times when STRs are selected on the basis of the expected duration of linearity (electronic supplementary material, figure S4). While researchers take into account STR mutation rates when estimating divergence time with ASD, commonly used STRs do not have the specific attributes that allow linearity to be assumed further into the past. The majority of haplogroup dates based on such sets of STRs may therefore have been systematically underestimated.

One weakness of the study is that its reliance on geographic patterns of the haplotypes depends on the assumption that they have evolved neutrally relative to each other. Selection might radically affect this pattern.