john hawks weblog

paleoanthropology, genetics and evolution

Africa

  • Neandertal introgression, 1000 Genomes style

    Sat, 2011-12-10 18:16 -- John Hawks

    For our project to understand pigmentation genetics in archaic humans, we had to find a good comparative sample of sequence data from recent humans. The original publication on the draft Neandertal genomes compared them to five low-coverage genomes from different Old World populations, along with the publicly available genomes from Craig Venter and others [1]. The first publication on the Denisova genome added an additional handful of genomes to these comparisons [2].

    Some of these handful of genomes from living people are more similar to the Neandertal and Denisova genomes than others. That simple fact is the proof that some living people have Neandertal and Denisovan ancestors.

    But until now, the comparison has been limited to a very small number of human genomes. That became a focus for critics of the Neandertal and Denisovan results. How could three or four genome sequences possibly provide an adequate representation of human variability? We could imagine scenarios in which the similarities between Neandertal and humans could be explained by some unsampled population, for example, northeast Africans [3]. Denisova does not present the same problem, because African population structure cannot possibly explain its resemblance to populations in Wallacea, Australia, and Oceania [2] [4]. But to compare either of these genomes, we should seek a broader sampling of genomes from living people.

    As I wrote yesterday, my students and I have been working to understand pigmentation genetics of the archaic human genomes ("Pigmentation of archaic humans: introduction"). I've emphasized the need to break the analysis into small steps. For this question, we need to examine whether the pattern of introgression around pigmentation genes is characteristic of the genome as a whole. If genes involved in pigmentation have systematically higher or lower levels of Neandertal ancestry, that will tell us a lot about the evolutionary history of pigmentation in recent and archaic humans. For this, we need a good comparative sample, and the 1000 Genomes Project provides the best sample available.

    The first step in assessing the pattern of introgression for pigmentation genes is to characterize the pattern of introgression across the whole genome.

    Yes, a whole-genome introgression analysis sounds awfully big for my "small steps" concept. But actually this is simpler than it might sound. Here's a teaser:

    The figures in this post are not from a whole-genome analysis; they include data from eight chromosomes that we prioritized because of our pigmentation analysis. I am licensing all of them under a Creative Commons ShareAlike license so that anyone can use them anywhere.

    UPDATE (2011-12-10): I finished the whole genome analysis and am updating this post and figures accordingly. The results are the same throughout, with the exception of the Europe-East Asia comparison, which now shows these populations to be significantly different across the genome as a whole. I have partially updated the figures and will finish these later today.

    The value of sequences

    The 1000 Genomes Project data have been updated several times in the last year, as both sequencing and analysis of the genomes have progressed (more information on 1000 Genomes Project website). We downloaded a release of SNP genotype calls from 1094 individuals, based on the low-coverage (average 4x) sequencing that has been carried out on the sample.

    A SNP (single nucleotide polymorphism) is a nucleotide site with at least two alleles present in the global human sample. These sites represent only one kind of genetic variation in today's populations. Many of the differences between people's genes are caused by insertions, duplications, deletions, transpositions, or inversions. But those kinds of polymorphisms can be challenging to study in low-coverage genomes, and we already understand quite a lot about SNPs in human populations from the earlier HapMap project [5] [6]. The HapMap provided the data underlying our 2007 paper on the acceleration of recent human evolution ("Why human evolution accelerated") [7].

    The drawback of earlier SNP variation projects is that they examined only a subset of SNP variation in a sample of people. To design a microchip that could provide a million or more SNP genotypes from a saliva sample, somebody first had to discover where in the genome SNPs could be found. So they took small samples of people, sometimes only a single person's two copies of the genome, and sequenced. Adding together SNPs found by several methods, they could get a representation of SNP variation across the whole genome in a population. But this process introduced a bias: the SNPs were ascertained in a sample that inevitably could not represent humans in other samples with the same accuracy. Initially, SNP samples were heavily biased toward people of European ancestry (upon whom most genetic work was originally done), and the HapMap project went to great efforts to increase the representation of other populations. But even with the best possible ascertainment, interpreting SNP variation requires us to jump through some theoretical hoops.

    Sequence data make life much easier for the population geneticist. Seriously, working on this stuff on the whiteboard is fun instead of a constant nightmare of sampling biases and spaces between markers. I have a bias myself, in that I find recombination hard to deal with. I love reticulation among populations, but I'd rather work with genealogies that look like proper trees instead of a liana-strewn mess. So looking at sequence data over short intervals makes me happy. Not as happy as beer aged in bourbon barrels, but happy.

    The 1000 Genomes Project SNP files represent every SNP mutation observed in the sample. In other words, these are sequence data, just with all the fixed (and therefore redundant) sites removed. Even so, these sequence data are not perfect. Low coverage means that some rare mutations in the sampled individuals will go unreported. We aren't typically interested in singleton mutations in the sample, except that missing them will introduce a bias upon our estimates of the time that common ancestors lived. Next-gen sequence reads are usually fairly riddled with errors. High coverage allows these errors to be removed with some confidence, but low-coverage genomes risk throwing out real SNPs along with the spurious ones. The publicly available files represent some analytical steps that we do not here control, so we have to work with the understanding that the data are not perfect.

    The 1000 Genomes SNP files have had a phasing algorithm applied to them, which attempts to assign genotypes to chromosomes. In essence, phasing tries to figure out whether adjacent SNP alleles belong to the same copy or to different copies of the same chromosome. The details of this phasing are not yet apparent, and for many reasons I am cautious about using phased data. The inference is often inaccurate for rare mutations, and the whole process tends to sneak assumptions about population history into the resulting dataset. I hate being forced to live with someone else's assumptions about human population history, and I typically try to avoid needing phased data. In this case, it looks like the data over short intervals are as accurate as they can be, given the limitations on coverage and sampling. We have moved forward by applying methods that make a bare minimum of assumptions.

    Counting derived SNP alleles

    David Reich and colleagues came up with an appealingly simple test of introgression, which they applied to both the Neandertal and Denisovan genomes. Eric Durand, Reich, Nick Patterson and Monty Slatkin described the method formally this year [8], which they call the D-statistic. Informally, this has become known as the ABBA-BABA test, after their labels for the discordant genealogies that the test compares. By and large, across the genome, humans living today share many more new mutations with each other than they do with an archaic human like a Neandertal. But sometimes two genomes are different from each other, and one of them shares a new mutation with the Neandertal.

    A human might share a mutation with a Neandertal because it actually isn't very new, and both inherited the mutation from some much more ancient population of humans. This scenario is called "incomplete lineage sorting", because humans today have multiple gene lineages that existed within some very ancient population, instead of these having been "sorted" cleanly into the different human and Neandertal populations. Incomplete lineage sorting does happen a lot between humans, Neandertals, and Denisovans. ILS is the normal mode of variation among recent human populations, who trace their genealogical histories back much further than the earliest "modern" humans. So if one human has a Neandertal allele, and another human has a different allele, it's probably no big deal. They both just inherited gene variants that already existed in our distant common ancestors.

    You can probably see already that if we had a way to estimate the age of an allele, we could tell whether incomplete lineage sorting is a credible explanation for any particular site. I'll leave that point for another post.

    In the meantime, if we pretend that we know nothing at all about the ages of alleles, we must find some other way to tell whether incomplete lineage sorting can explain Neandertal similarities. Reich and colleagues recognized that incomplete lineage sorting from ancient pre-Neandertal ancestors ought to be distributed equally among living people. If we look at every site in the genome where we have data from Neandertals, we should find that one living human genome should look like the Neandertal just as often as another.

    This insight led to their test. Take a pair of humans, count the number of times sequence A is like the Neandertal and sequence B is like a chimpanzee, and then do the inverse — B then A. ABBA-BABA.

    Why a chimpanzee? In most cases the chimpanzee allele will represent the ancestral state for humans. Living people can inherit ancestral alleles from Neandertals as well as derived ones, but the derived ones tend to be rarer and younger within human populations. If one living genome shares an ancestral allele with the Neandertal genome, we don't need incomplete lineage sorting or introgression to explain the pattern. For all we know, such a mutation originated after Neandertals were already gone. So we need to pay attention to the derived mutations, ones that are present in Neandertals but not in chimpanzees. Do a count of these across the genome, and if you find a living genome with significantly more than another, you've found evidence for introgression.

    Ed Green, David Reich and colleagues [1] [2] did a comparison of every possible pair of genomes in their modern human sample. These sequence data were gappy, so that sequence A might share different coverage with B than with sequence C. So it was necessary to consider each pair separately, counting all the sites where both human sequence and the Neandertal and chimpanzee sequences had data.

    The 1000 Genomes Project sample reports genotypes for every SNP for every sampled individual. So in principle, every pair of sequences should have data for every one of these sites. Again, we have to be cautious about the nature of the sequencing, attending to the possibility of systematic biases due to low coverage. But we really don't have to take the time-consuming step of comparing every possible pair of the 2188 resulting haploid genomes. We can just find the derived SNP alleles that are present in Neandertals and count how many of them are in each of the human sequences. If one sequence has significantly more Neandertal derived alleles than another, it had to get them somehow.

    That magic three percent

    The figure at the top of the post represents that count. Every individual in the 1000 Genomes Project dataset has two copies of the autosomal genome. Separating these two copies of the genome (basically arbitrarily) and counting up the shared derived features between each of those copies and the genome of Vindija 33.16, we obtain the histogram. Here it is again:

    The African genomes in the 1000 Genomes sample include Yoruba from Nigeria and Luhya from Kenya. The Asian populations sampled are Japanese and Chinese, including people of Han Chinese ethnicity in Beijing and southern China. The European ancestry samples include the CEU sample from Utah, as well as British, Tuscan, Spanish and Finn samples.

    The histogram shows that Asian and European genomes have significantly more Neandertal derived SNP alleles than do the African genomes. The averages for the Asian and European samples are around 3% higher than the average for the African samples. Whatever gave Africans some degree of similarity to Neandertals, non-Africans seem to have gotten around 3% more of it.

    Green and colleagues [1] assumed conservatively that Africans share derived SNP alleles with Neandertals only because of incomplete lineage sorting from the human-Neandertal ancestral population. This fraction should be the same in all human populations, under the assumption that Africans were mostly isolated from Neandertals for some period of time. The 3% Neandertal bonus outside Africa should then represent introgression from Neandertals into recent populations outside Africa.

    Both previous studies noted that genomes outside Africa are not significantly different in the fraction of derived SNP alleles shared with Neandertals. A genome from China and a genome from France carried the same fraction of shared derived SNP alleles with Neandertals. Here, we've confirmed that basic identity in the level of introgression in these populations.

    I have told several people now that I find the distributions in China and Europe spookily similar. On parts of the genome, the two distributions have means that are not significantly different. Indeed, I worked for a week with an analysis of eight chromosomes, in which the East Asian and European means were fewer than 100 SNP alleles apart. Even across the whole genome, Europeans average only 700 derived SNP alleles more than the East Asian sample. This small difference a bit more than a tenth of a percent) is strongly significant on these sample sizes. A t-test yields a p-value of 1.1 times 10-26 on the difference in means. Even so, the distributions of these two populations overlap across most of their ranges.

    Seeing these hundreds of genomes arrayed on a histogram provides much more information than we had from a handful of genomes. It is remarkable how much dispersion there is among genomes from a single population. Although the means of these two samples are nearly the same, you can see that each of them has a large range of variation in the shared derived SNP alleles with Neandertals. This variation means that people within a single population have very different proportions of Neandertal ancestry.

    This is not a graph of people, but a separation of the two copies of SNP alleles carried by these people. That separation is phased at short scales but arbitrary on the scale of a whole chromosome, so the histogram likely understates the variance among single genomes while it overestimates to some extent the variation among people with their diploid genomes. Still, it looks likely from these comparisons that some people in Europe carry more than a percent higher Neandertal ancestry than the average, and some carry a percent less. We can use statistical methods to test this hypothesis directly as applied to individuals in the sample.

    Neandertal genes in recently admixed populations

    A sample of hundreds of people allows us to demonstrate significant differences among the genomes of different populations. Some of the 1000 Genomes Project samples are from populations that represent historically recent admixture of people who trace their ancestry to different parts of the world.

    For example, the "ASW" population sample includes African-American people who live in the Southwest United States. We know from many other genetic studies that African-Americans vary in the fraction of ancestry they derive from Europeans and from Africans. The average amount of African and European ancestry varies among African-Americans who live in different parts of the U.S., as low as 3% and as high as 20% or more in some parts of the country. The proportion among individuals varies even more. So when we consider the ASW sample, we should expect to see a lot of variation in the number of shared derived SNP alleles with Neandertals, with a mean higher than African populations.

    Which is exactly what we do see:

    The ASW sample overlaps substantially with the Yoruba sample from West Africa (Nigeria) and slightly with the CEU sample, which includes people of European ancestry in Utah. The total in the ASW genomes is more variable than either the Yoruba or CEU population samples. If the higher mean in the ASW genomes reflects European ancestry from a population like CEU, the proportion of European ancestry would be around 17% for that sample of people. It would be hard to tell from these numbers alone how much of the variation in ASW is attributable to variation in ancestry fraction, and how much is expected within a population of homogeneous ancestry. As we'll see in some other populations, there are some appreciable differences among populations within a given region, and ancestry differences may add to the variation among individuals within populations.

    We see a similar pattern when we look at the Puerto Rican sample. Individuals in this sample have some ancestry from European, Native American and African ancestors. The comparisons by Reich and colleagues [2] and Green and colleagues [1] suggested that Native American populations have the same fraction of Neandertal ancestry as other people outside Africa. In the comparison with YRI and CEU samples, Puerto Rican (PUR) genomes are intermediate, with a mean suggesting around 15% ancestry from the West African population.

    The two outlier points in the Puerto Rican sample are the two genome copies from one individual, who we would hypothesize had much higher African ancestry than the average in the sample.

    Next...

    This post has taken me much longer than I expected to get to the point of talking about variation among samples within continental regions. It turns out that, despite the similarity of European and East Asian samples in their averages, there are substantial differences between samples within each of these regions.

    For example, here's a comparison of north and south Chinese samples:

    People of Han Chinese ethnicity sampled in Beijing appear to have on average a half percent more Neandertal ancestry than people of the same ethnicity sampled in southern China. I found these kinds of differences almost everywhere I looked within regions. More later...


    References

    1. Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH, et al. 2010. A Draft Sequence of the Neandertal Genome. Science [Internet] 328:710–722. Available from: http://dx.doi.org/10.1126/science.1188021
    2. Reich D, Green RE, Kircher M, Krause J, Patterson N, Durand EY, Viola B, Briggs AW, Stenzel U, Johnson PLF, et al. 2010. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature [Internet] 468:1053–1060. Available from: http://dx.doi.org/10.1038/nature09710
    3. Hodgson JA, Bergey CM, and Disotell TR. 2010. Neandertal genome: the ins and outs of African genetic diversity. Current biology : CB 20:R517-9.
    4. Reich D, Patterson N, Kircher M, Delfin F, Nandineni MR, Pugach I, Ko AM-S, Ko Y-C, Jinam TA, Phipps ME, et al. 2011. Denisova admixture and the first modern human dispersals into southeast Asia and oceania. American journal of human genetics 89:516-28.
    5. The International HapMap Consortium. 2005. A Haplotype Map of the Human Genome. Nature [Internet] 437:1299–1320. Available from: http://dx.doi.org/10.1038/nature04226
    6. McVean G, Spencer CCA, and Chaix R. 2005. Perspectives on human genetic variation from the HapMap Project. PLoS genetics 1:e54.
    7. Hawks J, Wang ET, Cochran G, Harpending HC, and Moyzis RK. 2007. Recent acceleration of human adaptive evolution. Proceedings of the National Academy of Sciences, U. S. A. [Internet] 104:20753–20758. Available from: http://dx.doi.org/10.1073/pnas.0707650104
    8. Durand EY, Patterson N, Reich D, and Slatkin M. 2011. Testing for ancient admixture between closely related populations. Molecular biology and evolution [Internet]. Available from: http://dx.doi.org/10.1093/molbev/msr048
    Synopsis: 
    We're quantifying the amount of Neandertal ancestry in whole genome data from living people.
  • The risk gradient

    Wed, 2011-11-09 23:58 -- John Hawks

    Ann Gibbons reports [1] from the International Congress of Human Genetics, on papers that examine GWAS risk alleles for type 2 diabetes: "Diabetes Genes Decline Out of Africa" (paywall).

    At the poster session, Stanford graduate student Erik Corona stood in front of a Google Earth map of the world that he finds surprising. On this map he had plotted the frequency of 12 gene variants known to be associated with type 2 diabetes in 51 populations from Australia to Zaire. It shows “a clear gradient of red to green from west to east, from Africa to Asia,” Corona says (see map). “Something strange is going on with type 2 diabetes.”

    This is of course a challenging problem because risk alleles identified in one population may not replicate in other populations. The most well-known example is ApoE4, strongly associated with Alzheimer's Disease in Europeans, but not in Africans. More generally, looking at a set of risk variants that are identified in one population introduces an ascertainment bias that constrains their likely frequencies in other populations. An allele is more likely to yield a statistically significant association with a trait if the allele is not too rare. If we take many alleles associated with a trait, we're likely to see some gradient across populations due to this bias alone.

    Hidden ascertainment bias is a problem we run up against quite a lot. It may not apply in this case, depending on where the risk alleles were identified, in particular since many risk alleles for type 2 diabetes appear to be linked to recent positive selection (explaining why I got interested).


    References

    1. Gibbons A. 2011. Diabetes Genes Decline Out of Africa. Science 334:583 - 583.
  • African Homo erectus

    Tue, 2011-11-08 00:14 -- John Hawks
    Synopsis: 
    African specimens from the Early Pleistocene are compared

    This station includes several casts of early fossil Homo erectus, from the Early Pleistocene of Africa. These include:

    • OH 9, from Olduvai Gorge, Tanzania, around 1.2 million years old.
    • KNM-ER 3733, from Ileret, Kenya, 1.65 million years old.
    • KNM-ER 3833, from Koobi Fora, Kenya, 1.6 million years old.
    • KNM-WT 15000, from Nariokotome, Kenya, 1.5 million years old.

    In addition to these specimens, the station has a few comparative casts from earlier hominid species and from other parts of the world.

    What to do: First, consider the issue of sexual dimorphism in these specimens. Which are male and which are female? What features lead you to that conclusion?

    Second, why are the differences between these specimens and Homo habilis, for example, KNM-ER 1813, reflective of a species distinction, instead of sex?

  • Announcing the Malapa Soft Tissue Project

    Sat, 2011-09-03 17:34 -- John Hawks

    I am pleased to announce a new open science initiative, focused on a discovery that is unique in paleoanthropology. Together we are going to find out if the Malapa site has preserved evidence of soft tissue from an ancient hominin species.

    If you've arrived at this page from outside the site, here's a link to the main project headquarters.

    In the August, 2011 issue, National Geographic reported on the Malapa fossils, including a teaser that the site may preserve skin from two hominin individuals. (I pointed to the article last month.)

    The suggestion is obviously surprising. Many readers will remember how much controversy surrounded claims about soft tissue preservation from dinosaurs several years ago. Yet extraordinary preservation contexts do exist in the fossil record. Indeed, a few years ago Lee Berger's team, including several of the people now working on the Malapa hominins, identified hair preserved inside hyena coprolites from Gladysvale cave, more than 200,000 years old and only a short distance from Malapa [1].

    Could Malapa present the first evidence of soft tissue from a fossil hominin? If so, what can it tell us about human evolution?

    The day the National Geographic article was published online, I was standing with Lee in his lab looking at what might be australopithecine skin. I'm not talking about an imprint of skin, like a skin cast. These appear to be thinly layered, possibly mineralized tissue.

    Suppose it's really skin, or some other soft tissue, I thought. How would you go about testing the hypothesis? Extraordinary claims require extraordinary evidence. Even if you could demonstrate it to your own satisfaction, what would it take to convince the doubters? How many distinct observations would be possible from these objects? What instruments would you use, and what comparative samples would you need?

    Lee said this was his problem as well. He has access to some of the most sophisticated technology in the world. Some kinds of observations are obvious. He can micro-CT the apparent soft tissue evidence, look within the rock at its structure. He can sample the chemical content, and use scanning and confocal microscopes to examine it. He could sacrifice a small sample to be microscopically dissected. At the end, he would have an answer involving all these comparisons. But would it be convincing?

    Lee then made an inspired proposal: What if the process itself were an experiment?

    Much of the criticism of other surprising fossil discoveries has been fueled by their secrecy. Science done by a closed process means fewer eyes looking at data, and too many chances for errors to pass unnoticed. Unnoticed, that is, until publication. Then, a firestorm of controversy may erupt as the scientific community at last examines the methods and results closely. In anthropology, the most critical errors are often missed comparisons -- sometimes simple things that a research team could have looked at, if they had only thought of it.

    An open process has the chance of improving research by broadening it. We want stronger, clearer results, and we want to anticipate every important criticism. If a significant comparison can be added by people who have the right tools, why not get those people involved? If we stand a chance of finding those people by making the process more open, why not do it?

    Lee suggested that this soft tissue evidence could be the basis of a true experiment in whether paleoanthropology could be done as open science. I've been agitating about open science for years, and I volunteered right away to host the experiment and work to make it a success. We went immediately to Rachelle Keeling, the graduate student who will be coordinating the project, and described how we thought it could work. She was enthusiastic about the idea of a truly new kind of scientific project, one that had the potential to involve so many people in the process of discovery.

    And so, after a month of putting things into order, here we are. How can you participate in the project, or at least follow its progress?

    I have set up a home page for the project, here as a special category page on the blog. This page is the online headquarters of the work, and includes a feed that will have all project updates. As the project proceeds, it will generate suggestions, results, and press. I'll be tracking all of these and updating as we learn more.

    The project has an official e-mail address hosted here: skin@johnhawks.net. We want to hear from anyone with the expertise or ideas to solve this problem. Rachelle and I will be reading through the e-mails, discussing them with other project members, and following up on them.

    We don't know what to expect but I hope we get hundreds of responses. We can't promise replies to anyone, but everyone will receive an automatic acknowledgement that we've received their messages, and we will follow up personally with those that have suggestions or proposals we can take action on. We're going to ask people to participate in the project, perform research, and coauthor the scientific work: this is real open science.

    Members of the Malapa team are biologists who know comparative skin and hair biology. I'll be posting quite a lot about these biological topics for people following the project.

    We know that there are many researchers who have been working with methods that would be useful on these unique samples of possible soft tissue. People working with the trace chemistry of organic compounds in mineral samples, people working with the microscopic structure of other ancient soft tissue samples, people who study preservation of organic materials in forensic contexts. There are many others that I don't even know I should be listing.

    If you know a person with the right expertise to help, please share this information and encourage her to write.

    Most important to the success of the project is showing that we can produce top quality science by this open process. That means we need journals to acknowledge the value of open science instead of penalizing it for not being secret and embargoed. If you're a journal editor reading this, I'm calling you out. And if you're a reviewer or editorial board member, you can support this project and encourage more like it by encouraging the submission of open manuscripts.

    And if you don't have a suggestion right now, keep watching. This project will develop and I expect it to become more interesting as it becomes broader. I can't predict how it will end, and that's pretty exciting!


    References

    Synopsis: 
    I announce and describe a project to study possible soft tissue evidence from a 2-million-year-old fossil hominin site.
  • The Malapa Soft Tissue Project FAQ

    Sat, 2011-09-03 17:07 -- John Hawks

    These are a few of the questions that I think are essential to understand our aims with the project and how we expect it will unfold. The future depends on what we hear from people with their ideas about how to analyze this unique evidence. I'll be updating this FAQ as we learn more about the samples. This is an open science project, and we'll be reporting on some results as they occur. But it all depends on people's participation.

    If you've arrived at this page from outside the site, here's a link to the main project headquarters.

    How did the project come about?

    When I was in South Africa in July, Lee Berger gave me an extraordinary overview of the discoveries from the new Malapa site. Embedded in the breccia that surrounded the cranial remains of MH1 and MH 2 are some relatively small, thin layers that visually appear to be organic (relative to the surrounding matrix). Under a light microscope look like they could be mineralized or preserved soft tissue. They do not appear to be skin impressions within the matrix, they appear to be thin layers that are a different substance from the surrounding matrix.

    Naturally these are incredibly interesting. But it is not obvious what will be the best way to establish what they are, and what we can learn from them.

    Lee suggested that this would be an ideal test case to see if open science can help solve a problem in paleoanthropology. We want to reach the people with the best ideas and ability to test hypotheses about these objects, and we don't know in advance where the answers will come from. That's the nature of the project: finding the right people and making the science happen.

    What do we want people to do?

    We want the best suggestions about how to evaluate this unique evidence and how it can test hypotheses about human evolution. We're reading all the suggestions sent to skin@johnhawks.net.

    We're especially keen to make contact with people who have the ability to make their suggestions happen. Some people out there have the knowledge to apply highly specialized analytical methods to samples like this. We want people like that to get involved with this project.

    Some people out there may have comparative samples that will be key to interpreting this evidence. How can tissue be preserved in a context where breccia is forming? Was there natural mummification or some kind of anoxic environment? To answer those questions, we need people who study the response of tissue to those contexts and who know the right samples to examine.

    Berger's team working on the Malapa hominins have access to much of the best technology. Micro-CT, microscopy, virtual dissection, chemical analysis, any of these things and more can be brought to bear.

    There's a lot more to this project than simply verifying (or refuting) that this stuff is soft tissue evidence. We need to know how it formed. If it's not soft tissue, we want to identify what it is, because there will almost certainly be more of it as the site is excavated and more specimens are prepared. If it is soft tissue, we need to know how it may have been changed as it was preserved, whether through drying, soaking in anoxic conditions, mineralization, or some combination of processes.

    We think the process of finding this out is even more exciting than knowing the result. We hope many of you see it the same way.

    If you write to us, you can expect that we may make your suggestion part of the website. This is an open project, and while we will be posting selectively, we will be sharing information as it progresses.

    Why would somebody want to participate in an open science project like this?

    We want to do the science right. We hope many people out there share this goal. It's a tremendous chance for people who don't normally operate within paleoanthropology to help us discover something fundamentally new about our evolution.

    People who perform analyses or contribute samples as part of this project happen will be full participants in the science and coauthors of any resulting publications. We want people to work together on this, and we think the best science will result from bringing together the best ideas and comparisons.

    How will the project work?

    That depends on what great ideas we hear from people. Lee's team will be carrying out analyses on these samples.

    Rachelle Keeling is coordinating the study, doing the research on what should be done, and what it will tell us about the samples. She and I will be reviewing the e-mails that the project receives, and will try to determine which approaches are feasible, and which order they should be carried out.

    As you send in ideas about what should be done, the more detail you can include about the analytical methods you can provide, the better. How much material (if any) does the method require? What hypotheses can the method test, or what information can it provide about the samples? How much time and preparation is required?

    If you have comparative samples that may be useful, what kinds of observations can you make on them? Can you point to references that have also used these samples?

    In other words, we want a bit of a plan if you can provide it. If you need more information from us to see if it's feasible, let us know -- we may be able to answer it, or have some team members carry out steps in advance.

    The project will be carried out over the next year, so the sooner we hear from you, the better!

    What is the Malapa site?

    Malapa is a cave site outside Johannesburg, South Africa, in the area where many other sites preserving remains of early hominins have been found. I have a Malapa page that gives a short introduction and links to many stories here about the fossils found at the site. I visited the site in July, 2011, and posted a narrative of the visit ("A visit to Malapa") that gives a good overview and several photos of the general area.

    Two of the most complete hominin skeletons ever described, both dating to 2 million years ago, have been discovered and described at the site. The site additionally includes further fossil materials that are still undergoing preparation and study. It is one of the most important fossil discoveries ever made in paleoanthropology, and will continue to produce new evidence about our origins for many years to come.

    How was the possible soft tissue evidence discovered?

    So far, the team at Wits has been working on breccia blocks recovered from the surface at Malapa. There has been no excavation yet at the site. The possible soft tissue evidence was discovered during the course of scanning and preparing these breccia blocks.

    The blocks are packed with bones. Many recognizable bones jut from the surfaces of the breccia, from antelopes, carnivores, small baboons and hominins. In several cases, hominin bones were recognizable at the surface, and these blocks were CT-scanned very early in the process of study and preparation. Scanning gives the preparators knowledge of what lies beneath their drill bits. In some cases, the best course of action is to leave the bones embedded within the breccia matrix, for further study by micro-CT.

    CT scan of Malapa MH1 cranium

    Initial CT scan of the MH1 cranium embedded in matrix block.

    In the initial CT-scanning of the MH1 cranium, team members noticed an area where the matrix surrounding the skull appeared irregular. As they prepared this out, it became clear that the breccia itself had pulled away from the cranium across a small region, and the breccia had a thin layer of material at its surface there. This is not the outer table of the bone (which is intact in the corresponding area), nor is it apparently an impression of the bone.

    Malapa MH1 breccia block with possible soft tissue

    Photo of breccia block including MH1 cervical vertebra (top). The smooth area, center, is a thin layer of candidate soft tissue on the surface the breccia.

    An additional section of possible soft tissue emerged as the female MH 2 mandible was prepared.

    Upon magnification, these pieces do appear to have a structure. As yet, no dissection or further sampling has been attempted. The team has no committed opinion about what these represent or how they were formed, other than that they do not appear to be simple impressions in the surface of the breccia. Disproving that they represent soft tissue may be just as interesting as demonstrating it, because either way we will discover important facts about the preservation and formation processes of this unique site.

    How could soft tissue possibly be preserved from 2 million years ago?

    Like other South African cave sites, the Malapa fossil hominins were preserved within a breccia, a cemented stone material packed with fossils, rock fragments, and other material. The Malapa breccia represents a remarkable snapshot of time, when hominins and other animals fell into a "death trap" and their complete skeletons were preserved.

    It is clear that Malapa preserves an extraordinary density of hominin remains, with nearly complete skeletons and articulated parts. These skeletons do not appear to have been disturbed after the bodies entered the site. Some plant and insect remains are preserved in the breccia as well.

    Beyond this, any explanation so far is speculative. If there was water in the site, which seems likely, it may have included an anoxic layer that preserved some of this material. A major goal of the project will be testing different hypotheses about the preservation environment of these fossils, to try to explain what these substances may be.

    Are you telling us everything?

    :)

    Synopsis: 
    The Malapa Soft Tissue Project is an experiment in open science, trying to uncover new facts about a unique discovery.
  • Digging deeper into the earliest Acheulean

    Thu, 2011-09-01 01:00 -- John Hawks

    I've been ranting on Twitter all day about the new paper on the "earliest Acheulean" by Christopher Lepre and colleagues [1], published in Nature today. The first time I read through the paper, I really thought they'd miffed it. I mean, really, they published a paper on the earliest Acheulean artifacts without putting a picture of them in the paper.

    What actually bothered me more was the lack of any discussion at all about why the assemblage is Acheulean as opposed to, say, Developed Oldowan. The word Oldowan appears only in the context of saying that many localities within the same Kokiselei site complex have only Oldowan-typical assemblages. This started bothering me less as I ran through the citations to earlier work on the Kokilelei localities. But that raised another point of irritation: This Acheulean locality was briefly described already, a long time ago. Why is this news? And given that both descriptions are so superficial, where's the fuller account?

    I had to stop and think about why I was finding this all so irritating. I mean, it's a paper about dating an archaeological locality. It's a perfectly good paper about dating an archaeological locality, full of details about the local geology, methods of sampling and analysis. My reactions weren't a criticism of the paper, really -- although if you're going to write a high-profile paper about your site, maybe you should actually feature the archaeology of the site?

    I've been digging through references all afternoon, trying to get straight exactly why this paper doesn't mention the Developed Oldowan at all. I'm not saying I favor the Developed Oldowan -- just that we deserve some kind of thoughtful review of what constitutes an "earliest Acheulean" site. Is it a purely typological definition based on the presence of bifaces made on large flakes, or is there something more here? That's going to take me a bit longer to review, so I'll just report on some of what I found.

    This isn't news. Hélène Roche and colleagues reported on this locality in 2003, in Comptes Rendus [2], including a date range between 1.79 and 1.65 million years ago. They describe it as "without doubt, one of the oldest Acheulean assemblages in Africa." That's right, if you can read French, you're eight years ahead of Nature.

    This paper adds precision to the earlier estimate, and it's really important to do this well. But if you've been reading about the archaeology of Plio-Pleistocene Africa, finding a date of 1.76 million years for this locality with an Acheulean assemblage is totally expected.

    Roche and colleagues [2] provided only a short description of the KS4 assemblage. Even so, it's more than provided in the current paper by Lepre and colleagues [1]. Here is what the current paper includes about the assemblage:

    The KS4 assemblage (Supplementary Fig. 2) is characterized by the presence of pick-like tools with a trihedral or quadrangular section, unifacially or bifacially shaped crude hand-axes, and a few cores and flakes, all derived from the same mudstone bed. A single subsurface, in situ origin for KS4 is ensured by excavations at the main test trench that recovered several spectacular sets of refitted lithic artefacts (Supplementary Fig. 3). To the exception of a few cores made on basalt, the rest of the assemblage has been knapped from large cobbles or tabular clasts of locally available aphiric phonolite.

    The supplementary information does include photos of three bifacial artifacts and two refits. But there is no technical analysis of the artifacts beyond the paragraph above. There's not even a summary of the number of artifacts found at the site.

    Roche and colleagues added more details (my translation of the French):

    Kokiselei 4 is a highly eroded site in which a series of more or less extensive trenches (total 19 m2) were dug. Among these only one (KS4A) yielded in situ artifacts in sufficient numbers to form an archaeological horizon, with a vertical dispersion limited to only fifteen centimeters, and no faunal remains. Some objects, distributed in a more diffuse fashion, were found in two other test pits (KS4B and KS4C); these are lower in elevation than the main horizon. In parallel to the test pits, a systematic surface collection across 104 m2 (metric grid) was performed, which comprises the total sample of lithic material from KS4 (n = 167). It is characterized by robust, rough pieces of varying sizes, often very large, some scrapers and notches made on cobbles or flakes, by very large cores, by proto-bifaces or bifaces, and by picks with a trihedral section. Two thirds of the proto-bifaces or bifaces are manufactured on oblong pebbles, relatively flat, some quite large, whole or broken into two in the middle according to the major axis and very few retouched. Only a few are free of cortex and / or shaped enough to be called bifaces, the proto-bifaces in turn are made more coarsely, as if the concept of an elongated shape and sharp point was well integrated, but the operating scheme was inadequately implemented. All the tools characterizing a very early Acheulian are present, and it is to this cultural period that we attribute KS4.

    Roche and colleagues also described the other localities, all Oldowan, at a similar superficial level of detail. The conclusion that Acheulean and Oldowan were two industries overlapping at the same time in this area was suggested in that paper.

    That, obviously, leads to the real scientific story here. How could there be two different stone tool traditions overlapping across some fairly large area for more than 300,000 years? If we count Developed Oldowan, that makes three. Some people would count two Developed Oldowans A and B!

    I'm inclined to think that the scenario is false. These really aren't distinct cultural traditions. Archaeologists have created definitions of archaeological assemblages, and the definitions have changed over time. Initially the definitions were entirely typological -- you have a handaxe, you've got Acheulean. Over time, the definitions have become less typological and more inclusive of technical elements -- you make bifacial artifacts on very large flakes, you've got Acheulean. But these technical categories are not unique or necessarily difficult to invent, and may have been repeatedly invented in different groups, just in the way that different groups of chimpanzees have invented nutcracking and termite fishing methods. For these early assemblages, we don't have any way of telling who made what -- the only hominin fossils from Kokilelei, for example, are teeth of A. boisei. We don't know how many different kinds of hominins there were. Maybe there was only one.

    Early Homo is a bundle of mysteries, in other words, and the archaeology doesn't help. Can we make any sense of the development of early stone tool technology, from its initial beginnings to the handaxe-dominated assemblages? What does it mean that both Oldowan-like and Acheulean-like industries dispersed widely throughout the Old World? This is a really interesting scientific problem, involving information transfer, emergent sets of behaviors, invention and creativity, and their effects on survival.

    The paper by Lepre and colleagues discusses the problem of Oldowan and Acheulean coexistence briefly, reviewing the idea that Homo erectus may be tied to Acheulean, leaving open the question of whether more than one toolmaking species existed before 1.5 million years ago. The paper is noncommittal, but I would frame the question very differently. It's self-evident that Acheulean cannot have been a culture, because no human or animal culture exhibits its spatial and temporal properties -- appearing episodically across three continents over a span of 1.5 million years. The real question is whether we can make sense of the many different Acheuleans, and whether other Oldowans (possibly Developed Oldowans) might have similar heterogeneity. Asking whether an Oldowan-bearing population in Africa first dispersed to Dmanisi is begging the question.

    Finding these answers is surely a lot more interesting than what the press has done with this article.

    That's probably what irritates the the most about this: how boring the article and reporting seem to make this topic. When I did the Google News search this afternoon, there are no fewer than 165 news articles worldwide. Nature made its cover image this week a photo of one of the bifaces. You can't get much more of a press push than that for an archaeology story. None of the stories go beyond the very simple "oldest Acheulean" story. Now, I'm used to seeing the "oldest X" storyline a lot in paleoanthropology, it's a perennial favorite of journalists who can't think of anything more interesting to write. But in this case, it's the worst angle -- because it's the part that isn't actually news!


    References

    Synopsis: 
    A paper reports on the earliest evidence of the Acheulean, but misses the key story.
  • Ugandapithecus skull found

    Fri, 2011-08-19 08:30 -- John Hawks

    A brief report earlier this month from Agence France-Presse describes a new discovery of Ugandapithecus, worked on by Brigitte Senut and Martin Pickford: "20-million-year-old ape skull unearthed in Uganda".

    "This is the first time that the complete skull of an ape of this age has been found ... it is a highly important fossil and it will certainly put Uganda on the map in terms of the scientific world," Martin Pickford, a paleontologist from the College de France in Paris, told journalists in Kampala.

    Ugandapithecus is a large Early Miocene ape, probably related to Proconsul. A 2009 paper by Pickford and colleagues [1] (open access) does a nice job of showing the anatomy with photographs and describing how the different samples of Ugandapithecus, some of which represent different species, differ from Proconsul. It will be very interesting to see how the new skull adds to the record of this ape genus.


    References

    1. Pickford M, Senut B, Gommery D, and Musiime E. 2009. Distinctiveness of Ugandapithecus from Proconsul. Estudios Geológicos 65:183 - 241.
  • Views from Rhino Cave, Tsodilo Hills, Botswana

    Tue, 2011-05-31 03:55 -- John Hawks

    Sheila Coulson, Sigrid Staurset and Nick Walker [1] (doi isn't working yet, so here's a PDF link, 12 MB) have published a long summary of findings from Rhino Cave, in the Tsodilo Hills of northern Botswana. These hills are huge isolated rock formations, or inselbergs, that jut out of relatively flat surrounding countryside. That makes them highly visible in the folklore and traditions of local people, and some caves in them have been used by people for tens of thousands of years.

    Coulson and colleagues describe the setting of Rhino Cave, named for a rock painting within it.

    It is...easy to understand how the site evaded detection, as it is perched high on the northernmost ridge of Female Hill and can only be approached by scrambling over, or squeezing between, massive boulders. Gaining entry to the cave is only slightly less arduous. On the western side of the ridge there is a raised, narrow, crawl space that ends with a considerable drop into the site. Alternatively, the wider eastern entrance offers two options: a two-meter jump or a slide down a steep boulder face, followed by a scramble over a rock-strewn opening near the present day floor.

    I love these kinds of sites where you know that every lithic was brought in by people. That can tell you a lot about how people used the site, and the authors use that to advantage. But Coulson and colleagues do not yet have new dates for the deposits. The old dates appear too recent and are problematic because of their mismatch with other local MSA sites such as White Paintings Shelter and ≠Gi, both between 66,000 and 95,000 years old. The Rhino Cave assemblage may be comparable to these in age. The paper reports that substantial amounts of exotic stone materials including silcrete and chalcedony must come from more than 50 or 100 km away, respectively.

    MSA exotic flakes from Rhino Cave, figure 5 from Coulson et al. 2011

    Some of the flakes made from exotic raw materials, Figure 5 from the paper. All photos in the article copyrighted and used courtesy of Sheila Coulson.

    There is a lot to the lithic collection besides the points, but I think these are interesting and certainly visually striking.

    MSA points from Rhino Cave, figure 4 from Coulson et al. 2011

    Coulson and colleagues noticed that many of the points were burned, and this was not easily explained by the incidental presence of fires in the cave, nor did it particularly appear to be explained in terms of the "heat-treating" that people used to make silcrete more suitable for artifact production at some other MSA sites.

    In summary, there is a very distinctive pattern of burning at this site. A group of 26 MSA points and their associated debitage are heat damaged to the point of destruction. However, they have not been exposed to long-term burning of the type that is commonly found when an artifact is discarded into a hearth -- a common feature on any number of Stone Age sites. It is suggested that these MSA points and their associated manufacturing debitage were selectively and intentionally burnt in short-term restricted fires that caused their coloring to change to various reddish hues.

    This is part of what Coulson and colleagues tentatively call evidence of symbolic or ritual behavior at the site. People climbed up to this out-of-the-way place with colorful stone from far away. The manufacturing debris shows that they made stone points in the cave. And they then left many of those points in the cave, some of them burned and destroyed or abandoned. At a minimum, it's curious. Adding everything together (including the cupules discussed below), it seems clear that the site was not used for purely utilitarian purposes. What that means about ancient social or cognitive systems is not obvious.

    The article is open access, and full of amazing full-size color photos. I don't know why everyone doesn't publish their sites this way. For example, here's a photo of the site and night, where Coulson and colleagues experimented with flickering light against the carved wall:

    Rhino Cave, night-time flicker light, figure 17 from Coulson et al. 2011

    This stone face, which has been pecked at and scooped out for many thousands of years, is the most distinctive aspect of the site. Coulson and colleagues believe that some of the existing marks reflect very great antiquity, and they have natural spalls of the rock face that broke off in MSA times and integrated themselves into MSA layers. Some (maybe most) of the cupules are recent, and the pictorial art inside the cave is also late. But at least some of the surface carving appears to have been MSA in age.

    Rhino Cave, cupules in rock wall, figure 12 from Coulson et al. 2001

    Cupules in rock face, detail.

    The paper discusses some evidence for pigment grinding at the site, including smooth-edged pieces of specularite and several small striated sandstone slabs (say that fast five times) presumably used as grindstones. Color goes together with the burning (to enhance color?), but this combination is not found elsewhere. Rhino Cave is in that way unique.

    They indicate that the rock face is exposed to flickering daylight through a shaft at certain times, which they attempted to simulate with the flickering light photograph. Really I can't think of any better way to give readers an impression of what it would be like to visit the site.


    References

    Synopsis: 
    Summary of Sheila Coulson and colleagues' richly illustrated work on this MSA-era site
  • Agriculture, population expansion and mtDNA variation

    Mon, 2011-05-23 11:50 -- John Hawks

    Earlier this spring, I wrote about a paper by Brenna Henn and colleagues that presented new data on SNP variation in recent African hunter-gatherer populations [1] ("Population structure within Africa: has 'modern human origins' become a non sequitur?").

    Another paper that came out this spring from the same research group is also very interesting. Christopher Gignoux, Henn and Joanna Mountain [2] examined the evidence for Holocene population growth in Europe, Africa and Southeast Asia, from within-haplogroup variability of mtDNA haplogroups. The idea is that earlier samples were not finely resolved enough to examine events of the last few thousand years, either because they included only small sequences (e.g., control region) with limited variation, or because they included whole mtDNA genomes with too few individuals to look at within-haplogroup coalescents. So here they add more individuals. It is still a small number (425 total) and so I expect that we will see better ones in the next few years.

    The results are nonetheless useful because they provide some nice matches for the archaeology of early agriculture. For example, in Africa:

    We find two periods of population expansion within our sample of lineages originating during the Holocene in western Africa. Although the majority of coalescent events occur during the Holocene, a number of lineages from this sample also coalesce during the Upper Paleolithic. The earliest growth begins at ≈38,000 ya (CI: 33,500–45,000 ya) (Table 1 and Fig. S1) and the second period begins at ≈4,600 ya (CI: 3,000–10,000 ya) (Table 1 and Fig. 1B). The correspondence between the timing of genetic evidence for a sharp increase in population size at 4,600 ya in our Holocene sample of sub-Saharan Africans and the archaeological evidence for origins of agriculture in western Africa is quite close (Fig. 1B and Table 1). In contrast, our southern African Upper Paleolithic sample representative of hunter-gatherers shows no growth over the past 20,000 y. We suggest Bantu-speaking farmers and other pastoralist groups migrated throughout southern Africa 2,000 ya (27) without impacting southern African mtDNA lineages (Fig. 1B).

    We can't really understand the pattern of genetic variation within Africa without understanding when the population grew. In Africa, Middle Stone Age genetic variation must have been more extensive than that in other regions of the world. But the survival of that MSA variation to the present day depends on the demography of populations over the past 50,000 years. In a growing population, fewer lineages will be lost by random genetic drift. So if Gignoux, Henn and Mountain are right about the growth of West African populations by 35,000 years ago, we might expect that region to preserve some extensive variation from MSA times. That might explain why that population preserves very deep Y chromosome lineages [3]. Regarding only mtDNA, one might conclude that a historical paucity of migration between hunter-gatherer and agricultural groups would be the most important reason why MSA variation remains in the present-day African population. This has been the explanation for survival of deep mtDNA lineages in southern Africa, for example. The Y chromosome result and the current paper remind us that population growth can also preserve variation from earlier time periods.

    I think this proposal of African population history matches very well the model that we assumed in our acceleration paper [4], which we based on the archaeological record. We suggested early population growth in Africa by 35,000 years ago followed by an agricultural expansion after 5000 years ago. The evidence for relatively late agricultural intensification, within the last 4000-5000 years in sub-Saharan Africa, is very clear archaeologically. Less clear: How big was the earlier, pre-agricultural human population? The LSA might correspond to a demographic intensification, generally after 45,000 years ago. Genetics has certainly seemed to support such a view, and we found it consistent with the evidence that positive selection had increased in rate much earlier in Africa than in other regions. Still, the more detailed study by Gignoux and colleagues helps to clarify this picture.

    The results also show agricultural population growth to have been late in Southeast Asia.

    Direct archaeological evidence for rice agriculture in southeastern Asia dates to only ≈4,400 ya in Thailand (28). Agriculture spread throughout Island Southeast Asia, with evidence of rice in Taiwan again dating to ≈4,400 ya. Our Southeastern Asian Holocene population size curve indicates expansion beginning ≈4,700 ya (CI: 3,000–5,700 ya) (Fig. 1C and Table 1).

    Again, useful. I think we need to exert some effort making sure that the initial dispersal of people into South/Southeast Asia can be differentiated from the post-agricultural history. But assuming that Gignoux and colleagues are correct, it makes sense in an overall picture of slowly adapting early crops to tropical climate regimes, or replacing early domesticates with different ones in those areas.

    I am less sanguine about their results for Europe. They show a gradual period of growth associated in time with the Younger Dryas (around 12,000 years ago), which could make sense in the archaeology. But I am not convinced that the "European" haplogroups here are really European to that time depth. We know that the Neolithic and post-Neolithic saw some large-scale shifts in the frequencies of mtDNA haplogroups in Central and Western Europe. Some Upper Paleolithic Europeans probably contributed mtDNA to this later population, but I have no confidence that the proportion was great enough to accurately infer the demography of that pre-Neolithic population. (This is also a problem with the current paper in Current Anthropology by Peter Rowley-Conwy. I'll discuss this sometime soon.)

    The next frontier in reconstructing the population history of Europe will be ancient DNA. A good sample of Neolithic and pre-Neolithic whole mtDNA genomes would settle this question and allow inferences about the kind of demographic recovery Europe underwent after the Last Glacial Maximum.

    An open question is to what extent the other populations have similar problems. The European population of today reflects West Asian population dynamics 10,000 years ago. The East African population today reflects West African population dynamics from before the Bantu expansion, possibly to a similar extent. The population of Southeast Asia reflects the population dynamics of early rice agriculturalists in South China. And so on.

    Adding large-scale migration and partial population replacement to this kind of demographic analysis is not easy, but it will be essential if we want a better picture of how agriculture affected human populations. Considering these problems, I think it's easy to see why I started working on Holocene population dynamics. Evidence about Late Pleistocene populations, like MSA Africans and Neandertals, still lies within our genomes. But we see it through a lens. Holocene population dynamics -- movements and population growth -- distort that lens. If we don't account for those Holocene dynamics, we will conclude wrongly about the earlier dynamics.

    I like this a lot, because this is what anthropology is really good for. We can bring a lot of archaeological and historical knowledge to bear on the question of post-agricultural population dynamics. But it's a deep, deep field with a lot of specialized literature.


    References

    Synopsis: 
    A study of mtDNA variation attempts to find the times and magnitudes of population expansions in early agriculturalists.
  • Population structure within Africa: has "modern human origins" become a non sequitur?

    Tue, 2011-03-15 16:33 -- John Hawks

    When I wrote about the Denisova genome late last year, I claimed that "A large-scale reorganization of the science of human origins is upon us."

    I'm glad I had the sense to write that. A lot of people have pointed back to that quote over the last few months. Still, I know that the full implications of the Denisova and Neandertal genomes haven't really sunk in. "Large-scale reorganization" takes time.

    A new paper by Brenna Henn and colleagues in PNAS [1] shows how the shifting landscape has caught many geneticists off their footing. Submitted before the Denisova genome, but long after the Neandertal, the paper is titled, "Hunter-gatherer genomic diversity suggests a southern African origin for modern humans". In today's landscape, with only one instance of the word "Neanderthal" in the paper, the conclusions are obviously incomplete.

    The "southern African origins" conclusion of the paper comes out of a simple analysis that assumes that the best-fit maximum for genetic diversity (as assessed by linkage) is the most likely point of origin of the population. That would be true if the African population emerged by a series of founder effects from a single small ancestral population -- the "serial founder effect" model that I have criticized here before. But of course in 2011, we know that model is false, because it is predicated on a lack of ancient mixture with Neandertals or other populations. If the serial founder model can't work outside Africa, it certainly can't work inside Africa, where populations were larger and regionally diversified during by the beginning of the Late Pleistocene. Without that false assumption, the "southern African origin" evaporates. The primary observation, a cline of linkage disequilibrium within sub-Saharan Africa, can be explained with reference to mixture of populations without assuming an origin and expansion from one geographic location.

    I don't want to criticize overmuch. Many ongoing research projects are casualties of our new knowledge of ancient genomics, and we'll see more papers like this before the fallout has settled. Simplistic founder models, acceptable only a year ago when these projects were conceived, are now unquestionably false. Ancient population mixture is the order of the day, and we don't have any simple, plug-in-the-data models to apply to data like these.

    Instead, I want to consider the power of the data in this article to answer some fundamental questions about African population history. Henn and colleagues report on SNP genotyping of several Bushman groups from southern Africa and Sandawe and Hadza people from eastern Africa. These data are on the 550k SNP platform that was used by 23andMe before the recent increase to 1M SNPs. That means the data are comparable to many other studies. They are not entirely comparable with other samples of African genetic variation, and the authors cut the total number of SNPs down to the 55,000 that overlap among all the genotyping platforms used in their analysis. For this reason, the paper presents a genome-wide set of 55,000 SNPs across many African populations.

    It's far from the perfect sample. I expect we'll be able to do much more with the full 550k dataset from the hunter-gatherer populations. The data have been made publicly available for download, and here we're already starting to investigate them.

    Within the current paper there is a very useful analysis of the broader dataset using the ADMIXTURE software. ADMIXTURE assumes that the current samples represent a mixture of ancient populations that were more distinct than today's. I went through this algorithm with my students in class Wednesday and Friday, which I'm sure was an intimidating process to most of them. The math is not too conceptually daunting; it's just hard to conceptualize how all the possible interactions relate to gene frequencies when you are assuming more than a few putative ancestral populations. Razib Khan gives an impressive step-by-step guide to performing an ADMIXTURE analysis, including some of these samples.

    I'm not in love with this analytical method -- there's no reality check on its assumptions. But its output can be informative about many aspects of population structure. Here are some first approximations:

    1. The genetic diversification of African populations was once much greater than today. Razib Khan points out the homogenizing effect that agricultural populations have had on the African continent, particularly during and after the Bantu expansion. I think the current data suggest that earlier processes involving LSA hunter-gatherers also tended to homogenize populations.

    For example, when eight initial clusters are assumed, the ADMIXTURE analysis constructs them in a way that most of the ancestors of today's Bushmen were in a population with a high degree of genetic divergence from the other seven ancestral populations. The FST between the Bushman ancestral population and others ranges from 0.1 (for forest pygmies) to a high of 0.25 (from Europeans). That estimate is nearly double the equivalent statistic in today's populations.

    Again, we don't have to believe the assumptions underlying the ADMIXTURE algorithm, but it does highlight the basic partitioning of diversity in the African population. Today there is high diversity within African population samples, and some of that diversity can be traced back to populations of 100,000 years ago or more. Some of the diversity that once existed among these populations has now been spread within them instead. The populations got genetically closer over time.

    A model of successive population expansions, bringing ancient populations genetically closer and closer together, is also what we may see in other places. As we have learned more about the mtDNA of ancient Europeans, it has become clear that successive expansions and migrations of people into Europe have radically reshaped the gene pool.

    2. Click languages have no genealogical unity. Over the years, many linguists and anthropologists have proposed that Hadza, Sandawe, and Bushmen are closely related to each other, despite their geographic distance, because they all speak languages that use click sounds. No historical linguist has ever successfully demonstrated a system of sound changes or detailed correspondences among these languages, but people promoting the hypothesis seem immune to these kinds of facts.

    The genetics show a very clear and ancient differentiation of these hunter-gatherer peoples. In the ADMIXTURE analysis, some of the largest genetic distances are among these peoples. By itself, that may not be surprising; these are the populations that have most evaded the homogenization that followed the spread of farming. The Hadza themselves are strikingly distinctive, and their genetics may reflect a history of small population size during the last several hundred years. The potential for genetic drift in this population was very high. Still, the genetic relations are just the opposite that would be expected if speakers of these click languages had shared a common origin.

    Seems to me that this could have been the lede of the paper, if it had been written differently. A bit more exploration of the hunter-gatherer data (probably incorporating some haplotype-level analysis to give a better estimate of the ages of events) would demonstrate this point very well.

    3. By the time we find "modern" humans in West Asia, the African population had long since diversified into regional populations. This is not news; the mtDNA evidence has suggested for several years that southern Africa and the remainder of sub-Saharan Africa were already regionally differentiated before 120,000 years ago. There have also been hints of this diversification from whole-genome evidence (including the supplement of the Neandertal genome paper last year). Here we have a clear indication that the regionality extends to every African hunter-gatherer population.

    4. Hunter-gatherers have relatively little evidence for recent positive selection. The supplementary data of the current paper includes a short discussion of selection and a list of candidate loci in the hunter-gatherer samples. There is relatively little overlap in candidate regions for selection among these samples. Different genes have been selected in different populations, and not all that many of them. This is not surprising if the selection is relatively new -- the last 20,000 years or maybe more, given the distances and amount of historical population structure estimated for the data. It's also consistent with the demography of these populations. It will be interesting to check, but I would speculate that the signature of selection will on average appear older in these samples than in populations that have historically been agriculturalists.

    5. Where's the Aterian? North Africa is relatively depauperate in variation in the large combined dataset. That may stem mostly from Holocene events, including the spread of West Asian populations across North Africa. But the low variation there doesn't readily fit the idea that an out-of-Africa dispersal of genes came from a North African source. I don't think the observations in the paper (centered around linkage disequilibrium with a very low SNP count) are enough to settle anything about this question, but I'd be nervous if I were busy trying to make the Aterian seem important to the modern human origins issue.

    Bottom line

    As interesting as these assertions look, I don't think that a lot of African prehistory is about to be rewritten. Obviously, geneticists need to get serious about reading some African archaeology. We already know that African regional populations were large and diverse during the Middle Stone Age, and that's a very good fit to the kind of genetic diversity we are seeing in these samples.

    The barrier is Holocene population history. Agricultural populations grew, spread, mixed with and absorbed hunter-gatherers, and what we left are the shattered remnants of ancient African population structure. Linkage may be the most powerful way we have to consider historical hypotheses using these SNP data, but if we're going to rely on it we have to control for recent demography and selection.

    And of course, it will be interesting to see a model that can integrate both Neandertal-African and within-African population histories. I don't really have a bang-up finish for this post, because there is immediately more work to be done with these data.


    References

Pages

Subscribe to Africa

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.