JREF : Taxonomy as a Rigorous Science

Someone in this thread made comments about the lack of rigor in paleontology, specifically in regards to taxonomy. I made a brief post in that thread, and I'd like to continue my discussion without derailing the thread.

If I may jump in where I more or less left off:

Scientists have an almost pathological aversion to the subjective, and taxonomists are no different. They have established several methods for ensuring objectivity in their analysis.

First and most importantly is the type specimen. This is a specimen (or more rarely a series of specimens) that defines the species in question. The concept is a bit Platonic/"shadows on a cave wall" for my taste, but even I acknowledge that the function it serves is critical. That function is to give a universal and almost always unverying starting point for species determination. Everyone uses the same type specimens for each species; you can't NOT use the same type specimens. These specimens are carefully maintaned in very high-quality labs. The existence of these type specimens does not, I hasten to emphasize, mean that the concept of a species is immutable once established; any time spent reading the taxonomic literature will show innumerable discussions that amount to "This species should be redefined" (particularly now, with the whole Ceratopsian thing going on). Type specimens make the discussion objective because they give us all a starting point, and all the same starting point.

One of the great tragedies of World War 2 was the bombing of so many museums. I'm not trying to deminish the astounding loss of life; I'm just saying that a tremendous amount of scientific data was irrevocably lost. Unfortunately, this was what was required to spur people into routinely including photographs with their species descriptions. Sadly, many pre-war species no longer have type specimens due to them simply being blown up and converted into rubble. I've had personal experience with the tremendous, and sometimes insurmountable, difficulties this causes even today.

The species description is another way to add objectivity. As I said previously, these descriptions are long, dense, and extremely technical. That's because they need to be. These descriptions give us something against which to test our specimens--they in a very real sense are the formal experiment in taxonomy. Once written, anyone can (once they've been trained to comprehend the description) determine if a new specimen is part of that species or not, using precisely the same logic as physicists used to determine if they'd found the Higgs Boson or not. This means that taxonomy is an empirical, experimental science.

Finally, all of the rules for species identification have been codefied in the International Code of Zoological Nomenclature. This is the framework within which taxonomists are required to operate. The Code defines the different types of type specimens (haplotype, paratype, holotype, and a few more that I've never seen in person), as well as how to address synonymies, contradictions, and other issues that will inevitably arise when thousands of researchers contribute to a conversation that's been going on for three centuries. These rules are determined by the researchers for the purpose of ensuring objectivity.

There are some problems with this method, yes. As I said before, it's very Platonic. The type specimen is supposed to represent the big-T Truth of the species, or at least was until fairly recently. This obviously contradicts evolution, and that assumption was abandoned by most researchers a long time ago. At this point, as far as I can tell, most researchers view type specimens as vital conveniences, not as Horse. I know two researchers (three? depends on how you count them--one person is only intermittently part of that team) who are attempting to redefine ammonite species based on population statistics, rather than a type specimen.

Secondly, and far more significantly, the literature is extremely widely scattered. A common type of taxonomic paper is "A revision of the genus ____", in which the researcher puts together everything they can find about that genus and proposes a new understanding of it. A common criticism of such papers is "But you haven't examined THIS paper!" A valid criticism--and drawing such criticisms is one of the critical aspects of such papers. Unfortunately there is no database we can go to to find this stuff. Several have tried to establish one (I think the Tree of Life is ongoing, but I could be wrong), but the task is beyond the comprehension of most. Even putting together all the information on each family within one order is an enormous task beyond all but the most dedicated research teams. What that means is that, unfortunately, when we describe a new species it's often already been described before. Nothing's more disheartening then realizing that your species was already named, back when scientific publications were in Latin (though it does give you a real sense of your connection with the great names in science; referencing Linnaeus is both disheartening and profoundly satisfying).

Another major issue with taxonomy, and in fact the biggest one, is that it does not presume to define evolutionary history. Linnaeus had no intention of discussing--nor, indeed, any knowledge of--evolution. The book "Darwin's Century" discusses some doubts he had later in life, but when he started at least the nested heirarchy was simply a matter of convenience. This is why Arthropoda can be both polyphyletic and a valid taxonomic name--taxonomy, at its start, was merely about putting things in categories. Which means it has a curious side benefit: taxonomy up until very recently served as a truly independent test for evolution. The fact that taxonomy largely matched evolutionary history, despite having nothing to do with it, is very clear evidence that evolution is right.

Still, taxonomy is a powerful tool. And the thing that strikes me about each of these problems is that the people identifying them and addressing them are researchers in the field. Like Creationists, those who attack paleontology for lacking rigor have not, to my knowledge, presented a single criticism that my coleagues and I haven't addressed first, better, in more detail, and found ways to fix.

-------------------------------

In the other thread I had mentioned that the math behind cladograms using morphological characters and genetic characters (characters=traits; don't ask me why, the best explanation I ever got was "Cladists wanted to set themselves apart, and jargon was an easy way to do that). I'm not going to go into too much depth here; if you really want to know more about the math, here is a very good resource. You can also download a freeware program at that site and actually play with some data yourself. Apparently a new version is out; hopefully they made it so you can copy and paste from Excel into it now. That was one of the most annoying aspects of that program. I'd recommend it to anyone who wants to toy with statistics--it can do all kinds of things, from cluster analysis to PCAs to cladograms to simple Gausian statistics. You just have to play around with it until you find out how. It's for serious researchers; they didn't design it to be intuitive to non-experts.

Anyway, here's how the math works, at the 1:1,000,000 scale: You pick your taxonomic groups, and you pick your characters. Genetic taxonomy is easy--you pick the gene, and each codon or base pair represents a character. Morphological data are trickier, but generally by the time you're doing this stuff you've gained a pretty good understanding of taxonomy so it's not too hard. What you want to focus on are shared, derived traits--meaning traits a group has because their ancestor had it and they retained it. Traits that only individual species have are pretty useless for determining evolutionary history--you already know where they evolved, so they tell you nothing. You should also have an "outgroup", a species that is outside the group you're interested in, but close enough to share some primative traits of the group (bacteria are a horrible out-group for primate cladistics; bats aren't too bad, actually). Then you run the program (it may shut down your computer for a while--on a really big database I saw it shut one down for three days).

What the computer does is construct every possible cladogram and count the number of evolutionary changes necessary to arive at that tree. The program will give you the trees with the elast number of changes necessary, on the assumption that the fewer changes necessary the more likely it is that the tree represents the true evolutionary history (as my old professor put it, "We assume that evolution is hard").

What you will immediately find is that there are many. The reason is that the way the heavy math works is by assuming that only pairs are possible--one species splits into two species, but never three or more. Obviously this doesn't represent biological reality; however, the consensus tree--meaning the average of the shortest trees--typically collapses many of the clades, meaning that it groups multiple species together. There is some debate as to what this means; personally, while I haven't run any serious tests of it (I'm still toying with how to run such a test), I lean towards the idea that those collapsed clades represent evolutionary reality: they show organisms that arose from the same stock, if not necessarily the same time.

Now you can test it. There are multiple ways; the ones I'm most familiar with are bootstrapping and jack-knifing. In one (and I always forget which), you simply remove random characters, re-run the analysis, and see what pops out. It does this a thousand times (default setting; you can make it do more). More robust trees can handle the loss of some data; if, however, some structure relies on a single character, it's more likely that it's not real. Often there's a reporting limit of 50% or so, meaning that if the structure doesn't appear on 50% resultant trees, you treat it as if it's not there. The other test also removes random traits, but it duplicates random traits as well--so each analysis has the same number of characters, but not the same characters. It's dealt with the same way.

This is how the math works for ANY set of characters. Genetic or morphological, even behavioral or stratigraphic (yes, I've seen it; no, I do not approve of if)--the characters don't matter, that's how the math works. So in essence, genetic cladograms are not superior to morphological ones.

One thing that's really cool about these trees is that each time the branches come together (called a node), the math will tell you what the traits of that node are. This forms a hypothesis about the nature of the ancestor of those two species. I've found, through analyzing numerous cladograms, that often the cladogram is saying "This species directly gave rise to this other one", which is by itself a really cool concept (I've toyed around with somehow addressing the distance from nodes, but haven't done much with it since grad school; haven't had time or a reason to get back into it). If the node matches a species you analyzed, that means that you are arguing that that speceis gave rise to the other(s) branching off that node. If the node's characters don't match anything you've analyzed, that means you are hypothesizing the existence of an organism ancestral to the ones you studied, and--here's the part that I find unbelievably awesome--you are predicting the nature of that organism. What that means is that someone can go out, find it, and say "Hey, this matches what they predicted"--making cladistics indespensable for making sense of the past. (The researcher would still have to draft a species description and curate the type specimen; the honor of naming the species goes to the discoverer, not to the person who hypothesized tis existence.)

These mathematical methods have been tested. For example, there was a scientist who created a fictional clade of organisms based on a known evolutionary history. He then presented them to scientists and challanged them to reconstruct the history of that clade. Eventually they succeeded--and we've gotten much better at it over time. As an aside, I want to make some of these as stuffed animals for my impending child.

A second test is to compare the hypothesized evolutionary history against what we see in the rock record. We expect some range extensions--FADs and LADs are not usually the first organism to have evolved in that species, but only the first and last APPEARANCE of that species in the fossil record. Still, the fossil record is useful. If you hypothesize a huge number of organisms and we only find three, or hypothesize traits we just don't find, it's cause to re-examine the cladogram, and specifically the characters and assumptions that went into it.

As I said, this is the 1:1,000,000 scale, for all of this--meaning I'm glossing over a huge amount of data and only barely scratching the surface of the stuff I'm adressing. Still, I think it's sufficient to demonstrate the rigorous nature of taxonomy and of our methods for understanding evolutionary history. If anyone would like me to expand on any of these topics, please feel free. And Jodie, please point to specific areas where subjectivity can be problematic.

via JREF Forum http://forums.randi.org/showthread.php?t=265927&goto=newpost

JREF

vendredi 27 septembre 2013

Taxonomy as a Rigorous Science

Aucun commentaire:

Enregistrer un commentaire