Wednesday, March 16, 2016

Decimal marks

I have been terribly sick recently. When not dealing with a fever, having a hard time breathing, or just passing out from tiredness, I've been working on a review paper looking at the history of allometry. One of the figures I want to include is one from Louis Lapicque’s 1907 paper Tableau general des poids somatiques et encéphaliques dans les espéces animales ("General picture of body and brain weight in animal species"). The figure is interesting in a number of ways. First, it's a figure. The creation and publication of figures in 1907 was a pretty unusual move. Most of the time, data was presented in tables because it was simply too time consuming to create and then reproduce figures in publications.. Second, the figure might be the first one to graphically show how (what we now call) allometric relationships appear linear when plotted in log-log space.

The copyright has expired on the original 1907 paper and figure, so there wouldn't be an impediment to reprinting the original figure, but I was curious whether I could recreate it from the data provided in the appendix of paper. In the process, I discovered a small little error related to commas and decimal points. In the original 1907 figure, there is a datapoint for European conger, listed in the figure using the French name "Congre" (lower right *). But when the data from the appendix was replotted, the conger datapoint was missing. A new point appeared to the left, however (Figure 1b, red *).

Figure 1a: Showing the 1907 plotted data for Conger vulgaris
Figure 1b: Showing the re-plotted data for Conger vulgaris

After confirming that I had transcribed the numbers correctly, I started looking closer at the figure and the raw data. So what's going on? What might be going on is confusion about whether a "," is a thousand's separator or a decimal point.

The original data table that has the data in question looks like



A large part of the world uses a "," as a decimal mark. In the United States, if we want to numerically write out ten-thousand, we would say "10,000". To write the same thing in France, it would appear as "10.000". To be clear, for the rest of this entry, I will use "." as a decimal mark, so when I say "99.99" I mean 99 and 99/100.

Here, we see that the data for the conger is written as [10.000 1.05]. The paper is in French, and other examples in the table show the decimal mark is a comma, which means the values are 10,000 and 1,050.

But wait. "1.05" doesn't make sense as 1,050. It would have been written as "1.050" if that was what was meant. I think what we are seeing here is a mixed decimal mark error. The point illustrated with the red arrow in Figure 1b matches [10, 1.05]. That would mean the eels have a body mass of 10g and a brain mass of 1.05g. A pretty remarkable eel, and also clearly wrong. Similarly, if we decide that the actual data should be [10,000 1,050], we have an eel with a body mass of 10,000g and a bring mass of 1kg...putting the brain mass close to that of humans. Again, extraordinary.

The actual plotted data corresponds to [10,000 1.05], which passes the sanity test of an eel weighing 10,000g and having a brain weighing 1.05g. Thus, a 99 109 year-old typo is resolved.

So kids, double check your decimal marks, lest you crash things into Mars.

The reworking of the old data has been illuminating, and has given me a better appreciation of the tools I use on a regular basis. In redoing the figure, I have identified minor errors in the locations of points in the original figure, and can see that some of Lapicque’s lines are off. For example, the line fit to the Blue Whale in the upper right corner just peters out near the datapoint for "antelope", but the recreated figure shows it to fit a line he had drawn for Lions, Pumas, and house cats. Weird grouping, but whatever. A type of robin (10) also fits on the same line as swans(6), mallards(7), and the garganey(a duck, 8)...which makes much more sense.

Time to have a coughing fit and lay down.

An afterword

Completely coincidentally, there is an odd point on the original plot that is vaguely in the same area as the incorrectly plotted [10, 1.05] data. I had initially thought Lapicque plotted something in exactly the same way my recreation had with a point at [10, 1.05]
Figure 3: What is this?
Now I'm left wondering: is this an artifact of how the figure was created or copied, or a data point that was removed from the analysis? If it's a datapoint, it looks like it would fall near lines for either monkeys and tamarin, or gibbons and orangoutangs. Maybe a pygmy tarsier?