|Accurate paleogeographic reconstructions are required to test a wide range of hypotheses in the geosciences and they are a critical foundation upon which to build next-generation 4D Earth systems models. In order to produce the best possible reconstructions, no data can be left behind. This paper represents an important step towards a general method by which paleogeographically sensitive proxy data (e.g., fossils, sediments) can be used to test and improve existing paleogeographic reconstructions. This type of reconstruction model-data comparison is critical and the scale of the problem is such that algorithmic solutions and automation are of great added value. The paper is well written, the examples given are clear and powerful, and the potential of the approach and scientific outcomes are substantial. I have only a few suggestions, detailed below.|
1) Discrete timescales of reconstructions and the mapping of proxy data therein.
Table 1 outlines the timescale used for the Golonka reconstructions and this represents the first major cull of Paleobiology Database data. Only those collections that are temporally resolved in the PBDB to fall entirely within the bins specified by the reconstruction timescale are included. An example bin used is the “late Cenomanian-early Campanian.” According to the methods description, this means that any PBDB fossil collection assigned to a time interval of “Cenomanian” or “Campanian” would be omitted from the analysis and only fossil collections resolved to a finer biostratigraphic zone within these two international ages would be included. This is an understandable convention given the discrete bins of the original Golonka reconstructions, but this protocol results in a large cull of PBDB data. International ages (e.g., Cenomanian) are generally defined by our ability to consistently correlate globally. It is certainly the case that biostratigraphic zonation can be more precise, particularly in the marine record, but a stage-level age assignment for a given collection is something that PBDB data enterers would be very satisfied with in the majority of cases. Indeed, in the example of the Cenomanian and Campanian above, there are more than 2,200 collections resolved to these two intervals and these would not be included. Similar numbers of collections are probably omitted from all such divided international ages (this is somewhat conveyed in the differences between curves in FIg. 11). Does this cull matter? Given that the Cenomanian-Turonian sea level high stand is likely captured by at least some collections that are resolved only to “Cenomanian,” its possible that it does. There are a few statistical approaches one could take to overcoming this problem (in addition to improving the PBDB ages constraints, see below).
Finally, it is noted that PBDB data were “downloaded” from the database on a given date. This is a useful description, but the PBDB API allows for very specific definition of download protocols in the form of a URL. I strongly recommend that these details either be specified or, better still, that the URL for the API request be provided (it might also be good to include citations that describe these PBDB resources). Requesting an official PBDB publication number, should the manuscript be accepted, and including that in the Acknowledgements would be appropriate as well. John Alroy and company’s original vision and the many PBDB data contributors should be recognized in this capacity.
2) The “500 km test”
Fossil collections deviating by more than 500 km from previously interpreted paleoshorelines are excluded herein. Why 500km? Why not 397 km? Or 193 km? With few exceptions, whenever there is some value arbitrarily defined as a threshold there is a more interesting and principled approach. I would find the distribution of deviations between “previously interpreted” shorelines and PBDB collections fascinating, both in aggregate and on an interval-by-interval and/or a plate-by-plate basis. These distributions would have statistical utility of several different types, including defining a principled threshold criterion for data rejection. To me, this would be single most interesting analytical addition to the paper.
One potential utility of adopting an approach that leveraged the statistical nature of the distribution of deviations is that this could be used to add quantitative estimates of error (mostly of a random nature) that is introduced at every step, from the original coordinates of PBDB collections (some of which are most certainly not accurately located) to the assignment of an age.
3) Minor points/questions: Use of fossils in pre-Triassic reconstructions, plates that appear during Phanerozoic, and next steps.
For geologically obvious reasons, not all plates can be tracked backwards in time to the Paleozoic. Are such plates excluded here? Does this matter to any of the results presented herein?
Prior to the constraints on rotations provided by sea floor data, there is considerable uncertainty in the paleopositions of the continents, particularly with respect to longitude. At such times, the fossil record becomes increasingly important, not just for shoreline reconstructions but also for continent positions. I have previously detected a quantitative signature of PBDB biogeographic patterns in Paleozoic reconstructions and have concluded that at least some of the signal reflects the fact the the fossil record was relied on more heavily in the Paleozoic than in the post Paleozoic. Does this matter? Is a potentially changing role for fossils in deriving paleopositions detectable in any way here?
I’m excited by the possibility of completely closing the loop, from aggregation of paleogeographically-useful data in PBDB and Macrostrat-type resources to testing and improving paleogeographic reconstructions, in a largely automated, algorithmic fashion. Certainly there are things we are working on now to improve the temporal resolution of PBDB fossil collections, notably by integrating them with stratigraphic models such as those in Macrostrat. I’d be very interested in discussing how to improve this process and to fully capitalize on paleogeographic reconstructions in a way that avoids circularity problems and that makes the data as useful as possible to the GPlates team, and vice versa. This paper is an important and welcome first step.