Home About Me Palaeogeography Palaeoclimatology Palaeoecology Resources News

Resources
Industry
GIS
Virtual Field Trips
Photo Gallery
Databases
Basic Design
Data & how to deal with it
Paul's Vertebrate Database
Database FAQ
Other databases
Formulae
Constants
Geodata
Useful Links
Data and how to deal with it

Computer databases provide an essential tool for investigating large-scale spatial and temporal problems in the Earth Sciences. But, although advances in both software and hardware have made the logistics of building a database much easier, fundamental problems remain concerning the representation and qualification of the data. In many ways databases have made the extraction of large datasets a little too easy, and the danger is that information is extracted without really understanding what the data actually represents. For example, a reported "Maastrichtian" locality in one place may not mean exactly the same thing as a "Maastrichtian" locality in another: one may represent a single channel sand with a vertebrate assemblage that represents almost an instant in geological time; another may be the composite fauna from a whole formation that spans the Maastrichtian; or, of course, the 'Maastrichtian' age assignment may just be wrong! Geological data are highly heterogeneous and databases must be designed to account for this, including variations in scale (grain, resolution), inconsistency in the data, and potential errors (inaccuracy).

These issues vary with the scope of the study (extent), the biological group, and the nature and scale-dependence of supplementary, non-biological, datasets (e.g. climate and ocean parameters). With the application of desktop geographic information systems (GIS) to global earth systems science, and the ability to efficiently integrate and query large, diverse datasets, the need to ensure robust qualification of data, especially scale, has become all the more essential.
LEFT: The uncertainty in spatial and temporal position of a geological record also changes with the geological time (Markwick & Lupia 2002) .

In compiling my own databases much of the time has been spent including information that was solely use to constrain the data I was actually interested in. The inclusion of specimen information provided a check on how reliable a taxonomic allocation might be; age dates were always qualified to reflect their provenance and nature. This is outlined in my thesis and summarised in Markwick & Lupia (2002) to which anyone interested in this issue should refer. In Industry qualifying data is all the more critical since it can have major financial reprecussions, and so consequently I spend much of my professional time deriving methods for dealing with uncertainty and heterogeneity using GIS-based models.

The important lesson is to ensure that all data in a database is checked for errors during entry: this is simple in GIS, where carefully designed queries can easily be made to highligh errors such as incorrect location. "Uncertainties" need to be highlighted, either by a simple qualifying code that can allow 'good' or 'poor' data to be differentiated, or by a comments field that can be used by the compiler to draw attention to any problems. As such data entry is not something that should be done by inexperienced staff ("on the cheap") but by people familar with the data. Once the data is entered it is invariably very difficult, and certainly time-consuming, to go back and re-check everything.

PageSpinner Site Map / Index
This page last modified: 1st January, 2006
ŠPaul Markwick 2000-2006
Macintosh