Database Management System

Requirements

The selected spatial object-relational database management system is PostgreSQL 9.2+ with PostGIS 2.0+.

Rationale

The database management system must comply with the Constraints set on licence compatibility, portability and acquisition cost.

Compliance with ANSI-SQL:2008 [ISO.IEC_9075:2008] is a basic technical interoperability requisite.

  • In Portugal, this is also a legal requisite for Public Administration IT systems development and procurement as established by the National Regulation on Digital Interoperability (Open Standards). [RNID12]

Project specific functional requirements (further discussed in the analysis of alternatives) are:

  • Support for spatial data types and spatial analysis
  • Support for (or coupling to) statistical computing tools
  • Support for key-value data storage and/or XML data storage

Analysis of alternatives

Progressive filtering is based on maturity and sustainability criteria that are not specific to this project, but guarantee that the chosen solution can be used at enterprise level in distinct projects, indirectly reducing the cost of support, database administration, etc.

Detailed information on the nature, characteristics and market share of major RDBMS products is available in Section B (Databases) of the Competitive Assessment in the European Commission’s Report on the Oracle/Sun Microsystems merger.

The proprietary DBMS market share estimates vary. The following ranking is based on annual revenue:

Three additional open source RDBMS are identified in the EC Report:

MySQL (and its various clones and forks, such as MariaDB, Drizzle and Percona Server) has the largest market share of open source RDBMS.

Oracle’s evaluation of the relative technical merits of the three open source RDBMS is quoted in the EC Report:

The notifying party argues that the database products Ingres and PostgreSQL are also available under open source licenses and are technically superior to MySQL, in particular with regard to higher-end enterprise usage targeting existing Oracle customers. Therefore if any open source database product were able to exercise a competitive constraint on Oracle, it would be Ingres or PostgreSQL rather than MySQL.

—§662

Presently, Ingres could not be included in the candidate set, as the community version does not appear to be under active development. Ingres utilises a dual licensing model: the open source (community) version is available without charge to the end users under GPLv2; a proprietary version is commercialised by Actian. The information available at the Ingres Community site does not provide a clear separation between the the community and the enterprise versions. Furthermore, an analysis the Ingres commit activity on the Ingres source code repository indicates that no commits have been made to the main branch in the past 12 months. Regarding the IngresGeospatial component, the available roadmap also indicates that there is no recent activity on the project.

The EC report provides an overview of the total cost of ownership (TCO), based on 3rd party evaluations performed in 2009, stating that:

MySQL and PostgreSQL are comparable in terms of price for the smallest deployments, and have by far the lowest prices for the largest. While the price levels of MySQL and PostgreSQL are directly comparable with each other, none of the other [proprietary] vendors comes close to either of these software packages for a large deployment.

—§243 et seq.

OpenVirtuoso, another object relational database management system, meets the minimum maturity criteria and is included in the candidate set of alternatives (OpenVirtuoso is not mentioned in the EC report that focuses on direct competitors to Oracle and MySQL).

The candidate set includes:

Spatial data support

Support for the spatial components is a project-specific requirement that, in practice, allows a decision between the 3 candidates.

The selected RDBMS must support spatial data:

MySQL currently lacks a full implementation of the spatial components.

The spatial components of OpenVirtuoso are only available in the proprietary version of the program.

PostgreSQL + PostGIS is the selected option:

  • PostGIS is the extension to PostgreSQL that allows spatial objects to be stored in the database, using the OGC WKT and WKB formats. (PostgreSQL is an object relational database that allows the definition of complex user datatypes).
  • Spatial data can be exported to GML, the XML grammar defined by OCG as a data modelling and open interchange format for geographic information. GML is used in the EU INSPIRE data models and data exchange specifications. [Inspire].
  • Support for the Dimensionally Extended 9 Intersection Model (DE-9IM) basically guarantees that all relevant topological predicates (functions to check topological relations between two geometries) are implemented.
  • PostGIS also supports GeoJSON and TopoJSON, that may be useful for map visualisation purposes (e.g. using Javascript libraries such as D3.js).
  • Finally, PostGIS includes support for GiST-based R-Tree spatial indexes.

Note

If or where only a simple SQL database engine is required, while maintaining support for spatial data and operators, the recommended option is SQLite + SpatiaLite.

SpatiaLite is available under an MPLv1.1 licence, which if fully compatible with EUPLv1.1.

SQLite is under a Public Domain licence, that is not an OSI-approved licence: that may require the acquisition of licence from the Hwaci (depending on the specific legal requirements of a Member State).

Statistical computing support

Another project-specific requirement is also supported by PostgreSQL: the support for statistical analysis.

At least two alternatives exist (see [Conw11] for examples):

  • Using PL/R, a loadable procedural language extension that supports writing PostgreSQL functions and triggers in the GNU R statistical computing language;
  • Accessing the database using RPostgreSQL, the GNU R interface to the PostgreSQL database system.

Basically this means that, if required:

  • The statistical capabilities provided by GNU R can be called by the PostgreSQL database in a way that database programmers are familiar with;
  • The data stored in the PostgreSQL database can be accessed by GNU R in a way that statisticians are familiar with.

Implicit in the above discussion in the adoption of GNU R (GPLv2) should any ‘out-of-the-ordinary’ statistical procedures be required (e.g. for statistical disclosure control). [1]

Key-value and/or XML data storage

Support for either key-value data storage or XML data storage is a (plausible) functional project requirement.

A common data model at national level can be specified based on well-known a priori requirements, namely the need to produce the Eurostat THB indicators data cubes and, consequently, to store the appropriate variables at microdata level.

However, it is likely that different countries will need/want to store additional variables regarding victims, traffickers, routes, etc., to comply with national reporting requirements. These needs can not be foreseen and accommodated into a single and canonical relational database model. Two alternatives exist (which may complement each other):

  • a NoSQL-like approach, whereby the additional data is stored as a semi-structured set of key-value pairs;
  • an SDMX-like approach, whereby the structure of the data is described using an XML schema and the data itself is stored in XML structured text.

PostgreSQL provides basic functionality to support either option: key-value data storage using the hstore module (that defines the hstore datatype and functions) and XML data storage using the native XML datatype and functions (that use the xmllib2 parser and toolkit).

Note on portability and support

  1. PostgreSQL is included by default in the main Linux distributions (Debian/Ubuntu and RedHat/CentOS/Fedora/Scientific), although the version maintained in each Linux distribution’s official repository may not be the last one. However, all PostgreSQL releases are available in the PostgreSQL Apt Repository (for the Debian family) and the PostgreSQL Yum Repository (for the Red Hat family).

    PostgreSQL is also available for Microsoft Windows and Mac OS X.

  2. As with the selected FOSS operating systems, both PostgreSQL and PostGIS are free of charge, but commercial versions and technical support are available for PostgreSQL (inter alia, by one of main developers, EnterpriseDB) and for PostGIS (inter alia, by one of the main developers, Refractions Research).

Real-world use

The following table lists a sample of projects running on PostgreSQL + PostGIS solutions.

The selection is (completely) biased toward spatial data systems: it focuses on national mapping and cadastral (land surveying) agencies, spatial data infrastructures, and security and emergency response systems.

However, spatial data types are more complex than text, numeric or date data types and spatial databases generally support a heavier workload than non-spatial databases. So, even if a given system will not have a major spatial component, these examples can be seen as ‘stress testing’ results for the proposed solution.

Organization, system or project References
Eurogeographics (MCA) [2]
Gestione Integrata e Interoperativa dei Dati Ambientali (SDI), Italy [3] [4]
Geodatastyrelsen (MCA), Denmark [5]
Global Disaster Alert and Coordination System (SDI), UN & EC-JRC [6]
Global Monitoring for Environment and Security, EC-JRC [7]
Institut national de l’information géographique et forestière (MCA), France [8]
Instituto Geográfico Nacional (MCA), Spain [9]
Landesamt für Vermessung und Geoinformation Bayern (MCA), Germany [10]
Maanmittauslaitos (MCA), Finland [11]
Ordnance Survey (MCA), UK [12]
Paikkatietoikkuna (SDI), Finland [13]
Police Crime Map, UK [14]
Publieke Dienstverlening op de Kaart (SDI), Netherlands [15]
Statens Kartverk (MCA), Norway [16] [17]

MCA: Mapping/Cadastral Agency

SDI: Spatial Data Infrastructure


Footnotes

[1]A discussion/comparison of statistical software packages is beyond the scope of this document. In this context, it is also deemed unnecessary as the GNU R Project is the de facto standard in statistical computing.
[2]URL: http://inspire.jrc.ec.europa.eu/events/conferences/inspire_2011/presentations/workshops/269/ESDIN_D5_2_EuroGeographics_Technical_Architecture.pdf
[3]URL: http://www.minambiente.it/export/sites/default/archivio/allegati/INSPIRE_state_of_play_2011_ITALY.pdf
[4]URL: http://www.gdmc.nl/zlatanova/Gi4DM2010/gi4dm/Pdf/p116.pdf
[5]URL: http://www.eurosdr.net/workshops/PostGIS/7_Rasmussen_de_Martino_Nielsen_KMS_Denmark.pdf
[6]URL: http://meetingorganizer.copernicus.org/EGU2012/EGU2012-7404.pdf
[7]URL: http://rslab.disi.unitn.it/papers/R72-JSTAR-Brunner.pdf
[8]URL: http://postgis.net/2012/10/18/ign
[9]URL: http://inspire.jrc.ec.europa.eu/events/conferences/inspire_2012/presentations/143.pdf
[10]URL: http://www.fig.net/pub/fao/floss_cadastre.pdf.
[11]URL: http://www.paikkatietoikkuna.fi/web/en/open-spatial-data
[12]URL: http://www.eurosdr.net/workshops/PostGIS/8_Bennett_Ordnance_Survey_UK.pdf
[13]URL: http://www.oskari.org/trac/wiki/DocumentationBackend
[14]URL: https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/78964/Open_Source_Options_v2_0.pdf
[15]URL: http://www.tudelft.nl//fileadmin/Faculteit/LR/Documenten/Onderwijszaken/lecture_postgis.pdf.
[16]URL: http://joinup.ec.europa.eu/sites/default/files/studies/IDABC-case-study-Norwegian%20Mapping%20Authority-final-version.pdf
[17]URL: http://www.eurosdr.net/workshops/PostGIS/6_Pedersen_Statkart_Norway.pdf
[18]URL: http://norgedigitalt.no/norge_digitalt/engelsk/about_norway_digital/

References

[Conw11]Conway (2011) - PL/R - The Fast Path to Advanced Analytics. URL: http://bunsen.credativ.com/~jco/2011/plr-PostgresOpen-2011.pdf http://bunsen.credativ.com/~jco/2011/plr-PGEast-2011.pdf
[Inspire]Directive 2007/2/EC of the European Parliament and of the Council of 14 March 2007 establishing an Infrastructure for Spatial Information in the European Community (INSPIRE) URL: http://inspire.jrc.ec.europa.eu
[ISO.IEC_13249-3:2011](1, 2) “Information technology – Database languages – SQL multimedia and application packages – Part 3: Spatial” URL: http://webstore.iec.ch/preview/info_isoiec13249-3%7Bed4.0%7Den.pdf
[ISO.IEC_9075:2008]“Information technology – Database languages – SQL”
[ISO_19125-1:2004]“Geographic information – Simple feature access – Part 1: Common architecture.” URL: http://portal.opengeospatial.org/files/?artifact_id=25355
[ISO_19125-2:2004]“Geographic information – Simple feature access – Part 2: SQL option” URL: http://portal.opengeospatial.org/files/?artifact_id=25354
[ISO_19136:2007]“Geographic information – Geography Markup Language (GML)” URL: http://www.opengeospatial.org/standards/gml