MonetDB

Introduction

MonetDB is an open-source database management system (DBMS) for high-performance applications in data mining, business intelligence, OLAP, scientific databases, XML Query, text and multimedia retrieval, that is being developed at CWI since 1993. MonetDB often achieves a significant speed improvement for both relational/SQL and XML/XQuery databases over other open-source systems. MonetDB achieves its goal by innovations at all layers of a DBMS, e.g., a storage model based on vertical fragmentation (column store), modern CPU-tuned query execution architecture, automatic and self-tuning indexes, run-time query optimization, and a modular software architecture.

Since the invention of relational database technology more than 30 years ago, both the hardware systems that database management systems (DBMSs) run on and the application that use DBMSs to manage their data have changed significantly, and they will keep changing in the future. However, most publicly available general-purpose DBMSs, both commercial and open-source, still follow the principle architectural design as it was created 30 years ago. The long-term research mission of the database architectures group at CWI is to reconsider all aspects of database architecture in the light of changing hardware architectures and application requirements, developing novel techniques that exploit modern hardware efficiently and fulfill the ever more challenging needs of contemporary and future data management applications [1].

MonetDB is the vehicle to disseminate the results of this pioneering database architecture research in a publicly available full-fledged open-source DBMS such that other researchers, application developers and end-users can exploit and benefit from them.

Technical features

From a user's point of view, MonetDB is a full-fledged relational DBMS that supports the SQL:2003 standard and provides standard client interfaces such as ODBC and JDBC, as well as application programming interfaces for various programming languages including C, Python, Java, Ruby, Perl, PHP.

MonetDB is designed to exploit the large main memories of modern computer systems effectively and efficiently during query processing, while the database is persistently stored on disk. With respect to performance, MonetDB mainly focuses on analytical and scientific workloads that are read-dominated and where updates mostly consist of appending new data to the database in large chucks at a time. However, MonetDB also provides complete support for transactions in compliance with the SQL:2003 standard.

Internally, the design, architecture and implementation of MonetDB reconsiders all aspects and components of classical database architecture and technology to achieve the aforementioned performance benefits by effectively exploiting the potentials of modern hardware and enabling extensibility to support new application requirements.

MonetDB is one of the first publicly available DBMSs designed to exploit column-store technology. Traditionally, relational database systems store their data row-wise, i.e., per data tuple, the respective values of all attributes are stored together in a database record. In contrary, a column-store stores the data column-wise, i.e., per attribute, the respective values of all tuples are stored together on one array. The first benefit of column-wise over row-wise storage is reduced costs for I/O and data transport. In particular, in scientific and analytical workloads, queries often access the attribute values of many or all tuples of a table, but only for a small subset of all attributes of a table; as opposed to traditional transactional workloads, where queries mostly access only one or at most very few tuples, but then all attribute values of those tuples.

MonetDB consequently exploits the column-store architecture beyond data storage and I/O efficiency. In MonetDB, all query processing internally happens on a columnar data representation. Multi-attribute tuples are only reconstructed just before the final query result is returned to the client. This approach enables a very lean query evaluation architecture that is highly tuned to minimize computational (CPU) costs. Moreover, carefully designed cache-conscious data structures and algorithms make optimal use of hierarchical memory systems [2].

The design of MonetDB also supports extensibility of the whole system at various levels. Via extension modules, implemented in C or MonetDB's MAL language, new data types and new algorithms can be added to the system to support special application requirements that go beyond the SQL standard, or enable efficient exploitation of domain-specific data characteristics. Additionally, opening the traditionally closed and monolithic query optimization and execution engine, MonetDB provides a modular multi-tier query optimization framework. Optimizer pipelines can be configured and extended to effectively exploit domain-specific data and workload characteristics.

In addition, MonetDB provides novel techniques to provide efficient support for a priori unknown or rapidly changing workloads over large data volumes. Both the fine-grained flexible intermediate result caching technique "recycling" [3] and the adaptive incremental indexing technique "database cracking" [4] require minimal overhead and investment to provide maximal benefit for the actual workload and the portion of the data that are actually accessed.

Finally, the core architecture of MonetDB has proved to provide efficient support not only for the relational data model and SQL, but also for, e.g., XML and XQuery [5]. In this line, support for RDF and SPARQL, as well as arrays will be developed with in the context of the TELEIOS project.

References

[1] S. Manegold, M. L. Kersten, and P. A. Boncz.
Database Architecture Evolution: Mammals Flourished long before Dinosaurs became Extinct.
In Proceedings of the International Conference on Very Large Data Bases (VLDB), pages 1648-1653, Lyon, France, August 2009.
10-year Best Paper Award paper for [6].
[2] P. A. Boncz, M. L. Kersten, and S. Manegold.
Breaking the Memory Wall in MonetDB.
Communications of the ACM, 51(12):77-85, December 2008.
[3] M. Ivanova, M. L. Kersten, N. J. Nes, and R. A. Goncalves.
An Architecture for Recycling Intermediates in a Column-store.
ACM Transactions on Database Systems, 35(4):1-41, December 2010
[4] S. Idreos, M. L. Kersten, and S. Manegold.
Database Cracking.
In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR), pages 68–78, Asilomar, CA, USA, January 2007.
[5] P. A. Boncz, T. Grust, M. van Keulen, S. Manegold, J. Rittinger, and J. Teubner.
MonetDB/XQuery: A Fast XQuery Processor Powered by a Relational Engine.
In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), pages 479–490, Chicago, IL, USA, June 2006.
[6] P. A. Boncz, S. Manegold, and M. L. Kersten.
Database Architecture Optimized for the New Bottleneck: Memory Access.
In Proceedings of the International Conference on Very Large Data Bases (VLDB), pages 54-65, Edinburgh, Scotland, UK, September 1999.
Received 10-year Best Paper Award at VLDB 2009 [1].