Efficient Integration of Query Algebra Modules into an Extensible Database Framework

Stefan Dieker

Praktische Informatik IV, FernUniversität Hagen
D-58084 Hagen

Abstract: In recent years, a growing demand for non-standard database systems could be observed. CAD systems, geographical, spatial, and spatio-temporal information systems, multimedia applications or management systems for semistructured data typically perform data-intensive operations, offer concurrent access to their data, and provide query capabilities. Implementing them on top of a database management system appears straightforward. Conventional relational systems, however, do not meet the requirements of such applications due to the restrictions of the relational data model.

This thesis presents SECONDO, a new generic environment supporting the implementation of database systems for a wide range of data models and query languages. On the one hand, this framework is more flexible than common extensible and object-relational systems, offering the full extensibility of second-order signature, the formal basis for data and query language definitions in SECONDO. On the other hand, it is much more complete and structured than database system toolkits. Extensibility is provided by the concept of algebra modules defining and implementing new types and operators. Support functions are used to register them with the system frame.

Evaluation of a query plan is controlled by the query processor. The query processor constructs an operator tree for the query plan and processes it by calling appropriate support functions. This seemingly simple strategy is complicated by parameter expressions and stream processing. We show how the SECONDO design enables the query processor to accomplish these tasks, without loss of clarity and straightforwardness even at the implementation level, in a completely data-model independent manner.

The application programming interface (API) is another important criterion for the quality of an extensible database framework. Modern database APIs usually provide a large object abstraction. Very often large objects are not used as standalone entities, but rather embedded within an aggregate of different types. Query performance is determined by the representation of the large object: either inlined within the aggregate or swapped out to a separate object. We present a sound and general large object interface extension computing an efficient representation of large objects automatically.

Furthermore, complex data models often require nested large objects, and access performance is highly influenced by the clustering strategy for storing the resulting tree of large objects. Thus in the next step we extend our work on large object interfaces by a general mechanism which automatically clusters nested large objects. It differs from other approaches by its extremely low assumptions on the implemented data model.

On top of the physical level of objects, operators, and queries combining them through terms over an executable algebra, SECONDO offers the more user-friendly descriptive level. To each descriptive object or operator one or more physical variants exist. The last issue addressed in this thesis is the transformation of descriptive terms to executable ones, referred to as evaluation plans. Our approach provides both a cost-based, bottom-up plan generation strategy and efficient handling of physical plan properties.

Keywords: extensible database systems, algebraic specifications, extensible query processing, extensible query optimization, extension packages, application programming interface, large objects, clustering

Published: Mensch und Buch Verlag, Berlin 2001, ISBN 3-89820-226-7