Profil: Prof. Dr. Bernhard Seeger

Verfasst von Bernhard Seeger am 5. Februar 2009 - 9:16
Druckversion

Publication Type:

Researcher

Autoren:

Quelle:

Datenbanksysteme, Universität Marburg (2009)

URL:

http://dbs.mathematik.uni-marburg.de/Home/People/Professor

Zusammenfassung:

Our research group is mainly interested in techniques for supporting efficient processing of queries on large databases. For the last 15 years, our focus has been on the development of techniques for object-relational databases. A strong emphasis has been on spatial, temporal and spatio-temporal index structures. The recent improvements in network technology allow querying massive remote data sources. From these pre-conditions, the focus of our work has been broadened into two directions. On the one hand, we investigate the management of data streams, created by massive amounts of small sensors. On the other hand, we consider the entire WWW as a geospatial database and explore techniques for querying it in a warehouse-like fashion.

 

PIPES - stream processing

Over the next years, a tremendous number of sensors will be installed in our environment. More and more data is continuously delivered from these devices as a stream. In general, a large number of streams are required to provide the desired information and each of the streams outputs a large number of data items. Ideally, users pose ad-hoc queries on streams, similar to a traditional DBMS. There are however fundamental differences: a query runs until the user explicitly stops it and, streaming data items are generally valid for a short period of time only. This leads to a substantial change in query processing. Therefore, systems for streams are primarily designed for the management of queries, of which there might be millions running simultaneously, whereas data items of the streams are kept in the system only temporarily.
Our research group addresses the following issues in stream processing:

  • Based on a temporal interval-semantics, we are concerned with the implementation of data-driven query operators, i. e., whenever a data item reaches an operator it triggers its processing step.
  • The optimization of queries on streams is a real challenge when we consider millions of rather complex queries running simultaneously. For this scenario, scalable multi-query optimization has become the big issue.
  • Even though, there are heuristics, to construct good query plans, it might be that the performance of the plan drops during runtime. This can be caused by fundamental changes in the system, e. g. changes in the quantity and quality of arriving data items. We therefore need a mechanism to dynamically adapt our query plan at runtime. In general, it is sufficient to redistribute only resources among the operators, but in the worst-case, a rearrangement of the entire query plan might be necessary.
     

We address all these research issues in a project called PIPES that is currently supported by the German Research Society (DFG).

 

Spatial-aware querying the WWW

The World Wide Web is the largest collection of geospatial data; a resource that goes almost unexploited. For using the Internet as a reliable and fast geospatial database, considerable efforts are necessary. However, little work has been done in this area so far and the general direction of research and development has not been found. In this project, essential questions in this field are addressed. First, a suitable architecture is required for an efficient and effective mapping of Internet resources to geographic locations. Second, this mapping architecture serves as the foundation for designing a geospatial search engine that fundamentally differs from its traditional counterparts, particularly in respect to selecting and ranking search results. Third, we examine the problem of geospatial analyses based on data gathered by localized web crawls. Such analyses support a new class of queries and offer a substantial cost reduction compared to traditional analyses based on conventional data-collection techniques.

 

Advanced query processing and optimization

Though database systems are considered as a mature technology, there are still research challenges due to new demanding applications. The efficient processing of joins has been a big issue for long, but surprisingly very little work has been done for supporting complex join predicates like similarity. Similarity joins are important when users are interested in the integration of different data sources. Another application of a similarity join arises in the context of data mining to detect similar patterns. We are very much interested in efficiently supporting such unusual joins, particularly for the cases when the input consists of more than two relations and the output is produced progressively.

 

Index structures

One of the subjects we are very well known for is the area of index-structures. Our R*-tree and MVBT are index-structures that are already available in commercial systems such as Oracle. For many years we have been studying the design and evaluation of heuristics for improving the R*-tree. Moreover, bulk-operations like loading a tree from a given set of objects have been an important topic to our research group. Indexing and storing XML-data is also a subject we are working on. One major focus has been on native storage structures for XML and supporting bulk-loading on our XML-storage. Recently, we have extended our studies to new fields of applications like preference databases and demanding new technologies, for example location-based services and peer-to-peer systems.

 

XXL (eXtensible and fleXible Library)

Though researchers in the database area are often interested in the development of a prototype database system, we followed a different approach and have developed a library called XXL, which may of course very well serve as a platform for building database systems. XXL provides the query processing functionality required for a database system like a set of demand-driven operators, a rich collection of index-structures, and a rule-based optimizer. It supports processing of both, relational and XML data. All the packages of XXL come with a full documentation and therefore, people outside of our group are able to quickly familiarize with the functionality of XXL. It is very important to us that we generally use XXL to implement new techniques presented in our research papers. There is reference implementation available in XXL that allow for quick experimental comparison, for example. We found that XXL improves quality and speed of our coding, when implementing new ideas, since it provides a rich infrastructure of low- and high-level components. XXL is a live library where new functionality is continuously added. The library is publicly available under GNU LGPL.