Header Home Home Research News Events About CWI Library Publications Home Contact Intranet Search


full record

The Researcher’s Guide to the Data Deluge: Querying a Scientific Database in Just a Few Seconds [ 2011 ]


Rights: IPR: CWI
DB ID: 18546
File upload: Authors version
Digital object:http://oai.cwi.nl/oai/asset/18546/18546B.pdf (author version)
Persistent Identifier:urn:NBN:nl:ui:18-18546 (resolver: http://persistent-identifier.org/ - leads to CWI repository; use
http://persistent-identifier.org/?identifier=urn:nbn:nl:ui:18-18546 to refer to this repository item)
Open Access:A publicly accessible file is available in the CWI repository
Type: BibTeX type: inproceedings
OAI type: Article in monograph or in proceedings
CWI Research Group: INS1
Authors: Kersten, M.L. (1)
Idreos, S. (2)
Manegold, S. (3)
Liarou, E. (4)
Title: The Researcher’s Guide to the Data Deluge: Querying a Scientific Database in Just a Few Seconds
Abstract: There is a clear need nowadays for extremely large data processing.
This is especially true in the area of scientific data management where soon we expect
data inputs in the order of multiple Petabytes.
However, current data management technology is not suitable for such data sizes.

In the light of such new database applications, we can rethink some of the strict
requirements database systems adopted in the past.
We argue that correctness is such a critical property, responsible for performance degradation.
In this paper, we propose a new paradigm towards building database kernels
that may produce \emph{wrong but fast, cheap and indicative} results.
Fast response times is an essential component of data analysis for exploratory applications;
allowing for fast queries enables
the user to develop a ``feeling" for the data through a series of ``painless" queries which eventually leads
to more detailed analysis in a targeted data area.

We propose a research path where a database kernel autonomously and on-the-fly
decides to reduce the processing requirements of a running query
based on workload, hardware and
environmental parameters.
It requires a complete redesign of database operators
and query processing strategy.
For example, typical and very common scenarios were query processing performance degrades significantly
are cases where a database operator has to spill data
to disk, or is forced to perform random access, or has to follow long linked lists, etc.
Here we ask the question: What if we simply avoid these steps, ``ignoring" the side-effect
in the correctness of the result?
Abstract format: latex
Language: en
Conference data:
title:    International Conference on Very Large Databases (VLDB) (37)
date(s):    2011, August 29 - September 1
location:    Seattle, WA, USA
Size: 13p.
Booktitle: Proceedings of International Conference on Very Large Data Bases 2011 (VLDB)
Year: 2011
Pages: 585 - 597
Note: Challenges & Visions Track Best Paper Award.
Non scientific stakeholder: n
Pages number: 13p.
Refereed: y
ORA creation date: 20110819025000
ORA modification date: 20160809122256
Mutation date: 2016-08-01
Darenet: x
Related preprint:
Other relations:
This publication is related to 1 not financed project
  • MonetDB


Feedback | CWI Home page