PhD Position (CIFRE): Relational query optimization for multidimensional data
Offre de thèse (CIFRE): Optimisation de requêtes relationnelles pour les données multidimensionnelles
Context
Query performance on analytic workloads is heavily influenced by the quality of the query optimizer-one which has prompted several decades of research and advancement on the different components of the query optimizer. When it comes to query optimization and plan cost estimation, no one-size fits all [2]. This is usually because some relational operators and their corresponding enumeration algorithms are often vendor-specific, complexity of the analytic workload, and the underlying system architecture are also some contributing factors.
Multidimensional data models are very popular in scientific processing and machine learning workloads [3] as they help capture variable data. Some of the prevalent data types include arrays and dictionaries or maps and are also now becoming a serious part of financial workload. Query evaluation on multidimensional workloads often involves multiple levels of aggregation over these sets of data. A relevant work in this sphere described in [4] provides a technique for lazily evaluating aggregates by fusing group by and join operator. In [5], the authors proposed a set of low-level plan operators for SQL-style statistical expressions that modularizes aggregate implementations whenever multiple aggregates are combined.
Work in this area provides the foundation for query evaluation on multidimensional data but additional research that focuses on the application of more robust optimization techniques and their cost analysis are the goal of this research.
Research Objectives
The primary focus of this work is to develop optimization techniques for queries on multidimensional workload. During the first phase of this research, the candidate will conduct a thorough state-of-art study that focuses on cost model and optimization techniques applicable to this domain. In the next phase, an investigation of query constructs that are a source of bottleneck for query evaluation will be studied. This means in particular identifying the most important logical constructs, and their frequent combinations in practice that are the most interesting to optimize.
Candidate’s Profile
The ideal candidate for this role must possess an MSc in Computer Science or closely related fields. The candidate must be proactive and highly-motivated to carry out advance research, and with a well-developed analytical problem-solving ability. Good understanding of functional programming in Scala (or willingness to learn) is required.
About the Team
This research will be carried out between Opensee and Tyrex team at Inria Grenoble Rhône-Alpes.
Opensee (opensee.io) is a fintech company with headquarters in Paris, offering instant and self-service analytics to financial institutions, helping them better respond to regulatory and business requirements and turn their big data challenges into competitive advantage. The Core Engine team at Opensee, research, experiment, maintain, and develop features for the query engine which is at the core of data processing within Opensee.
The Tyrex team (tyrex.inria.fr) is affiliated with CNRS LIG, Inria, UGA, and Grenoble INP; and located in Montbonnot near Grenoble in France. The Tyrex research group focuses on the foundations of the next generation of data analytics and data-centric programming systems and has produced many output in research and industrial applications. The candidate will graduate from the University of Grenoble Alpes (UGA).
Type of position: 3 years contract.
How to apply
The candidate should send an email to Pierre Genevès (pierre.geneves@inria.fr) and Nabil Layaïda (nabil.layaida@inria.fr) with an application file composed of:
- a detailed CV
- a motivation letter
- academic transcript (relevé de notes)
References