CLEAR: Compilation of Intermediate Languages into Efficient Big Data Runtimes.

CLEAR is a research project funded by ANR, which started in January 2017 for 48 months.

Project Overview

This project addresses one fundamental challenge of our time: the construction of effective programming models and compilation techniques for the correct, efficient and scalable exploitation of large amounts of data. We study high-level specifications of pipelines of data transformations and extraction for producing valuable knowledge from rich and heterogeneous data. We investigate how to synthesize code which is optimized for distributed big data platforms, with applications in smart cities, finance and healthcare in particular.

Major updates

October 2019: We are making progress in algebraic reasoning for facilitating the optimization and synthesis of efficient distributed code: see more details in our papers and submissions to come. New code and papers have been delivered.

October 2018: Our works have an application in healthcare: we analyze very large amounts of electronic health records and train machine learning models that predict risks of important clinical outcomes. See more in our Big Data Research article, and in our followup paper presented at the DSAA conference in October 2018.

June 2018: the intermediate report is available, with new deliverables.