Paris-Saclay Center for Data Science kick-off meeting

Name: Paris-Saclay Center for Data Science kick-off meeting
Start: 2014-06-30T09:00:00+02:00
End: 2014-06-30T18:45:00+02:00
Location: No location set

lundi 30 juin 2014, 09:00 → 18:45 Europe/Paris

Description

The goal of this meeting is to officially launch the Paris-Saclay Center for Data Science. We gather data providers and data analysts around the common theme of data science. The three external keynote talks and the seven talks given by members of the CDS cover a wide spectrum of topics on both domain sciences and data science.

The event will take place in the main auditorium of the Linear Accelerator Laboratory, building 200 on the UPSud Orsay campus. Information on getting to LAL is available here.

The event will be webcast here.

Contact: balazs.kegl@gmail.com

Participants

97 Voir la liste complète

- 09:00
  
  Welcome
  
  Arrival of the participants with coffee and pastry.
- Introduction: Paris-Saclay and CDS
  
  Président de session: Balázs Kégl (LAL)
  - 1
    
    FCS and Université Paris-Saclay
    
    Orateur: Patricio Leboeuf (FCS)
    
    Slides
  - 2
    
    Center for Data Science
    
    Orateur: Balázs Kégl (LAL)
    
    Slides
- Session 1
  
  Président de session: Alexandre Gramfort (Telecom ParisTech, CNRS)
  - 3
    
    Machine learning for personalized genomics
    
    The rapid technological developments in biology, in particular of DNA sequencing technologies, allow us to collect large amounts of molecular data about the genome of each individual, and opens the possibility to predict drug response or evaluate the risk of various diseases from one's molecular identity. In this talk I will discuss some regularization-based approaches we have developed to estimate complex, high-dimensional predictive models from relatively few samples, in particular in cancer prognosis and toxicogenetics.
    
    Orateur: Jean-Philippe Vert (Mines ParisTech / Institut Curie)
    
    Transparents
  - 10:50
    
    Coffe break
  - 4
    
    Enhancing functional neuroimaging with meta-analytic approaches
    
    Functional brain imaging offers a unique view on brain functional organization, which is broadly characterized by two features: the segregation of brain territories into functionally specialized regions, and the integration of these regions into networks of coherent activity. Among other observation modalities, magnetic resonance imaging yields a spatially resolved, yet noisy view of this organization. In this talk, I will discuss how the use of multiple datasets and machine learning tools can enhance the inference procedures that are necessary to go from data to knowledge on the brain.
    
    Orateur: Bertrand Thirion (INRIA / Neurospin)
    
    Slides
  - 5
    
    Data science in planetary science
    
    Remote sensing is the major technique to study planetary environment in order to decipher the structure and evolution of solar system bodies. For a decade, spacecrafts have acquired high-resolution spectra, high-resolution images, hyperspectral images, and multi-angular hyperspectral images. The treatment of raw data to produce high level science results but also the visualization of the large amount of data require innovative tools. Here I review some aspects of data science projects in planetary science, focusing on multi-angular hyperspectral imaging (~500 wavelength), digital terrain model using stereoscopic techniques on high resolution images (~0.5m/pixel), and data visualization.
    
    Orateur: Frédéric Schmidt (GEOPS / UPSud)
    
    Transparents
  - 6
    
    Cosmology: from fundamental questions to computing challenges
    
    The Big Bang cosmological model provides a powerful framework to describe the evolution of the Universe. Despite tremendous theoretical and observational progress in the field, profound mysteries such as the nature of dark matter and dark energy remain to be unveiled. After a brief introduction on cosmology, an overview of some of the large projects in astrophysics and cosmology in the next decade will be presented. These projects cover a broad range of the electromagnetic spectrum, from optical surveys (LSST, eBOSS, EUCLID), to future CMB (Cosmic Microwave Background, CORE2) missions and next generation radio interferometers (SKA). Some of the computing challenges faced by these projects will be highlighted, focusing on the LSST (Large Synoptic Survey Telescope) data management and processing case.
    
    Orateur: Prof. Reza ANSARI (LAL-Univ.ParisSud , IN2P3-CNRS)
    
    Transparents
- Lunch
- Session 2
  
  Président de session: M. Arnak Dalalyan (ENSAE CREST)
  - 7
    
    The autonomous search engine
    
    In the talk I will discuss work on learning to rank for information retrieval, in which the goal is to automatically construct a model that ranks documents in response to a query. In traditional supervised machine learning approaches for the LTR problem one manually selects a set of manually engineered ranking features and then learns the best way of combining them to obtain the most powerful ranking model that those features are capable of producing. In ongoing work on truly autonomous search engines, we are moving evaluation, learning and feature engineering to a weakly supervised paradigm, learning from the implicit feedback that naturally emerges as part of users' interactions with the search engine. I will discuss recent progress in each of these three dimensions: evaluation, learning and feature engineering.
    
    Orateur: Maarten de Rijke (University of Amsterdam)
  - 8
    
    The digital transition: applications of machine learning to marketing, engineering sciences, and medicine
    
    In every sector of human activity, the pervasiveness of sensors and the accumulation of digital information have raised novel intellectual challenges, dreams and fears. Recently, intensive research in the field of high dimensional statistics, the progress in the description and modeling of networks, and the second life of optimization theory have generated concepts and algorithms that allow to develop inference on complex data and also to think about new perspectives of interactions between experts or scientists of different fields. A major tension when addressing such issues from the viewpoint of applications is the balance between customization and reproducibility and, to my opinion, these two criteria should drive future innovations in the field of machine learning. In the talk, I will illustrate these ideas by going through a few recent achievements arising from interdisciplinary projects in the fields of digital marketing, fluid mechanics, and ethomics.
    
    Orateur: Nicolas Vayatis (ENS Cachan)
    
    Slides
  - 9
    
    Designing and learning features for music information retrieval
    
    This talk discusses a mix of concepts, problems and techniques at the crossroads of signal processing, machine learning and music. I will start by introducing content-based music information retrieval (MIR) as an important and challenging data science problem. Then, I will discuss recent work done at my lab on a variety of MIR problems such as automatic chord recognition, music structure analysis, cover song identification and instrument recognition. In the process of doing so, I'll review the impact of feature design for specific MIR tasks, suggest that existing feature extraction methods in audio can be re-conceptualized as deep, multi-layer and trainable systems combining affine transforms and subsampling operations, and show a few examples where deep learning matches or outperforms the current state of the art in music and sound classification. Finally, I’ll discuss open challenges and opportunities in the field.
    
    Orateur: Juan Pablo Bello (New York University / Telecom ParisTech)
  - 10
    
    Direct-touch interaction for scientific visualization
    
    Since the size and complexity of scientific datasets is growing at a very high rate, people are working on developing techniques to effectively depict and visualize them. However, frequently it is not sufficient to just produce a single static visualization but instead we have to support scientists in discovering aspects about the data that they did not know about it. That means that we have to develop effective interactive visualization tools that support scientists in exploring their data. In my talk I will address the problem of interactively visualizing data that has an inherent mapping to the 3D spatial domain such as MRI scans, physical simulations, or molecular models. Specifically, I use interfaces on large, touch-sensitive displays because they tend to give people the feeling of "being in control of their data." That means we face the problem of providing input on a two-dimensional surface which needs to be mapped to manipulations of the three-dimensional data space. I will talk about FI3D, a technique to navigate in 3D datasets and control 7 degrees of freedom with only one or two fingers being used simultaneously. Next, I will discuss the problem of spatial data selection which is fundamental to further data analysis and also requires to define a 3D selection space with only input on a 2D plane. Finally, I discuss a case study in which we integrated several different interaction techniques into a tool for fluid mechanics experts to explore their data. I will end my talk by pointing out some open problems and research challenges that we are currently facing.
    
    Orateur: Dr Tobias Isenberg (Inria)
- Coffee break
- Session 3
  
  Président de session: Balázs Kégl (LAL)
  - 11
    
    The data science challenges of particle physics
    
    Particle physics poses several unique challenges for data science with multi-petabyte datasets, complex particle detectors, and the search for exceedingly rare signals in the data. The field is characterized by large, international collaborations, which requires a high-level of collaboration. I will give an overview of our data science challenges and discuss the statistical aspects of the recent discovery of the Higgs boson, including the collaborative statistical modeling techniques that are transforming the field. I will identify places where our tools and techniques are quickly evolving or beginning to fail and opportunities for fruitful collaboration with the nascent field of data science.
    
    Orateur: Kyle Cranmer (New York University)
    
    Transparents
  - 12
    
    Challenges for data science initiatives – an innovation management perspective
    
    Center for Data-Science (CDS) initiatives seem to pop up all around the globe at the moment. Considering the data deluge phenomena, the motivation behind such initiatives may seem trivial. However, a closer look reveals that the purpose, success conditions and managerial principles for CDS initiatives are much less clear. CDSs are neither private companies, nor traditional research entities. What would be a suitable organizational model and philosophy – designed to avoid pitfalls other science-based movements have faced in the history? Beyond the seemingly trivial purpose of being (analytical) service providers, each such initiative needs to build their own strategy for survival, success and long-term impact. They also need to accomplish this feat in a way to differentiate themselves from other initiatives. To this end, the body of knowledge produced by management science in the form of methods, organizational models and best practices can be helpful. This talk will focus on some potential pitfalls and the potential contribution of design theory and innovation management methods for CDS initiatives.
    
    Orateur: Akin Kazakci (Mines ParisTech)
    
    Transparents
- Panel discussion
  
  Présidents de session: M. Erwan Le Pennec (Ecole Polytechnique), Prof. Michalis Vazirgiannis (LIX Ecole Polytechnqiue)
  
  slides