Name: Annual CDS Pitching Day
Start: 2017-11-08T08:30:00+01:00
End: 2017-11-08T18:00:00+01:00
Location: Digiteo Shannon, building 660

- 09:00 → 09:50
  
  Welcome coffee 50m
- 09:50 → 11:00
  Session 1¶
  
  Physical Sciences
  - 09:50
    
    Introduction¶ 10m
    
    Orateur: Balázs Kégl (LAL)
  - 10:00
    
    Innovative meta-model construction techniques for electro-nuclear scenarios simulation¶ 20m
    
    Future of nuclear energy is very uncertain and actively discussed. To support these discussions and allows the choices to be informed, we need to simulate electro-nuclear scenarios with high level of precision. I will present the process used to simulate such scenarios and described the two key model needed : irradiation model and fuel creation model. For each of these two models, I will present what physics problem they have to solve and why the standard numerical meta-models are not sufficient and we would need in order to continue the development of our studies.
    
    Orateur: Marc Ernoult (IPNO)
    
    IPNO_DR PACS_ErnoultMarc.doc
  - 10:20
    
    Flexible and Adaptive Data Acquisition¶ 20m
    
    We investigate the possibility of flexible and adaptive settings of data acquisition for nuclear physics experiments(gamma-ray spectroscopy). The goal is to adapt rapidly the event selection settings for up to a few hundred detectors, knowing that the configuration changes with the experimental setup and that the behaviour of the electronics changes with the experimental conditions: we need to implement a feedback by learning from experimental data. This feedback should at least update the parameters of the online calculations, and possibly the algorithms.
    
    Orateur: Bernard Genolini (IPN Orsay)
    
    DAQSP-FADA_BernardGenolini.docx
  - 10:40
    
    Tracking Machine Learning Challenge¶ 20m
    
    The Large Hadron Collider at CERN, where the Higgs boson has been discovered, is poised for a major upgrade for possible discovery of new particles, super-symmetric particles, dark matter or signs of extra-dimensions of space. The increase in yearly number of proton collisions recorded comes at a cost of a large increase of the recorded event complexity. Preliminary studies show that traditional algorithms suffer from a combinatorial explosion of the CPU time.
    
    To reach out for Computer Science specialists, a Tracking Machine Learning challenge (trackML) is being set up for 2018.
    
    Orateur: Dr David Rousseau (ATLAS)
    
    LAL_LRI_DavidRousseau.docx
    
    tr171108_davidRousseau_CDSpitching_trackML.pdf
- 11:00 → 11:20
  
  Coffee break 20m
- 11:20 → 13:00
  Session 2¶
  - 11:20
    
    RAMP on detecting Mars craters in satellite images¶ 20m
    
    Impact craters in planetary science are used to date and characterize planetary surfaces and study the geological history of planets. It is therefore an important task which traditionally has been achieved by means of visual inspection of images. This talk will present a currently ongoing RAMP challenge with the goal to predict the location and size of craters on Mars based on a satellite image, and the pipeline that was set up to tackle this problem.
    
    Orateur: Joris Van den Bossche (INRIA)
    
    Slides
  - 11:40
    
    Galaxy morphologies and deblending¶ 20m
    
    Current and future generation of large scale astronomical surveys will have to deal with an increasing number of crowded fields due to their sensitivity. In such fields, a high number of objects (mostly galaxies) are "blended" together, which poses a challenge in terms of photometry (measure of individual fluxes) and morphology (shape measurements), both heavily related to the main science goals. This ramp would explore the use of deep learning techniques to tackle the deblending of galaxies (detection + segmentation) and ideally the measurement of their morphologies (regression).
    
    Orateur: Alexandre Boucaud (LAL)
    
    boucaud_galaxies.pdf
  - 12:00
    
    Fake News Detection¶ 20m
    
    Recently, there is a new phenomena of fake news and alternative facts. No one deliberately and consciously desires false information but paradoxically, people pervasively consume fake information. It is essential to restore people's confidence in reliable, fact-checking sources, and reduce media bias whether perceived or real. We believe that exploring artificial intelligence technologies could be leveraged to combat fake news and partly automate fact checking. We dive in American politics and propose a starting point for fake news detection.
    
    Orateur: Emanuela Boros (CEA, LIST, Laboratoire Vision et Ingénierie des Contenus)
  - 12:40
    
    Decoding and recoding wall turbulence¶ 20m
    
    Study of turbulence using numerical tools such as Direct Numerical Simulations, Large Eddy Simulations, etc.
    leads to an analysis of large amount of data to discover the underlying physical mechanisms. Estimation of a
    turbulent flow using few sensor measurements is yet another important problem in fluid mechanics, which also
    involves big data analysis. In this project, we analyze turbulent channel flow using a sufficiently large numerical
    dataset (around 9 TB) close to the wall region, with an aim of improving predictability and also understanding
    the underlying physics.
    
    Orateur: Srikanth Muralidhar (limsi)
    
    abstract_muralidhar.pdf
- 13:00 → 14:00
  
  Lunch 1h
- 14:00 → 15:40
  Session 3¶
  - 14:00
    
    Switching from Matlab to Python¶ 20m
    
    Atherosclerosis is an inflammatory disease of the arterial wall caused by the formation of an atheroma plaque in the vessels wall. Data analysis of single point spectra and images led to the identification of some spectral changes due to the murine macrophages J774 enrichment with fatty acids. Data processing was held using Matlab environment and we are looking to re-process this dataset in Python environment.
    
    Orateur: Sana Tfaili (Lip(Sys)², UFR Pharmacie, Université Paris Sud)
    
    LipSys_Sana Tfaili.pdf
  - 14:20
    
    Ranking Big Data Sets using Rank Aggregation Techniques¶ 20m
    
    The aim of (scientific) data ranking is to help users choose between alternative pieces of information especially when they are faced with huge amounts of data. However, ranking scientific data is a difficult task: various alternative quality criteria can be defined to order data items, depending on the data origin or even the way data have been obtained. As a consequence, it is very difficult to determine which ranking method (or which ranking criteria) to use. We present here a family of solutions named Rank Aggregation Techniques able to compute a consensus ranking from a set of input rankings. We will present the results obtained on our current applications.
    We are open to any new collaboration with domain scientists having the need to make the most of alternative rankings.
    
    Orateur: Dr Sarah Cohen-Boulakia (Laboratoire de Recherche en Informatique)
    
    Cohen-Boulakia-RankingCDSpitching2017.pdf
    
    ConQur-Bio Web site (using rang aggregation techniques)
    
    LRI_RankAggregarionTech_SarahCohen.pdf
  - 14:40
    
    Calcium rhythms, Algorithms and markers of the culture media impact during fertilization¶ 20m
    
    During fertilization in mammals, the egg emits a series of calcium oscillations that are specific to each individual and whose frequency and amplitude are modulated by the culture medium in which it is cultured. We have developed advanced microfluidic techniques to record, stimulate and analyze the calcium response during the first hours of in-vitro fertilization (IVF) and have hundreds of individual data depending on the composition of the culture medium. The construction, as part of a collaborative project with CDS, of a prediction tool based on algorithms and a mathematical formalism of the functioning of the egg would open new perspectives of knowledge for developmental biology.
    
    Orateur: Corinne Cotinot (INRA)
    
    Calcium rhythms.pdf
  - 15:00
    
    Learning and Image Processing¶ 20m
    
    INRA/MaIAGE and INRA/BDR join their efforts to create efficient software for the early evaluation of embryo viability from time-lapse observation of embryos. Specifically, we are investigating the early development of bovine embryo which can be observed in 2D+Time light microscopy at different stages of the development. We have a database of hundreds of expert annotated embryos. More than one hundred qualitative and quantitative measures as well as original movies are available. A preliminary exploratory statistical analysis has been conducted and classification regression trees revealed discriminant features with respect to viability. Considering a restricted number of these features, we aim at their automatic evaluation from the movies.
    
    Orateur: Alain Trubuil (INRA)
    
    INRA_MaiAGE_AlainTrubuil.pdf
  - 15:20
    
    RAMP on predicting autism from resting-state functional MRI and anatomical MRI¶ 20m
    
    This talk will present the ongoing preparation of a RAMP aiming at distinguishing subjects with Autism Spectrum Disorder (ASD) from typical control subjects. This analysis will use the Autism Brain Imaging Data Exchange (ABIDE I & II) database and data from Robert Debre Hospital based on R-fMRI and anatomical MRI. We will particularly focus on presenting the problematic, the typical pipeline answering this problem, and the current status of this RAMP.
    This work is in collaboration with the Pasteur Institute (Neuroanatomy group of the Unit of Human Genetics and Cognitive Functions).
    
    Orateur: Guillaume Lemaitre (INRIA)
- 15:40 → 16:00
  
  Coffee break 20m
- 16:00 → 17:50
  Session 4¶
  - 16:00
    
    End-to-end Deep Learning Approach for Demographic History Inference¶ 20m
    
    Recent methods for demographic history inference have achieved good results, circumventing the complexity of raw genomic data by summarizing them into handcrafted features called summary statistics. We developed a new approach based on deep learning that takes as input the variant sites found within a sample of individuals from the same population, and infers demographic descriptor values without relying on these predefined summary statistics. By letting our model choose how to handle raw data and learn its own way to embed them, we were able to outperform a method frequently used in population genetics for the inference of three out of seven demographic descriptor values of a scenario with a bottleneck and two expansions. This is still a preliminary work and we are hopeful that future developments would allow us to tackle a broader range of demographic scenarios and outperform previous methods by developing more flexible artificial neural network architectures.
    
    Orateur: Theophile Sanchez (LRI)
    
    deepgenetics_tsanchez.pdf
  - 16:20
    
    A generic autocompletion of SPARQL in a multi-service context at Paris-Saclay¶ 20m
    
    Dans cet article, nous démontrons, au travers d’une expérimentation, une approche permettant de proposer des complétions d’une requête en cours de rédaction en exploitant de nombreux types d’autocomplétion et ce dans un contexte multi-services. Cette expérimentation s’appuie sur un éditeur SPARQL auquel nous avons ajouté des mécanismes d’autocomplétion qui supportent une ontologie en perpétuelle évolution, ici avec la base de connaissances collaborative de Wikidata.
    
    Orateur: Mme Karima Rafes (LRI)
    
    BDA2017_DEMO_Autocompletion_SPARQL.pdf
  - 16:40
    
    Scientific Knowledge Graph Enrichment and Expansion¶ 20m
    
    In the setting of the CDS2 of University Paris Saclay, the Data IT platform is a prominent initiative for moving forward a creation of a linked open data cloud dedicated for data science. In order to take part of this challenging goal, we propose in this project to be involved in the enrichment of the datasets, existing in Data IT platform, and in proposing tools contributing to the improvement of the data and knowledge quality that are already available and those which will be accessible through Data IT platform.
    
    Orateur: Fatiha Sais (LRI)
    
    LRI_INRA_FatihaSais.pdf
  - 17:00
    
    Discussion¶ 30m

Annual CDS Pitching Day

main auditorium

Digiteo Shannon, building 660