Defence of dissertation in the field of computer and information science, Aleksi Kallio, M.Sc.

2016-02-19 12:00:00 2016-02-19 23:59:59 Europe/Helsinki Defence of dissertation in the field of computer and information science, Aleksi Kallio, M.Sc. Sampling from scarcely defined distributions: Methods and applications in data mining http://old.cs.aalto.fi/en/midcom-permalink-1e5ce325e52a030ce3211e5a5744f3b67527bbc7bbc Konemiehentie 2, 02150, Espoo

Sampling from scarcely defined distributions: Methods and applications in data mining

19.02.2016 / 12:00
in lecture hall T2, Konemiehentie 2, 02150, Espoo, FI

Aleksi Kallio, M.Sc., will defend the dissertation "Sampling from scarcely defined distributions: Methods and applications in data mining" on 19.2. at 12 noon in Aalto University School of Science, lecture hall T2, Konemiehentie 2, Espoo.

Reliability and reproducibility of discoveries is essential for scientific progress. In his dissertation, Aleksi Kallio, M.Sc., studied difficult cases of scientific data analytics and developed new methods and approaches to assess the statistical significance of discoveries. Improved methods are needed due to rapidly growing volumes of data and more complex analytical questions that are faced in modern research.

The dissertation introduces the term scarcely defined distributions to describe difficult statistical distributions that are common in modern data analytics. The dissertation discusses methods and applications of data mining, in which scarcely defined distributions emerge. Several strategies are put forth that allow to analyze complex datasets. Applications are reviewed from several fields, including bioinformatics, paleontology and ecology. A common factor for the application areas is the complexity of the underlying processes and error sources.

The work concludes that development of new and flexible analytical methods is crucial for all fields that desire to use data to support decision making and prediction. If testing for significance and reliability is not on par with the rest of the data processing machinery then the future of data driven discovery will be plagued with false interpretations. The applicability of the research extends beyond the fields that were discussed. The generic methods and approaches can be adopted to many use cases where complex data sources are relevant, including major social questions related to medicine, climate and social networks.

Dissertation release (pdf)

Opponent: Dr. Pauli Miettinen, Max-Planck-Institut für Informatik, Germany 

Custos: Professor Aristides Gionis, Aalto University School of Science, Department of Computer Science

Electronic dissertation: http://urn.fi/URN:ISBN:978-952-60-6654-7

School of Science, electronic dissertations: https://aaltodoc.aalto.fi/handle/123456789/52