Torsten Möller, head of the research group on Visualization and Data Analysis, is the speaker of the research platform for data science at the University of Vienna. In his blog post he explains, why data science is an exciting area not only for computer scientists like himself, but also for researchers from a broad variety of fields.
I am so excited about data science, because to me it is all about modelling. And modelling is the key to understanding the world around us. But let me explain …
Data has been around for a long time, but it is only a means to an end. Galileo collected data of the moons of Jupiter in his notebook, Mendel carefully observed, collected and counted different types of peas. Many people after and before them collected data in the form of observations as well. The importance (and the excitement) comes with what we can do with it. It all has to do with science and the principled approach of what we call science – by using the scientific method. Driven by curiosity, we want to understand our physical world or societal phenomena by means of natural or social sciences. Using the scientific method, we start by collecting observations of the phenomenon we want to study. These observations help us form a hypothesis, a possible model that explains that phenomenon. “All” we have left to do now is to validate or reject this hypothesis. Ultimately, this leads us to a working model of the phenomenon we are studying.
Modelling is more ubiquitous than you might think
When you are buying a car or looking for a new apartment, there are many different objectives to weigh – from the price to the distance from work, etc. Weighing these different objectives that you all want to optimise is already a very simple model. But we all know that the space it spans is large and often difficult to understand. Even here, visual tools can help you with that. One of my favorite projects along these lines is LineUp developed by colleagues of mine at the Johannes Kepler University Linz. Together with colleagues from VRVis, we have developed it further (called WeightLifter) and car manufacturers have been using it to better understand their models of car engines.
One of our past successes has been the development of Vismon – a tool that allows fisheries managers to reason about harvesting policies by visually understanding the consequences of their decisions. Based on a large collection of historical data, fisheries scientists have developed the models driving these decisions.
Modelling has advanced with technological progress
Of course, dealing with difficult decision processes in politics and business, and cracking the secrets of biology or physics is not just about weighing some objectives. Over the years, powerful modelling techniques have been developed. Jim Gray put this technological development in the service of science and the scientific method in perspective. In what he calls the “Four Paradigms of Science” (detailed in “eScience — A Transformed Scientific Method“, (2007)) he proclaimed that there are really four principled ways of modelling.
The empirical approach best reflects the simplicity of the scientific method, turning a string of empirical observations into a hypothesis that needs to be tested. The advancement of mathematical abstraction and mathematical modelling allowed scientists like Albert Einstein to predict physical phenomena that could only be validated decades later (for example, gravitational waves). With the advent of computers, scientists used the growing computational power to perform experiments on the computer (as opposed to in the physical world) to create understanding. This is what we call computational science today. Only the collection of massive amounts of data, which has become commonplace only recently, allows us to train statistical models that have great predicitive qualities. This is what we call data-driven science or data science.
The famous statistician and mathematician David Donoho chronicles the history of data science and argues that it has been around for 50 years. It is not always just about “Big Data”, but about incorporating data into the modelling process.
Tools to make the use of data more democratic
In my view, the need to democratise modelling is where all these different modelling processes come together. Not only most scientific disciplines, but also businesses as well as governmental institutions have and use data to gain insights for making important decisions. Even us individuals collect data about ourselves. For example, through health apps on our smart devices or by collecting our bank statements. Understanding how we or others can use this data is important – not just for reasons of privacy. Most of all it is about enabling us to find ways to live a better and longer life in this digital world. Just as engineers have built microscopes to help people “see” without requiring a degree in physics, computer scientists like myself aspire to build tools that help people model without requiring a degree in mathematics or statistics.
For these reasons, providing sophisticated modelling techniques for scientists, business people, decision makers and individuals is the most exciting aspect of data science to me. Just as Galileo became famous for his theory on the movement of planets and Mendel is celebrated as the founder of genetics, Data Science will rise and fall by its ability to help people model in new ways.
Data Science @ Uni Vienna starts with lecture series “What is data science?“
I am humbled to bring together mathematicians, statisticians and computer scientists on the one hand, as well as astronomers, digital humanists and economists on the other hand to build the research platform Data Science @ Uni Vienna. I hope this will only be the beginning of an effort to build a bridge between methodological approaches and applied problems at the University of Vienna. We are open to all people who share our vision.
Besides doing fun research, we are working on the development of three master’s programmes in data science that we will hopefully be able to offer in the next few years.