Danilo Bzdok heads the section for “Social and Affective Neurosciences” at the Department of Psychiatry, Psychotherapy and Psychosomatics at RWTH Aachen University in Germany. Using his dual background in neuroscience and data science, Danilo tries to reframe psychological questions as statistical-learning questions to generate new insights. His work on social cognition and psychiatry has led to innovative data-led perspectives on how humans navigate the social world and its neural substrates. In 2017, he was designated “Rising Star” by the Association for Psychological Science (APS) in the USA. He is also a self-proclaimed potato chips gourmet and excessive consumer of especially electronic and classical music.
My first encounter with Danilo was unilateral, over the pages of Nature Methods’ Points of Significance section, where he published several introductory pieces on machine learning. His way of boiling down a complex topic into an accessible explanation was also at the heart of our next meeting. At ICM in Paris, he gave an institute lecture about the relation between mainstream statistics and emerging pattern-learning techniques in brain-imaging neuroscience. This led to a longer discussion afterwards, this time face to face... And this discussion is revealed here, where Danilo gives his views on big data, the changes in how we answer questions with data in everyday science, and some speculations on the future of neuroscience.
Tal Seidel-Malkinson (TSM): First, can you tell us about your career path. You started in medicine, then moved into basic research. What made you shift?
Danilo Bzdok (DB): It actually started when I was in middle-school – my first intellectual passion was in programming and computer science. I really liked composing logic using computer code. At roughly 15 I was fluent in half a dozen programming languages, such as 32-bit assembler, Pascal, and C++. But I felt early on that I was also intrigued by various other, completely different, things like philosophy,foreign languages, social sciences and neuroscience.
At that time, being mostly focused on natural sciences felt like somewhat of a limitation to me. One thing I really liked about the way of thinking in philosophy, and still appreciate very much, is the close interplay between logic and language. I was however not fully convinced this was a very pragmatic career choice. At least in Germany, a degree in philosophy is not always something that keeps many doors open for the next steps in life. That’s why I eventually decided to study a conservative area that would give me a solid foundation. Medicine seemed to be a safe choice, provided such a general education, and also gives you a lot of options. You go through an intense learning experience that shapes your work ethic. I wanted to go towards becoming active in scientific research, determined to move into brain science in particular. I therefore spent my early University years concentrating on neuroscience and psychiatry.
In the middle of my studies I wanted to get involved with research as soon as possible. This led me to work with Simon Eickhoff at the Institute of Neuroscience & Medicine at the Research Centre Juelich, who was an incredible mentor to me, and I also reached out to the department of psychiatry at the RWTH Aachen. I was lucky enough to be funded by the German Research Foundation (DFG) and to be part of an international research and training group (IRTG1328 on “Schizophrenia and autism”) with UPENN, USA. This particular department of psychiatry at RWTH Aachen University turned out to be active in brain-imaging research. Due to a series of lucky coincidences, I had the opportunity to go through an authentic research experience.
During the second half of Medical School I spent less and less time attending lectures, and instead tried to min-max the exams. Towards the end of my medical studies I was barely studying anymore. It then felt like a smooth transition into being a full-time researcher. At that point, I wasn’t ready to commit another >5 years to clinical specialization in psychiatry, which takes ~50-60 hours of your time per week and leaves less time for research.
I also learned a lot during a fantastic research stay working with Peter Fox and Angela Laird in San Antonio, Texas, USA, and I launched several collaboration projects with social cognition enthusiasts here in Germany, including Kai Vogeley, Leonhard Schilbach, and Denis Engemann. Together we conducted a series of neuroimaging studies on whether or not there are brain regions that may be uniquely devoted to social-affective processing - a direction of research which later pushed me to pursue always more general systems neuroscience questions. In 2013, I had become convinced that whether human-specific neural systems exist -- particularly ones that might be devoted to human social interaction -- was at its heart a methodological and statistical question. Whether or not scientists can go beyond the cognitive terms that we have been used for decades in social and affective neuroscience, such as “theory of mind”, “affect”, or “empathy”, is a question that can be more readily wrestled with certain data-analysis toolkits than others.
TSM: It’s clear that neuroscience nowadays increasingly requires an interdisciplinary set of skills. In your unique path you have acquired a broad set of skills from your degrees in medicine and maths and your PhDs in neuroscience and in computer science. How did you choose this path and, given this can’t be common training, what do you think early career researchers should focus on?
DB: I went through a journey of sometimes unconnected interests. It wasn’t always a conscious choice at a particular point. Essentially, I just went through several bouts of intense interest, getting absorbed in specific topics. That’s why, in retrospect, I am happy I somehow made it all the way through medicine. Despite changing areas of interest, at least I have an official degree that could help me give something back to society.
For years, I was not really sure how to cultivate and usefully combine my skills in language, logic and algorithms. When neuroscience later turned out to be such a vibrant interdisciplinary field, it was quite a relief to me. I found an opportunity to combine several different, what I like to call, thought styles. In neuroscience you can interface between diverging thought styles and approaches, and really get something out of it. That’s perhaps why I have a weakness for fuzzy topics like higher-order cognition, what domain-general function the TPJ may subserve, and what the “dark matter” of brain physiology - the default-mode network - may tell us about the nature of the human species. Several of these topics have a decent amount of soft-scienciness, at least to me – I then try to be principled and get at these research questions with algorithmic approaches that “let the data dominate”.
One thing that appears obvious to me in my activities as a supervisor, mentor and speaker: the data science revolution will depend on better quantitative literacy of the next generation of ambitious neuroscientists. We live in an increasingly quantified world. There are more quantifiable aspects about how we live and what we do; in normal life as well as when things go awry. There is a rapidly increasing opportunity to use algorithmic and computational tools, to generate quantitative insight and reach rigorous conclusions from the increasing amount of data at our hands.
Such modern regimes of data-analysis may look disturbingly different from the traditional goals of statistics and how statistics is taught at the university for many empirical sciences. In the data-rich setting, some traditional methods may have difficulty approximating the truth. That’s why I tried to structure my scientific education not only towards a solid neuroanatomical and neurophysiological understanding, in which I was much influenced by Karl Zilles and Katrin Amunts, but also a sense of probabilistic reasoning and quantitative methodology, in which I was much influenced by Bertrand Thirion, Gaël Varoquaux, and Olivier Grisel.
As almost every PI will tell you, most of their students will ultimately not end up in academia. I therefore believe that, at a more pragmatic level, getting an education with a solid data-analysis component can avoid pigeonholing PhD students or Post-Docs for a career as a scientist, and offer a broader portfolio of options to find jobs in industry and government after leaving academia.
TSM: Big data is a new opportunity for neuroscience, but equally it’s a new challenge. How do you see this development?
DB: In general, many scientific disciplines show a tendency to diversify into ever more specialized subdisciplines over time. So just because there are new opportunities doesn’t mean that the more established ways to conduct research and older techniques are rendered obsolete. Meticulously designed, hypothesis-guided experiments in carefully recruited participant samples will most likely remain the workhorse to generate new insight in neuroscience. What appears to be happening right now, is that we are extending the repertoire of questions that can be asked and are quantifiable.
Let me give one particular example. The increasing availability and quality of brain measurements will soon allow learning description systems of mental operations in health directly from data themselves - a cognitive taxonomy directly extracted from brain measurements, and nomenclatures of disturbed thinking in mental disease. Such goals are likely to require combinations of massive amounts of richly annotated brain data and innovative pattern-learning approaches.
TSM: There’s a tendency towards moving from group analyses to predicting outcomes for individual participants, are our current tools reliable enough for that?
DB: Broadly, I can see two distinct and promising trends – on the one hand, scientists bring in a small number of subjects into the lab several times and acquire hours of brain scanning, which allows accessing a finer granularity of neural processes at the level of densely sampled single individuals. There are several well-known labs that now seriously go into this direction with a lot of success...
TSM: Do we need so much data on individuals because of variability of cognition or the SNR of fMRI?
DB: There are several aspects at play. Often, resting-state scans are still just 5-10 minutes. I think that may not be enough to robustly describe *all* aspects of neural activity changes in the brain that investigators may find interesting. This is the first trend: one pocket of the brain-imaging community now tries to go always deeper in terms of subject specificity. It nicely complements the dominant agenda of conducting statistical tests on differences between pairs of experimental conditions or participant groups.
The completely other way to go beyond binary comparisons that I see is progress towards population-scale neuroscience. There is an increasing tendency for extensive data collections with hundreds and thousands of indicators like demographic, neuropsychological and health-related items, from a maximum of individuals. Such population neuroscience approaches will probably shed new light on variability patterns of brain biology, across distinct brain-imaging modalities, and bring into contact previously unconnected research streams. These people try to acquire as much information as possible that characterizes as many people as possible. The approach avoids strict a-priori choices as to the type of person or disease category to be distinguished and studied. One hopes that coherent clusters of individuals emerge in massive data. That again is a completely different perspective. This is a good setting, for example, to discover, quantify, and ultimately predict subclinical phenotypes in people - individuals who deviate from the normative population in some coherent way, without being “dysfunctional” in society.
It is my impression that both highly-sampled single participants and richly phenotyped participant populations are two exciting upcoming directions that hold a lot of promise. Both these research agendas can probably complement and inform experimental studies of ~30 people with well-chosen hypotheses and dedicated experimental designs.
From a more statistical perspective, there is an orthogonal aspect. For the majority of the 20th century, researchers in biomedicine have acquired and analyzed “long data”, with fewer variables than individuals. Today neuroscientists need to tackle always more often “wide data”, some call it “fat data”, with sometimes a much greater number of variables than individuals. Having extensive “found” or observational data from general-purpose databases is where machine-learning algorithms and data science come into play. Such more recently emerged statistical tools offer new strategies to search through abundant yet messy data. It is an exciting future perspective to integrate both – the highly sampled subjects and population neuroscience.
TSM: As you said both approaches require collecting, logging and archiving big datasets – this requires a lot of resources. Do you think this might increase the gap between well-funded and less well-funded labs?
DB: That’s a bit political, I’ll try to give a neutral answer. When you look at the Human Connectome Project (HCP) – there was a lot of excitement when it established itself as a trusted reference dataset for the brain-imaging community. That allowed new methodological approaches to be compared against each other in a more principled fashion. Yet, looking at the many thousand imaging neuroscientists on the planet, how many of those have really published a paper with the data from the HCP project? Actually, not that many.
Many of the existing HCP publications appear to often be methods-focused papers. I’m not saying that’s not interesting. But I think many scientists would perhaps have expected more discoveries on brain structure and function based on this unique data resource. One reason why this is surprising to me is that many of the classical software libraries still scaled fairly well to the HCP 500 release; just having to wait a bit longer for the results. Even with the full 1,200 subjects you could still scale to the higher sample size using essentially identical software and analysis pipelines that were already set-up in the lab.
We now have the UK Biobank Imaging, CamCAN, ENIGMA, and many other rich datasets. Given that HCP data were not primarily used by labs to answer cognitive neuroscience or neurobiological questions on brain connectivity, I expect that there will probably be an even bigger gap between the majority of imaging neuroscientists and those people who capitalize on the new generation of complex datasets. There will be even fewer labs that have a vested interest in and a daily exposure to methodological techniques needed to leverage these burgeoning data repositories.
TSM: This transition to big data requires a change in our methodologies and ways of thinking. How do you think this cultural shift should be achieved?
DB: Let’s go back to the two larger trends we discussed before – using densely sampled participants and population neuroscience to understand the healthy and diseased brain. Big-data methodologies are likely to play an important role in gaining this insight. We’ll need a shift in our everyday data-analysis practices and how we design and run our labs. We’ll need more computational savoir-faire and more people from STEM backgrounds. But that’s not enough. There also needs to be a more organic and fluid conversation between analysts and the PIs who have these people on their payrolls. More exchange in both directions will help us to negotiate between the research questions and optimal algorithmic methods.
A big issue, for instance, already is and will increasingly become the “big-data brain drain”: many people with quantitative aptitude and a proven data-analysis skill-set are highly sought after and may be aggressively headhunted by companies for several times higher salaries than what we in academia can offer. For instance, one of my students with a background in physics recently got recruited by McKinsey Analytics in London.
To tackle some of the ambitious questions we mentioned, we’ll also need better infrastructure than many universities today offer us neuroscientists. We simply need more money for this expensive computational architecture and its sustained maintenance. Now, some people may ask why we don’t just use cloud computing. And sure Amazon AWS and other cloud-based solutions are attractive options. But it’s worth considering two problems: first, you have data-privacy issues where you have personal data from individuals. In many research institutions, researchers may not be allowed to upload detailed information of individuals to servers in a different country. Second, there is a bureaucratic problem: you cannot easily estimate in advance how much money you need for your particular cloud-computing jobs. Many finance departments are however allocating money on a per-year basis, at least at German universities.
Last but not least, there’s the educational issue: how should we train young neuroscientists? It’s not clear how in this already very interdisciplinary teaching schedule, with theory of neuroscience, molecular biology, anatomy, physiology, classical statistics, genetics, brain diseases, and so forth, we could add multi-core processing, high performance programming, and so on. There are so many things that a 21st century neuroscientist is expected to absorb. It’s not clear where you’ll find people with such a multi-faceted mind who can be incentivized to, and are able to, embrace this breadth.
TSM: So perhaps we need to be collaborative? It’s perhaps not realistic to expect single people to have all these skills.
DB: It’s probably not realistic, but still, we will need some of these “glue people”. It’s not clear to me where we should expect them to come from. That’s why my feeling is that the shape and form of scientific education may play an increasingly important role in neuroscience.
TSM: Big data has been seen by some as a solution to the replication crisis – and another approach has been to use meta-analysis. You’ve recently published a meta-analysis on theory of mind. What did you learn from this, and what should we be careful about when applying meta-analysis?
DB: Several decades ago, there was a similar crisis in the social sciences, as we experience now in the current replication crisis. Many people weren’t sure how to go forward as there was a lot of uncertainty about how robust and valuable the abstract constructs were that these empirical scientists were studying. An important contribution to provide justification for these mental and social constructs came from quantitative meta-analysis.
Quantitative analysis is a very useful tool to identify convergence across isolated findings and thus solidify scientific areas. Especially if you know you will be facing small effects and a lot of noise; which is true for social and psychological sciences, and probably not wrong for brain-imaging. So you can either shift to a different area of research with more tractable problems or adapt to the situation that we have, where meta-analysis is one key solution to cope with the idiosyncrasies of a broad range of studies. It will unavoidably mask some subtle effects from single experiments. But you can see through the noise – distinguish the forest from the trees.
TSM: Presumably it also helps to collaborate with multi-centre studies.
DB: Sure. Many young students getting into neuroscience may perhaps still envision the lonely genius who is knowledgeable about so many areas of neuroscience. The biggest steps forward may come from *teams*. Sets of people who learned to genuinely work together; not despite but because they are drastically different in their knowledge and thought styles. If they succeed in aligning their thinking and efforts towards a common goal in neuroscience research, non-linear progress probably becomes much more likely.
In terms of data-collection, it’s worth comparing brain-imaging to genetics or genomics. Several trends in imaging neuroscience today may have been preceded in a similar form already 5-10 years ago in genomic research. There, many data collection collaborations were foundational and helped the research community to see through the noise more clearly. Imaging neuroscience is becoming larger and more international with increasing numbers of labs, so there is greater potential for people to work together. Intense and bidirectional collaboration between drastically different disciplines may be a prerequisite to render some of the ambitious questions actionable that we had the pleasure to discuss today. It also means you need people skills, on top of everything else!
TSM: I want to thank you for the nice chat – and it’s definitely an exciting, interesting era in neuroscience!