By Elizabeth DuPre
The Open Science Special Interest Group (SIG) is a relatively new organization within OHBM; however, it is responsible for several increasingly popular community initiatives including the hackathon and the open science room. As the Open Science SIG assumes new leadership this month, I sat down with the incoming chair, Kirstie Whitaker, to hear about her hopes for the upcoming year.
Elizabeth DuPre (ED): Today I’m here with Kirstie Whitaker, Chair of the OHBM Open Science SIG. Kirstie, can you first tell us about yourself?
Kirstie Whitaker (KW): I’m a research fellow at the Alan Turing Institute – the UK’s national research institute for data science and artificial intelligence. There’s a lot of research going on there, but one of the projects I work on is trying to incentivise reproducible research across all of data science. I’m a neuroscientist by training, and I did my PhD in UC Berkeley, followed by a postdoc in Cambridge at the department of Psychiatry. I then had a one year fellowship with the Mozilla Science Lab before I transitioned to working in the Turing Institute.
ED: It sounds like you’ve seen many aspects of neuroscience and data science, both in academia as well as in industry through your fellowship with Mozilla. Those can all lend very different perspectives on the thing we’re both passionate about: open science. Can you tell us your thoughts about open science following from those experiences?
KW: Open science, as you’ve said, can mean different things to different people. You can imagine our friends in the library sciences are extremely passionate about open access. We should all be passionate about open access and being able to read our colleague’s work. There’s also a lot of work going on at OHBM using open data. That’s making science more efficient and allowing us to answer more interesting questions with different types of techniques – by harnessing different peoples’ data and sharing that with our colleagues.
There’s another aspect which is pretty prominent in neuroscience, with huge influence around the world, which is open source code. I write some analyses and importantly I allow other people to use it – so in that sense it’s similar to open data – but they’re also able to see it and interrogate it. So instead of building a black box we’re building tools that you can look inside.
There’s also an additional angle of making sure that science is open to all people. This includes citizen science – and one of our hackathon organisers this year is Anisha Keshavan, who’s one of the coolest and most exciting citizen science people that I’ve ever worked with – which means breaking out of the ivory tower, and allowing everyone who’s interested in helping us understand the brain to productively take part.
It also means making sure that there are scientific career paths for people with diverse experiences and opinions. That means we allow women to succeed as well as men. We ensure that people from different cultural backgrounds, different races, different countries who speak different languages, are all given a fair shot at expressing their goals, and completing the analyses that they want to do.
So for me, open science is just doing science, and doing science well. But my particular passion is to ensure we are being diverse and inclusive.
ED: Over this past year you’ve served as chair elect while I’ve been secretary elect – and we’ve gotten to see the leadership do some amazing things. Anisha was the co-organiser for our hackathon. And this was the first time that the hackathon has sold out – so it was really exciting to see all the enthusiasm that the open science events are generating. We also had Felix Hoffstaedter organizing the open science room at the annual meeting, where we even decided we needed a bigger space.
And of course our current chair Chris Gorgolewski and secretary Matteo Visconti di Oleggio Castello have done a great job about communicating to the community what we’re so excited about. Given all this, now that you’re taking over as chair where would you like to take the SIG?
KW: I know, it’s such a brilliant and terrible problem to have sold out the hackathon! The other person we should mention is Greg Kiar who co-organised the hackathon. He liaised with ethnographic researchers who specifically do research on hackathons to create a survey that asked attendees what they gained from the event, how they felt it accommodated more junior members, and importantly, how these events could be improved in the future. I’m so glad Greg conducted that survey - before we closed out the room on Saturday we all had 30 minutes to fill in our survey and answer our questions - and we’ll see the fruits of that survey in next year’s hackathon. He gave a brief overview and one of the biggest themes was people being so excited and grateful that there were so many skills available – and that there were so many different levels of people that were there.
I think that the event selling out reflected that excitement. But selling out means we’ll have to confront some issues; in particular, we’re going to have to figure out if we want to keep the hackathon small and intimate or let everyone who wants to come attend. One of the big sells of a small event is that you can easily make some connections with individual people who can share their expertise with you or point you in the right direction. Once you get larger you effectively start building OHBM [laughs]. I mean, we’re the hackathon, we’re not trying to take over the entire conference, so we’ll have some interesting challenges about how we include everyone.
My goal is to think about culture change, and making sure we give credit to early career researchers that are doing excellent work that supports others. Historically, the incentive structure in academia has been to encourage very sharp elbows and making sure “To get to the top I’ve got to be number one. I’ve got to be uniquely better than everyone else.” One thing that really impressed me at this year’s OHBM conference was a presentation by JB Poline where he talked about the work that the community has brought together for a publishing platform where you don’t just publish traditional papers, but you might also publish code, data or tutorials. These are things that we all know are very useful, but that aren’t fully recognised. I’d love to see early career researchers get a bit more credit for that sort of thing.
I also think that the wider community should take back that spirit of the hackathon – the feeling in the open science room of these really helpful conversations and try and take that out into the OHBM community all year long. We have a Slack channel where you can get in touch with people, by pinging questions out. But I think it would be really interesting to see if we can solicit ideas from our community and actually get our members involved. It doesn’t have to be the SIG that puts on an event – it could be that we help our members make the connections (and we perhaps help out with a little funding).
One of the initiatives [Elizabeth] and I have been doing is the demo calls. There, we reach out to people and I sit on YouTube live and I ask people about their experiences with open source and their projects, and how others can get involved. Maybe those demo calls are useful and we can take them forward and keep them going. But maybe there are better ideas and that’s what I’d love to explore – how we can generate more ideas and bring them to light.
ED: I’m really excited to see where that goes. That leads into our recent round of elections…
KW: Yes! Traditionally there were just two members of the committee and they’ve done a lot of work. Thank you to the previous leadership of the OHBM hackathon and the Open Science Room and the brain hack and everything - all the people who have run so many of these initiatives. It was a lot of work! I was very happy that we created quite a few more positions to bring more people in that were passionate and wanted to help nurture the open science community. For example, this year we realised that we didn’t have a treasurer position, and keeping track of all this money and paying for these things was a lot of work, so we’re introducing a new role to cover this need.
We’ve talked about my vision and my passion for open science. But one of the things that is so fun, and frightening, about open science and diversity is that you have to eat your own dog food; that is, to practice what you preach. The success of open science in general and the SIG in particular relies on bringing in new people, new points of view, and I’m looking forward to it.
ED: Yes, I’m looking forward to seeing everything that happens and our new initiatives. Thanks so much!
After our conversation took place, we concluded the most recent round of elections. We’re now excited to announce the new leadership joining the Open Science SIG:
Greg Kiar - Treasurer
Camille Maumet - Chair elect
Ana Van Gulick - Secretary elect
Sara Kimmich - Treasurer elect
Roberto Toro and Katja Heuer - Hackathon co-chairs
Tim van Mourik - Open science room organizer
Cameron Craddock - Council liaison
Look for a follow-up post where we find out more about their pathways into open science!
Danilo Bzdok heads the section for “Social and Affective Neurosciences” at the Department of Psychiatry, Psychotherapy and Psychosomatics at RWTH Aachen University in Germany. Using his dual background in neuroscience and data science, Danilo tries to reframe psychological questions as statistical-learning questions to generate new insights. His work on social cognition and psychiatry has led to innovative data-led perspectives on how humans navigate the social world and its neural substrates. In 2017, he was designated “Rising Star” by the Association for Psychological Science (APS) in the USA. He is also a self-proclaimed potato chips gourmet and excessive consumer of especially electronic and classical music.
My first encounter with Danilo was unilateral, over the pages of Nature Methods’ Points of Significance section, where he published several introductory pieces on machine learning. His way of boiling down a complex topic into an accessible explanation was also at the heart of our next meeting. At ICM in Paris, he gave an institute lecture about the relation between mainstream statistics and emerging pattern-learning techniques in brain-imaging neuroscience. This led to a longer discussion afterwards, this time face to face... And this discussion is revealed here, where Danilo gives his views on big data, the changes in how we answer questions with data in everyday science, and some speculations on the future of neuroscience.
Tal Seidel-Malkinson (TSM): First, can you tell us about your career path. You started in medicine, then moved into basic research. What made you shift?
Danilo Bzdok (DB): It actually started when I was in middle-school – my first intellectual passion was in programming and computer science. I really liked composing logic using computer code. At roughly 15 I was fluent in half a dozen programming languages, such as 32-bit assembler, Pascal, and C++. But I felt early on that I was also intrigued by various other, completely different, things like philosophy,foreign languages, social sciences and neuroscience.
At that time, being mostly focused on natural sciences felt like somewhat of a limitation to me. One thing I really liked about the way of thinking in philosophy, and still appreciate very much, is the close interplay between logic and language. I was however not fully convinced this was a very pragmatic career choice. At least in Germany, a degree in philosophy is not always something that keeps many doors open for the next steps in life. That’s why I eventually decided to study a conservative area that would give me a solid foundation. Medicine seemed to be a safe choice, provided such a general education, and also gives you a lot of options. You go through an intense learning experience that shapes your work ethic. I wanted to go towards becoming active in scientific research, determined to move into brain science in particular. I therefore spent my early University years concentrating on neuroscience and psychiatry.
In the middle of my studies I wanted to get involved with research as soon as possible. This led me to work with Simon Eickhoff at the Institute of Neuroscience & Medicine at the Research Centre Juelich, who was an incredible mentor to me, and I also reached out to the department of psychiatry at the RWTH Aachen. I was lucky enough to be funded by the German Research Foundation (DFG) and to be part of an international research and training group (IRTG1328 on “Schizophrenia and autism”) with UPENN, USA. This particular department of psychiatry at RWTH Aachen University turned out to be active in brain-imaging research. Due to a series of lucky coincidences, I had the opportunity to go through an authentic research experience.
During the second half of Medical School I spent less and less time attending lectures, and instead tried to min-max the exams. Towards the end of my medical studies I was barely studying anymore. It then felt like a smooth transition into being a full-time researcher. At that point, I wasn’t ready to commit another >5 years to clinical specialization in psychiatry, which takes ~50-60 hours of your time per week and leaves less time for research.
I also learned a lot during a fantastic research stay working with Peter Fox and Angela Laird in San Antonio, Texas, USA, and I launched several collaboration projects with social cognition enthusiasts here in Germany, including Kai Vogeley, Leonhard Schilbach, and Denis Engemann. Together we conducted a series of neuroimaging studies on whether or not there are brain regions that may be uniquely devoted to social-affective processing - a direction of research which later pushed me to pursue always more general systems neuroscience questions. In 2013, I had become convinced that whether human-specific neural systems exist -- particularly ones that might be devoted to human social interaction -- was at its heart a methodological and statistical question. Whether or not scientists can go beyond the cognitive terms that we have been used for decades in social and affective neuroscience, such as “theory of mind”, “affect”, or “empathy”, is a question that can be more readily wrestled with certain data-analysis toolkits than others.
TSM: It’s clear that neuroscience nowadays increasingly requires an interdisciplinary set of skills. In your unique path you have acquired a broad set of skills from your degrees in medicine and maths and your PhDs in neuroscience and in computer science. How did you choose this path and, given this can’t be common training, what do you think early career researchers should focus on?
DB: I went through a journey of sometimes unconnected interests. It wasn’t always a conscious choice at a particular point. Essentially, I just went through several bouts of intense interest, getting absorbed in specific topics. That’s why, in retrospect, I am happy I somehow made it all the way through medicine. Despite changing areas of interest, at least I have an official degree that could help me give something back to society.
For years, I was not really sure how to cultivate and usefully combine my skills in language, logic and algorithms. When neuroscience later turned out to be such a vibrant interdisciplinary field, it was quite a relief to me. I found an opportunity to combine several different, what I like to call, thought styles. In neuroscience you can interface between diverging thought styles and approaches, and really get something out of it. That’s perhaps why I have a weakness for fuzzy topics like higher-order cognition, what domain-general function the TPJ may subserve, and what the “dark matter” of brain physiology - the default-mode network - may tell us about the nature of the human species. Several of these topics have a decent amount of soft-scienciness, at least to me – I then try to be principled and get at these research questions with algorithmic approaches that “let the data dominate”.
One thing that appears obvious to me in my activities as a supervisor, mentor and speaker: the data science revolution will depend on better quantitative literacy of the next generation of ambitious neuroscientists. We live in an increasingly quantified world. There are more quantifiable aspects about how we live and what we do; in normal life as well as when things go awry. There is a rapidly increasing opportunity to use algorithmic and computational tools, to generate quantitative insight and reach rigorous conclusions from the increasing amount of data at our hands.
Such modern regimes of data-analysis may look disturbingly different from the traditional goals of statistics and how statistics is taught at the university for many empirical sciences. In the data-rich setting, some traditional methods may have difficulty approximating the truth. That’s why I tried to structure my scientific education not only towards a solid neuroanatomical and neurophysiological understanding, in which I was much influenced by Karl Zilles and Katrin Amunts, but also a sense of probabilistic reasoning and quantitative methodology, in which I was much influenced by Bertrand Thirion, Gaël Varoquaux, and Olivier Grisel.
As almost every PI will tell you, most of their students will ultimately not end up in academia. I therefore believe that, at a more pragmatic level, getting an education with a solid data-analysis component can avoid pigeonholing PhD students or Post-Docs for a career as a scientist, and offer a broader portfolio of options to find jobs in industry and government after leaving academia.
TSM: Big data is a new opportunity for neuroscience, but equally it’s a new challenge. How do you see this development?
DB: In general, many scientific disciplines show a tendency to diversify into ever more specialized subdisciplines over time. So just because there are new opportunities doesn’t mean that the more established ways to conduct research and older techniques are rendered obsolete. Meticulously designed, hypothesis-guided experiments in carefully recruited participant samples will most likely remain the workhorse to generate new insight in neuroscience. What appears to be happening right now, is that we are extending the repertoire of questions that can be asked and are quantifiable.
Let me give one particular example. The increasing availability and quality of brain measurements will soon allow learning description systems of mental operations in health directly from data themselves - a cognitive taxonomy directly extracted from brain measurements, and nomenclatures of disturbed thinking in mental disease. Such goals are likely to require combinations of massive amounts of richly annotated brain data and innovative pattern-learning approaches.
TSM: There’s a tendency towards moving from group analyses to predicting outcomes for individual participants, are our current tools reliable enough for that?
DB: Broadly, I can see two distinct and promising trends – on the one hand, scientists bring in a small number of subjects into the lab several times and acquire hours of brain scanning, which allows accessing a finer granularity of neural processes at the level of densely sampled single individuals. There are several well-known labs that now seriously go into this direction with a lot of success...
TSM: Do we need so much data on individuals because of variability of cognition or the SNR of fMRI?
DB: There are several aspects at play. Often, resting-state scans are still just 5-10 minutes. I think that may not be enough to robustly describe *all* aspects of neural activity changes in the brain that investigators may find interesting. This is the first trend: one pocket of the brain-imaging community now tries to go always deeper in terms of subject specificity. It nicely complements the dominant agenda of conducting statistical tests on differences between pairs of experimental conditions or participant groups.
The completely other way to go beyond binary comparisons that I see is progress towards population-scale neuroscience. There is an increasing tendency for extensive data collections with hundreds and thousands of indicators like demographic, neuropsychological and health-related items, from a maximum of individuals. Such population neuroscience approaches will probably shed new light on variability patterns of brain biology, across distinct brain-imaging modalities, and bring into contact previously unconnected research streams. These people try to acquire as much information as possible that characterizes as many people as possible. The approach avoids strict a-priori choices as to the type of person or disease category to be distinguished and studied. One hopes that coherent clusters of individuals emerge in massive data. That again is a completely different perspective. This is a good setting, for example, to discover, quantify, and ultimately predict subclinical phenotypes in people - individuals who deviate from the normative population in some coherent way, without being “dysfunctional” in society.
It is my impression that both highly-sampled single participants and richly phenotyped participant populations are two exciting upcoming directions that hold a lot of promise. Both these research agendas can probably complement and inform experimental studies of ~30 people with well-chosen hypotheses and dedicated experimental designs.
From a more statistical perspective, there is an orthogonal aspect. For the majority of the 20th century, researchers in biomedicine have acquired and analyzed “long data”, with fewer variables than individuals. Today neuroscientists need to tackle always more often “wide data”, some call it “fat data”, with sometimes a much greater number of variables than individuals. Having extensive “found” or observational data from general-purpose databases is where machine-learning algorithms and data science come into play. Such more recently emerged statistical tools offer new strategies to search through abundant yet messy data. It is an exciting future perspective to integrate both – the highly sampled subjects and population neuroscience.
TSM: As you said both approaches require collecting, logging and archiving big datasets – this requires a lot of resources. Do you think this might increase the gap between well-funded and less well-funded labs?
DB: That’s a bit political, I’ll try to give a neutral answer. When you look at the Human Connectome Project (HCP) – there was a lot of excitement when it established itself as a trusted reference dataset for the brain-imaging community. That allowed new methodological approaches to be compared against each other in a more principled fashion. Yet, looking at the many thousand imaging neuroscientists on the planet, how many of those have really published a paper with the data from the HCP project? Actually, not that many.
Many of the existing HCP publications appear to often be methods-focused papers. I’m not saying that’s not interesting. But I think many scientists would perhaps have expected more discoveries on brain structure and function based on this unique data resource. One reason why this is surprising to me is that many of the classical software libraries still scaled fairly well to the HCP 500 release; just having to wait a bit longer for the results. Even with the full 1,200 subjects you could still scale to the higher sample size using essentially identical software and analysis pipelines that were already set-up in the lab.
We now have the UK Biobank Imaging, CamCAN, ENIGMA, and many other rich datasets. Given that HCP data were not primarily used by labs to answer cognitive neuroscience or neurobiological questions on brain connectivity, I expect that there will probably be an even bigger gap between the majority of imaging neuroscientists and those people who capitalize on the new generation of complex datasets. There will be even fewer labs that have a vested interest in and a daily exposure to methodological techniques needed to leverage these burgeoning data repositories.
TSM: This transition to big data requires a change in our methodologies and ways of thinking. How do you think this cultural shift should be achieved?
DB: Let’s go back to the two larger trends we discussed before – using densely sampled participants and population neuroscience to understand the healthy and diseased brain. Big-data methodologies are likely to play an important role in gaining this insight. We’ll need a shift in our everyday data-analysis practices and how we design and run our labs. We’ll need more computational savoir-faire and more people from STEM backgrounds. But that’s not enough. There also needs to be a more organic and fluid conversation between analysts and the PIs who have these people on their payrolls. More exchange in both directions will help us to negotiate between the research questions and optimal algorithmic methods.
A big issue, for instance, already is and will increasingly become the “big-data brain drain”: many people with quantitative aptitude and a proven data-analysis skill-set are highly sought after and may be aggressively headhunted by companies for several times higher salaries than what we in academia can offer. For instance, one of my students with a background in physics recently got recruited by McKinsey Analytics in London.
To tackle some of the ambitious questions we mentioned, we’ll also need better infrastructure than many universities today offer us neuroscientists. We simply need more money for this expensive computational architecture and its sustained maintenance. Now, some people may ask why we don’t just use cloud computing. And sure Amazon AWS and other cloud-based solutions are attractive options. But it’s worth considering two problems: first, you have data-privacy issues where you have personal data from individuals. In many research institutions, researchers may not be allowed to upload detailed information of individuals to servers in a different country. Second, there is a bureaucratic problem: you cannot easily estimate in advance how much money you need for your particular cloud-computing jobs. Many finance departments are however allocating money on a per-year basis, at least at German universities.
Last but not least, there’s the educational issue: how should we train young neuroscientists? It’s not clear how in this already very interdisciplinary teaching schedule, with theory of neuroscience, molecular biology, anatomy, physiology, classical statistics, genetics, brain diseases, and so forth, we could add multi-core processing, high performance programming, and so on. There are so many things that a 21st century neuroscientist is expected to absorb. It’s not clear where you’ll find people with such a multi-faceted mind who can be incentivized to, and are able to, embrace this breadth.
TSM: So perhaps we need to be collaborative? It’s perhaps not realistic to expect single people to have all these skills.
DB: It’s probably not realistic, but still, we will need some of these “glue people”. It’s not clear to me where we should expect them to come from. That’s why my feeling is that the shape and form of scientific education may play an increasingly important role in neuroscience.
TSM: Big data has been seen by some as a solution to the replication crisis – and another approach has been to use meta-analysis. You’ve recently published a meta-analysis on theory of mind. What did you learn from this, and what should we be careful about when applying meta-analysis?
DB: Several decades ago, there was a similar crisis in the social sciences, as we experience now in the current replication crisis. Many people weren’t sure how to go forward as there was a lot of uncertainty about how robust and valuable the abstract constructs were that these empirical scientists were studying. An important contribution to provide justification for these mental and social constructs came from quantitative meta-analysis.
Quantitative analysis is a very useful tool to identify convergence across isolated findings and thus solidify scientific areas. Especially if you know you will be facing small effects and a lot of noise; which is true for social and psychological sciences, and probably not wrong for brain-imaging. So you can either shift to a different area of research with more tractable problems or adapt to the situation that we have, where meta-analysis is one key solution to cope with the idiosyncrasies of a broad range of studies. It will unavoidably mask some subtle effects from single experiments. But you can see through the noise – distinguish the forest from the trees.
TSM: Presumably it also helps to collaborate with multi-centre studies.
DB: Sure. Many young students getting into neuroscience may perhaps still envision the lonely genius who is knowledgeable about so many areas of neuroscience. The biggest steps forward may come from *teams*. Sets of people who learned to genuinely work together; not despite but because they are drastically different in their knowledge and thought styles. If they succeed in aligning their thinking and efforts towards a common goal in neuroscience research, non-linear progress probably becomes much more likely.
In terms of data-collection, it’s worth comparing brain-imaging to genetics or genomics. Several trends in imaging neuroscience today may have been preceded in a similar form already 5-10 years ago in genomic research. There, many data collection collaborations were foundational and helped the research community to see through the noise more clearly. Imaging neuroscience is becoming larger and more international with increasing numbers of labs, so there is greater potential for people to work together. Intense and bidirectional collaboration between drastically different disciplines may be a prerequisite to render some of the ambitious questions actionable that we had the pleasure to discuss today. It also means you need people skills, on top of everything else!
TSM: I want to thank you for the nice chat – and it’s definitely an exciting, interesting era in neuroscience!
By Danka Jandric, Jeanette Mumford & Ilona Lipp
Planning a resting state study and analysing resting state data can feel overwhelming. There seems to be an endless number of options regarding all stages of the experiment. Decisions need to be made about how to acquire data in an optimal way, what preprocessing and noise correction pipelines to employ and how to extract the most meaningful metrics. Many strategies have been published and are available in software packages. However, there seems to be little consensus about what works best and even more importantly, about how to judge whether something “works” or not. The choice of method often depends on the specifics of the data and addressed research question, but can equally often seem arbitrary. To help guide you through this jungle of rs-fMRI, we walk you through all stages of a resting state experiment. We do this by addressing questions that researchers are likely to have… or should have! While we do not provide definite answers to these questions, we try to point out the most important considerations, outline some of the available methods, and offer some valuable video resources from recent OHBM education courses, to help you make informed decisions.
What do I need to consider when planning my experiment?
Running a rs-fMRI experiment seems easy enough. Technically, all you need is to put your participant in the scanner, tell them to rest and run a standard BOLD sequence. However, it may be worth thinking about your analysis strategy beforehand, so that once you start analysing your data you do not suddenly wish you had…
How do I know my data quality is good?
One of the most common questions asked when evaluating data is how to tell if the data are “good” or not. The answer to this question, regardless of the data, is to actually look at your data. Although this task is somewhat easy with behavioral data, when faced with hundreds of thousands of time series for a single subject, it is less clear how we can do this. Luckily Jonathan Power has not only developed tools we can use with our own data but also takes us through data inspection in his educational talk from 2017, “How to assess fMRI noise and data quality”.
How do I improve my data quality?
fMRI data are noisy and this is not going to change any time soon, so we have to deal with it somehow. Acceptance and hoping for the best is a strategy, but could lead to problems further on in your analysis. If there is a lot of noise compared to the signal of interest, then individual subject’s resting state networks will not look clean, and the power in detecting group-level effects may be low, so you might not find anything interesting in your group-level analysis. However, as importantly, if there are systematic differences in noise sources between the cohorts you are studying, then seemingly interesting effects can be simply a result of group differences in noise, such as head motion. Having ignored the noise problem, you might end up spending days writing a paper with a game-changing title, being hit by reality when the annoying reviewer then asks you to quantify group differences in your noise. Better to be aware of and account for noise to start with, right? But this is easier said than done…
What causes noise in rs-fMRI data?
Resting state analysis generally deals with correlations in time courses between voxels. If a noise source affects several voxels in similar ways, this can lead to temporal correlations which are independent of neural co-fluctuations. For this reason, the aim of noise correction is to get rid of as much variance in the BOLD signal as possible that is related to noise. To figure out what the best possible noise correction strategy may be, we first have to be aware of what the sources of noise in BOLD time series are.
In his video, Cesar Caballero Gaudes gives a comprehensive overview of the most common sources of noise, such as head motion (from minute 05:11), respiratory and cardiac variation (from minute 05:53), and hardware (from minute 11:11), and their effects on the data. Cesar also gives an overview of some of the denoising strategies that are available to tackle different types of noise.
How can I correct for noise when I have information about the noise sources? The nuisance regression approach:
One denoising approach is to record information about some of the potential noise sources during the scan, such as physiological recordings or head motion parameters. These can then be used to figure out to what extent our BOLD time series can be explained by the noise sources, by including nuisance regressors in a general linear model. Generally, we probably all agree that the more high-quality information we have on what happened during our scan, the better. One may also think that the more nuisance regressors we employ to regress out from our BOLD time series, the better our clean-up… but is that so? In her video, Molly Bright gives us deeper insight into the nuisance regression approach to clean up noise.
In some smart simulation analyses (from minute 12:30), Molly shows that simply adding as many nuisance regressors as possible may not be the best strategy, as we may accidentally remove a lot of signal. Also, we need to be careful about time-lagging our regressors in order to account for the delay between a physiological change and the BOLD response. Molly explains why trying to identify that delay using the rs-fMRI data can be tricky, and why adding a breath-hold at the end of your acquisition may be a good idea (from minute 20:16).
Molly also demonstrates that very commonly applied preprocessing steps, such as bandpass filtering, can have effects on our data that we might not have predicted (from minute 16:30). While introducing a few strategies to make the nuisance regression approach for noise corrections more valid – such as prewhitening (from minute 12:00) - she stresses the fact that there is not one optimal strategy and that it is very difficult to tell whether noise removal “has worked”. The take-home message here is probably that as a field, we need to work towards a better understanding of the BOLD profiles of different noise sources. Additionally, integrated strategies are needed to deal with the complicated interplay between different noise sources, such as between head motion and physiological noise.
How can I correct for noise when I do not have information about the noise sources? The ICA approach:
While the success of nuisance regression depends on having good quality nuisance regressors in the first place, data-driven approaches are available that can be applied to any dataset, the most common strategy being independent component analysis (ICA). ICA for noise removal is based on the separation of the BOLD time courses into spatial components, and classifying each component into signal vs. noise. This is typically done on a subject-by-subject basis. The time courses of the noise components can then be regressed out or accounted for during further analyses.
Ludovica Griffanti gives a comprehensive introduction to ICA for noise removal and highlights the difficulty that often lies in the signal vs noise classification that is performed by “experts”. Whilst semi-automated and automated approaches are under development in order to make this classification more objective, Ludovica makes the strong point that ultimately these algorithms or at least their validation are based on “gold-standard” manually labelled data. While there is no clear consensus yet on what signal and noise components look like, Ludovica provides us with some guidance and rules that can help with classification and are a first step towards this consensus.
How can multi echo data help with noise correction?
The vast majority of BOLD data has been acquired with a single echo time, optimised to the average T2 across grey matter. However, if you have not started your experiment, you might want to acquire data with several echo times. Prantik Kundu explains why: BOLD and non-BOLD related signal have different sensitivity to echo time, so having information about the actual decay can help distinguish signal of interest from noise (from minute 05:10).
Prantik provides a few beautiful examples on how multi-echo fMRI data can be combined with ICA-based approaches for noise clean-up, calculating parameters that objectively inform about how similar the components’ behaviour is to BOLD vs non-BOLD related signal (from 11:43). In the grand scheme of things, the multiple echo times used are still quite short, so acquiring this extra information would not necessarily increase your total acquisition time. On a side note, even data from one additional short echo time can provide information about some noise sources, as described in a study by Bright and Murphy (2013). Be aware that certain noise sources, such as slow physiological changes yield ‘BOLD-like’ noise (which we can treat as noise or as signal of interest, depending on our perspective), as they interact with the cerebrovascular system. Multi-echo data does not help with correcting for this type of noise.
Why go through all that pain? Can I not just do a simple global signal regression for noise correction?
A cheap and easy (and still very widely used) way for performing ‘noise correction’ is global signal regression. Here, the average signal across the whole brain (or all gray matter voxels or all cortical voxels) is calculated and regressed out from each voxel time series, with the underlying assumption that the global signal mostly reflects combined noise from various sources. The advantage of this approach is that it is able to remove artifacts that are hard to get rid of with other noise correction methods. However, global signal regression is highly controversial in the field, with the main points of criticism being that the global signal has neuronal contributions and that global signal regression shifts the correlation coefficients and induces negative functional connectivity. In her video, Molly Bright briefly touches on this (from minute 24:43), and refers to a recent 'consensus paper'. An alternative to regressing out the global signal are using the signal from white matter or CSF, as briefly described in Cesar’s video (from minute 20:00). If you are interested also see his recent paper.
How should rs-fMRI data be preprocessed?
Resting state fMRI data can largely be preprocessed in the same way as data from a task-based fMRI acquisition (for a refresher on steps we recommend the slides from the educational course from OHBM 2016). As Molly pointed out, some of the “standard” preprocessing steps, such as bandpass filtering, can have unexpected effects on rs-fMRI data. As rs-fMRI data does not have strong task-driven signal changes, it is generally more susceptible to noise and probably to anything we do to the data, so be wary of that.
As described above, there are strategies for tackling noise, such as physiological artifacts, in the preprocessing pipeline. Some good pointers, including Cesar Caballero Gaudes’s video on denoising, have been outlined in the previous section. In addition, in 2016 Rasmus Birn, an expert on the influence of physiological noise on the BOLD signal, gave a thorough overview of physiological noise and approaches to remove it.
How can I analyse the data to find meaningful resting state networks?
Once your data is preprocessed, denoised and you are confident that it is in good shape, you will want to get on with the exciting part – identifying resting state networks. When done properly, resting state data can show us large-scale networks in the ‘brain at rest.’ What defines them are the correlated temporal patterns across spatially independent regions. Each network has a distinct time course from other resting state networks, but one which is consistent across its regions.
The aim of rs-fMRI analyses approaches is to use the time courses of brain regions to decompose the brain into resting state networks. Several techniques exist, with the two most common being seed-based correlation analysis (SCA) and independent component analysis (ICA).
In his video, Carl Hacker gives a nice overview of both SCA and ICA. He introduces the two methods (from minute 1:12) and identifies the main differences between the approaches (from minute 4:15). Carl also discusses how to identify RSNs from seed-based mapping (from minute 6:25), and how the brain can be parcellated using ICA (from minute 13:35). While SCA uses the time series of an a priori selected seed region in order to identify whole brain functional connectivity maps of that region, ICA decomposes data from the whole brain into the time courses and spatial maps of the resting state signals, called independent components (ICs). SCA is a useful method to answer questions about the functional connectivity of one specific region. However, the drawback is that it only informs about connectivity of this region. On the other hand, the numerous ICs that you get from ICA are defined as a collection of regions which have maximal spatial independence but co-varying time courses, thus showing networks across the whole brain that have synchronous BOLD fluctuations when the brain is not performing a task.
In healthy subjects, SCA and ICA have been shown to produce moderately corresponding functional connectivity information, and the choice between them is likely to be guided by the specific research question. Note that the focus of Carl’s video is parcellation of the brain. However, many concepts and principles also apply to other types of analyses. Read more about these two methods in Cole et al. (2010) and Smith et al. (2013).
How do I interpret ICA components?
If you have run ICAs on your resting state data, your next task will be to interpret the output. The output consists of a number of spatial maps showing regions with spatial independence but co-varying time courses, called independent components (ICs). How many ICs you get depends on the parameters you set when you run the ICA, but it is typically a few dozen.
The first step when interpreting the ICs is to determine whether they are signal or noise. Because ICA is data-driven, it does not ‘filter out’ noise, but it can separate neural signal from non-neural signal, i.e. noise, so it is important to classify the components correctly as either signal or noise.
So how do I distinguish between signal and noise in extracted ICs?
In her video, Ludovica Griffanti discusses how RSNs and noise can be distinguished. She provides an overview of component classification approaches, including manual and automatic classification approaches (from minute 2:58). Importantly, Ludovica describes the characteristics of signal and noise components and gives examples of both (from minute 5:20). Ludovica’s key message is that the aim of classification is to retain as much signal as possible, so if you are unsure if a component is signal or noise, keep it in as signal. She also makes the point (from minute 19:00) that a number of factors relating to participants, MR acquisition and preprocessing affect IC characteristics and discusses these briefly. The classification approach discussed in Ludovica’s video is very similar when classifying ICA outputs from single-subject data and group level ICA, but there are differences. For an outline of these and for a more thorough discussion of manual classification of ICA components, please see Griffanti et al. (2017).
How do I identify RSNs from ICs classified as signal?
There are a few approaches to determining what networks signal components correspond to. Some ICA toolboxes will have spatial templates that can be compared to the ICs. But perhaps the most common approach is manual labelling based on known anatomy. The spatial patterns and time courses of many common resting state networks (RSNs) have been described. (e.g. for labelling RSNs from group-level data see Beckman et al. (2005) and De Luca et al. (2006)).
A further option for IC classification is the use of automated RSN classification techniques. In his video, Abraham Snyder gives an overview of how machine learning can be used to classify RSNs based on pattern recognition (minutes 28:50-33:00).
What is this thing called dual regression?
ICA is typically done with group data and produces spatial maps that reflect the group average functional connectivity. However, the individual variability of IC topography is often useful, for example to make comparisons between groups of individuals. A process called back-reconstruction is therefore used to obtain the individuals’ time courses for the ICs obtained from the group-level ICA, which are then correlated with each voxel to obtain subject-specific spatial maps. Dual regression is one available back-reconstruction method. In his video, Carl Hecker gives a brief overview of how it works (from minute 19:38).
If you are interested, Erhardt et al. (2011), describe the principles of several back-reconstruction methods, including dual regression.
What metrics can I extract from the rs-fMRI analyses?
Local activity metrics:
Even before running a network analysis on the rsfMRI data, such as SCA or ICA (see above), two useful metrics can be derived from the data, ALFF and ReHo.
Amplitude of Low Frequency Fluctuations (ALFF) measures the magnitude of low frequency oscillations (0.01-0.1 Hz) in the BOLD signal in neural regions. The fractional ALFF, a complementary metric, is a measure of the contribution that a specific low frequency oscillation makes to the whole frequency range recorded. Both metrics give a measure of the amplitudes of brain activity in specific regions. However, the interpretation of these measures is difficult. Fractional ALFF has been shown to be dependent on the vascularisation of the brain, similar to the resting-state fluctuation amplitude (RSFA), which is a very similar measure to the ALFF, and available from any rs-fMRI scan, but has often been interpreted differently. Physiological mechanisms, including vascular effects, in rs-fMRI are still not fully understood and the exact interpretation of measures linked to cerebrovascular characteristics is therefore more difficult.
The other common rs-fMRI metric is that of regional homogeneity, or ReHo. ReHo is a voxel-based measure of regional brain activity, based on the similarity of the time-series of a given voxel and its nearest neighbours. It quantifies the homogeneity of adjacent regions, to provide information about the coherence of neural activity of a specific spatial region.
Thus, both ALFF and ReHo give information about regional neural activity and have been shown to have high values in, for example, the default mode network regions during rest, indicating that they can point to the regions that play central roles in resting state networks. Because they provide information about regional neural activity at rest, both ALFF and ReHo can be used to determine an ROI for SCA.
Functional network metrics:
However, ALFF and ReHo are metrics of local neural activity, and are thus limited in their ability to provide information about large resting state networks. Network analyses therefore tend to focus on functional connectivity measures.
SCA and ICA, discussed above, both offer measures of functional connectivity within the brain. Both calculate the correlation of time series between voxels in the brain to produce spatial maps of Z-scores for each voxel. These scores reflect how well the time series of each voxel is correlated with the time series of other voxels and are a measure of functional connectivity. In SCA, the Z-scores reflect the correlation of each voxel with the average time course of the seed voxel, while in ICA the Z-scores reflect the correlation of each voxel with the average time series of the respective IC. Dual regression can be run with both SCA and ICA to enable the investigation of individual and group level differences of functional connectivity.
A good overview of the metrics described above is provided in Lv et al. (2018).
A more recent metric derived from rs-fMRI data is that of functional homotopy. Functional homotopy shows the synchrony of spontaneous neural activity between geometrically corresponding, i.e. homotopic, regions in the two hemispheres. It provides a measure of connectivity between corresponding interhemispheric regions, and can be used to determine regional versus hemispheric information processing.
Chao-Gan Yan asks whether these different measures of resting state functional connectivity show unique variance, and discusses the concordance among some of these metrics and also global connectivity (a graph theory measure, please see the next section), by drawing on work from his research group.
It is important to remember that most measures of resting state functional connectivity are based on correlational analyses and thus do not tell us anything about how regions of the brain influence the activity of other regions. It is possible to model the relationships between observed patterns of functional connectivity to be able to draw inferences about such neural influences, in an approach called effective connectivity, which is determined with Dynamic Causal Modelling. In his video, Karl Friston describes how we can use effective connectivity to infer causality from observed connectivity (minutes 0:57 to 23:07).
How can graph theory be applied to resting state data?
More advanced metrics can be derived from rs-fMRI data using graph theoretical analysis approaches. Graph theory is a mathematical method for mapping all the brain’s connections by depicting them as a graph consisting of nodes and edges. When graph theory is applied to rs-fMRI data, the nodes are often large-scale brain regions, and the edges represent the functional connectivity between them. The great advantage of graph theory over other measures of functional connectivity is that it offers a way to quantify the properties of large, complex networks.
Alex Fornito gives an excellent introduction to graph theory in his video. He discusses the rationale for using graph theory (minutes 0:55 to 3:39), before going on to give a history of graph theory (minutes 3:39 - 11:54). Then, Alex describes how network models can be created and shown as graphs (minutes 11:54 to 16:53), with a focus on defining nodes and edges. He describes how edges can be defined using fMRI data, including the potential problem of relying on the time series correlations that underpin functional connectivity (minutes 19:16 - 24:43). Finally, the construction of the graph is described (minutes 24:43 - 28:55).
Alex Fornito discusses several approaches to defining the nodes of a network. One of these is parcellation of the brain. The brain can be parcellated from rs-fMRI data through either SCA or ICA, as described by Carl Hacker.
Once a functional connectivity matrix has been created, either from brain parcellation or the components obtained from ICA, there are two options for deriving metrics. The first is to simply compare the functional connectivity matrices between two or more groups of participants. This approach can provide useful information about how the variable of interest, such as a disease, affects the connectivity between or within resting state networks, and has been used to characterise functional connectivity in diseases such as schizophrenia and autism. The other option is to create a graph from the functional connectivity matrix and study it with graph theory.
However, because functional connectivity matrices show correlations between the time series of defined brain regions, either approach is potentially susceptible to spurious or weak connections, for instance due to noise. One way to address this is to apply a threshold that removes the connections that fall below that threshold. Andrew Zalesky gives an introduction to network thresholding and an overview of how it is performed between 0:00 and 16:40 minutes of his video. He also provides an overview of the type of measures that can be extracted from brain graphs, with a focus on comparisons of edge strength (minutes 16:40 to 19:36).
Some regions of the brain are more strongly connected with others, and tend to be considered network hubs. Metrics related to network hubs are among the most commonly used in graph theoretical analysis. Martijn van den Heuvel discusses network hubs and the metrics associated with them (from about 1:30 minutes).
An extensive list of graph theory metrics and what they tell us about neural networks can be found in Rubinov and Sporns (2010).
For those interested, there is a small collection of videos on graph theory from last year’s presentations at the OHBM conference, including those discussed in this post.
What do the resting state networks actually show?
How do you interpret findings from your resting state analysis? Well, first, it is important to consider the biological function of the correlated temporal patterns. Unfortunately, it is not as simple as defining it as ‘activity during rest.’ RSNs are collections of brain regions that have synchronous BOLD fluctuations, but the source of the signal has not been unequivocally established. While there is strong evidence to suggest that the signal is neural, there is still ongoing debate about the extent to which it may be influenced by non-neuronal noise, such as respiratory and cardiac oscillations. However, the fact that rs-fMRI analysis results have been reproduced even when applying conservative physiological corrections across both individual subjects and groups points to a largely neural basis of the rs-fMRI signal.
So what does the functional connectivity mean? In purely methodological terms it is the statistical correlation of two time series. It has been suggested that such correlations have arisen as a result of neural populations that are active together to perform a task and have therefore ‘wired’ together. The rs-fMRI signal reflects their spontaneous neural activity in the absence of a specific task. There may be direct anatomic connections between networks derived from rs-fMRI analyses, or another joint source of the signal. This is currently not well understood, and rs-fMRI findings should be interpreted with caution.
A short, but good, outline of the origin of the rsfMRI signal is provided in van den Heuvel et al. (2010).