OHBM - What is it that interests you about neuroimaging?
William Seeley (WS) - Neuroimaging has the potential to address three key issues in neurodegeneration research. First, brain imaging can tell us when and where neurodegeneration begins in living patients. This critical information provides the “treasure maps” we can use to guide our search for the cellular-molecular mechanisms of disease within the right neuroanatomical context. Second, functional imaging can help us understand changes in network physiology underlying patient symptoms. Finally, the dawn of “connectomic” imaging has allowed us to test competing models of network-based disease progression.
OHBM - What difficulties have you faced balancing a research and clinical career? What benefits has it brought?
WS - My clinical life frames and motivates everything we do in research, and I continue to be impressed by how much clinical science can teach us about healthy brain organization. For me, the major challenges relate to time and not having enough of it to do everything I would like to do in my career. Overwhelmingly, though, my life as a clinician has added great meaning to my life as a researcher.
OHBM - What draws you to OHBM and how does it differ from other similar, large conferences?
WS - I’ve been coming to OHBM for about 10 years and I keep coming back because of the enthusiasm of the membership for this field. It feels like a cohesive group of people - they each bring a different perspective and a different set of tools. It’s also a good place to learn about those tools – the very front line of methodological advances is reported here first. That always makes it exciting.
OHBM – In your keynote talk you laid out a number of different variants of the dementias, then focused on frontotemporal dementia. The dementias vary based on both behavior and symptoms. Could networks and the connectivity between networks aid differential diagnoses? Do you envision scanning patients in order to distinguish between different types of dementias?
WS - Neuroimaging of brain structure and function can help us refine our assessment of a patient’s clinical syndrome. In my talk you heard me discuss syndromic diagnoses and pathological diagnoses as distinct and separate concepts. Structural and functional imaging can help with syndrome refinement, but when it comes to the underlying neuropathological cause of that syndrome, I think those strategies are going to fall just a bit short. Take the example of behavioural variant frontotemporal dementia (bvFTD), it has 15 different neuropathological causes – I doubt we could use neuroimaging alone to decide which of those 15 underlying histopathologies is the actual cause of a patient’s bvFTD. It’s more likely that we’ll need a molecular technique, whether that’s biomarker analysis from spinal fluid or molecular imaging using PET scanning, to decide which of those various underlying histopathologies is the cause. Alternatively we’ll use some kind of a merger, where the structural and functional imaging refines the syndrome to the point where the differential diagnosis gets shorter. Then we use molecular imaging to nail the final diagnosis.
OHBM - Some of your recent research centres on selective vulnerability. Can you tell us what this is, and why it might be relevant to many neurological conditions?
WS - All neurological diseases are selective in some way. In neurodegenerative disease, we can see that progression occurs in a selective manner that is governed by network connections. Where (in which cell type), how (in what manner), and most importantly why a disease begins where it does remains far more mysterious, but may be a key to developing early-stage treatment or prevention.
OHBM – What would it take to get to the point where we have screening for different vulnerabilities? Would that goal be a priority in the absence of effective neuroprotective recommendations – or are effective recommendations available?
WS – That’s already the reality in Alzheimer’s dementia. Alzheimer’s is a common disease. You can screen a healthy older population for amyloid-beta deposition using molecular imaging and then triage patients for experimental treatment trials based on that result. To imagine doing that for some of the less common dementias, such as frontotemporal dementia, is a little more daunting because of the lower population prevalence. FTD has a population prevalence of about 1 in 5000 in those aged over 45, so it would have to be either a very inexpensive test or a very powerful therapy to justify that kind of screening program.
OHBM – It’s been proposed that the salience network may switch activity between the default mode and the central executive networks – and impaired switching has been hypothesized to play a role in a number of psychiatric disorders. Do you see that playing a role in the frontotemporal dementias?
WS – I don’t think we really know the answer, as I don’t think there’s been a study yet that went straight after the switching concept. From a phenomenological standpoint the patients are pretty poor switchers. Sometimes they get stuck in ruts and perseverate on the same behavioral response over and over. Other times they fail to switch in other ways when switching would be helpful. Sometimes they switch too much, where they’re distractedly moving from task to task as opposed to finishing. I do think that behavioral switching is a deficit – whether that correlates with network switching is an open question and I think it’ll be an important one to address at some stage.
OHBM - What do you see as the next major goals of neuroimaging in dementia research?
WS - Neuroimaging can play a critical role by providing short-term interval biomarkers of disease progress for use in early-stage drug development. To accomplish this goal may require that we develop better models to predict progression, and then use those models as a way of assessing whether a drug has had a meaningful impact.
OHBM: Thank you Prof Seeley!
Prof Seeley's keynote talk on 'Network-based neurodegeneration' will soon be available to view on the OHBM OnDemand portal. Keep checking for this and other great talks from OHBM 2016.
Thanks to Sarabeth Fox for video recording.
BY: JEANETTE MUMFORD, CYRIL PERNET, THOMAS YEO, LISA NICKERSON, NILS MUHLERT, NIKOLA STIKOV, RANDY GOLLUB, & OHBM COMMUNICATIONS COMMITTEE (IN CONSULTATION WITH THOMAS NICHOLS)
In recent weeks a lot of attention has been given to the paper “Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates”, by Eklund, Nichols and Knutsson, in the Proceedings of the National Academy of Sciences. This work highlights an important concern; however, some of the media attention has been based on a misunderstanding and an ‘inflated’ interpretation of the results. Specifically, too much weight has been given to the numbers “40,000 impacted studies” and “70% false positives”, an unfortunate side effect of reducing a study rich in information to a few soundbites. We respect the views of this paper and the effort put forth by the authors who, like the leadership of OHBM, understand there is a growing concern for validity and reproducibility in our field. The purpose of this post is to put these numbers in context and clarify how these findings impact our view of past and future fMRI results.
In task-based fMRI studies we are often interested in looking for systematic differences between experimental conditions or cognitive states across upwards of 100,000 voxels in the brain. It is widely known that this large number of statistical tests, typically one per voxel, requires correction for multiplicity. The most common approaches focus on control of the family-wise error (FWE), which is the probability that a given study will produce any false positives. The most common approaches for FWE control are voxel-wise and cluster-wise thresholding. Voxel-wise thresholding draws conclusions about specific voxels and cluster-wise thresholding allows one to conclude whether a group (or cluster) of adjacent voxels show an effect based on a feature, most often its size (e.g. only groups of voxels bigger than size N are significant). Eklund et al. consider both voxel-wise and cluster-wise FWE control in an exercise that tests whether the thresholding methods and their implementation by various software packages control the FWE as advertised. The innovation in this work is that they used resting-state fMRI data rather than computer generated simulation data to estimate noise (see below for more on this); they analyzed this resting-state data as if it were actually task fMRI data.
Eklund et al. find voxel-wise results are always correct, i.e. control FWE below a requested 5% level, and are thus safe; we won't discuss these further. They also find that, depending on the exact methods and tools used, cluster-wise results can be invalid, i.e. have FWE in excess of the traditionally accepted 5% level. Understanding the specifics of when these methods are invalid is the focus of the article.
Figure 1. Cartoon example of how cluster-based thresholding works. The orange line represents the uncorrected, voxelwise p-values over a row of voxels (space). First, the cluster-defining threshold, CDT, is used to define the clusters, which are indicated by the boxes on the x-axis. Second, using the cluster size as the statistic, a threshold of cluster size k is used to assess the two clusters, concluding only the red cluster is large enough to be significant.
A cartoon example of the cluster-wise based strategy is illustrated in Figure 1. First, a primary threshold is required to define clusters (in Eklund et al. this is called a cluster-defining threshold, CDT). The CDT is typically based on the uncorrected voxelwise p-values. SPM and FSL use random field theory to obtain FWE-corrected p-values, which requires an estimate of the spatial smoothness of the image that is being thresholded, typically a map of t-statistics that quantifies the effect size at each voxel. AFNI uses a simulation-based procedure that also relies on a smoothness estimate. In contrast, another choice is to use a permutation approach, which is based on randomly permuting data labels to generate a null distribution for cluster size that is used to compute a p-value. The approaches in the 3 widely used fMRI data analysis packages, SPM, FSL and AFNI, are variations of parametric methods, and are based on specific assumptions about the data, while the permutation method is nonparametric and requires minimal assumptions.
What is unique about this work?
This paper is an example of a simulation study, an evaluation of a method based on ‘made up’ data. The reason simulations are used is because quantifying FWE can only be done if the ground truth is known. Specifically, we must ensure there is no signal in the data. A simulation is most useful when the simulated data reflect what we would find in real data as closely as possible. This has been a limitation of previous studies, which generated synthetic data with software and used this synthetic data to test the performance of the analysis algorithms (Friston et al. (1994) and Hayasaka and Nichols (2003) are examples). This work uses a large pool of real human resting state fMRI data as a source of null data, or data that do not contain any task-related signal. Fitting a model of a task to the data should not find any activation. The advantage of using actual fMRI data is that the spatial and temporal structure of the noise is real, in contrast to previous simulation studies that used computer-generated null data. In the simulations in Eklund et al., random samples of subjects from the resting state data set are taken, and these samples are analyzed with a fake task design. The subject-specific task activation estimates are then entered into either a 1-sample test (to test the hypothesis that there is an effect of this task in this group) or a 2-sample test between two groups of subjects (to test the hypothesis that the effect of the task differs between the groups). Each result is assessed in the usual way, looking for FWE-corrected p-values that fall below p=0.05, and the occurrence of significant clusters (cluster-wise approach) is recorded. The authors repeat this a total of 1000 times and the FWE is computed as the number of simulated studies with any false positives divided by 1000. In theory, using p=0.05 should result in a FWE of 5%.
Brief Summary: Four study designs, two blocked and two event related, were studied across multiple degrees of spatial smoothing, different cluster-forming thresholds and different software packages. Specifically SPM, FLAME 1 from FSL, OLS from FSL, 3dttest from AFNI, 3dMEMA from AFNI, and a permutation-based approach implemented using the BROCCOLI software were studied. The main result, highlighted in the first figure of the paper, shows that when using a parametric approach, a cluster defining threshold of p=0.01 leads to poor control of FWE (FWE from approximately 4-50%). However, FWE control is improved when a cluster defining threshold of p=0.001 is used instead, regardless of software package used (FWE ranges from approximately 0-25%). The more conservative nonparametric approach controls FWE regardless of cluster defining threshold in most cases, although elevated FWE were observed for the one-sample t-test in some cases due to skewed data. The second result, which is the source of the 70% FWE that has appeared in many other blog posts, occurs when simply using a cluster size of 10 as an ad-hoc inference procedure. In this case, a cluster defining threshold of p=0.001 was used and clusters with 10 or greater voxels are identified as significant. The high FWE of this approach indicates that it should not be thought of as controlling FWE. More details and the explanation of why FLAME1 appears conservative in both of these results are in the next section. The general conclusion is that when using cluster-based thresholding, a cluster-defining threshold of p=0.001 has better control of FWE than p=0.01 for SPM, FSL and AFNI. The nonparametric-based approach has better controlled FWE in the scenarios tested here.
AFNI problem identified. The results presented in this manuscript include the use of a pre May 2015 version of AFNI, specifically the 3dClustSim function used to implement the parametric FWE control. One of the discoveries made during this project was the smoothness estimate used in this older version of 3dClustSim had a flaw that increased the FWE. This was fixed by the AFNI developers in versions after May 2015. Although the new version reduces FWE, it is still inflated above the target of 5%; the p=0.01 and p=0.001 cluster defining thresholds’ FWE with 3dClustSim changed from 31.0% to 27.1% and 11.5% to 8.6%, respectively.
Is FLAME1 superior? Some results appear to support the claim that FLAME1 option in FSL has better FWE control, even in the ad-hoc case, but this is due to a known problem where FLAME1 sometimes overestimates the variance. To clarify, FLAME1 differentially weights the contribution of each subject according to the subject-specific mixed effects variance, which is a sum of within- and between-subject variances. The result is that more variable subjects contribute less to the statistic estimate. In comparison, the OLS option in FSL treats all subjects equally (also true for SPM, AFNI’s 3dttest and permutation tests). When the true between-subject variance is small, FLAME1 overestimates it, causing an increase in p-values, which reduces the FWE. When the true between subject variance is not close to 0, FLAME1 results in a more accurate estimate of the variance but the FWE can then be inflated with results similar to FSL’s OLS. The resting state data have a low true between-subject variance, leading to lower FWE than we might see with task data where systematic differences in task performance might indeed yield the predicted large between-subject differences. This is supported by a secondary simulation using task fMRI data with randomly assigned groups that found FLAME1 to have error rates comparable to FSL’s OLS. Overall, this implies that the FWE will be controlled if the true between-subject variance is small and will be elevated similarly to OLS if the variance is larger than 0.
Why do parametric methods fail? The assumptions of random field theory include that the spatial smoothness must be constant across the brain and the spatial autocorrelation follows a squared exponential distribution. The spatial autocorrelation distribution was not found to follow the squared exponential very well, instead the accuracy of the distribution varied according to distance. Simply put, if voxels were close together, there was a stronger agreement between the empirical and theoretical spatial correlation, but the two do not match for voxels that are far apart from each other. This explains why results improve for more stringent cluster forming thresholds, since clusters are smaller, hence the voxels involved are closer and the assumptions are more closely met.
Should we all panic and give up on fMRI? Are all 40,000 fMRI studies of the past worthless? Of course not.
The blog post by Tom Nichols refines this estimate to a more reasonable number of studies from the past that may be impacted: closer to 3,500. (Note: PNAS has accepted an Erratum from the authors that revises the sentences that led to the sensationalized press articles.) The study shows that (a) FWE control does not work properly in the parametric tests using an ad hoc threshold of 10 voxels; (b) FWE is often controlled by permutation-based testing; (c) cluster inference for SPM, FSL, and AFNI using a cluster-defining threshold of 0.01 is likely problematic; (d) although improvements would be expected if a cluster forming threshold of 0.001 was used, FWE is still not controlled at the nominal level of 5% under all conditions.
How shall we proceed to analyze fMRI data?
Both parametric and nonparametric-based inference have pros and cons and work well when their assumptions are met. Prior work has highlighted the assumptions of the parametric cluster-based thresholding approach, including using a small p-value based cluster defining threshold (see Friston et al. (1994) and Hayasaka and Nichols (2003) for examples). Although it was clear the threshold needed to be low, without knowing the true spatial covariance structure, it wasn’t necessarily clear how low for real fMRI data. Since the Eklund et al. work used real fMRI data in the simulations we now know that p=0.01 is not low enough and p=0.001 is a better option. Generally, the permutation test has fewer assumptions and tends to have better FWE control, but Eklund et al. did find some cases with the 1-sample t-test where the nonparametric approach had elevated FWE, due to skew in the data. Permutation-based options can be implemented on any NIfTI file using SnPM in SPM, randomise in FSL, PALM (also affiliated with FSL), Eklund’s BROCCOLI package and mri_glmfit-sim in FreeSurfer.
Importantly, AFNI users should update their versions to ensure use of either the repaired 3dClustSim (after May 2015), or the new 3dFWHMx function which uses a more accurate spatial smoothness estimate and will improve FWER control. Also, using the ad hoc cluster size of 10 voxels has the largest FWE and is not recommended as a method for controlling FWE.
The work of Eklund et al. supplies important information to those who choose to control the multiple comparison problem according to the FWE. In future work, researchers intending to use FWE correction can make better choices to ensure the true level of FWE is closer to the goal FWE. Although some previously published studies may have not used as stringent FWE control when they had intended to, the results can still be interpreted, but with more caution. Multiple comparison correction is just one element of neuroimaging practice, and there are countless choices in the design, acquisition, analysis and interpretation of any study. We encourage everyone to consult the OHBM Committee on Best Practice in Data Analysis and Sharing (COBIDAS) report on MRI, and review the detailed checklists for every stage of a study. The report is available directly on the OHBM website http://www.humanbrainmapping.org/COBIDASreport and on bioRxiv.
BY CYRIL PERNET
During the annual OHBM meeting in Geneva I had fun making word clouds from the twitter feed of the hashtag #OHBM2016. Attendants could see the word clouds in between every presentation, and I think it made the welcome screen look pretty cool (you can find them on the @OHBM_members channel and on the OHBM facebook page). In case you thought some information was missing, that is simply because it was either not that frequently discussed on Twitter or http://www.wordclouds.com/ did not show it (not all words appear depending on design and size). There was no censoring, and you can blame me if something was not to your liking.
The exciting stuff
The most discussed lectures were those of Tim Behrens and Fernando Lopes da Silva, closely followed by the talk from Gael Varoquaux. The main topics that engaged attendees were connectivity analyses, machine learning, power analyses, BIDS and yes, Brexit.
Thomas Yeo aptly summarized the current state of connectivity analyses on the OHBM blog so no need to talk more about it.
Machine learning is increasingly used in neuroimaging applications these days and Gael Varoquaux had lots of comments about his talk: 'Cross-validation to assess decoder performance: the good, the bad, and the ugly'. In this talk he shows that leave one out strategies are biased and that N-folds cross validation, providing we keep the data structure, works much better.
I am really glad that use of power analyses is now at the forefront of neuroimagers’ discussions. During Sunday’s reproducibility workshop I discussed and presented two of the main tools used to carry out power analyses on full maps: fMRIpower and neuropower. These tools were then presented by Jeanette Mumford (creator of fMRIpower) and Joke Durnez (creator of neuropower) in the Open Science SIG room.
Another favorite topic of mine: data sharing. BIDS, or to give it its full name - Brain Imaging Data Structure - is driven by Chris Gorgolewski, and describes how to structure and store your data in an easily shareable way. It provides advice on how to name files and how to create simple metadata text files (tsv and json). Using BIDS doesn’t require programming knowledge, and does substantially improve data sharing, by allowing machines to read data easily.
A paper from Anders Eklund et al. about failure to control the family-wise error rate (FWER) using cluster size was recently published in PNAS and elicited many comments, not just online but also from the floor. The paper suggests that cluster size correction may significantly inflate false-positives, addresses the extremely important issue of controlling FWER, and is a must read along with the comment from Glass Brain awardee Karl Friston and Guillaume Flandin.
From our community, gender imbalance and diversity was frequently discussed and added to the #GenderAvenger hashtag. It was often commented that committee members and awardees were predominantly white males from wealthy countries. The Organization is well aware of this and has actively sought to reflect the geographic diversity of our membership as well as to balance the number of male and female session and keynote speakers. Council takes this feedback, of the need to do more, seriously and is actively at work to further address these issues, and push for all aspects of OHBM to become as diverse as the members it represents.
Top 5 twitter users
During the conference, the OHBM and OHBM_SciNews accounts retweeted posts from or with twitter users mentioned. Thanks to the top 5: @kirstie_j, @NKriegeskorte, @pierre_vanmedge, @ChrisFiloG, @ten_photos.
Note: A version of this post previously appeared on Cyril Pernet's personal blog: