From BRAIN 2025: A Scientific Vision
Concurrent with the emergence of integrated optical approaches, it is essential to develop computational approaches for the analysis and management of the enormous data sets the optical techniques will yield (see also section 5). Calcium imaging studies in mice produce ~1 Gbits/sec of data; anatomical datasets will readily grow to the ~10 Petabyte scale and beyond. Sustained efforts will be necessary to develop sophisticated analytic tools for the analysis of these experiments.
Policies and methods for data sharing will also need to be developed to fully exploit the value of these datasets. Wellcurated, public data platforms with common data standards, seamless user accessibility, and central maintenance would make it possible to preserve, compare, and reanalyze valuable data sets that have been collected at great expense. This would be of great benefit to neuroscience, just as the availability of public genomic and protein structure databases have transformed genetics and biochemistry. Analysis tools and user interfaces should be developed that can be run remotely. Creating and maintaining such data platforms would entail a major effort of the community to decide what data and metadata to include, controls on the use of data, and support for users. Valuable lessons and best-practices can be learned from existing public datasets, which include the Allen Brain Atlas, the Mouse Connectome Project, the Open Connectome Project, the CRCNS data sharing project, ModelDB, and the Human Connectome Project, as well as datasets generated by the physics, astronomy, climate science, and technology communities. A first unifying attempt, the Neuroscience Information Framework (NIF) sponsored by NIH, provides a portal to track and coordinate multiple sites, but the myriad genetic, anatomical, physiological, behavioral and computational datasets are difficult to manage because of their heterogeneous nature. The NIH Big Data to Knowledge (BD2K) Initiative offers opportunities to neuroscientists to develop new standards and approaches.
Methods and software as well as data should be shared. Some neural simulators such as Genesis, NEURON and MCell are well‐established, open source and well documented, but the software for many models and simulations in published papers are undocumented or unavailable. The description of a model in a published paper is often insufficient to reproduce the simulations; it is essential that software be made available so that all models in published papers are reproducible. As data sets become larger and as new types of data become available, there is increasing need for public, validated methods for analyzing and presenting these data. As an example, microelectrode recordings often pick up spikes from several neurons that need to be separated into single units—a procedure known as “spike sorting”. A plethora of custom spike‐sorting programs have been created by many individual laboratories. But there are no widely accepted standards for rigorous spike sorting, and it can therefore be difficult to compare data precisely across laboratories. The community would benefit from common standards for spike‐sorting and for other common data analysis procedures.
New data platforms would also encourage changes in the culture of neuroscience to promote increasing sharing of primary data and tools. We heard from many researchers about the value of sharing data, and their desire for stable, easily interconvertible data formats that could accelerate the field. Data and data analysis tools that emerge in the BRAIN Initiative should be freely shared to the extent possible, no later than the date of their first publication and in some cases prior to that date. Some areas of neuroscience, such as human brain imaging (the Human Connectome project; the International Neuroimaging Data‐Sharing Initiative), are already sharing data on a large scale despite the enormous datasets involved. Having said that, extending this model to all fields is a difficult problem, and cannot be solved at one step. Based on the history of data sharing in many fields of biology, the solution will come from the engagement of sophisticated, motivated members of the scientific community from the bottom up, not from a directive from above.