Boosting geoscience data sharing in China
Data flow and information sharing are important for developing a borderless scientific community. The open sharing of scientific data fully based on findability, accessibility, interoperability and reusability (FAIR) principles is very important to enhance the utilization of data resources.
On August 5, Prof. LI Xin, director of the National Tibetan Plateau/Third Pole Environment Data Center, the Institute of Tibetan Plateau Research, the Chinese Academy of Sciences, Prof. CHENG Guodong, academician of the Northwest Institute of Eco-Environment and Resources, the Chinese Academy of Sciences, as well as researchers from many domestic geoscience data centers and Shanghai Normal University, published comments in Nature Geoscience. They believe that the supporting policies for open data sharing and the incentive mechanism for data contributors are the key to realizing wider geoscience data sharing in China.
Build confidence in data-sharing practices
Prof. LI Xin said that Data flow and information sharing are important for developing a borderless scientific community. However, policy, culture and technical barriers have obstructed the free flow of scientific data.
Data center is an important carrier to promote the open sharing of scientific data. Internationally, developed countries attach great importance to the construction of scientific data center. As early as the 1990s, twelve distributed active archive centres were launched by the National Aeronautics and Space Administration of the United States to store data and information in climate research. The World Data System currently has 86 data centres, 57% of which are in the Earth sciences.
In recent years, as China plays an increasingly important role in tackling these challenges, the country has recently adopted a more proactive policy in data sharing and transparency. For example, in 2018, the Chinese Ministry of Science and Technology released a policy that sets data sharing as a principle for research funded by the government. This policy has now brought online first-generation national data centres, ten of which are focused on the Earth and environmental sciences.
But to what extent will these actions really boost the practice of data sharing in China? A recent survey of more than 2,000 Chinese researchers reveals both opportunities and challenges. The survey showed that while researchers in China are willing to share research data, they are concerned about misuse of data and violation of copyright and licensing. Instead of wider public sharing, private sharing of data with immediate colleagues and collaborators is more common in China. This suggests that a lot of work is still needed to increase the visibility of new data centres and to build confidence in data-sharing practices more broadly among Chinese researchers.
Crediting data contributors
In order to promote the wide sharing of scientific data in China, the National Natural Science Foundation of China has added new requirements to two major research programs in the field of Geosciences, where data sharing was a mandate. The project required all data obtained from the programmes to be deposited in the foundation’s geoscience data centre for public access and data reuse. The data submission and data quality were evaluated during annual, interim and final evaluations of the project. Most importantly, the key mechanism was giving credit to data contributors by clearly acknowledging their contribution through data citations via data DOIs and the associated paper publications. To date, more than 2,500 scientific papers have cited these datasets. The legacy of this programme is mandatory data sharing, credit to data contributors, and respect for intellectual property. "These two major research projects emphasize the compulsion of data sharing, recognition of data contributors and respect for intellectual property rights." Prof. LI Xin said. Recently, another project was launched by the Chinese Academy of Sciences (CAS), called CASEarth. The project aims to build a cyberinfrastructure for data on the Earth, environmental, ecological and biological sciences. By collecting data from CAS institutions, the CASEarth repository has now stored more than 5 PB of data, and its data have been downloaded more than 500,000 times.
Strengthen the incentive mechanism
The success of these pioneering projects suggests that policies that support public data sharing from the top down, and bottom-up incentives that credit data contributors, are key to enabling wider data-sharing practices in China. More specific actions are needed in policy, management and technological aspects to roll out data-sharing mandates more broadly in the big data era in China.
Specifically, in terms of policies, it is important to have a clear definition of sensitive data and specific rules for Geoscience data sharing limitations and restrictions. In terms of management, the evaluation mechanism should be changed to credit the success of a researcher or grant not only on the basis of publications but also on data availability and the quality of the shared data, and data centres should incentivize data contributors by promoting data publication and citation, and track data use by quantifying the impact of each specific dataset with data-reuse metrics. In terms of technology, the role of data centres needs to change from data warehouses to smart information providers, such as, a platform for analysing geospatial information. In terms of internationalization, encourage the publication of metadata and data in both Chinese and English, and actively participate in international certification, so as to enhance the international influence of China's data center. The key to the implementation of these actions lies in the data center. As a hub in the system of data sharing, data centres play a key role in realizing many of the above proposed actions.
The data explosion in the big data era has posed both challenges and opportunities to the global geoscience community.
Prof. LI Xin said that strengthening and standardizing scientific data management is of great significance to further bring into play the effectiveness of national financial inputs and outputs, improve the capacity of scientific and technological innovation and promote economic and social development. " While progress has been made in public data sharing in China, vigorous efforts from government, researchers and data centres are still needed to achieve a paradigm shift. The more we honor the data and the data creators, the more we benefit science and society."
Data centres are the mediators that link policy makers, data contributors, data, and data users in the ecosystem of data sharing. In this system, the purpose of data centres is to provide good management that can turn the loop into reinforced feedback that eventually benefits science and society. (Photo by Institute of Tibetan Plateau Research, Chinese Academy of Sciences)
Relevant paper information: https://doi.org/10.1038/s41561-021-00808-y