Recommendations for the Handling of Research Data and Research Software
I. Working with Research Data and Research Software
The requirements of good research practice for trustworthy, methodologically correct, thorough, reliable and well-documented, verifiable work also apply to the handling of research data and research software. The essential interdisciplinary elements of reliable and well-documented work are listed below.
For the planning and structured documentation of a research project in which data and/or software is created and/or reused, it is recommended that data management plans (DMP) and software management plans (SMP) be used as tools for project and quality management and be regularly updated. DMPs and SMPs support the sustainable handling of research data and software.
For projects that place special demands on centrally-operated infrastructures, the university’s central institutions may require the submission of a DMP or SMP.
The University of Potsdam also recommends a detailed description of the planned research process and research methods (see Open Methods, Open Science Guidelines[i]). The planned research process, including research questions, research methods, data collection, data processing and analysis can, for example, be published as part of a study pre-registration. This contributes to improved transparency, verifiability and reproducibility.
It is recommended that project heads apply for the resources necessary for data and software management (e.g. personnel costs, required hardware and software, usage and licensing fees, etc.) when preparing funding applications.
- Data Selection. It is recommended that researchers determine at an early stage which data will be published or archived, and which data will continue to be stored locally . It is useful to define retention periods. Data that are no longer required and are neither worth publishing nor archiving should be routinely deleted. In particular, research data that can be linked to a specific or identifiable person are subject to the principle of storage limitation: storage is only permitted for as long as strictly necessary for the respective purpose. The data should be anonymized as soon as the research process permits.[ii] Should anonymization not be possible, pseudonymization is required.
- Software Selection. It is recommended researchers determine at an early stage which research software and/or code will be reused or newly generated and which will be published or archived. Selection for publication and/or archiving can be based on various criteria, e.g. scripts written to perform analyses in research software, scripts for executing workflows, software for implementing a new algorithm or software with ongoing research value.
- Data Ownership Rights. Research data are usually not covered by copyright or related rights, but in certain cases, rights may arise which belong to different individuals. Because of this, ownership of, and usage rights to, research data are often unclear, which can restrict their reuse. It is therefore recommended that projects with multiple participating researchers document legal ownership, the intention to publish the data and the mutual granting of the relevant usage rights at an early stage.
- Software Ownership Rights. Computer programs and software are protected by copyright. Rights to use software can be transferred to others by means of software licenses. Copyright protection remains in place despite licensure. Permissive open source licenses include the MIT (https://mit-license.org/ or Apache licenses (https://www.apache.org/licenses/LICENSE-2.0.html).
- Compliance with Ethical Standards. Researchers must observe ethical standards for research projects. The structured processing and publication of research results should adhere to discipline-specific research ethics guidelines in keeping with good research practice and maintain the protection of privacy and related individual rights. In addition, discipline-specific ethics guidelines and/or the CARE principles (Collective Benefit, Authority to Control, Responsibility and Ethics)[iii] ought to be observed where relevant. In specific cases, the involvement of the university’s Ethics Committee (Ethikkommission) may be appropriate.
- Secure Storage. Suitable storage services or storage media and appropriate backups should be used to guard against data loss. This also applies to generated research software and analysis scripts. The use of the university’s storage services or those storage services belonging to the scholarly community (see Open Infrastructure, Open Science Guidelines) is recommended; the use of local storage media and commercial storage services as a private consumer is discouraged. In many cases, it is valuable to use version-control. The necessary level of data security is to be ensured through suitable technical and organizational measures, e.g. through access restrictions or the pseudonymization of personal data that cannot be anonymized. Researchers must observe the university’s guidelines on information security (Leitlinie zur Informationssicherheit).[iv]
- Documentation and Use of Standards. For the sharing and reusability of research data and research software and the reproducibility of research results, it is necessary to document the context in which they were created and the tools used (cf. Open Methods, Open Science Guidelines).
- Research Data. In the interests of interoperability and long-term usability, it is recommended that free standard data formats be used. In the interests of verifiability, it is recommended researchers define and document the conventions for file naming and folder structures, the use of – where possible, discipline-specific, – metadata standards and capturing relevant metadata from the beginning of the research process.
- Research Software. In the interests of interoperability, research software should be able to exchange data or metadata, e.g. via an application programming interface (API). Documentation and the use of software-specific metadata and versioning are recommended to ensure verifiability.
II. Publication and Citation of Research Data and Software
The rules of good research practice for publishing apply to both data and software publications. Research data and software that form the basis for published findings should be made available in a timely manner and be linked to the text publication.
Research data and software with a high potential for reuse should be published independently of any text publication. To improve verifiability and to acknowledge the work of those responsible for producing data, reused research data and research software should be cited.
Whenever possible, text, data and software publications should be published open access in order to ensure unrestricted and free access to scholarly publications (see Open Access, Open Science Guidelines).
It is not permissible to limit data publication to those data that support the authors’ hypotheses, to split up data and software publications with the goal of increasing the number of publications, or to publish multiple times without disclosing previous publication. When publishing and citing research data and software, the following points should be kept in mind:
- Publication Venue. Established discipline- and data- or software-specific repositories and data centers should be given preference when publishing. Databases may also be used for research data. Infrastructures in which data or research software are stored and can be referenced independently of any text publication should be used.
- Preparation and Availability.
- Preparation and Availability of Research Data. Research data should be made available at a processing stage (raw or processed data) that enables research results to be reproduced and third parties to reuse the data. When preparing data for publication and selecting a publication venue, it is recommended to consistently observe the FAIR data principles, preparing and storing research data in a findable, accessible, interoperable and reusable manner.[v] At their core, the four principles stipulate the following requirements:
- Findability: Data are sufficiently described with relevant metadata and referenced by a unique persistent identifier (e.g., a DOI).
- Accessibility: Data are human and machine readable and are stored in a trusted repository.
- Interoperability: Metadata use a formalized, freely available, widely used, and contextually appropriate vocabulary for knowledge representation.
- Reusability: Data are clearly licensed, contain correct provenance information, and are documented in a transparent manner.[vi]
- Preparation and Availability of Research Software. Self-programmed research software should be made publicly available by publishing the source code. The source code should be findable via a persistent identifier, citable and with detailed documentation.[vii]
When preparing research software for publication and selecting the publication venue, it is recommended to consistently observe the FAIR principles for research software, and to prepare and store software in a findable, accessible, interoperable and reusable way.[viii] At their core, the four principles stipulate the following requirements:- Findability: Research software is findable by humans and machines. It is sufficiently described with relevant metadata, versioned and referenced by a unique persistent identifier (e.g. a DOI).
- Accessibility: Research software is accessible to humans and machines via open, free communication protocols. Even when the research software is no longer accessible, metadata remain available.
- Interoperability: Research software can exchange data or metadata with other software, e.g. via an application programming interface (API).
- Reusability: Research software is both executable and reusable (well documented, modifiable, integrable). It is clearly licensed and contains correct information about its provenance.
- Preparation and Availability of Research Data. Research data should be made available at a processing stage (raw or processed data) that enables research results to be reproduced and third parties to reuse the data. When preparing data for publication and selecting a publication venue, it is recommended to consistently observe the FAIR data principles, preparing and storing research data in a findable, accessible, interoperable and reusable manner.[v] At their core, the four principles stipulate the following requirements:
- Authorship: Anyone who makes a genuine, verifiable contribution to the content of a research data or software publication qualifies as an author. This applies in particular to substantive scholarly contributions to the development, collection, acquisition or provision of data, software, or sources.[ix]
- Free Licensing and Open Access. Research data and software should be made available under established and standardized licenses that are as open as possible. Conditions for access and, where applicable, embargo periods should adhere to the principle of being “as open as possible and as closed as necessary”. The following points provide specific guidance on licensing:
- Licensing and Attribution: The obligation to credit scholarly contributions that are reused is part of good research practice. Licenses and releases for data and software that do not contractually require the attribution of the authors do not relieve scholars of this obligation. The preferred way to encourage data and software citation is not through restrictive licensing but by providing a recommended citation.
- Public Research Data: Creative Commons licenses have become the established licenses for publicly available research. Licenses can only be granted by rights holders. The University of Potsdam recommends Creative Commons licenses for research data.
Examples of recommended Creative Commons licenses are:- CC0 (“Creative Commons Zero 1.0 Universal”): Data are marked as rights free. Data may be copied, modified and redistributed without attribution to the original authors, even for commercial purposes.
- CC BY 4.0 (“Attribution 4.0 International”): The original authors must be named, and the data may be copied, modified and redistributed.
Creative Commons licenses that impose requirements beyond simple attribution do not tend to be recommended, as their stricter copyright and licensing conditions limit reuse. These include CC BY-NC (only for “noncommercial use”), CC BY-SA (“share alike”: redistribution only under the same conditions) or CC BY-ND (“no derivatives”: no modification possible). Further information on Creative Commons licenses is available at https://creativecommons.org/.
- Research Data with Restricted Access: Research data to which only limited access can be granted should be licensed in consultation with the responsible data center, using the center’s typical licenses.
- Research Software: The University of Potsdam recommends licensing software publications with the most permissive license possible. Established free software licenses include the MIT, BSD and GNU General Public licenses.
The website choosealicense.com helps users to choose a suitable software license. Reused software from third parties may restrict the licenses that may be granted.
- Publication Reporting. Those belonging to and affiliated with the University of Potsdam should report their quality assured, citable data and software publications that are recognized by their respective discipline-specific communities to the University Library for inclusion in the university’s bibliography.
- Citation. Where no discipline-specific standards for the citation of data or software exist, and no publisher guidelines are available, the University of Potsdam recommends the following citation style:
- Citation of Research Data. The Data Citation Principles from the FORCE11 Data Citation Synthesis Group can be used as a guide for citing research data.[x]
- Citation of Research Software. The Software Citation Principles from the FORCE11 Software Citation Working Group can be used as a guide for citing software.[xi]
III. Contracts and Collaboration
- When negotiating cooperative, license and funding agreements – especially for private funding –, those belonging to and affiliated with the University of Potsdam should ensure that these agreements adhere to the principles of the University of Potsdam’s Research Data and Research Software Policy as far as possible, especially with regard to the openness and reusability of research data and software. When transferring rights for reuse, publication or commercial use, care should be taken to ensure that data or research software remain freely available for research purposes; in particular, commercial actors should not be granted exclusive rights.
- Cross-institutional research collaborations should align their practices with the University of Potsdam’s Research Data and Research Software Policy, unless the other parties mandate equivalent or stricter requirements. As part of their governing structures, project consortia should establish clear and binding rules at an early stage for joint research data and software management as well as for the openness and reusability of their research outputs.
IV. Institutional Responsibility
- All faculties are advised to consider whether theses and dissertations should include data availability statements and statements about the availability of research software and, if necessary, to establish regulations with an appropriately binding character.
- Study commissions are advised to review curricula to ensure that the practical handling of research data and software is adequately covered as a cross-disciplinary subject in undergraduate programs and, where necessary, to ensure that it is given greater emphasis.
- Where necessary, departments and institutes are advised to appoint responsible persons to enable the institutional archiving of research data and software at the University of Potsdam.
[i]Open Science Guidelines of the University of Potsdam (2023). https://doi.org/10.25932/publishup-59490
[ii] To this end, all directly identifying characteristics must be removed, or the personal key created as part of the pseudonymization process destroyed. Additional measures may also be necessary. Data are only considered anonymous once “individual details about personal or factual circumstances can no longer be linked to a specific or identifiable person or can only be linked with a disproportionate amount of time, cost and effort” (§ 3 BbgDSG).
[iii]Carroll, S. R. et al. (2020). The CARE Principles for Indigenous Data Governance. Data Science Journal, 19(1), 43. doi.org/10.5334/dsj-2020-043
[iv]University of Potsdam (2023). Leitlinie zur Informationssicherheit der Universität Potsdam. https://www.uni-potsdam.de/fileadmin/projects/ambek/Amtliche_Bekanntmachungen/2023/Ausgabe_12/ambek-2023-12-587-588.pdf [last accessed 10.09.2025; only available in German]
[v]Wilkinson, M. D. et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3, 160018. https://doi.org/10.1038/sdata.2016.18
[vi]Ligue des Bibliothèques Européennes de Recherche (2017). Implementing FAIR Data Principles: The Role of Libraries.https://libereurope.eu/wp-content/uploads/2017/12/LIBER-FAIR-Data.pdf [last accessed 29.08.2025]
[vii]Version-control software and repositories commonly used in software development do not usually meet these requirements, as they neither guarantee availability nor offer persistent identifiers (such as DOIs). Cited releases should therefore also be deposited in a suitable research repository. GitHub offers an easy-to-use interface for this purpose: guides.github.com/activities/citable-code [last accessed 10.09.2025]
[viii]Chue Hong, N. P., et al. (2022). FAIR Principles for Research Software (FAIR4RS Principles) (1.0). https://doi.org/10.15497/RDA00068
[ix]Criteria for whether something is a genuine, verifiable, and substantive scholarly contribution vary by discipline. Within the framework of a discipline-specific publication culture, it is possible that the authors of a text publication differ from the authors of the data and software publications on which the text publication is based.
[x] Data Citation Synthesis Group (2014). “Joint Declaration of Data Citation Principles – FINAL”. Martone M. (ed.) San Diego, CA: FORCE 11. https://doi.org/10.25490/a97f-egyk
[xi] Smith A. M., et al. (2016). Software citation principles. PeerJ Computer Science,2:e86 https://doi.org/10.7717/peerj-cs.86