How to best handle data storage and archiving after the project is finished?
How to best handle data storage and archiving after the project is finished?
What is Data Collection?
Data collection is the procedure of collecting, measuring, and analyzing accurate insights for research using standard validated techniques. To collect data, we must first identify what information we need and how we will collect it. We can also evaluate a hypothesis based on collected data. In most cases, data collection is the primary and most important step for research. The approach to data collection is different for different fields of study, depending on the required information. Research Data Management ( RDM) is present in all phases of research and encompasses the collection, documentation, storage and preservation of data used or generated during a research project. Data management helps researchers: organize it, locate it, preserve it, reuse it. Additionally, data management allows:- Save time and make efficient use of available resources : You will be able to find, understand and use data whenever you need.
- Facilitate the reuse of the data you have generated or collected: Correct management and documentation of data throughout its life cycle will allow it to remain accurate, complete, authentic and reliable. These attributes will allow them to be understood and used by other people.
- Comply with the requirements of funding agencies : More and more agencies require the presentation of data management plans and/or the deposit of data in repositories as requirements for research funding.
- Protect and preserve data : By managing and depositing data in appropriate repositories, you can safely safeguard it over time, protecting your investment of time and resources and allowing it to serve new research and discoveries in the future.
- Numerical files, spreadsheets, tables, etc.
- Text documents in different versions
- Images, graphics, audio files, video, etc.
- Software code or records, databases, etc.
- Geospatial data , georeferenced information
Joint Statement on Research Data from STM, DataCite and Crossref
- When publishing their results, researchers deposit the related research data and results in a trusted data repository that assigns persistent identifiers (DOIs when available). Researchers link to research data using persistent identifiers.
- When using research data created by others, researchers provide attribution by citing the data sets in the references section using persistent identifiers.
- Data repositories facilitate the sharing of research results in a FAIR manner, including support for metadata quality and completeness.
- Editors establish appropriate data policies for journals, outlining how data will be shared along with the published article.
- The editors establish instructions for authors to include Data Citations with persistent identifiers in the references section of articles.
- Publishers include Data Citations and links to data in Data Availability Statements with persistent identifiers (DOIs when available) in the article metadata recorded in Crossref.
- In addition to Data Citations, Data Availability Statements (human and machine readable) are included in published articles where applicable.
- Repositories and publishers connect articles and data sets through persistent identifier connections in metadata and reference lists.
- Funders and research organizations provide researchers with guidance on open science practices, track compliance with open science policies where possible, and promote and incentivize researchers to openly share, cite, and link research data.
- Funders, policy-making institutions, publishers, and research organizations collaborate to align FAIR research data policies and guidelines.
- All stakeholders collaborate to develop tools, processes and incentives throughout the research cycle to facilitate the sharing of high-quality research data, making all steps in the process clear, easy and efficient for researchers through provision of support and guidance.
- Stakeholders responsible for research evaluation factor data sharing and data citation into their reward and recognition system structures.
- Know the requirements and programs of the financing agencies
- Search research data
- Prepare a Data Management Plan .
- If your research involves working with humans, informed consent must be obtained.
- If you are involved in a collaborative research project with other academic institutions, industry partners or citizen science partners, you will need to ensure that your partners agree to the data sharing.
- Think about whether you are going to work with confidential personal or commercial data.
- Think about what systems or tools you will use to make data accessible and what people will need access to it.
During the project...
- Update the Data Management Plan
- Organize and document data
- Process the data
- Store data for security and preservation
- The context: history of the project, objectives and hypotheses.
- Origin of the data: if the data is generated within the project or if it is collected (in this case, indicate the source from which it was extracted).
- Collection methods, instruments used.
- Typology and format of data (observational, experimental, computational data, etc.)
- Description standards: what metadata standard to use.
- Structure of data files and relationships between files.
- Data validation, verification, cleaning and procedures carried out to ensure its quality.
- Changes made to the data over time since its original creation and identification of the different versions.
- Information about access, conditions of use or confidentiality.
- Names, labels and description of variables and values.
-
READme file template for a data setCornell University
-
REadme file template (Madroño consortium)
- Each variable forms a column
- Each observation forms a row
- Each cell is a simple measurement
- Structure the data in TIDY (vertical) format i.e. each value is a row, rather than horizontally. Non-TIDY (horizontal) data.
- Columns are used for variables and their names can be up to 8 characters long without spaces or special signs.
- Avoid text values to encode variables, better encode them with numbers .
- In each cell, a single value
- If you do not have a value available , provide the missing value codes.
- Provide data tables , which collect all the data encodings and denominations used.
- Use data dictionary or separate list of these short variable names and their full meaning
-
Practical Guide to Publishing Tabular Data to CSV FilesContent prepared by Carlos de la Fuente García, expert in open data, within the framework of the Aporta Initiative, of the Ministry of Economic Affairs and Digital Transformation
-
Data Quality Review ChecklistFor research data during the life of the project it is recommended:Consolidated thematic data repository for that discipline. Re3data or Data Repositories, for example, in Social Sciences, World Values Survey or Wellcome Library in History of Medicine Institutional data repository. Digital CSIC or Harvard Dataverse, Dehesa (repository of the University of Extremadura).
- Store data in readable formats for the long term.
- Check the files from time to time.
- Clearly organize and label stored data so that it is easily findable and accessible.
- Take into consideration the physical degradation of optical and magnetic media in case it is necessary to copy or migrate data.
- Store data on different media, even for a short-term project, for example on the hard drive and on CD.
- Create digital versions of paper documentation, in PDF/A Format for long-term preservation and storage.
- Take into account support conservation factors, such as changes in temperature, relative humidity, light, etc.
Storage media are complementary to each other and some of the most common options are:
- Personal or project data storage (using USB drives, computer hard drives or network drives within the institution), recommended only for use in the course of research.
- Institutional repository (Dehesa or that of your university).
- National data archiving services.
- Cloud data warehouse.
- Repositories (RIO, Zenodo, disciplinary repositories).
According to the OpenAIRE Research Data Management Briefing Paper, data should be deposited in a data repository according to the following order of preference :Repository search engines could also help you:- Multidisciplinary data repository. Zenodo, Dryad, Dataverse, Figshare, Mendeley Data,
- Other data repositories
- Together with the scientific publications ODYSSEY
- Re3data: global registry of research data repositories from different academic disciplines. Managed and maintained by DataCite .
- Fairsharing : search engine for standards, data repositories and open access policies in all disciplines.
- Repository Finder – Allows researchers to search for the most suitable repository to deposit data.
-
When the data constitutes or contains sensitive information . There may be national and even institutional regulations on data protection that will need to be taken into account. In these cases, precautions must be taken to anonymize the data and, in this way, make its access and reuse possible without any errors in the ethical use of the information.
-
When the data is not the property of those who collected it or when it is shared by more than one party, be they people or institutions . In these cases, you must have the necessary permissions from the owners to share and/or reuse the data.
-
When the data has a financial value associated with its intellectual property , which makes it unwise to share the data early. Before sharing them, you must verify whether these types of limits exist and, according to each case, determine the time that must pass before these restrictions cease to apply.
Comments
Post a Comment