RESEARCH DATA
In empirical research, data is an essential component. Modern technologies make it possible to create much larger amounts of data than before, and many new tools are emerging that enable data to be analyzed, stored, managed, and shared.
Any information that may be collected through observation, experiments, surveys or other means whose analysis is intended to confirm (or deny) the hypothesis of an ongoing study, are the research data.
With the rapid development and application of the Open Science in practice, aiming to obtain funding for projects, it is increasingly required to open/publish not only the results of the conducted research, but also the generated data used to obtain them. To this end, the data collected in the course of the research must be made available to the public by placing them in open access repositories or publishing them in data journals. In this way, the opened data can be reused by others researchers. Furthermore, data can also be cited separately, which provides the researcher with the added benefit of visibility and influence.
Any information that may be collected through observation, experiments, surveys or other means whose analysis is intended to confirm (or deny) the hypothesis of an ongoing study, are the research data.
With the rapid development and application of the Open Science in practice, aiming to obtain funding for projects, it is increasingly required to open/publish not only the results of the conducted research, but also the generated data used to obtain them. To this end, the data collected in the course of the research must be made available to the public by placing them in open access repositories or publishing them in data journals. In this way, the opened data can be reused by others researchers. Furthermore, data can also be cited separately, which provides the researcher with the added benefit of visibility and influence.
Data types
Research data incudes not only the final results of experiments, but also primary and intermediate data (such as spreadsheets, laboratory notes, diaries, questionnaires, transcripts, codes audio recordings, videos, photographs, test answers, slides, artifacts, samples, digital objects, data sets, data files, databases, algorithms, research methodologies, workflow, software content (input/output data, schemas, data files), protocols, etc.). Different institutions distinguish different types of data depending of the discipline and the context of the study:
NSF (National Science Foundation) distinguished data types:
NEH (National Endowment for the Humanities) distinguished data types:
NSF (National Science Foundation) distinguished data types:
- Data (numerical or qualitative)
- Publications
- Samples (e.g. blood, soil etc.)
- Physical collections (e.g. herbarium, archeological finds etc.)
- Software
- Research models
NEH (National Endowment for the Humanities) distinguished data types:
- Citations
- Software codes
- Algorithms
- Digital tools
- Geospatial coordinates
- Documentation
- Reports
- Articles
In order for data to be findable, it must be described with a metadata description. The components of the data description (metadata) are similar to the ones of the bibliographic description of research publications.
Metadata components for describing data:
Data description (metadata) are compiled according to certain metadata standards:
*DOI (Digital Object Identifier) – is a global, unique, and persistent digital name of an object. It does not change over time despite the possible changes in the object’s location, and is intended for an accurate identification of an online published object in the internet space.
Metadata components for describing data:
- Author(s) – name(s) of the data creator(s) (individuals or organizational unites) name(s).
- Publication date – indicate the year in which the data was published/placed in the repository.
- Name – the full name of the dataset, including the version number (if applicable).
- Source / Publisher – indicate where the data were published (repository or data journal).
- Link / Identifier – a link (URL) for direct access to the data on the Internet or a unique, permanent identifier (e.g., DOI*).
Data description (metadata) are compiled according to certain metadata standards:
- CERIF (Common European Research Information Format) – a common standard for the description of metadata in various disciplines for the European Union countries.
- Dublin Core (Dublin Core Metadata) – a standard for describing metadata for various disciplines.
- DDI (Document, Discover and Interoperate) – an international standard for the description of survey data in social, behavioral, economic and health sciences.
- DICOM (Digital Imaging and Communications in Medicine) – a standard for describing biomedical data.
- CIF (Crystallographic Information File) – a metadata standard for describing data from crystallographic and other structural studies.
*DOI (Digital Object Identifier) – is a global, unique, and persistent digital name of an object. It does not change over time despite the possible changes in the object’s location, and is intended for an accurate identification of an online published object in the internet space.
Background data are needed to understand, correctly interpret and to be able to reuse the data for secondary analysis.
Background data can be considered:
Background data can be considered:
- code descriptions
- questionnaires
- description of methodologies
- reports
- conference poster presentations
- articles
- information on websites, blogs, etc.
Research Data Management
In order for the data to be published, they must be properly collected, formatted, described and stored throughout the study. These processes of data collection, curation, and preservation, together with their dissemination, are described as the Research Data Management (RDM).

Effective RDM must be performed at all stages of the cycle.
Planning stage involves:
- Initial decision on whether new data will be collected, or already existing data sets will be used in the study
- Selection of data repository
- Selection of data formats and metadata standards
- Identification of confidentiality, privacy and other ethical issues
- Identification of potential data users
- Assessment of the possible costs surrounding data management
- Determination of the procedures needed for file organizing, backup creation and data storing
- Creation of quality assurance protocols
- Establishing data security measures and setting access restrictions
Primary data may need to be refined, standardized or otherwise processed for enabling their further analysis. Therefore, at this stage it is particularly important that all manipulations made to the primary data are documented. It is essential to describe all used analysis procedures, models, as well as the specifications of hardware and software.
Data preservation stage:
Depending on the requirements of the selected repository and / or funder the format of data that will be made public is determined at this stage, as well as proceeding with the further data curation / cleaning (e.g. depersonalization) and documentation. All documentation describing the data must be reviewed to ensure that it is comprehensive enough to enable the discovery and reuse of the published data.
Data Accessibility and Dissemination stage:
The processed and described data are published in scientific articles, data journals, reports, and together with additional information are stored in data repositories or archives.
Who are involved in RDM process?
Scientists and researchers – data creators and users. They plan the studies, anticipate what data will be collected, collects and processes the data, as well as determines how the data will be analyzed and what possible conclusions they may provide.
Universities and research institutes – sets internal RDM policies. They can provide the necessary resources required for RDM implementation in practice, such as RDM training, support in developing RDM plans, hardware and / or software, and related consulting (IT departments), as well as data archiving services (institutional repositories).
Data repositories – supervises the data by ensuring their long-term preservation and access. Data repositories works with data creators to ensure the long-term usefulness of the data and impose necessary access restrictions (e.g., embargo periods or other access restrictions related to the requirements of the institution or the funder), and also ensures the data security and respect for intellectual property rights.
Users – representatives of various fields using the published data. Data re-users can be: data producers themselves, and other researchers, who examine the data from other aspects than in original research, compare similar data obtained at later stages, and / or seek to verify the reliability of the research results; teachers who use the data for teaching purposes; students, employing the data for preparation of their graduation works; business, political or private sector representatives whose decisions are often based on the data; journalists aiming to make their published information more reliable; and interested members of the general society.
Funders – provides the necessary resources for the research. Today, more and more funding institutions are demanding a RDM plan to increase the transparency of execution of their funded projects. By encouraging the re-use of data collected during the project, funders are also seeking to increase their return on investment.
Publishers and journals – publish research results and scientific discoveries based on them. Publishers and editorial boards of journals are increasingly encouraging authors to cite data (both self-produced and external that have been reused in the study). Some journals, e.g., PLOS (Public Library of Sciences) include the requirement, that the data used in research must be published in data repositories.
Due to the large number of stakeholders involved in the RDM process, effective RDM requires close cooperation between all participants in the process.
Scientists and researchers – data creators and users. They plan the studies, anticipate what data will be collected, collects and processes the data, as well as determines how the data will be analyzed and what possible conclusions they may provide.
Universities and research institutes – sets internal RDM policies. They can provide the necessary resources required for RDM implementation in practice, such as RDM training, support in developing RDM plans, hardware and / or software, and related consulting (IT departments), as well as data archiving services (institutional repositories).
Data repositories – supervises the data by ensuring their long-term preservation and access. Data repositories works with data creators to ensure the long-term usefulness of the data and impose necessary access restrictions (e.g., embargo periods or other access restrictions related to the requirements of the institution or the funder), and also ensures the data security and respect for intellectual property rights.
Users – representatives of various fields using the published data. Data re-users can be: data producers themselves, and other researchers, who examine the data from other aspects than in original research, compare similar data obtained at later stages, and / or seek to verify the reliability of the research results; teachers who use the data for teaching purposes; students, employing the data for preparation of their graduation works; business, political or private sector representatives whose decisions are often based on the data; journalists aiming to make their published information more reliable; and interested members of the general society.
Funders – provides the necessary resources for the research. Today, more and more funding institutions are demanding a RDM plan to increase the transparency of execution of their funded projects. By encouraging the re-use of data collected during the project, funders are also seeking to increase their return on investment.
Publishers and journals – publish research results and scientific discoveries based on them. Publishers and editorial boards of journals are increasingly encouraging authors to cite data (both self-produced and external that have been reused in the study). Some journals, e.g., PLOS (Public Library of Sciences) include the requirement, that the data used in research must be published in data repositories.
Due to the large number of stakeholders involved in the RDM process, effective RDM requires close cooperation between all participants in the process.
Why is important to manage data?
RDM helps researchers to organize the research more efficiently:
RDM ensures that the data will be preserved and made available for the future research, interpretation and reuse:
RDM helps researchers to organize the research more efficiently:
- In the case of a large project that generates different types of data, even the researchers themselves may get lost in the variety and abundance of data. RDM helps them optimize the use of data in the active research phase.
- Often data collection, processing, or analysis tasks are performed by graduate students or other project participants (staff members or even representatives of other departments or institutions). Therefore, RDM helps to cooperate with other participants of the research and facilitates knowledge transfer in the context of staff changes.
RDM ensures that the data will be preserved and made available for the future research, interpretation and reuse:
- At the end of a research or project, the data may be used to answer additional questions that were not addressed in the original research.
- Also, if a similar research is to be carried out in the future, the data collected from the original research can be easily used to compare the results, which would be practically impossible if the data were not processed systematically.
- RDM also increase the transparency of the research and the reliability of the data, as properly prepared data allow them to be used to validate the results of the research.
- Properly collected and described data can be discovered and reused by third parties (users), thus ensuring not only a higher return on investment in projects due to the reuse of data for innovation and progress, but also promote the develop of citizens‘ science, and provides a possibility to employ data for study and teaching purposes.
Documents regulating research data management in Lithuania
Research Data Management (RDM) principles in Europe are based on the European Commission‘s (EC) 2017 March 17 Guidelines to the Rules on Open Access to Scientific Publications and Open Access to Research Data in Horizon 2020. More information on the European Commission‘s open science and research data management policies can be found at on OpenAIRE website.
The principles of data management of the projects financed by the Research Council of Lithuania (RCL) are defined in 2016. February 29 by resolution no. VIII-2 Guidelines for Open Access to Scientific Publications and Data.
The key provisions in the EC and RCL guidelines are similar and emphasize:
At VILNIUS TECH research data management is carried out in accordance with „Guidelines for Open Access to Data and Scientific Publication of Vilnius Gediminas Technical University“ approved by the Rector‘s Order No. 1231 on 2016 December 6.
Highlights:
FAIR principles
All regulations require that research data would be collected, formatted, described and stored in accordance with FAIR principles (A set of guiding principles to make data Findable, Accessible, Interoperable, Reusable).
The principles of data management of the projects financed by the Research Council of Lithuania (RCL) are defined in 2016. February 29 by resolution no. VIII-2 Guidelines for Open Access to Scientific Publications and Data.
The key provisions in the EC and RCL guidelines are similar and emphasize:
- the importance of presenting a research data management plan;
- data storage in research data repositories;
- ensuring open access to data through open content licenses;
- ensuring long-term data preservation after the end of the project;
- a comprehensive description of the data, providing sufficient information and tools to verify the research results;
- linking data with relevant publications.
At VILNIUS TECH research data management is carried out in accordance with „Guidelines for Open Access to Data and Scientific Publication of Vilnius Gediminas Technical University“ approved by the Rector‘s Order No. 1231 on 2016 December 6.
Highlights:
- research data obtained from publicly funded project activities must be kept upon open access according to the requirements of the funding institution;
- at the end of project data storage and submission to an open access repository must be ensured;
- ensured timely open access to data and their metadata, which should coincide with the announcement of the relevant publication.
FAIR principles
All regulations require that research data would be collected, formatted, described and stored in accordance with FAIR principles (A set of guiding principles to make data Findable, Accessible, Interoperable, Reusable).
Data Management Plans (DMPs)
It is increasingly required to submit a research data management plan when applying for project funding.
Research Data Management Plan (DMP) – is a formal document describing the processes that will be performed with the created or reused data by identifying specific measures and strategies to be applied at all stages of the data management cycle.
Research Data Management Plan (DMP) – is a formal document describing the processes that will be performed with the created or reused data by identifying specific measures and strategies to be applied at all stages of the data management cycle.
DMP should describe all key aspects of data management throughout the research life cycle. Plan is based on a common framework answering a number of key questions:
- What data will be created, generated or collected?
- In what format the data will be collected and what is the expected volume?
- How the data will be processed during and after the project?
- What methods and metadata standards will be used? (Different science fields and/or disciplines may have different/specific metadata standards that need to be defined before data collection)
- How the long-term data storage will be ensured: to which data repository the data will be deposited, and how much will it cost?
- What access possibilities to the data are anticipated to be applied: will the access be open, and how will the data be shared?
Digital Curation Center provides a list of questions (Digital Curation Centre’s Checklist for a Data Management Plan) that should be addressed in the DMP:
What data will be collected or created?
A description of the data must be provided, describing its type, format, and volume, together with a justification for the choice of the specific format. If there are any storage implications due to the format or volume of the data, they should be included as well.
How the data will be collected and created?
Data collection methods to be used, including the standards that dictate these methods, should be described.
Plans for organizing data files and applying version control, as well as how quality assurance protocols will be implemented during and after data collection, should also be provided.
What documentation and metadata will accompany the data?
A proper description of the data set (metadata) and provision of additional (background) information is essential to ensure that the data will be findable and reusable.
It is important to determine what information will be required to ensure that the data be understandable to secondary users, or for the data creators themselves. In other words, it is necessary to describe the types of metadata, as well as of documentation / background data that will be needed to accompany the data in order to interpret and use them in the future.
These may include descriptions of methodologies or codes, definitions of variables and their values, questionnaires, descriptions of hardware or software, analytical procedures used along with their application conditions, and more.
How any ethical issues will be managed?
The DMP should demonstrate that you have considered critical issues related to laws and guidelines regarding the protection of human subjects. If the project will conduct research involving human participants, provisions for protecting their confidentiality should be described. This should include strategies for handling and storing sensitive data, restricting access to sensitive data, and preparing depersonalized datasets for sharing.
How copyright and intellectual property rights issues will be managed?
This is where legal issues of data ownership need to be addressed. It is necessary to specify the owner of the data and the conditions of their use. If third-party data is reused, the data producer’s permission to use the data must be obtained and provided.
How the data will be stored and backed up during research?
It is necessary to describe the data retention provisions and extent to which they are adequate to your data type and scope. Provisions for storage should also include plans for systematic backups of the data files.
How the access and security will be managed?
If the data are sensitive or should otherwise be restricted, the access may be granted only to authorized project personnel. It is also necessary to describe in detail the security measures that will be applied to protect the data, along with the standards with which those security measures comply.
Which data should be retained, shared, and (or) preserved?
Depending on how the data may be used in the future or by others or by the data producers themselves, the potential value of the data, and the effort that will be required to prepare the datasets for their preservation and long-term access, need to be considered.
What is the long-term preservation plan for the dataset?
It is necessary to choose a storage (data repository) that will be used to archive the data, ensuring that it will be stored and available for use in the near future and in the long run.
How will the data be shared?
It is necessary to define the mechanisms that will be used to share data by describing how others might find the data and how data files will be delivered to them. The preferences of how the data should be acknowledged or cited by other users, should also be specified.
Are any restrictions on data sharing required?
If privacy concerns affect the ability to share project data, it is necessary to discuss them and how these concerns may be resolved to enable data sharing (by providing anonymized versions of the data, requiring a data use agreement, or by other mechanisms). There may also be an option to place an embargo on the data, to ensure that only authorized personnel will have exclusive access to the data for a period of time. In this case, it is necessary to explain the underlying reasons for application of embargo period.
Who will be responsible for data management?
Identify, by name if possible, who will oversee the implementation of the data management plan.
What resources will be required to implement DMP?
Some projects that rely or generate complex data may require special knowledge or special equipment for ensuring proper data management due to the type, size or the fact that project activities are geographically distributed among different institutions.
The costs associated with performing ongoing data management tasks, as well as the resources required for long-term preservation of the data in a repository should also be accounted for.
Answering these questions will not only ensure the compliance with data management requirements, but will also help to better prepare for data management strategies’ implementation.
What data will be collected or created?
A description of the data must be provided, describing its type, format, and volume, together with a justification for the choice of the specific format. If there are any storage implications due to the format or volume of the data, they should be included as well.
How the data will be collected and created?
Data collection methods to be used, including the standards that dictate these methods, should be described.
Plans for organizing data files and applying version control, as well as how quality assurance protocols will be implemented during and after data collection, should also be provided.
What documentation and metadata will accompany the data?
A proper description of the data set (metadata) and provision of additional (background) information is essential to ensure that the data will be findable and reusable.
It is important to determine what information will be required to ensure that the data be understandable to secondary users, or for the data creators themselves. In other words, it is necessary to describe the types of metadata, as well as of documentation / background data that will be needed to accompany the data in order to interpret and use them in the future.
These may include descriptions of methodologies or codes, definitions of variables and their values, questionnaires, descriptions of hardware or software, analytical procedures used along with their application conditions, and more.
How any ethical issues will be managed?
The DMP should demonstrate that you have considered critical issues related to laws and guidelines regarding the protection of human subjects. If the project will conduct research involving human participants, provisions for protecting their confidentiality should be described. This should include strategies for handling and storing sensitive data, restricting access to sensitive data, and preparing depersonalized datasets for sharing.
How copyright and intellectual property rights issues will be managed?
This is where legal issues of data ownership need to be addressed. It is necessary to specify the owner of the data and the conditions of their use. If third-party data is reused, the data producer’s permission to use the data must be obtained and provided.
How the data will be stored and backed up during research?
It is necessary to describe the data retention provisions and extent to which they are adequate to your data type and scope. Provisions for storage should also include plans for systematic backups of the data files.
How the access and security will be managed?
If the data are sensitive or should otherwise be restricted, the access may be granted only to authorized project personnel. It is also necessary to describe in detail the security measures that will be applied to protect the data, along with the standards with which those security measures comply.
Which data should be retained, shared, and (or) preserved?
Depending on how the data may be used in the future or by others or by the data producers themselves, the potential value of the data, and the effort that will be required to prepare the datasets for their preservation and long-term access, need to be considered.
What is the long-term preservation plan for the dataset?
It is necessary to choose a storage (data repository) that will be used to archive the data, ensuring that it will be stored and available for use in the near future and in the long run.
How will the data be shared?
It is necessary to define the mechanisms that will be used to share data by describing how others might find the data and how data files will be delivered to them. The preferences of how the data should be acknowledged or cited by other users, should also be specified.
Are any restrictions on data sharing required?
If privacy concerns affect the ability to share project data, it is necessary to discuss them and how these concerns may be resolved to enable data sharing (by providing anonymized versions of the data, requiring a data use agreement, or by other mechanisms). There may also be an option to place an embargo on the data, to ensure that only authorized personnel will have exclusive access to the data for a period of time. In this case, it is necessary to explain the underlying reasons for application of embargo period.
Who will be responsible for data management?
Identify, by name if possible, who will oversee the implementation of the data management plan.
What resources will be required to implement DMP?
Some projects that rely or generate complex data may require special knowledge or special equipment for ensuring proper data management due to the type, size or the fact that project activities are geographically distributed among different institutions.
The costs associated with performing ongoing data management tasks, as well as the resources required for long-term preservation of the data in a repository should also be accounted for.
Answering these questions will not only ensure the compliance with data management requirements, but will also help to better prepare for data management strategies’ implementation.
There are several online tools for creating data management plans that can be used for free. The most widely used tools are DMPTool and DMPOnline, developed by the University of California Curatorial Center of the California Digital Library and the UK Digital Curation Center (DCC).
Both these tools are designed to facilitate the preparation of DMP. They provide step-by-step wizards that allows researchers to create customized data management plans. Guidelines and templates adjusted to the requirements of various funding agencies are also made available to researchers. In the event of a change in the funding agency's policy, the information is also updated in the provided templates. It is also possible to draw up plans that are not related to specific funders, following the general structure of the DMP. Institutions can also tailor these tools according to their individual needs.
For Lithuanian researchers, the preparation of DMP is relevant when submitting applications for Europe-Horizon and RCL funded projects: In order to use these tools, researchers have to create free personal accounts. Registered users can create DMP, view and download publicly available examples of by other institutions and / or researchers prepared DMPs.
Both these tools are designed to facilitate the preparation of DMP. They provide step-by-step wizards that allows researchers to create customized data management plans. Guidelines and templates adjusted to the requirements of various funding agencies are also made available to researchers. In the event of a change in the funding agency's policy, the information is also updated in the provided templates. It is also possible to draw up plans that are not related to specific funders, following the general structure of the DMP. Institutions can also tailor these tools according to their individual needs.
For Lithuanian researchers, the preparation of DMP is relevant when submitting applications for Europe-Horizon and RCL funded projects: In order to use these tools, researchers have to create free personal accounts. Registered users can create DMP, view and download publicly available examples of by other institutions and / or researchers prepared DMPs.
Research data collection and storage
The suitability of the format for the data firstly depends on the data type and the specification of its generation, including the equipment used. Also, when preparing data for publication in repositories or data journals, the formats supported by the repositories and those recommended by publishers should be considered, as they may vary. However, if possible, it is recommended to choose the data format based on several key criteria:
- Is the format widely used?
- Is the format suitable for long-term storage?
- Is the format open and does not require licensed software to open it?
- What is the complexity of the format? Simpler formats are recommended.
- Is the format compatible with archiving and is not detrimental to the data quality?
Data type | Recommended formats |
Text | PDF (the most appropriate: PDF/A) without formatting: TXT can be edited: ODT, RTF, HTML for text with formulas: LaTeX (TEX) |
Tables | CSV / TSV Numerical data: HDF5 |
Graphics | Raster: PNG, TIFF Vector: SVG, EPS |
Multimedia | Multimedia: MKV, WebM, Video: AV1, VP9 Audio: FLAC, WAV, Vorbis, Opus |
Linked and / or structured data | SIARD, Dump, XML, CSV / TSV, HDF5, JSON, YAML |
Data Repositories
These registries can be used both to find the most suitable repository for data storage, and for searching of existing data sets.
re3data – the Registry of Research Data Repositories – is the most widely used search engine for research data repositories. It helps to find discipline specific repositories among more than 2000 registered ones. You can search by topic, country, content type, and more. This search engine is a service of DataCite (a non-profit organization that provides persistent digital identifiers (DOI) for research data).
Other data repository registries:
re3data – the Registry of Research Data Repositories – is the most widely used search engine for research data repositories. It helps to find discipline specific repositories among more than 2000 registered ones. You can search by topic, country, content type, and more. This search engine is a service of DataCite (a non-profit organization that provides persistent digital identifiers (DOI) for research data).
Other data repository registries:
- FAIRsharing (databases catalogue)
- Open Access Infrastructure for Research in Europe (OpenAIRE / Explore)
- Directory of Open Access Repositories (OpenDOAR)
- Master Data Repository List (Clarivate Analytics)
These repositories accumulates the research data and/or results of all disciplines. Data deposition, archiving and access are free. Each data set receives a permanent digital identifier DOI.
- Zenodo – this repository is linked to Horizon 2020 and OpenAIRE projects. The repository is funded by the European Commission.
- DRYAD
- FigShare
- 4TU.ResearchData
- B2SHARE
- Mendeley.Data
Data journals
The number of journals dedicated for publishing data is growing rapidly. The most popular data journals are listed below.
- Biodiversity Data Journal (Pensoft)
- Biomedical Data Journal (Procon)
- BMC Research Notes (Springer Nature)
- Data (MDPI)
- Data in Brief (Elsevier)
- Earth System Science Data (Copernicus Publications)
- Ecology / Ecological Archives (Wiley)
- Geoscience Data Journal (Wiley)
- F1000Research (Taylor & Francis Group)
- Genomics Data (Elsevier)
- Geoscience Data Journal (Wiley)
- Geoscientific Model Development (EGU Publications)
- GigaScience (Oxford Academic)
- International Journal of Robotics Research (SAGE Journals)
- Journal of Chemical and Engineering Data (SAGE Journals)
- Journal of Physical and Chemical Reference Data (AIP Publishing)
- Nuclear Data Sheets (Elsevier)
- Research Data Journal for the Humanities and Social Sciences (Brill)
- Scientific Data (Springer Nature)
Search and citation of research data
When conducting research, preparing term papers, and graduation works, an existing published data may be reused instead of collecting or creating new data, thus saving both time and resources. This way becomes more and more convenient since researchers, government organizations, and other institutions are increasingly making their data sets freely available.
Various research data sets can be found in data repositories and data journals, where you can also store and publish your own research data. Search methods applied in each data repository may be different. Therefore, in order to perform the most optimal search and efficiently find the data sets you need, we recommend to read repository’s user guide which can be found at repository’s website before conducting the search.
In order to make the data search more convenient, several additional tools are available which allows to search for the data sets matching specific research topic across multiple repositories at once, for example:
Various research data sets can be found in data repositories and data journals, where you can also store and publish your own research data. Search methods applied in each data repository may be different. Therefore, in order to perform the most optimal search and efficiently find the data sets you need, we recommend to read repository’s user guide which can be found at repository’s website before conducting the search.
In order to make the data search more convenient, several additional tools are available which allows to search for the data sets matching specific research topic across multiple repositories at once, for example:
- DataCite – helps to find the required data sets in several repositories according to the topic you are interested in by using Datacite search.
- Mendeley.Data – is a research data search engine where researchers can search within > 25 million datasets in both thematic and multidisciplinary data repositories.
- Data Observation Network for Earth (DataONE) – provides capabilities of data search for Earth observation research data across a wide network of subject related repositories.
- CESSDA Data Catalogue – allows search of social sciences datasets in the repositories of the European Consortium for Social Sciences Data Archives.
Research data is also a scientific output, so when using data not created by you in your work, it must be cited as any other source of information: the data used must be cited in the text and included in the bibliography. The citation and bibliographic description of the data is compiled in the same way as in the case of scientific publications, by indicating the main components of metadata.
General structure of the bibliographic data description:
Creator. (Year of publication). Name. Version. Publisher. Source type. Identifier.
As in the case of publications, the structure of a bibliographic record of data depends on the citation style used. Therefore, the order and formatting of individual components in bibliographic descriptions may vary.
General structure of the bibliographic data description:
Creator. (Year of publication). Name. Version. Publisher. Source type. Identifier.
As in the case of publications, the structure of a bibliographic record of data depends on the citation style used. Therefore, the order and formatting of individual components in bibliographic descriptions may vary.
Additional information resources
- FOSTER is an organization that provides comprehensive information on open science in order to fill existing knowledge gaps in the academic community. On scientific data management topics, the organization provides training materials, as well as free courses (Open and FAIR Research Data and Managing and Sharing Research Data), and provides a link to the Open Science Training Handbook on a variety of open science topics, including open research data.
- OpenAIRE – is an organization funded by the European Commission that is actively involved in shaping open science policy. The organization also provides comprehensive information and guides on scientific data management and other open science topics. Summarized information on research data management is provided in the A Research Data Management Handbook.
- Open Knowledge Foundation – it is a not-for-profit organization. Their mission is “to create a more open world – a world where all non-personal information is open, free for everyone to use, build on and share; and creators and innovators are fairly recognized and rewarded”. They have developed an Open data handbook, which is publicly available to read in over 20 languages.
- Research Data Alliance (RDA) – an international organization whose main objective is to provide comprehensive information and support regarding the opening and management of research data. The RDA website provides information and recommendations to help in discovering the most appropriate data management solutions. On the site, you can search for information by specific topic or by research area.
- Digital Curation Centre (DCC) – it provides a range of data management information, guides and tools.
- Data Observation Network for Earth (DataONE) – an organization that seeks to ensure open, continuous and secure access to Earth observation research data. It also provides comprehensive methodological material on data management topics.
- CESSDA – Consortium of European Social Science Data Archives, which provides extensive training (in video format, downloadable) on data management topics in the context of Social Sciences, and a Data Management Expert Guide.
-
- Page administrators:
- Jolanta Juršėnė
- Asta Katinaitė-Griežienė
- Orinta Sajetienė
- Indrė Ereminė