FAQs | POLiS

What are research data?

Research data are as diverse as the methodologies by which they are obtained. Although there is no universal definition, research data can be understood as all data generated and/or used in the context of scientific research. This includes data obtained from observations, surveys, measurements, experiments or simulations, as well as processed data. Methodical procedures such as algorithms, software or workflows can represent a central result of a scientific work, and therefore also belong to research data.

What is research data management?

Research data management addresses the handling of research data, from their creation to their storage and reuse. The different stages that research data go through can be explained by the research data life cycle.

Research data management (RDM) begins with the systematic planning of which data are used and how they are handled during a research project. For this purpose, a data management plan (DMP) helps to clarify critical questions in advance. During the collection of data, attention must be paid to a comprehensible documentation of the sources and tools used, and the research data must be supplied with relevant metadata. The same applies during the analysis of the research data. An electronic lab notebook (ELN) supports researchers in this process. Together with a comprehensible documentation, the raw and processed data are stored on a storage infrastructure, with the possibility of long-term archiving. Subject-specific repositories or scientific journals enable research data to be shared with the community. In order to uniquely identify and reference the generated data, a persistent identifier (PID) is assigned during publication. Other researchers now have access to the research data and can reuse them to build upon the current state of knowledge.

Why is research data management so important?

There are many reasons for this! Good research data management, for example, saves time, costs and resources in the long term. Additionally, a systematic handling of your research data helps to keep an overview of your data. This reduces errors and can improve the quality of your work. Accessible and well-documented data can accelerate the scientific progress, since other scientists can validate their results or generate new findings. For a list of benefits, see the questions below.

Why should I worry about the appropriate handling of my research data?

With the appropriate handling of your research data, you will benefit from:

Efficiency: A thoughtful handling of your research data saves time, increases the reproducibility and reduces the risk of losing your data.
Credibility: Giving other researchers access to your data increases the acceptance and credibility of your research.
Recognition: Not only papers can be cited, but also well-documented data sets.
Reusability: Other researchers can generate new knowledge based on your data - and vice versa.

What are FAIR data?

To exploit the full potential of collected research data, they should be sustainably reusable. To describe the requirements for this, the FAIR data principles were proposed. FAIR stands for Findable, Accessible, Interoperable and Reusable. A detailed description can be found here. In its proposal, the Cluster has decided to implement these principles.

What does FAIR mean in concrete terms?

This is a complex question, since data, workflows and starting positions are quite heterogeneous. In simple words, FAIR means:

Findable: Put your data into a repository or comparable data infrastructure, by assigning a PID. Describe your data with rich metadata.
Accessible: The repository or comparable data infrastructure should allow access to the data, using standardised communication protocols. Accessible does not necessarily mean open source. Metadata should be available, even if the data are no longer stored. If you are using established infrastructures, you do not have to worry about this point any further.
Interoperable: The data and metadata should be stored in such a way that both computers and humans can read and handle the content easily. This can be achieved by providing the data and metadata in widely used data formats.
Reusable: Here, two points are of special importance: First, the data should be described in such a way that other researchers are able to understand how such data were generated and, as a consequence, are potentially able to replicate them. Therefore, rich metadata are necessary. Second, a license should be given, which defines the reusability of the data.

How can I make my research data FAIR?

Looking on the details, this might be a complex question. However, with the following list, which is taken from here (or in German), a first guideline is given:

Document the data: Always document your data from the beginning of the research process in order to make the research data reproducible. Provide relevant metadata as well. Stick to discipline-specific metadata standards.
Grant a license: Choose a license suitable for your research data (e. g. Creative Commons). Try to keep the terms of use as open as possible and as restricted as necessary.
Legal aspects: The publication of research data may be opposed by various legal and/or ethical aspects. Check this before publication. Further information: here
Choose a repository: Search for a suitable subject-specific repository relevant for your research community. If you do not find what you are looking for choose a multidisciplinary or institutional repository.
Persistent identifier: Make sure that your data gets a DOI to make it findable in the long term. Create an ORCID for yourself so that your scientific work can be clearly assigned to you.
Publishing: Upload your research data in a suitable file format to the selected repository and let the world know about it! If you have any questions, the staff of the repository will be happy to help you. Further information: here

Important: When publishing a paper, it is of utmost importance to link the associated research data in the paper via a data availability statement (embargo period is possible) .

E.g.: The datasets generated and analyzed during the current study are available in the [NAME] repository [PERSISTENT IDENTIFIER].

Best practice examples for data availability statements:

The DFG addresses research data management within the Guidelines for Safeguarding Good Research Practice. If I have no time to read these guidelines, what are the basic statements in this codex?

The most central statement is written in guideline 13: „As a rule, researchers make all results available as part of a scientific/academic discourse.“

In the explanation below, the following is stated: „In order to maintain transparency and enable research to be referrable and reused by others, researchers make the research data and principal materials, on which a publication is based, available in recognised archives and repositories, which is in accordance with the FAIR principles (Findable, Accessible, Interoperable, Reusable).“ So, the implementation of the FAIR principles is on its way to become an integral part of everyday scientific work.

A second point concerns the obligation to archive research data. The explanation of guideline 17 specifies a period of ten years.

What is research software?

In simple terms, research software can be understood as software that is created in the context of scientific work or that contributes significantly to scientific knowledge. Even a simple input via the command line can be research software, if specialised knowledge is condensed in it.

Does research software also count as research data?

Yes! Especially when considering the FAIR principles, it is of central importance to make the research software used available to ensure traceability and reproducibility. A way of interpreting the Guidelines for Safeguarding Good Research Practice, with respect to research software, can be found here. See also here for current discussions.

What does the Cluster offer to support me in handling my research data?

A lot! Research data management plays an important role in achieving the goals of POLiS. The POLiS research data handling officer coordinates all RDM activities of the Cluster.

The Cluster offers a one-day workshop called „Introduction to Research Data Management“.
The Cluster provides a central data server to enable efficient collaborations with large amounts of data.
In cooperation with other research projects, the Cluster is developing the data exchange and analysis platform Kadi4Mat.
The Cluster supports researchers in introducing and using electronic lab notebooks.
The Cluster offers advanced modules, e.g. for data analysis and processing.
The Cluster works closely with libraries, computer centres and other research data infrastructures.
The Cluster advises researchers on how to implement the FAIR principles.
The Cluster initiates and supports pilot projects in RDM to enable completely traceable research chains from synthesis to post-mortem analysis.

I do not have time and do not want to deal intensively with my research data. What are the minimum requirements within POLiS?

In a scientific institution, the proper handling of research data is part of everyday life and is therefore also included in the Guidelines for Safeguarding Good Research Practice, published by the DFG. In POLiS, we demand and promote their implementation by applying the FAIR principles. This includes adding a data statement in POLiS publications, which describes how access is granted to the respective FAIR research data and under which license they can be reused. Exceptions are possible in individual cases, e.g. in the case of patent applications. In such cases, please consult the POLiS research data handling officer. In Kadi4Mat, reliable raw and processed data shall be made accessible in a curated form, beyond the DFG guidelines. This applies in particular, but not only, to data on which publications are based. On a case-by-case basis, or by granting a license, the owner of these data decides how they can be reused. This is of fundamental importance, in the context of the Cluster’s project structure, where some projects build on the data from other projects, the long-term funding period, and the work packages D.3 and D.4, where a standardised and automated access to a large database is required.

Why should I plan the handling of research data, before starting research?

Because it makes sense! This allows possible weak points to be identified and eliminated at an early stage. Especially when writing a research proposal, funding can be requested for research data management, e.g. costs for ensuring the reusability of research data for third parties. Many funding organisations meanwhile also demand a section on how research data is treated in the applied research project. If the handling of the research data is clear from the beginning, you no longer have to worry about data security etc. and can concentrate on the most important point in your research project - the research itself.

What is a data management plan (DMP)?

A data management plan describes which data will be produced or used and what happens with this data during and after the research project. It is recommended to write the data management plan before you start your research project. It may also be appropriate to update it during the project. A data management plan is required by many funding organisations.

How can I write a data management plan?

Writing a data management plan is quite easy. Either the organisation to which you are applying for funding will provide you with a template, or you can find templates on the internet, as a first orientation. Tools like DMPonline or RDMO might help you while writing a DMP.

Everyone talks about electronic lab notebooks (ELN). What are they and how can I use them?

An electronic lab notebook can be understood as „a digital version of the conventional paper lab notebook in which the entire scientific process can be captured. Unlike conventional notebooks, which are difficult for others to access, electronic notebooks make it easier for researchers to organise, manage and share the many components of their work.“ [A. Mascarelli, Nature, Research Tools: Jump Off The Page, 2014, doi.org/10.1038/nj7493-523a]

Compared to a paper form, the electronic character of the ELN offers several advantages. For example, it is quite easy to work collaboratively together with you colleagues and discuss results. The risk of losing data is generally lower. Depending on the ELN, reports can be generated automatically. As you can see, there are many good reasons to work with an ELN!

The idea of using an ELN within Kadi4Mat is to define and use well-described and documented sequences of sequential or parallel steps, which can be processed in a highly automated manner and are called workflow. If you create or use workflows in Kadi4Mat, you are actually using an ELN. This goes beyond a simple replacement of a paper lab notebook and can be understood as an ELN 2.0. However, since not all needs can be covered with this approach, other ELNs like Chemotion elabFTW or Jupyter can be an alternative.

What is data documentation?

A good data documentation is essential to find, reproduce and reuse data. Data documentation includes, for example, the description of variables, the relationship of the data to each other or information on the collection of data. Certain documentation information can also be stored in the form of metadata. Metadata are therefore a structured form of data documentation.

What are metadata?

Metadata describe data and are as important as the data themselves. Often data are worthless, because it is not documented under which circumstances the data were created. As a rule, the more metadata are available, the better it is!

What is important during the analysing step?

The analysing step often involves further processing of raw data and plotting of the processed data. The analysis steps and the methods and tools used should be documented as accurately as possible. This increases the reproducibility of the processed data and drawn conclusions.

The ELN part of Kadi4Mat offers solutions to provide a well-documented analysing step, which is reproducible and not least executable.

Where and how should I store my data?

In general, the generated data should be stored securely, with regard to data access security and hardware security. Depending on your institution or university, software and hardware solutions are offered to meet your needs. Regular backups reduce the risk of data loss. What sometimes proves to be challenging and difficult is the fact that you are recommended to make at least three copies of a file and store them on at least two different media, one of which is decentralised.

How should I name my files and folders?

A lot can be done to ensure that data can still be found years later. It is recommended to define a consistent method of naming your files. This method should be meaningful to you and your colleagues. To avoid problems during data processing, avoid umlauts like (ö, ü, ä) and ß, as well as special characters such as { } [ ] < > ( ) * % # ; , : ? ! & $. Use the underscore (_) or a capitalisation of the first letter to separate different parts. Avoid spaces and use the underscore instead. As an example, a file could be named as follows: [project]_[content]_[person]_[date/version].doc

Everyone talks about repositories for research data. What are they and how can I use them?

A repository is a managed directory for storing and describing (digital) objects like publications, software or research data. Repositories make searching and finding data easier. With a sophisticated permission management, repositories can be hosted publicly or locally. Repositories can be divided into discipline-specific, interdisciplinary or institutional repositories. Repositories can be used to search and share data.

How can I safely store my data for a long period of time?

Data on which papers are built should be made publicly accessible. In the next section you will learn how to share the data. In order to save the storage space available for daily work, other important data that are no longer actively required can be archived. Please contact your supervisor to find out about the possibilities of data archiving at your institute.

KIT employees can use the service bwDataArchive for data archiving.

It is a lot of work for me to handle my data carefully and publish them in a curated form. What are my advantages?

Datasets can be cited like publications that contribute to a system of scientific reputation, such as the h-index. In addition, your research will be more reliable if the underlying data can be accessed. Just as you benefit from the data of others in your research, others can benefit from your data and accelerate the scientific progress.

Where can I publish my research data, which a paper is based on?

There are many possibilities. Discipline-specific repositories are the first choice, because your data can be easily found by your community. Re3data might help to find an appropriate repository. Interdisciplinary repositories can be used if no appropriate discipline-specific repository can be found. Zenodo is most likely the most prominent one. Sometimes, journals offer the possibility to upload research data during the publication process, for example by using Mendeley Data. Furthermore, university libraries often offer the possibility to publish data. See KITOpen for KIT employees, OPARU for employees of Ulm University and JLUdata for employees of University of Giessen.

I want to share my research data. Which license is recommended?

Unfortunately, no universal answer is available to this question. For possible re-users, it is advantageous if the data are easy to reuse, for example by having few restrictions. The Creative Commons licenses are widely used for data publications. The Public Licence Selector can serve as an aid in decision-making and is easy to use.

What is a persistent identifier (PID)?

A persistent identifier is a long-lasting reference to an object like a document, file or web page. A digital object identifier (DOI) is commonly used to identify scientific publications.

What is an ORCID?

The Open Researcher and Contributor IDentifier (ORCID) is a PID for persons. Even a full name with middle name might not be sufficient to uniquely identify a person. Some journals require the authors’ ORCID to be provided during the submission process. Every researcher is highly encouraged to create and use an ORCID.

What is an embargo period for research data?

During the publication of research data, a period of time may be specified in which no access to the data is provided. This time is called embargo period. If the analysis of data is not yet finished at the time of a paper publication, the assignment of an embargo can be appropriate. Although the data is not yet accessible, metadata can be found and the data can be cited via a PID.

What is a data journal?

Data journals publish a detailed description to a data set, for example how these data were collected and how they can be reused. Examples are Scientific Data or Chemical Data Collections.

How can I find data from my research field that I can reuse and benefit from?

Data on which papers are based can at best be obtained directly from the information in the data statement. Search engines help you to find existing data. To find subject-specific repositories, you can use re3data. Other search engines like B2FIND, BASE, Google Dataset Search and DataCite Metadata Search can be used directly for a data search.

What is Kadi4Mat?

Kadi4Mat is the Karlsruhe Data Infrastructure for Materials Science. This software is developed for managing research data, with the objective of combining new concepts with established technologies and existing solutions. It contains both aspects of an electronic lab notebook (ELN) and a repository. More information can be found here.

Who is developing Kadi4Mat?

As part of several research projects, Kadi4Mat is being developed at Karlsruhe Institute of Technology (KIT) under the supervision of Dr.-Ing. Michael Selzer (Institute of Nanotechnology). Major contributions have been made from members of the Institute of Nanotechnology and the Institute of Applied Materials of KIT, as well as the Institute for Digital Materials of the Hochschule Karlsruhe, among others.

What is the vision of Kadi4Mat?

Kadi4Mat is developed as a state-of-the art platform for research data management. Starting from materials sciences, the concept is left generic enough to also cover the needs of a variety of other communities. One basic feature is the data storage and web-based data exchange with fine-grained, user-defined access permissions, so as to give you full control over your data. Since user-specific workflows play a central part in the development, Kadi4Mat will offer interfaces to run your workflow locally as well as on the Kadi4Mat server, interacting with the Kadi4Mat database. Kadi4Mat is developed as the platform to facilitate the handling of your research data and to help you make your data FAIR!

Why should I use Kadi4Mat?

The reason why you should use Kadi4Mat is because you benefit from it! First of all, you can easily manage your (meta-) data and have a great overview of your collected data. Using advanced search functionalities, you can quickly find the data you need. If you collaborate with other projects, you can share your data comfortably. Kadi4Mat offers different permission roles, so you can decide if other persons are only allowed to get read access or are also able to update the content. Kadi4Mat paves the way to advanced data analysis, which is a central part in RU D. Only when data are accessible and equipped with rich metadata, data science can work and new findings or descriptors can be found. Last but not least, a thorough and consistent data management enables a completely traceable research chain from synthesis to post-mortem analysis.

In which parts of the research data life cycle can I use Kadi4Mat?

Kadi4Mat is mainly developed to work with "warm" data. Therefore, typical application areas are located in the "collect" and "analyse" section. For this purpose, you can store your warm data and, if you wish, share them with your community. In order to publish the data, get a DOI and share them with the world they have to be transferred to another service. Kadi4Mat offers interfaces for this purpose. For example, your Kadi4Mat account can be linked to your Zenodo account and data can be copied directly from Kadi4Mat to Zenodo if they are to be published.

How can I get in touch with people of the Kadi4Mat developer team?

There are several ways to get in contact with the Kadi4Mat team. The most obvious way is to contact the research data handling officer via e-mail or Rocket.Chat. Furthermore, you can send feedback to feedback-kadi4mat@lists.kit.edu or use the issue system on GitLab to report bugs, feature requests, etc.

Is Kadi4Mat open source?

Yes! Kadi4Mat is developed under Apache License, Version 2.0. You can get access to the code here. In some cases, special features might be in private repositories until their release.

How can I use Kadi4Mat?

POLiS operates its own data server with Kadi4Mat here. You can access with your POLiS account.

How is the repository part of Kadi4Mat structured?

You can manage your data using resources called records, collections, groups and templates. With a role management system, you have full control over your data. You decide who can see and use your data.

What are records?

Records are the basic components of Kadi4Mat, as they contain data and connect them with metadata. The data of a record can either consist of a single file or of multiple files (e.g. a series of multiple images), all sharing the same metadata. Records can also be grouped into collections and linked to other records.

What are collections?

Collections represent simple, logical groupings of multiple records.

What are groups?

Groups can be used to group multiple users together, making access management easier.

What are templates?

Templates can be used to create blueprints for records and metadata. When creating a new record, templates can serve as a starting point.

What is Kadi-APY?

The Kadi-APY software is a library to be used in tandem with Kadi4Mat. The REST-like API of Kadi4Mat makes it possible to programmatically interact with most of the resources that can be used through the web interface. The library makes using this interface especially easy, by offering both an object-oriented approach in Python and a command line interface (CLI). It is written in Python 3 and works under both Linux and Windows.

Why should I use the Kadi-APY library?

Because it facilitates the interaction with the Kadi4Mat database, by integrating it into your workflow. If you repeat particular working steps several times, the Kadi-APY library helps to automate your interaction with Kadi4Mat, which saves time. Especially when processing large data sets, the Kadi-APY library enables a high degree of automation. Furthermore, your data become more reliable and traceable, since fewer steps are done by hand.

I cannot program. Does it even make sense to use the Kadi-APY library?

Yes, absolutely! There are several points of entry to use the Kadi-APY library. Experienced programmers might use the Python library directly, since it gives them full control and maximum flexibility. However, through CLI commands or within the graphical workflow editor, even inexperienced users can use the Kadi-APY library easily, without any programming skills.

How can I install the Kadi-APY library?

The Kadi-APY library is available via pip. Run

$ pip3 install kadi-apy

to install the Kadi-APY library.

How can I upgrade the Kadi-APY library?

Run

$ pip3 install --upgrade kadi-apy

How can I use the CLI of the Kadi-APY?

After installation, run

$ kadi-apy --help

or see here for more information. E.g. the command

$ kadi-apy records add-files -R my_record -n folder

will upload all files in the folder called folder into the record with the identifier my_record.

How can I get access to the Kadi-APY source code?

The source code is located at gitlab.com and can be accessed here.

Do I have to use the Kadi-APY library?

Of course not! All functionalities of Kadi4Mat can be accessed through the website. However, if you want to automate your workflow, you are encouraged to use it.

I want to use the Kadi-APY library within my Python code. How can I do that?

It is quite easy to use the Kadi-APY library within your Python code. After installation, you can import Kadi-APY functions into your Python code. An example is included in the Kadi-APY project, which can be found here. Furthermore, the programmatic API of Kadi4Mat may also be used directly with other suitable clients. Details about the API can be found in the developer documentation of Kadi4Mat, which can be found here.

Why should I use Linux?

In general, Linux has some strong advantages. It is open source, free to use and is not as vulnerable to attacks as Windows, just to mention a few. It is also perfect for programmers in a scientific environment! Even if you do not program, the use of a command line like the Linux terminal might be interesting for you to execute some CLI commands or run your workflow in a Bash script. These could be simple, but powerful steps to automate your data processing, make your workflow more reliable and reproducible and save time! In combination with Bash scripting, the Linux terminal is much better to use than the Windows command line. And the great thing is: Even within Windows, you can use a Linux terminal!

What is a command line interface (CLI)?

A command line interface provides a text-based user interface and is the most simple way for human-computer interaction. To interact, a command is necessary, which can be enriched with additional parameters. This command usually leads to an action which might also have an output.

The Linux terminal and the Windows PowerShell are two examples for command line interpreters that are able to handle CLIs. The alternative to a CLI is the graphical user interface (GUI).

I have no experience with Linux. Does it even make sense to start at all?

If you want to use some CLI commands, it is absolutely necessary! Kadi4Mat provides powerful CLI libraries (Kadi-APY CLI and workflow nodes) which can be used directly. After installation, the users only have to define the text input parameters, which is very easy.

I only have a Windows PC. How can I use Linux?

That is no problem at all! By additionally installing Linux on your Windows PC, you can use dual booting. Windows also enables the use of a powerful Linux terminal, by providing the so-called Windows Subsystem for Linux (WSL), which can be installed directly via the Windows App Store. This might be sufficient for your needs.

What are the most important commands for the Linux terminal?

A few commands are enough to get you started. These include:

ls - shows content of folder
ls -l - shows content of folder with more information
cd dir - changes directory to dir
cd - changes directory to home
cd .. - navigates one folder up
mkdir folder - creates a folder called folder
cp file1 file2 - copies file1 to file2
pwd - shows current directory
rm file - removes (deletes) the file called file
chmod +x file - makes the file called file executable

More useful commands can be found here.

What is the Windows Subsystem for Linux?

The Windows Subsystem for Linux allows Windows users to run a Linux terminal on a Windows PC. More information can be found here. The WSL 1 was released in 2016 and the WSL 2 in 2020. Using the WSL is probably the easiest way to install a Linux terminal on a Windows computer.

How can I install the WSL 1?

The WSL is available in the Microsoft App Store. Before you are able to install it, the Windows Subsystem for Linux feature has to be enabled. Furthermore, a 64-bit version of Windows 10, with version 1607 or higher, is required. The English and German versions of an easy to understand installation guide can be found here (English) and here (German).

Which distribution should I use?

A Debian-based distribution is highly recommended. As a suggestion, Ubuntu could be used, which is the most common distribution.

How can I use graphical applications, using the WSL 1?

In Windows, you need to install an X server like Xming. After starting the X server in Windows, you can start a GUI application via the Linux terminal.

If the graphical application is not able to connect to the X server, use the command

$ export DISPLAY=:0.0

before starting the application. You can also add this command to your /.bashrc file. So you do not have to re-enter it every time:

$ echo 'export DISPLAY=:0.0' >> ${HOME}/.bashrc

How can I transfer data from Windows to the WSL?

There are two possible ways: Via the terminal, the path to Windows is given with

$ /mnt/<letter_hard_drive>/Users/<username>/Desktop/

Using

$ cp /mnt/<letter_hard_drive>/Users/<username>/Desktop/file .

you can transfer a file called file from your Windows Desktop to the WSL. An alternative is to use the Windows File Explorer. The WLS is available by entering

\\wsl$

How can I set a link from the WSL to my Windows desktop?

Use the command ln -s to create a (symbolic) link. The command

$ ln -s /mnt/<letter_hard_drive>/Users/<username>/Desktop/ $HOME/desktop

creates a link called desktop in your $HOME folder, which points to your Windows desktop.

Which editor is recommended for editing text files?

You can use a Windows text editor to edit and save text files located in the WSL. A useful option is notepad++.

How can I copy and paste text in the WSL terminal?

You have to manually activate this feature in the option panel of the terminal (which is located in the top right corner of the terminal). After activating it, use [CTRL] + [SHIFT] + [C] to copy and [CTRL] + [SHIFT] + [V] to paste text within the terminal. An instruction can also be found here.

How can I scroll up in the WSL terminal?

You have to change the Screen Buffer Size to a higher number. Change it to 1000, for example, via Properties → Layout → Screen Buffer Size → Height. Follow this link to a description.

What is Git?

Git is a powerful version-control software designed to track changes during software development. Especially when many programmers work together, Git helps to keep an overview. Besides software, Git can be used to track any kind of files. Git is highly recommended for anyone developing software. More information can be found on Wikipedia.

Do I need to install Git?

No. The most important packages can be installed without Git.

What is GitLab?

GitLab is a Git-based platform for software development. Besides GitHub, GitLab is one of the most used Git-based platforms available. The Kadi4Mat source code is hosted at www.gitlab.com.

Do I need to create a GitLab account?

No. Public projects can be downloaded without a gitlab.com account. If you want to create an issue on gitlab.com, to get in contact with the Kadi4Mat developer team, you have to create an account first.

How do I install Git?

Just run the following command:

$ sudo apt install git

If you want to actively use Git for software development, the graphical interfaces provided by Git GUI and Gitk are recommended:

$ sudo apt install git-gui

$ sudo apt install gitk

How can I download code from gitlab.com?

If you want to get a copy of an existing project from GitLab, you can retrieve the respective URL from gitlab.com. You can choose between cloning with SSH or HTTPS. Open a terminal, navigate to the folder where you want to save the project and run

$ git clone <url>

If you just want to download source code from a public repository, cloning with HTTPS is recommended.

How can I update the code from GitLab?

Navigate into your Git project folder and run

$ git pull

If you want to update the Kadi-APY library, for example, you should navigate into the folder called kadi-apy.

What is a workflow?

A workflow is nothing more than a series of individual steps during your research. For example, a simple workflow is the transformation of raw data into processed data and plotting them. As part of good research data management, it is important to document everything that is necessary for a third party to be able to understand and repeat the steps taken, if necessary. This is especially easy to establish for steps which can be executed by a computer.

What are workflow nodes?

A workflow node is simply a single CLI command. Kadi4Mat provides a powerful library of workflow nodes for transforming, converting or plotting data. The Kadi4Mat workflow nodes can be installed via

$ pip3 install workflow-nodes

or upgraded via

$ pip3 install --upgrade workflow-nodes

However, to cover your specific workflow, it is also possible to create your own workflow node with your code!

What is the workflow editor?

The workflow editor is the graphical representation of single workflow nodes, their input parameters and how they are connected logically. In simple terms, the workflow editor is a graphical user interface of an advanced Bash script. The main advantage is that no scripting skills are required to use the workflow editor. Kadi4Mat will provide both a desktop and a web version of the workflow editor. Using the web version of the workflow editor, you can easily work collaboratively on a workflow, share it and run it on the server.

Could you please present an example workflow?

In the following figure you can see an example.

It starts with the situation where metadata of a real or virtual experiment is documented in an Excel sheet. Additionally, raw data are given. Each elliptical node represents an action within the workflow. The workflow shows that metadata are read from the Excel sheet and stored in Kadi4Mat. The raw data are processed and plotted. All data are uploaded into Kadi4Mat. To interact with the Kadi4Mat database, workflow nodes using the Kadi-APY library are used. A comparable, executable example, written in Bash, can be found here.

How can Kadi4Mat help me with my workflows?

The Kadi4Mat framework provides a powerful tool compendium of workflow nodes, which can be used directly. By integrating workflow nodes into your workflow, which communicate with the Kadi4Mat database, the systematical storage of your data happens almost on its own. That’s a great thing, isn’t it?

Where can I run my workflow?

At the moment, the workflows can only be executed locally. The desktop version of the workflow editor will be deployed as soon as possible. In the meantime, you can use a simple Bash script to describe and run your workflow.

How can I use my own code as a workflow node?

That’s quite easy! In general, every CLI command can be a workflow node. To make your own code available within the workflow editor, the xmlhelpy library has to be used, which defines the input and output parameters. If you have any questions as to how to do that, contact the POLiS research data handling officer.

I have installed the WSL. What are the next steps?

Use the installation script to install the Kadi-APY library, the workflow node and the necessary dependencies. After downloading, copy the script into the home folder of your WSL and run it via

$ ./install.sh

If the script cannot be executed, run

$ sudo chmod +x install.sh

to make it executable. After installation, you should source the .bashrc file via

$ source ~/.bashrc

to make sure that the installation directory is located in $PATH. Now, the workflow nodes and the Kadi-APY CLI can be used! It is recommended to store the information of the Kadi4Mat instance you are working with in $PATH. For this purpose, you need the personal access token (PAT) and the URL of your instance. The PAT can be created here.

Please note that the link is only working if you are logged in. The URL of the POLiS instance of Kadi4Mat is: https://kadi4mat.postlithiumstorage.org/, for example. Run the following two commands with the correct content for

$ echo 'export KADI_HOST=<url>' >> ${HOME}/.profile

$ echo 'export KADI_PAT=<your_PAT>' >> ${HOME}/.profile

After sourcing

$ source ${HOME}/.profile

the Kadi-APY library is connected to your Kadi4Mat account.

I use Kadi code (Kadi-APY or workflow nodes) and receive an error message from it. What can I do?

Kadi4Mat is updated regularly to integrate new features or remove bugs. Therefore, you have to make sure that you are using the latest code version via