Getting started#

Installation#

We recommend installing cheminfopy in a dedicated virtual environment or conda environment. Note that we currently support Python 3.7 and 3.8.

To install the latest stable release use

pip install cheminfopy

The latest version of cheminfopy can be installed from GitHub using

pip install git+https://github.com/cheminfo-py/cheminfopy.git

High-level overview#

The idea behind this library is to provide an easy way to interact with the cheminfo ELN. For example, from Jupyter notebook.

Use cases can be:

  • Get some samples for further analysis that is currently not implemented in the ELN

  • Get many samples for some machine learning project

  • Programmatically add spectra or data to some entries in the ELN

To do so, this library is organized around managers that provide ways to interact with the different “kinds” of objects that are stored in the ELN:

  • Sample can be used to retrieve information about a sample and add new data to one sample

  • User can be used to retrieve information on the user level, e.g., to list all samples that are user has access to

  • Experiment can be used to interact with the reaction entries in the ELN

Basic interactions with a sample#

Initialization of Sample#

Before you can perform any query, you need to initialize a Sample.

from cheminfopy import Sample

# you need to initialize the sample manager with the ELN instance, the UUID of a sample and a token
my_sample_manager = Sample(instance='https://mydb.cheminfo.org/', sample_uuid='ca5915318397af313e55b3181f7b3a1c', token='TJyOgqRYyDusBmbGytvbNhTvgC3q5mfdg')

There are a view pieces of information that you need to get from the ELN for that:

  • The token: You can get tokens from the ELN in a view that looks somewhat like the following (on c6h6.org in the “Tools” tab)

_images/token_view.png
  • The sample UUID: This is the unique identifier of the sample. We will put it into the links and that the token view shows you and you can also find it in the sample table

  • The instance will show under the heading “Your database instance”

But, the view in the ELN will also show you input that you can just copy-paste to initalize the Sample. For entry tokens, it will also automatically fill the UUID.

Retrieving information#

Many core properties of a sample are accessible as properties of the Sample. That is to get the molecular formula you have to do nothing else than my_sample_manager.mf.

One common use case might be that you want to retrieve some file. For this, we have the get_data() method, which expects you to put the type of spectrum (e.g., “ir”, “isotherm”, …) and the filename.

my_sample_manager.get_data('isotherm', 'BET.jdx')

Which will return you the content of to the JCAMP-DX file. To convert JCAMP-DX files to Python dictionaries, you can use the jcamp library.

The question might be know: What do I do if I have no clue what the filename is? Then you can get a list of all available spectra using the my_sample_manager.spectra property of the Sample object.

Adding information#

If you performed some analysis (e.g., you computational colleagues perfomed a structure optimization) you might want to add some data back to the ELN. For this, you can use the cheminfopy.managers.sample.Sample.put_data() method. Please keep in mind our data schema when you use this method. For instance, you can only use the types that are implemented in the schema and we recommend that you only upload JCAMP-DX files for spectral data. To convert Python dictionaries into JCAMP-DX files you can use the pytojcamp library.

source_info = {
    "uuid": "34567896rt54ery546788969870890",
    "url": "https://aiidalab-demo.materialscloud.org/hub/login",
    "name": "Isotherm simulated using the isotherm app on AiiDAlab"
}
metadata = {
    "gas": "N2",
    "temperature": 200
}
my_sample_manager.put_data(data_type='isotherm', name='BET.jdx', filecontent='<your_file_content>', metadata=metadata, source_info=source_info)

Note that we also provided source_info as dictionary. This is information that we will save in the database such that you can trace back, at some future point in time, where the information came from. In this case, this new attachment came from a simulation in AiiDAlab. Hence we can use this description for the source name and use the uuid to point to the node of the same object in the AiiDA database.

Global access to the ELN#

If you have a user token, you can use the User class. With the get_sample() method you can get a Sample object given a UUID of a sample.

Basic interactions with a reaction#

Initialization of Experiment#

Before you can perform any query, you need to initialize a Experiment.

from cheminfopy import Experiment

# you need to initialize the experiment manager with the ELN instance, the UUID of an experiment and a token
my_experiment_manager = Experiment(instance='https://mydb.cheminfo.org', experiment_uuid='ca5915318397af313e55b3181f7b3a1c', token='TJyOgqRYyDusBmbGytvbNhTvgC3q5mfdg')

You can then access the main properties of the reaction using properties, e.g. my_experiment_manager.reactionRXN.

What reactions and samples do I have access to?#

To get a “table of contents” overview of the samples and reactions that you have access to, you can use the get_sample_toc() and get_experiment_toc() methods.