These use cases show how you can use EDB Postgres Advanced Server with Chemaxon JChem PostgreSQL Cartridge.

Create table with MOLECULE datatype column

  1. Using SQL, create the table in which a column has the MOLECULE datatype to store and process chemical data:

    CREATE TABLE public.chemical_data
    (
                 empno 	INT4 Primary Key,
        	ename 	VARCHAR(50),
        	location  CLOB,
        	mol   	MOLECULE('sample')
    );
  2. View the structure of this table:

    DESC public.chemical_data;
                   	Table "public.chemical_data"
      Column  |     	Type      	| Collation | Nullable | Default
    ----------+-----------------------+-----------+----------+---------
     empno	| integer           	|       	| not null |
     ename	| character varying(50) |       	|      	|
     location | clob              	|       	|      	|
     mol  	| molecule('sample')	|       	|      	|
    Indexes:
    	"chemical_data_pkey" PRIMARY KEY, btree (empno)

Insert/update/delete chemical data

  1. Using SQL statements, insert data, including chemical data, into the table:

    INSERT INTO public.chemical_data VALUES (10,'EMP1','COUNTRY1','c1ccccc1');
    INSERT INTO public.chemical_data VALUES (20,'EMP2','COUNTRY2','O=Cc1ccccc1');
    INSERT INTO public.chemical_data VALUES (30,'EMP3','COUNTRY3','C=CCOC=O');
    INSERT INTO public.chemical_data VALUES (40,'EMP4','COUNTRY4','c1nc2cncnc2n1');
  2. Use a SELECT statement to retrieve the data from the table, including the chemical data in the column mo1:

    SELECT * FROM public.chemical_data;
     empno | ename | location |  	mol 	 
    -------+-------+----------+---------------
    	10 | EMP1  | COUNTRY1 | c1ccccc1
    	20 | EMP2  | COUNTRY2 | O=Cc1ccccc1
    	30 | EMP3  | COUNTRY3 | C=CCOC=O
    	40 | EMP4  | COUNTRY4 | c1nc2cncnc2n1
    (4 rows)
  3. Update and delete data from the table, including the chemical data:

    UPDATE public.chemical_data SET ename = 'EMP#1',mol='C=CCOC=O' WHERE empno = 20;
    
    SELECT * FROM public.chemical_data;
     empno | ename | location |  	mol 	 
    -------+-------+----------+---------------
    	10 | EMP1  | COUNTRY1 | c1ccccc1
    	20 | EMP#1  | COUNTRY2 | O=Cc1ccccc1
    	30 | EMP3  | COUNTRY3 | C=CCOC=O
    	40 | EMP4  | COUNTRY4 | c1nc2cncnc2n1
    (4 rows)
    
    DELETE FROM public.chemical_data WHERE empno = 30;
    
    SELECT * FROM public.chemical_data;
     empno | ename | location |  	mol 	 
    -------+-------+----------+---------------
    	10 | EMP1  | COUNTRY1 | c1ccccc1
    	30 | EMP3  | COUNTRY3 | C=CCOC=O
    	40 | EMP4  | COUNTRY4 | c1nc2cncnc2n1
    (3 rows)

Create indexes in molecule columns

For indexing a column containing chemical structures, the following indextypes are provided:

- chemindex  
- sortedchemindex

This example creates indexes in the Molecule columns on the table:

CREATE TABLE public.chemical_data
(
            empno 	INT4,
    	ename 	VARCHAR(50),
    	location  CLOB,
    	mol1   	MOLECULE('sample') NOT NULL,
    	mol2   	MOLECULE('sample')
);

CREATE INDEX chmcal_index1 ON public.chemical_data  USING chemindex(mol1);
CREATE INDEX chmcal_index2 ON public.chemical_data  USING sortedchemindex(mol2);

DESC public.chemical_data;
               	Table "public.chemical_data"
  Column  |     	Type      	| Collation | Nullable | Default
----------+-----------------------+-----------+----------+---------
 empno	| integer           	|       	|      	|
 ename	| character varying(50) |       	|      	|
 location | clob              	|       	|      	|
 mol1 	| molecule('sample')	|       	| not null |
 mol2 	| molecule('sample')	|       	|      	|
Indexes:
	"chmcal_index1" chemindex (mol1)
	"chmcal_index2" sortedchemindex (mol2)

Searching for data from the Molecule column

You can perform substructure, duplicate, superstructure, and full-fragment searches.

Substructure search finds all structures that contain the query structure as a subgraph. Sometimes the chemical subgraph isn't the only thing that's provided. Certain query features that further restrict the structure are also provided. If special molecular features are present on the query, such as stereochemistry or charge, then only those targets match the ones that also contain the feature. However, if a feature is missing from the query, it doesn't need to be missing, by default.

Perform substructure search using the symmetrical sub-/super-structure search operator |<|.

This SQL statement creates a table and inserts data:

CREATE TABLE public.chemical_data
(
            empno 	INT4,
    	ename 	VARCHAR(50),
    	location  CLOB,
    	mol1   	MOLECULE('sample') NOT NULL,
    	mol2   	MOLECULE('sample')
);

INSERT INTO public.chemical_data VALUES (10,'EMP1','COUNTRY1','c1ccccc1','C=CCOC=O');
INSERT INTO public.chemical_data VALUES (20,'EMP2','COUNTRY2','O=Cc1ccccc1','C=CCOC=O');
INSERT INTO public.chemical_data VALUES (30,'EMP3','COUNTRY3','C=CCOC=O','c1nc2cncnc2n1');
INSERT INTO public.chemical_data VALUES (40,'EMP4','COUNTRY4','c1nc2cncnc2n1',NULL);

These SQL statements search data from the Molecule column:

SELECT * FROM public.chemical_data WHERE 'C=CCOC=O'|<| mol1 AND mol2 IS NOT NULL ORDER BY mol1;
 empno | ename | location |   mol1   | 	mol2 	 
-------+-------+----------+----------+---------------
	30 | EMP3  | COUNTRY3  	| C=CCOC=O | c1nc2cncnc2n1
(1 row)

SELECT * FROM public.chemical_data WHERE 'cncn'|<| mol1 ORDER BY mol1 LIMIT 2;
 empno | ename | location | 	mol1  	| mol2
-------+-------+----------+---------------+------
	40 | EMP4  | COUNTRY4   	| c1nc2cncnc2n1 |

Use duplicate searches mainly before database inserts to check whether the given molecule is already in the database.

Perform duplicate searches using the |=| operator.

This SQL statement searches data from the Molecule column:

SELECT * FROM public.chemical_data WHERE 'C=CCOC=O' |=| mol1;
 empno | ename | location |   mol1   | 	mol2 	 
-------+-------+----------+----------+---------------
	30 | EMP3  | COUNTRY3  	| C=CCOC=O | c1nc2cncnc2n1
(1 row)

Superstructure searches find all molecules for which the query is the superstructure of the target. Perform superstructure searches using the sub-/super-structure search operator |>|.

This SQL statement searches data from the Molecule column:

SELECT * FROM public.chemical_data WHERE 'C=CCOC=O'|>| mol1;
 empno | ename | location |   mol1   | 	mol2 	 
-------+-------+----------+----------+---------------
	30 | EMP3  | COUNTRY3  	| C=CCOC=O | c1nc2cncnc2n1
(1 row)

Full-fragment searches are between substructure searches and full searches. The query must fully match a whole fragment of the target. Other fragments might be present in the target, but they are ignored.

This SQL statement searches data from the Molecule column:

SELECT * FROM public.chemical_data WHERE 'C=CCOC=O'|=>| mol1 ORDER BY empno LIMIT 2;
WARNING:  am_functions.cpp:458 Index scan of reverse full fragment search is not supported.
 empno | ename | location |   mol1   | 	mol2 	 
-------+-------+----------+----------+---------------
	30 | EMP3  | USA  	| C=CCOC=O | c1nc2cncnc2n1
(1 row)

Import data using COPY

You can import data into the table that has a Molecule column using the COPY command. For example:

  1. Create a .csv file containing comma-separated data:

    cat /home/edb/Desktop/m_data.csv
    10,EMP1, COUNTRY1, c1ccccc1
    20,EMP2, COUNTRY2, O=Cc1ccccc1
    30,EMP3, COUNTRY3, C=CCOC=O
    40,EMP4, COUNTRY4, c1nc2cncnc2n1
    
  2. Using this SQL statement, create the table:

    CREATE TABLE public.chemical_data
    (
                empno 	INT4,
        	ename 	VARCHAR(50),
     	   	location  CLOB,
        	mol1   	MOLECULE('sample')
    );
    
  3. Load data into the table:

    COPY public.chemical_data FROM '/home/edb/Desktop/m_data.csv' (FORMAT csv);

You can then fetch data from the table using the following SQL statement:

SELECT * FROM public.chemical_data ORDER BY empno;
 empno | ename | location | 	mol1 	 
-------+-------+----------+---------------
	10 | EMP1  | COUNTRY1 | c1ccccc1
	20 | EMP2  | COUNTRY2	| O=Cc1ccccc1
	30 | EMP3  | COUNTRY3  	| C=CCOC=O
	40 | EMP4  | COUNTRY4   	| c1nc2cncnc2n1
(4 rows)

Could this page be better? Report a problem or suggest an addition!