SMILES (Simplified Molecular Input Line Entry System)

Definition

SMILES is a specification for describing the structure of chemical molecules using short ASCII strings. These strings encode molecular structures in a way that can be easily read and processed by computers. Developed in the late 1980s, SMILES has become a standard format for representing molecular structures in cheminformatics and computational chemistry. It encodes atoms, bonds and molecular topology that is also easily interpreted by the human eye.

Importance in Computational Drug Discovery:

Standardization: SMILES provides a standardized way to represent chemical structures, ensuring consistency across different databases and computational tools.
Compatibility: SMILES strings are compatible with various cheminformatics software and tools, facilitating data exchange and interoperability.
Efficiency: SMILES strings are compact and efficient, making them suitable for large-scale data storage and processing.
Searchability: SMILES allows for easy searching and indexing of chemical structures in databases, enabling rapid retrieval of information.
Algorithmic Processing: SMILES strings can be used as input for various computational algorithms, including molecular modeling, virtual screening, and property prediction.
Machine Learning: SMILES can be used to train machine learning models for predicting molecular properties, activities, and other drug discovery-related tasks.

‍

Key Tools

RDKit:
- An open-source cheminformatics library that supports the generation, manipulation, and analysis of SMILES strings. It provides tools for molecular fingerprinting, substructure searching, and property prediction.
Open Babel:
- An open-source chemical toolbox designed to interconvert chemical file formats, including SMILES. It offers functionalities for molecular data manipulation and analysis.
DeepChem:
- An open-source library that uses deep learning models for various cheminformatics tasks, including molecule generation and property prediction using SMILES strings.
ChemAxon Marvin:
- A comprehensive cheminformatics suite that includes tools for drawing, converting, and analyzing chemical structures, supporting SMILES notation.
DeepOrigin Tools Available in Balto:
- SMILESToWeight: For calculating the molecular weight of compounds using SMILES strings.
- FuncGroups: For determining functional groups in molecules represented by SMILES.
- QED: For evaluating the drug-likeness of molecules using SMILES.
- LogP, LogS, LogD: For predicting molecular properties like solubility and partition coefficients using SMILES.

Literature

Application of Monte Carlo Algorithm to Explore Simplified Molecular-Input Line-Entry System based Molecular Descriptors of BACE1 inhibitors for Therapeutic Application in Alzheimer’s Disease

Publication Date: 2018-08-14
DOI: 10.5120/ijca2018917745
Summary: This study uses Monte Carlo (MC) algorithms to develop QSAR models with molecular descriptors derived from SMILES for BACE1 inhibitors, revealing the significant role of cyclic rings in inhibition.

Evaluating the feasibility of SMILES-based autoencoders for drug discovery

Publication Date: N/A
DOI: 10.59720/22-248
Summary: Investigates the use of SMILES-based generative autoencoders for de novo molecular generation, highlighting the model's tendency to overfit and its capability to distinguish molecules based on molecular weight.

Drug Discovery in the Age of Artificial Intelligence: Transformative Target-Based Approaches

Publication Date: 2024-11-01
DOI: 10.3390/ijms252212233
Summary: Explores the impact of machine learning on target-based drug discovery, emphasizing the use of SMILES for virtual screening and de novo drug design.

Accelerating Drug Safety Assessment using Bidirectional-LSTM for SMILES Data

Publication Date: 2024-07-08
DOI: 10.48550/arXiv.2407.18919
Summary: Proposes a Bi-Directional Long Short Term Memory (BiLSTM) model for predicting toxicity and solubility from SMILES data, achieving high accuracy.

Computational drug discovery on human immunodeficiency virus with a customized long short‐term memory variational autoencoder deep‐learning architecture

Publication Date: 2023-11-27
DOI: 10.1002/psp4.13085
Summary: Uses an LSTM variational autoencoder for generating potential HIV drugs, demonstrating a promising approach in computational drug discovery.

Molecular Joint Representation Learning via Multi-Modal Information of SMILES and Graphs

Publication Date: 2022-11-25
DOI: 10.1109/TCBB.2023.3253862
Summary: Proposes a framework for joint representation learning using SMILES and molecular graphs, improving feature correspondence and prediction accuracy.

Exploration of Chemical Space Guided by PixelCNN for Fragment-Based De Novo Drug Discovery

Publication Date: 2022-12-01
DOI: 10.1021/acs.jcim.2c01345
Summary: Introduces a novel framework using PixelCNN for fragment-based molecular design with SMILES, capturing periodicity and enabling effective exploration of chemical space.

DuBloQ: Blockchain and Q-Learning Based Drug Discovery in Healthcare 4.0

Publication Date: 2022-05-30
DOI: 10.1109/iwcmc55113.2022.9824216
Summary: Proposes DuBloQ, a novel methodology integrating Q-Learning and blockchain for secure and efficient drug discovery using SMILES strings.

The application of combination of Monte Carlo optimization method based QSAR modeling and molecular docking in drug design and development

Publication Date: 2020-02-11
DOI: 10.2174/1389557520666200212111428
Summary: Reviews the use of Monte Carlo optimization and molecular docking with SMILES-based descriptors for drug design, demonstrating robust correlation with biological activity.

AI-Based Drug Discovery of TKIs Targeting L858R/T790M/C797S-Mutant EGFR in Non-small Cell Lung Cancer

Publication Date: 2021-07-28
DOI: 10.3389/fphar.2021.660313
Summary: Uses AI to discover drug candidates for NSCLC, leveraging SMILES datasets for ligand generation and virtual screening to overcome drug resistance.

‍

Explore more materials, our research and analyses of AI and the drug discovery industry below.

SMILES (Simplified Molecular Input Line Entry System)

Key Tools

Literature

Deep Origin Resources

Scientific Poster

•

Drug Discovery

Discovering Novel Synthetic Lethal Pairs With Large Scale Cellular Simulations

Scientific Poster

•

Drug Discovery

Drug Discovery Challenges with a Multiscale Molecular Modeling Pipeline

Blog

•

Drug Discovery

Finally, a Useful AI Assistant for Drug Discovery: Meet Balto

Discovering Novel Synthetic Lethal Pairs  With Large Scale Cellular Simulations