Headed to ACS San Diego? Join us for Happy Hour!
← Back to glossary

SMILES (Simplified Molecular Input Line Entry System)

Definition
Definition
Definition

SMILES is a specification for describing the structure of chemical molecules using short ASCII strings. These strings encode molecular structures in a way that can be easily read and processed by computers. Developed in the late 1980s, SMILES has become a standard format for representing molecular structures in cheminformatics and computational chemistry. It encodes atoms, bonds and molecular topology that is also easily interpreted by the human eye.

Importance in Computational Drug Discovery:

  1. Standardization: SMILES provides a standardized way to represent chemical structures, ensuring consistency across different databases and computational tools.
  2. Compatibility: SMILES strings are compatible with various cheminformatics software and tools, facilitating data exchange and interoperability.
  3. Efficiency: SMILES strings are compact and efficient, making them suitable for large-scale data storage and processing.
  4. Searchability: SMILES allows for easy searching and indexing of chemical structures in databases, enabling rapid retrieval of information.
  5. Algorithmic Processing: SMILES strings can be used as input for various computational algorithms, including molecular modeling, virtual screening, and property prediction.
  6. Machine Learning: SMILES can be used to train machine learning models for predicting molecular properties, activities, and other drug discovery-related tasks.

Key Tools

  1. RDKit:
    • An open-source cheminformatics library that supports the generation, manipulation, and analysis of SMILES strings. It provides tools for molecular fingerprinting, substructure searching, and property prediction.
  2. Open Babel:
    • An open-source chemical toolbox designed to interconvert chemical file formats, including SMILES. It offers functionalities for molecular data manipulation and analysis.
  3. DeepChem:
    • An open-source library that uses deep learning models for various cheminformatics tasks, including molecule generation and property prediction using SMILES strings.
  4. ChemAxon Marvin:
    • A comprehensive cheminformatics suite that includes tools for drawing, converting, and analyzing chemical structures, supporting SMILES notation.
  5. DeepOrigin Tools Available in Balto:
    • SMILESToWeight: For calculating the molecular weight of compounds using SMILES strings.
    • FuncGroups: For determining functional groups in molecules represented by SMILES.
    • QED: For evaluating the drug-likeness of molecules using SMILES.
    • LogP, LogS, LogD: For predicting molecular properties like solubility and partition coefficients using SMILES.

Literature

Application of Monte Carlo Algorithm to Explore Simplified Molecular-Input Line-Entry System based Molecular Descriptors of BACE1 inhibitors for Therapeutic Application in Alzheimer’s Disease

  • Publication Date: 2018-08-14
  • DOI: 10.5120/ijca2018917745
  • Summary: This study uses Monte Carlo (MC) algorithms to develop QSAR models with molecular descriptors derived from SMILES for BACE1 inhibitors, revealing the significant role of cyclic rings in inhibition.

Evaluating the feasibility of SMILES-based autoencoders for drug discovery

  • Publication Date: N/A
  • DOI: 10.59720/22-248
  • Summary: Investigates the use of SMILES-based generative autoencoders for de novo molecular generation, highlighting the model's tendency to overfit and its capability to distinguish molecules based on molecular weight.

Drug Discovery in the Age of Artificial Intelligence: Transformative Target-Based Approaches

  • Publication Date: 2024-11-01
  • DOI: 10.3390/ijms252212233
  • Summary: Explores the impact of machine learning on target-based drug discovery, emphasizing the use of SMILES for virtual screening and de novo drug design.

Accelerating Drug Safety Assessment using Bidirectional-LSTM for SMILES Data

  • Publication Date: 2024-07-08
  • DOI: 10.48550/arXiv.2407.18919
  • Summary: Proposes a Bi-Directional Long Short Term Memory (BiLSTM) model for predicting toxicity and solubility from SMILES data, achieving high accuracy.

Computational drug discovery on human immunodeficiency virus with a customized long short‐term memory variational autoencoder deep‐learning architecture

  • Publication Date: 2023-11-27
  • DOI: 10.1002/psp4.13085
  • Summary: Uses an LSTM variational autoencoder for generating potential HIV drugs, demonstrating a promising approach in computational drug discovery.

Molecular Joint Representation Learning via Multi-Modal Information of SMILES and Graphs

  • Publication Date: 2022-11-25
  • DOI: 10.1109/TCBB.2023.3253862
  • Summary: Proposes a framework for joint representation learning using SMILES and molecular graphs, improving feature correspondence and prediction accuracy.

Exploration of Chemical Space Guided by PixelCNN for Fragment-Based De Novo Drug Discovery

  • Publication Date: 2022-12-01
  • DOI: 10.1021/acs.jcim.2c01345
  • Summary: Introduces a novel framework using PixelCNN for fragment-based molecular design with SMILES, capturing periodicity and enabling effective exploration of chemical space.

DuBloQ: Blockchain and Q-Learning Based Drug Discovery in Healthcare 4.0

  • Publication Date: 2022-05-30
  • DOI: 10.1109/iwcmc55113.2022.9824216
  • Summary: Proposes DuBloQ, a novel methodology integrating Q-Learning and blockchain for secure and efficient drug discovery using SMILES strings.

The application of combination of Monte Carlo optimization method based QSAR modeling and molecular docking in drug design and development

  • Publication Date: 2020-02-11
  • DOI: 10.2174/1389557520666200212111428
  • Summary: Reviews the use of Monte Carlo optimization and molecular docking with SMILES-based descriptors for drug design, demonstrating robust correlation with biological activity.

AI-Based Drug Discovery of TKIs Targeting L858R/T790M/C797S-Mutant EGFR in Non-small Cell Lung Cancer

  • Publication Date: 2021-07-28
  • DOI: 10.3389/fphar.2021.660313
  • Summary: Uses AI to discover drug candidates for NSCLC, leveraging SMILES datasets for ligand generation and virtual screening to overcome drug resistance.