Gene-Disease-Drug Association

Introduction

LitGENE is an interpretable transformer-based model that integrates textual information through contrastive learning to refine gene representations. This approach leverages over a century of genetic research, employing summary data to predict the functions of genes.

Figure 1

Figure 1: Overview of LitGENE Model

Technical Details

LitGENE incorporates a pre-trained language model augmented with Gene Ontology (GO) knowledge through contrastive learning. This infusion of information enriches gene representations, helping the model achieve superior performance in gene-related tasks.

The model uses BiomedBERT, a BERT-based text encoder specifically trained on biomedical text corpora. Contrastive learning draws the representations of semantically similar genes closer together in the embedding space while pushing apart those of dissimilar genes.

Key Features

Webserver Functionality

The LitGENE webserver provides an interactive platform for users to explore gene-disease-drug associations. The webserver is built using Flask, a lightweight WSGI web application framework in Python, which makes it easy to scale and extend.

Users can input a gene, disease, or drug of interest and receive a summary based on extensive biomedical literature. The webserver also features advanced functionalities such as:

Applications

LitGENE has been evaluated on various gene-related tasks, such as solubility prediction, chromatin state prediction, dosage sensitivity, subcellular localization, and transcription factor target identification. It has shown superior performance, demonstrating its broad applicability in genomics and biomedical research. We recommend using LitGENE software (link) for these predictions. Currently, the webserver only supports predictions of related biological entities from input prompts.

Research and Development

LitGENE was developed by a team of researchers from the Department of Computer Science and the Comprehensive Cancer Center at The University of New Mexico. The team includes Ala Jararweh, Oladimeji Macaulay, David Arredondo, Olufunmilola M Oyebamiji, Luis Tafoya, Kushal Virupakshappa, and Avinash Sahu (Principal Investigator).

The findings affirm the complementary nature of unstructured text to structured databases in enhancing biomedical predictions while conscientiously addressing interpretability and bias for AI deployment in healthcare.

References

Contact

For more information, visit our website or contact us at info@avisahuai.com.