a machine learning model for Accurate Identification of m6A RNA modification Sites
Abstract:
Recent developments in novel sequencing techniques have enabled the transcriptome-wide detection of unique RNA modifications, which post-transcriptionally alter RNA splicing, translational activity, and stability. N6-methyladenosine (m6A), the most well-studied human RNA modification, has been proven to be associated with various diseases, such as cancers, neurological disorders, and cardiovascular diseases. Current research investigates the role of m6A regulators as therapeutic targets and diagnostic markers. A deeper understanding of m6A modifications could significantly advance the field of personalised medicine. Several computational models have been used to predict RNA modification sites using validated experimental datasets, accelerating the discovery of m6A applications. Nevertheless, a more comprehensive characterisation of RNA modification sites could further improve the prediction accuracy. This study presents AI-m6ARS, a novel interpretable machine learning method for accurately identifying m6A RNA modification sites using high-resolution experimental data. AI-m6ARS effectively encodes RNA sequences through four types of variables: one-hot encoding, iFeatures, conservation scores and geographic features, overall improving the site characterisation. With the use of two robust feature selection methods and an ensemble-based machine learning algorithm, the resultant model for AI-m6ARS demonstrated generalisable predictive performance across different validation sets.
Important Information: