New AI Model Forecasts Academic Paper Citations with High Accuracy
WASHINGTON, DC — May 27, 2025 A new study titled "ForeCite: Adapting Pre-Trained Language Models to Predict Future Citation Rates of Academic Papers" presents a breakthrough in using artificial intelligence to forecast the academic impact of scientific work. Developed by Gavin Hull and Alex Bihlo, the model adapts pre-trained language models to predict how often a scientific paper will be cited—offering an automated tool for gauging research significance.
Published on arXiv on May 13, 2025, the study introduces ForeCite, which fine-tunes large causal language models (CLMs) by attaching a simple linear regression head. Tested on a dataset of over 900,000 biomedical papers published between 2000 and 2024, ForeCite achieved a Pearson correlation of 0.826 with actual citation data, significantly outperforming previous models like Gradient Boosting and BioBERT-based predictors.
The authors conducted rigorous temporal and domain-based validation to ensure the model’s robustness across disciplines and years. Notably, ForeCite maintained strong performance when tested on papers from years not included in training data, showing promise for forward-looking citation predictions.
The study also raises questions about model interpretability. Saliency analysis suggests that titles and abstracts have outsized influence on citation predictions—potentially reflecting biases in academic publishing or discoverability trends.
By automating citation prediction at scale, ForeCite could have significant implications for funding decisions, hiring, and publication strategies, though the authors caution against over-reliance without human oversight.
*Sources:
*https://arxiv.org/abs/2505.08941