Information was scraped from this page on 2023-06-28.
Over 5000 abstract titles from 2023 were used as training data.
Embeddings for these were generated through the OpenAI API, with the text-embedding-ada-002
model.
A penalized logistic regression model was fit using the glmnet
R package.
The tuning parameter was selected using cross validation.
The area under the ROC curve was 0.83 in the training data.