1
|
Title and abstract
|
|
1-1 |
Title |
Use broad AI terms like “artificial intelligence” or “machine learning” in the title, reserving specific model details for the abstract. |
|
1-2 |
Abstract |
Provide a structured summary including the study design, population details, AI algorithm type, statistical analysis methods, primary and secondary outcomes, main results, conclusions, and any public availability of the software, data, or model. |
|
2
|
Introduction
|
Provide the scientific and clinical background, including pre-existing evidence for AI use, related clinical problems, and the clinical role the AI plays in solving them; the research questions answered using the AI model; and study objectives and hypotheses. |
|
3
|
Methods
|
|
3-1 |
Study design, setting, and population |
|
3-1-1 |
Study design |
Clearly define the study design (prospective or retrospective), specific objectives (feasibility, superiority, prediction), and the reference standard used to benchmark the AI model’s performance. |
|
3-1-2 |
Study settings |
Provide comprehensive details on the study settings, including the type, size, and specific location of the environment (e.g., hospital type, clinic area), availability of relevant facilities or technologies, representativeness of the real-world clinical conditions, and any technical configurations or site-specific adaptations, to clarify limitations of generalizing the AI model. |
|
3-1-3 |
Study population |
Describe the participant recruitment process, including inclusion and exclusion criteria, and illustrate the number of participants at each stage using a flow diagram, if possible. |
|
3-2 |
Description of the AI system |
|
3-2-1 |
Study data |
· Data source and collection: specify the origin of input data and the collection period. Include any devices or software used, along with detailed acquisition protocols. |
|
· Data structuring: indicate whether data is structured or unstructured. |
· Data preprocessing and transformation: describe preprocessing steps to standardize and format data for the AI system. Include criteria for minimum data quality and how outliers or missing data were handled. |
· Investigator expertise: if data collection relied on investigator expertise, list the number of investigators and their qualifications, any training provided, and the methods used to address inter- and intra-investigator variability. |
· De-identification and privacy protection: provide details on the methods used for de-identifying data and protecting personal health information. |
· Ground truth annotation: clearly define the ground truth reference (e.g., gold-standard clinical measurements) used for model validation. Ensure the definition is precise and reproducible. |
· Data processing status: state whether data were pre-processed before analysis or processed during AI system application, and specify whether the data were generated during or before AI use. |
3-2-2 |
Study output |
· Definition of AI output: specify the AI system’s output in relation to the clinical problem. Ensure that the format of both the AI output and ground truth are uniform. |
|
· Impact on clinical management: describe how the AI output guides clinical management, especially if it influences clinical outcomes. If researchers perform standardized clinical actions based on the AI output, this should be documented. |
· Interpretability: confirm that researchers understand and can interpret the AI output, which determines whether standardized actions are required (e.g., administering medication based on a probability threshold generated by the AI). |
· Reference protocol comparison: if comparing AI performance to a reference clinical protocol, provide details on how the AI system and reference are used to guide clinical decision-making. Explain the rationale for using the reference standard, including its limitations, errors, and potential biases. |
3-2-3 |
Data separation |
· Dataset splitting: describe how the dataset was initially split (e.g., training, validation, and test sets), including the proportions and rationale for each subset. Preferably, use an external test set for validation; if using internal validation, explain and justify the method used. |
|
· Population representation: ensure that the test set represents the target population, using methods such as stratified sampling if necessary. Report and statistically compare the distribution of key variables across training, validation, and test sets, and investigate any systematic differences. |
· Prevention of overfitting and information leakage to ensure generalizability: outline the methods used to prevent overfitting, such as k-fold cross-validation. Ensure that dataset splitting was done at the beginning, maintaining separation of sets to prevent information leakage. Describe the measures taken to prevent both issues to ensure the model is generalizable beyond the training data. |
3-2-4 |
Concise description of the AI system |
· Model selection and task specification: identify the AI model type, intended task (e.g., classification or regression), and any specific beneficiaries. Include scientific rationales for model selection. |
|
· Algorithm and supporting environment: describe the mathematical algorithm, hardware, and software (with versions) supporting the AI system, as well as developer or manufacturer details and configuration settings. |
· Model versions from previous studies: cite prior development/validation studies, presenting the AI system information as it was used in those studies to facilitate performance comparison. If the AI was modified, provide scientific rationales for changes and describe the modifications clearly. |
· Supplemental data for new models: provide unpublished models as supplementary materials or register them in a public database with accession details to ensure version history and transparency. |
· AI model architecture: fully document the model architecture to enable replication, detailing inputs, outputs, components, and the scientific rationale for each. Include architectural elements such as layers, activation functions, pooling and normalization types, dropout layers, optimization algorithms, and hyperparameters. |
· Version history and identification: for clinical trials, provide a unique device identifier or regulatory marker for the AI model. If the model has multiple versions, document modifications and the rationale behind each change. |
· Reporting standards: use a standardized format for reporting concise information in the AI model, when possible. |
3-2-5 |
Model training |
· Detailed training process: document all training processes in detail to ensure reproducibility, ideally providing the code. |
|
· Data augmentation: describe any data augmentation techniques used (e.g., geometric transformations, paraphrasing, noise introduction) if required for specific data types, such as images or text. |
· Parameter initialization: explain the initialization method for model parameters. For random initialization, describe the distributions from which the parameters are drawn (e.g., uniform, normal) and the key parameters of the distributions. For transfer learning, the source of the initial parameters should be provided. Specify any combination of random initialization and transfer learning, indicating unmodifiable parameters where applicable. |
· Convergence monitoring: provide details on the methods used for monitoring model convergence, including pre-defined stopping criteria and hyperparameters. If convergence was not achieved, describe any adjustments considered (e.g., feature scaling, learning rate, batch size, architectural changes). |
· Validation metrics: specify the metrics used to validate model performance (e.g., sensitivity, specificity, precision, area under the receiver operating characteristic curve, mean squared error). |
3-2-6 |
Model evaluation |
· Performance metrics: specify the primary performance metric for model selection (e.g., area under the receiver operating characteristic curve, accuracy, and precision for classification; mean squared error for regression). Present metrics with statistical uncertainty (e.g., 95% CI) and compare them between models using appropriate statistical tests. |
|
· Model evaluation methodology: provide a rationale for the chosen model evaluation method, allowing for flexibility in method selection rather than adherence to a predefined protocol. |
· Multiple model selection: if more than one model is chosen, justify the selection. For ensemble models, provide 1) the training data allocation to each model, 2) the combination function to resolve disagreements among models, and 3) a full description of each model in the ensemble. |
· Comparison with existing models: if applicable, compare the final model with previously published models addressing the same clinical problem. |
· Sensitivity analysis: assess model robustness through sensitivity analysis by testing the model under various assumptions or initial conditions. |
· Result interpretation: provide clear guidance on interpreting the AI model results to prevent misapplication in clinical settings. |
3-3 |
Miscellaneous aspects of the AI model description |
|
3-3-1 |
Defining features and response variables |
Use common data elements to ensure standardized, consistent, and compatible variable definitions and formats across study settings. |
|
3-3-2 |
Sample size estimation |
Calculate the required sample size, if possible, based on results from a pilot or previous study to achieve the pre-determined statistical power with an acceptable type I error rate. |
|
4
|
Results
|
|
4-1 |
Study data |
· Flowchart for inclusion/exclusion: use a flowchart or diagram to illustrate the inclusion and exclusion of participants or data at each stage, based on the specified criteria. Include the number of participants/data included or excluded and their corresponding criteria. If a flowchart is already provided in the Methods section, disregard this item. |
|
· Dataset preparation: confirm the dataset was prepared as previously planned in the Methods section. If statistical comparisons among partitioned data are already reported in the Methods section, disregard this item. |
· Population characteristics: describe and compare the characteristics of the training and test sets to detect any dataset shifts. |
· Selective reporting of baseline characteristics: report baseline characteristics relevant to the AI model’s task or study outcomes. |
· Missing data: clearly report the amount of missing data across the dataset’s features. |
4-2 |
Model performance |
· Reporting metrics with statistical uncertainty: report performance metrics on the training, validation, and test sets, including statistical uncertainty and significance as outlined in the Methods section. Provide scientific rationale for each selected metric. |
|
· Clinical translation of model performance: evaluate and justify how model performance metrics relate to clinical outcomes. Statistically compare the performance of the final model to that of standard techniques or baseline models. |
· Feature contribution analysis: describe the contribution of each feature to model performance, with relevant plots for visualizing feature contributions if applicable. |
· Sub-group performance: report sub-group analysis, identifying groups for which the model performed best and worst, as well as any critical sub-group performance. |
· Sensitivity analysis: describe cases based on model confidence and prediction correctness for classification models. Report cases with the largest and smallest differences between predicted and actual values for regression models. |
· Unsupervised model assessment: assess the accuracy and relevance of unsupervised model outputs through expert review. |
4-3 |
Use of the AI model in clinical practice |
· Adherence to protocols: report adherence or non-adherence of the investigators to the study protocols for AI model use in clinical practice. Include details on any cases of non-adherence, particularly if the AI model was not used as planned. |
|
· Impact on medical practice and patient experience: document any unexpected changes in medical practice or patient experience caused by AI model usage. Note any additional procedures, manual data handling, or increased workload associated with AI integration. |
· External influences on model performance: report any external changes or influences beyond AI model implementation that may affect model performance or conduct of the study. |
· Modifications to the AI algorithm: describe any modifications made to the AI algorithm during the study. |
· Error handling: document instances of both agreement and disagreement between the AI model’s recommendations and investigators’ decisions, based on the ground truth, to assess the model’s reliability in clinical decision-making. |
5
|
Discussion
|
· Summary of study results: summarize the study findings, their contribution to advancing knowledge, clinical implications, and impact on the relevant academic field. Indicate whether the study results support the use of the AI model compared to previous studies or current standards, referring to performance metrics when applicable. |
|
· Study limitations: identify study limitations related to materials and methods, unexpected results, statistical uncertainty, biases, generalizability, challenges in clinical application, and unanswered questions. Balance these limitations against the study’s strengths to assess the extent of the evidence supporting the AI model’s potential benefits. |
· Human factors: discuss the effects of human factors on model performance |
· Future actions: outline potential future actions based on the study results. |
· Safety issues: address any safety concerns and propose specific strategies to mitigate these issues in future studies, along with the rationale for each approach. |
6
|
Public accessibility of the AI system, source code, and raw data
|
· Source code and training data availability: ensure that the algorithm, source code, and training data, as well as data collected during the study, are shared publicly. |
|
· Code documentation: provide the code in well-documented, easily understandable scripts or notebooks, with clear explanations and annotations. |
· Formatted data and software/hardware requirements: share the formatted raw data used as model input, along with the versions of libraries, packages, modules, and software components, and any specific computer system configuration requirements necessary for the code to function. |
· Repository access information: include accessible links to repositories, contact information, or instructions for obtaining access to all relevant files. |
· Intermediate results: generate and share as many intermediate outputs as possible at each stage of model development. |
· Access restrictions: if access to the AI system or data is restricted due to proprietary or licensing issues, clearly state the reason. If privacy concerns limit access to training data, release at least the model’s source code for public access. |
· Levels of sharing: refer to the existing categorization of levels of sharing and adhere to model repository or journal-specific policies on data and code sharing. |
7
|
Other information
|
|
7-1 |
Pre-registration of AI research to prevent p-hacking |
Pre-registration of AI research protocols: pre-register the study design, data handling (including strict separation among training, validation, and testing datasets), model configurations, and parameter tuning procedures. |
|
7-2 |
Safety issues related to errors in AI model use in medical practice |
· Misleading recommendations: indicate whether the AI model’s recommendations were found to mislead clinical practice, affecting patient safety and clinical outcomes. Specify who is responsible for clinical decisions at each step in the clinical pathway. |
|
· Reporting errors: document AI algorithm errors, errors external to AI model use, and human errors, including their occurrence rate, causes, and impacts on the clinical pathway, study outcomes, and patient safety. |
· Error detection and management: describe how errors were detected, managed, and corrected, noting whether AI algorithm or human errors were identified before they jeopardized patient safety. |
· Risk reduction efforts: outline any efforts made to reduce the risks caused by AI-related and external errors. |
· Reporting adverse events: report all direct and indirect, expected and unexpected adverse events related to AI model use, misuse, or even correct use, along with strategies to mitigate these events. |
· Risk assessment for patient safety: identify and assess relevant risks to patient safety associated with AI model use. |
· Learning curve metrics: provide metrics for learning curves of the investigators involved in data collection for AI model development. Present these metrics chronologically, with graphical representation, if possible. |
7-3 |
Human errors in AI model use |
· Data preparation method: specify whether data were prepared manually or through an automated algorithm. If automated, describe the tools, algorithms, and parameters used in the process. |
|
· Researcher training for data selection: if input data were selectively acquired by the researchers, confirm they were fully trained in a standardized data selection protocol. Clarify whether this protocol can be accommodated in real clinical practice. |
· Researcher training for AI model use in clinical practice: confirm that researchers using the AI model in clinical settings received adequate training. |
7-4 |
Errors external to the AI system |
Document external factors that could influence AI system performance in clinical settings. |
|
7-5 |
Ethical considerations regarding equity |
· Fairness and equity assessment: describe efforts made to assess and promote fairness and equity in the AI model, acknowledging any existing inequities in current healthcare standards. |
|
· Inclusion of underrepresented groups: ensure the input data adequately include underrepresented groups (e.g., racial or ethnic populations) and relevant features to support fair prediction. |