Learning curves for three specific procedures by anesthesiology residents using the learning curve cumulative sum (LC-CUSUM) test
Article information
Abstract
Background
The learning curve cumulative sum (LC-CUSUM) test is an innovative tool that allows quantitative monitoring of individual medical performance during the learning process by determining when a predefined acceptable level of performance is reached. This study used the LC-CUSUM test to monitor the learning process and failure rate of anesthesia residents training for specific subspecialty anesthesia procedures.
Methods
The study included 490 tracheal punctures (TP) for jet ventilation, 340 thoracic epidural analgesia (TEA) procedures, and 246 fiberoptic nasal intubations (FONI) performed by 18 residents during their single 6-month rotation.
Results
Overall, 27 (14–52), 19 (5–41), and 14 (6–33) TP, TEA, and FONI procedures were performed, respectively, by each resident. In total, 2 of 18 residents achieved an acceptable failure rate for TEA according to the literature and 4 of 18 achieved an acceptable failure rate for FONI, while none of the residents attained an acceptable rate for TP.
Conclusions
A single 6-month rotation in a reference teaching center may not be sufficient to train residents to perform specific or sub-specialty procedures as required. A regional learning network may be useful. More patient-based data are necessary to conduct a risk adjustment analysis for such specific procedures.
Introduction
As assessments of healthcare quality become more important in medical practice [1], statistical process-control methods initially developed to monitor the quality of manufactured goods are being increasingly used to monitor clinical performance [2]. The cumulative sum (CUSUM) is one of the methods used in the medical field. The learning curve CUSUM (LC-CUSUM) test is an alternative to the CUSUM test that was developed to focus specifically on the learning period of a procedure [34]. Statistical process-control methods were first used to monitor surgical performance in pediatric cardiac surgery [5] and are still used to monitor cardiac surgeons' performance and their patients' outcomes [6]. Statistical process-control methods have been used in some studies to construct learning curves for anesthesiology procedures, such as peripheral vein cannulation [7], tracheal intubation and mask ventilation [789], epidural analgesia [8], transverse abdominal plane block [10], or ultrasound assessments for epidural analgesia [1112].
The French anesthesia residency program has certain mandatory rotations, such as pediatric and obstetric anesthesia, but there is no obligation for the subspecialty rotation. Other anesthesia training programs offer sub-specialty rotations scheduled for managing specific procedures [13]. However, technical competency for specific procedures in France improves only during an elective subspecialty rotation.
Oncologic abdominal and cervicofacial surgery comprises a large part of our activities, and our department is considered a reference center for fiberoptic nasotracheal intubation (FONI) (> 250 procedures/year), thoracic epidural analgesia (TEA), and tracheal puncture (TP) for high-frequency jet ventilation (HFJV) (> 500 procedures/year) by French academics and residents. We have been searching for an objective and individual performance measurement method to assess the learning process for these procedures correctly. Thus, this study assessed the learning curve of residents performing FONI, TEA, and TP procedures during a 6-month rotation using a dedicated statistical method (LC-CUSUM test).
Materials and Methods
During three consecutive rotations, data from 18 residents (six per rotation) were collected for all performed procedures. After local ethics committee agreement to waive informed consent, the residents were recruited for the LC-CUSUM test evaluation at the beginning of their 6-month rotation in our department. Our current practices remained unchanged.
Protocol and measurements
This study focused on TP, TEA, and FONI. As the participants had already completed most of their residency program, they had already completed the obstetrics rotation and had practiced more than 150 lumbar epidural analgesia procedures. Despite this experience, all residents had performed fewer than 10 TEA procedures before our departmental rotation. They were new to TP for HFJV and most were new to FONI (four FONI procedures for the most experienced resident). The residents were informed about the assessment and the criteria for failure and success of each procedure in our department on the first day of the rotation. The residents received formal teaching about each of the procedures during the first week of the rotation and assisted a staff practitioner with one procedure. Then, the first procedure was performed under active supervision and residents were advised appropriately. The first attempt at each of the three procedures was considered a rehearsal, and no data were collected for the final analysis. Data were collected in a personal paper log book, in which the failure criteria were recorded.
Anesthesia was induced prior to HFJV using a total intravenous anesthesia technique with remifentanil and propofol. Then, the trachea was punctured through the cricothyroid membrane with an arterial Leader Cath 14-gauge catheter (Vygon, Ecouen, France) using the Seldinger method. The criteria for failure were two punctures or more and catheter misplacement after checking during direct hypolaryngoscopy with or without a complication (intraoperative subcutaneous emphysema or hypoxia with SpO2 < 90% were considered) [14]. TEA was performed by locating the epidural space using a blind loss of resistance technique to saline with an 18-gauge Tuohy needle (Portex Epidural minipack; Smiths Medical, Ashford, UK). The catheter was inserted 5 cm into the epidural space. A test dose was administered before anesthesia was induced, and epidural analgesia was used during surgery following the departmental analgesic protocol [15]. The procedure was considered a failure if epidural analgesia was insufficient in the postoperative care unit, if more than two punctures were needed, if the resident was unable to insert the catheter, or if the dura mater was perforated when the catheter was inserted. Fluoroscopy and ultrasound were not used to help guide the needle tip, as all of our “last year” residents had already completed the obstetrics rotation and were proficient in obstetric anesthesia, including epidural analgesia, and were fully aware of the complications related to epidural puncture. FONI was realized under sedation with a remifentanil target control infusion, which is our standard practice. Endoscopy was performed with a video feedback system in all cases to allow confirmation of placement under supervision by a senior practitioner. The procedure was considered a failure if the resident was unable to perform the entire procedure without any intervention from a tutor, if sedation was inadequate (patient expressed unacceptable pain with a numerical verbal scale score > 3, movement, or agitation), or if a complication occurred (hypoxia with SpO2 < 90%, bleeding in airway, or bronchospasm) [16].
All procedures were considered successful if no failure criteria were met.
Data analysis
The LC-CUSUM test was developed to determine when a trainee has reached a predefined level of performance [3]. The LC-CUSUM sequentially tests the null hypothesis that “performance is inadequate” against the alternative that “performance is adequate.” The LC-CUSUM test computes the “St” score from successive outcomes; a success increases the score and a failure decreases it. Numerically, St = max (St0 − 1 + Wt), whereas St0 = 0 and Wt = log[(1 − p0)/(1 − p0 − Δ)], if the procedure is a success, and Wt = log[p0/(p0 + Δ)], if the procedure is a failure. p0 is the proportion of failure under an adequate performance level, Δ is an acceptable deviation from adequate performance, and t is the number of attempts and is > 0. Once the score reached the predefined limit “h,” the test rejects the null hypothesis in favor of the alternative, and performance is deemed adequate.
The LC-CUSUM score was plotted along the y-axis against successive procedures along the x-axis. As long as the score remained in the continuation region, namely, between the x-axis and the decision limit, “h,” performance was not considered adequate and monitoring was continued. The score increases as the number of failures decreases until it crosses the limit “h” where proficiency is declared. A particular feature of the LC-CUSUM test is that it incorporates a holding barrier at zero that cannot be crossed. Therefore, the “St” score remains zero if the trainee accumulates numerous successive failures and, consequently, the LC-CUSUM remains responsive to a decrease in the number of failures at all times. For example, if a failure occurs due to a technical misunderstanding, the trainee does not have to compensate unnecessarily for all accumulated failures, indeed once he overcomes the problem and may capitalize on successes only.
Based on a consensus from published data [141516], acceptable failure rates (defined proficiency) were 1, 10, and 18% for the TP, TEA, and FONI, respectively; deviations from these levels that were considered acceptable were 1, 5, and 9%, and the defined equivalence zones were 2, 15, and 27%, respectively. Computer simulations were performed to obtain limits and their corresponding true and false discovery rates. Tables 1, 2, 3 show the TP, TEA, and FONI computer simulations. The true discovery rate (TDR) is the proportion of alarms emitted under the acceptable failure rate (alternative hypothesis), and the false discovery rate (FDR) is the proportion of alarms emitted under the non-acceptable failure rate; both rates are defined for a given number of procedures. A high TDR and low FDR are the objectives, but these rates vary together; thus, there is a necessary trade-off between a test that is sensitive and a test that will yield few false alarms. The decision limits were chosen to provide a TDR of 80%, regardless of the FDR; performance on the tests was calculated arbitrarily for 50 of each of the three procedures, as that is the maximum number of procedures that could be performed by each resident according to our database. The number of procedures during which the test is run is also critical to test performance. Namely, the more procedures a trainee is given, the more likely the test will be able to separate the null from alternative hypotheses. For example, if the trainee has a true performance of 10% for TEA (adequate level) and the TDR is 80%, there is an 80% chance of the test signaling (indicating acceptable failure rate) more than 50 procedures. If the FDR is 15%, true performance is 20% (inadequate level), and there is a 15% chance of emitting a false alarm from more than 50 procedures.
The values for an 80% TDR were 0.9 for TEA, 0.375 for TP, and 1.4 for FONI.
Results
Eighteen residents performed a total of 1,049 procedures during their rotation.
Tracheal puncture
The LC-CUSUM results for TP are presented in Fig. 1. A total of 490 TP procedures were performed. Residents performed a median of 27 procedures (interquartile range [IQR], 21–31). In total, 451 (92%) of the procedures were successful regarding the chosen criteria. Among failures, 25 (5%) were misplacements of the catheter, 11 (2%) resulted in subcutaneous intraoperative emphysema, and 5 (1%) resulted in hypoxic episodes. Two patients had concomitant emphysema and misplacement. The decision limit was never crossed; thus, no trainee was declared proficient with TP at the end of the rotation.
Thoracic epidural analgesia
The LC-CUSUM results for TEA are presented in Fig. 2. A total of 340 procedures were performed (median, 19/resident; IQR, 14–23). According to the criteria, 285 (83%) procedures were successful and 27 were failures (8%) (with resident unable to insert the catheter, in 13 cases (4%) more than two attempts, 12 cases (4%) of inefficient analgesia, and three (1%) dura mater punctures). The decision limit was crossed by two (11%) residents after 17 and 34 procedures.
Fiberoptic nasotracheal intubation
The LC-CUSUM results for FONI are presented in Fig. 3. A total of 246 procedures were performed with 226 (92%) successes; two (0.7%) residents did not complete the entire procedure alone, one (0.3%) did not manage sedation adequately, and 17 (7%) complications occurred, mostly hypoxia. Residents handled a median of 14 (IQR, 9–16) intubations during their rotation. Two residents performed more than 22 procedures. The decision limit was crossed by four (22%) residents after 13 (n = 3) and 17 procedures.
Discussion
The LC-CUSUM results revealed that all but four residents did not manage to demonstrate competency during their rotation and were not considered competent with the procedures after their rotation in our department. Our results are not consistent with previously published studies. For example, de Oliveira Filho and Komatsu [79] reported that ≥ 50% of residents demonstrate competency for lumbar epidural analgesia or orotracheal intubation procedures. Some explanations of this discrepancy should be considered. The adequate and inadequate performance rates chosen may not have been appropriate. We chose published data [141516] to set performance rates and compare results from studies in other departments. de Oliveira Filho [7] determined the acceptable performance rate for learning lumbar epidural analgesia arbitrarily from samples of procedures performed by instructors. They used 20 and 40% as acceptable and unacceptable failure rates, respectively, and 45% of the residents attained an acceptable failure rate with these limits. Kestin [8] used rates for the same procedure taken from the consensus among staff anesthesiologists in their department (5 and 10% for acceptable and unacceptable failure rates, respectively). They reported that 41% of residents exceeded the chosen failure rate and concluded that the failure rate might be too stringent. They reported that 39–67 procedures were needed to achieve competency in lumbar epidural analgesia, according to the CUSUM test. No information on adequate performance rates for TP and FONI is available in the literature.
The choice of TDR and FDR plays an important part in monitoring efficiency, as our simulations showed. Test performance depends on the difference between the adequate performance level and the acceptable deviation from that level and on the number of procedures during which the trainee is observed. The larger the difference and the longer the observation period, the better the test performance. Our simulations showed that trainees needed to perform approximately 65 FONI (h = 1.67), 140 TEA (h = 1.82), and 520 TP procedures to obtain a TDR of 80% with an FDR of 5%, where the difference between performance under the null and alternative hypotheses is extremely low (1%) (h = 1.88). If a trainee is denied the appropriate number of procedures, then we are at risk of not detecting proficiency in due time by setting the limit “h” too high or we are at risk of having too many false alarms by setting the limit “h” too low. Instructors could decide not to wait for the required number of procedures when assessing competency. In practice, the test properties are mediocre due to the very stringent level of performance required for some procedures, such as TP. Therefore, a dilemma arises in which the test properties are improved, the trainee is allowed to complete a relatively large number of training procedures, or the number of procedures is limited but the properties of the test remain mediocre. Only four residents achieved adequate FONI performance. Using an efficient test to monitor learning, such as the LC-CUSUM, for a 6-month rotation in a reference center may be insufficient to consider that a resident has reached proficiency. This could be an effective approach if a regular collection is paired with an attentive survey from residency heads at the beginning of the residency. However, completing and interpreting charts could be difficult due to discontinuous activity [817] and the effectiveness of such an undertaking has not been evaluated. Another possibility is that the residency is organized differently, so proficiency might not be achieved but performance is acceptable. Rotations could be made to fulfill specific goals rather than fulfill a predefined time period in a department [13]. Residents would have to continue training as long as these goals were not met. An efficient statistical process-control method is mandatory for such an approach to prevent excessive training time on a specific topic [13].
Few studies have investigated teaching specific procedures for airway management. Teaching HFJV is difficult, as this procedure is not in general use, despite its validation as part of airway management algorithms [18]. Although the rate of complications is directly related to the number of procedures performed [19], minimal skill during residency has not been claimed from a simulator setting [13]. Cricothyroidotomy has only been studied on mannequins or cadavers [2021]. Most FONI studies included either sleeping patients with no difficult airways or mannequins [2223], and the time to complete these procedures was the main outcome studied, rather than efficacy or safety. Furthermore, such conditions for carrying out the procedures have nothing to do with patients undergoing head and neck surgery. Real-life training seems unavoidable until an acceptable level of training is reached for technical and non-technical skills during complex procedures, such as TEA or FONI, as the use of mannequins is considered inappropriate [2324].
Using a risk adjustment for a procedure would be of great benefit. Unlike some authors [925], we did not use a risk-adjusted test, mainly because we were unable to find external data to quantify patient risk level, except for epidural analgesia [2627]. This was a limitation of our study and is consistent with the requirement for specific data, as reported by Komatsu et al. [9]. We have started collecting demographic data in a new clinical database to fulfill this requirement concerning airway management.
In conclusion, the LC-CUSUM test enabled learning of unusual and specific procedures by residents to be assessed. Reference centers performing subspecialty procedures may not provide sufficient training for these procedures during a single 6-month rotation. An arbitrary number of attempts or training time may not adequately respond to individual training variations, so consideration should be given to monitoring the learning process from the beginning or redefining the residency curriculum.