Systematic review of learning curves in robot‐assisted surgery

Background Increased uptake of robotic surgery has led to interest in learning curves for robot‐assisted procedures. Learning curves, however, are often poorly defined. This systematic review was conducted to identify the available evidence investigating surgeon learning curves in robot‐assisted surgery. Methods MEDLINE, Embase and the Cochrane Library were searched in February 2018, in accordance with PRISMA guidelines, alongside hand searches of key congresses and existing reviews. Eligible articles were those assessing learning curves associated with robot‐assisted surgery in patients. Results Searches identified 2316 records, of which 68 met the eligibility criteria, reporting on 68 unique studies. Of these, 49 assessed learning curves based on patient data across ten surgical specialties. All 49 were observational, largely single‐arm (35 of 49, 71 per cent) and included few surgeons. Learning curves exhibited substantial heterogeneity, varying between procedures, studies and metrics. Standards of reporting were generally poor, with only 17 of 49 (35 per cent) quantifying previous experience. Methods used to assess the learning curve were heterogeneous, often lacking statistical validation and using ambiguous terminology. Conclusion Learning curve estimates were subject to considerable uncertainty. Robust evidence was lacking, owing to limitations in study design, frequent reporting gaps and substantial heterogeneity in the methods used to assess learning curves. The opportunity remains for the establishment of optimal quantitative methods for the assessment of learning curves, to inform surgical training programmes and improve patient outcomes.


Introduction
Learning curves describe the rate of progress in gaining experience or new skills and are widely reported in surgery. Surgeons typically exhibit improvements in performance over time, often followed by a plateau where minimal/limited additional improvement is observed 1 . Generally, surgical learning curves are measured as a change in an operative variable (which can be considered a surrogate for surgeon performance) over a series of procedures. Studies investigating learning curves for surgical procedures are becoming increasingly important, as learning curves can have substantial impact on surgical metrics, clinical outcomes and cost-benefit decisions.
There has been particular interest in learning curves in robot-assisted surgery, especially in gynaecology and urology 2,3 . Despite the reported operative benefits and improved hospital experience provided by robot-assisted surgery compared with traditional minimally invasive approaches 4,5 , uptake of robotic technology has been slow, largely due to high capital and maintenance costs, and uncertainty regarding the potential benefits of robot-assisted approaches over conventional laparoscopic approaches. For example, robot-assisted approaches have been associated with longer operating times for many procedure types 6 . A large proportion of these comparative studies, however, may have been generated from surgeons who were still learning the robotic technology in question 4 , potentially underestimating the full benefits of robotic assistance. Robot-assisted approaches have the potential to expedite surgeon learning, but methods used to measure and define learning curves seem inconsistent 1 . Studies evaluating the learning curve for surgical procedures often aim to determine the number of sequential procedures that comprise the learning curve, or that are required to 'overcome' the learning curve (sometimes referred to as the learning curve length). To achieve this aim, studies often define a particular threshold in surgeon performance. A common threshold includes reaching a plateau in performance, yet the performance thresholds used are highly inconsistent 1 . The way learning curves are described can lead to misinterpretation. Terms such as 'overcome', for instance, could be considered misnomers, implying surgeons have mastered a procedure, for which certain performance thresholds may not provide sufficient evidence. For example, a plateau in performance does not necessarily equate to high-quality performance; it only implies that a surgeon is no longer improving 1 . There remains a need to understand better the learning curve of robot-assisted surgery and broadly characterize how learning curves are defined and reported. This systematic review was performed to characterize the current evidence base and appraise the methods used to define and measure learning curves for surgeons performing robot-assisted surgery, taking a holistic, panspecialty view. *Surgeon experience only stated in studies with multiple robotic study arms. ORL, otorhinolaryngology; TORS, transoral robot-assisted surgery; RKT, robot-assisted kidney transplantation; n.r., not reported; RDP, robot-assisted distal pancreatectomy; LDP, laparoscopic distal pancreatectomy; LRRYGB, laparoscopic robot-assisted Roux-en-Y gastric bypass; TRRYGB, totally robot-assisted Roux-en-Y gastric bypass; RPD, robot-assisted pancreatoduodenectomy; RARP, robot-assisted radical prostatectomy; ORP, open radical prostatectomy; HAI, hepatic artery infusion; REVUR, robot-assisted extravesical ureteral reimplantation; RATS, robot-assisted thoracic surgery; RSC, robot-assisted sacrocolpopexy; LRP, laparoscopic radical prostatectomy; RAT, robot-assisted thymectomy; RRS, robot-assisted rectal cancer surgery; LESS, laparoendoscopic single-site; RC, robot-assisted cholecystectomy; TME, total mesorectal excision; RAG, robot-assisted gastrectomy; RAPN, robot-assisted partial nephrectomy; SSRC, single-site robot-assisted cholecystectomy; LSC, laparoscopic sacrocolpopexy; RALP, robot-assisted laparoscopic prostatectomy; RA-GPEHR, robot-assisted giant para-oesophageal hernia repair; RKT, robot-assisted kidney transplantation; OKT, open kidney transplantation; LND, lymph node dissection.
of robotic technologies. As such, database searches were limited to the period from 1 January 2012 to 5 February 2018, in order to capture studies investigating learning curves in the context of training relevant to current practice. The search terms used are provided in Tables S1 and S2 (supporting information). The two most recent abstract books of relevant surgical congresses were also searched from 1 January 2016 to 14 February 2018. This review considered only primary research, and excluded review articles. Supplementary hand searches of the bibliographies of relevant systematic reviews were conducted to identify any primary studies not identified elsewhere. The review process was performed by two independent reviewers, who assessed the titles and abstracts of all search results (stage 1), as well as the full texts of all potentially eligible studies identified in the first stage (stage 2). In the event of discrepancies, the two reviewers came to a consensus for each decision. In the absence of a consensus, a third independent reviewer resolved any disagreements.
Eligible publications included any randomized or non-randomized, comparative or observational studies involving wet-or dry-lab testing, simulations, patients or registry/economic analyses that performed a learning curve analysis (consisting of a graph and/or reported data for at least 4 time points) of surgeons performing robot-assisted surgery in any specialty. Studies were required to report learning curve results from more than one surgeon alone or as part of a surgical team, of any specialization (robot-assisted, laparoscopic or open). In the absence of reporting the number of surgeons involved, included studies were required to have multiple authors. Only studies that included at least 20 surgical procedures in the analysis of the learning curve were considered. At least one of the following metrics had to be reported: time to plateau/number of 'phases' in the learning curve; statistical differences in metrics assessed over time; or learning percentages. Detailed eligibility criteria are shown in Table S3 (supporting information). The studies reported in this review are restricted to learning curve analyses of procedures on patients.

No. of robotic surgeons per study
No. of studies One study enrolled only a single robot-assisted surgeon, but was considered eligible because the total number of enrolled surgeons was greater than one (the study also evaluated a surgeon who performed procedures laparoscopically).

Data extraction and quality assessment
For each eligible study, data were extracted into a prespecified grid by one reviewer, with verification by a second, independent reviewer. Where there was a discrepancy, the two reviewers attempted to come to a consensus; a third reviewer resolved any disagreements in the absence of a consensus. Captured data included study design, methodology, surgeon experience, robotic technology used and the metric measured to evaluate the learning curve. Information relating to the learning curve itself was captured, including the number of phases of the curve, the number of operations per phase and the number of procedures to overcome the learning curve (denoted in this review as the point where the chosen performance threshold was considered to have been overcome). Where reported, the specific performance threshold used was captured. If the learning curve had not been overcome within the study period, the number of procedures to overcome the learning curve was reported to be greater than the total number included in the study period.
The quality of each eligible study was assessed using either the UK National Institute for Health and Care Excellence (NICE) RCT checklist 8 or a modified version of the Downs and Black checklist for non-randomized studies 9 .

Results
A total of 2316 records from electronic database searches, conference abstract searches and hand searches were identified. Of these, 281 full-text articles were assessed for eligibility, of which 213 were excluded (Table S4, supporting information). The remaining 68 records (reporting on 68 unique studies) were found to meet the eligibility criteria, 49 of which reported on patient data and are presented here (Fig. 1).

Characteristics of included articles
Characteristics of the 49 eligible studies presenting learning curves derived from patient procedures are presented in Table 1   . All were observational in design. Data were analysed retrospectively in 40 of 49 studies (82 per cent), and the remaining nine studies (18 per cent) were   (Fig. 2). The captured studies spanned ten surgical specialties (Fig. 3). Learning curves were reported most frequently for urology, general surgery and gynaecology.

Learning curve metrics
Time-based metrics were the most commonly reported variables used to assess the learning curve, reported by 42 of the 49 studies (86 per cent). Other measures, including length of hospital stay, morbidity and mortality rates, and procedure-specific metrics, were reported less commonly (Fig. 4). Of the categories of metrics captured, duration of surgery, length of stay (LOS) and complication rate were reported most frequently within each category. The number of procedures required to overcome the learning curve for these metrics is shown in Table 2.

Duration of surgery
Across 33 studies that investigated the learning curve based on duration of surgery, 27 reported whether the learning curve had been overcome. In 21 of these 27 studies (78 per cent), at least one of the included surgeons was reported to have overcome the learning curve, with the remaining six studies (22 per cent) stating that the learning curve had not been overcome by any surgeon within the number of procedures in the study period ( Table 2).
Among the 27 studies that reported whether the learning curve had been overcome, learning curve analyses were conducted for 22 unique procedures, of which only five  Studies that did not report whether the learning curve had or had not been overcome within the study period were not included in this table. *For studies that reported a consistent improvement in metrics across the course of the study, it was assumed that the learning curve had not been overcome within the study period. If the learning curve had not been overcome, the number of procedures to overcome the learning curve was reported to be greater than (>) the total number of procedures in the study period. If the learning curve was reported to have been overcome (or surgeons were reported to be proficient/competent) before study initiation, the number of procedures required to overcome the learning curve was recorded as zero.
Where results were reported separately for individual surgeons with no clear differences in previous experience, learning curve estimates are reported separately, separated by a semicolon; where the experience of surgeons was intentionally different, individual experience is reported as separate rows and experience level is stated in brackets. RARP, robot-assisted radical prostatectomy; n.r., not reported; RAPN, robot-assisted partial nephrectomy; RALP, robot-assisted laparoscopic prostatectomy; LRP, laparoscopic radical prostatectomy; HAI, hepatic artery infusion; CUSUM, cumulative sum; RPD, robot-assisted pancreatoduodenectomy; RDP, robot-assisted distal pancreatectomy; RAG, robot-assisted gastrectomy; RC, robot-assisted cholecystectomy; SSRC, single-site robot-assisted cholecystectomy; RSC, robot-assisted sacrocolpopexy; TME: total mesorectal excision; RRS, robot-assisted rectal cancer surgery; RAT, robot-assisted thymectomy; REVUR, robot-assisted extravesical ureteral reimplantation; LND, lymph node dissection. Studies that did not report whether the learning curve had or had not been overcome within the study period are not included in this table. *For studies that reported a consistent improvement in metrics across the course of the study, it was assumed that the learning curve had not been overcome within the study period. If the learning curve had not been overcome, the number of procedures to overcome the learning curve was reported to be greater than (>) the total number of procedures in the study period. If the learning curve was reported to have been overcome (or surgeons were reported to be proficient/competent) before study initiation, the number of procedures to overcome the learning curve was recorded as zero. Where results were reported separately for individual surgeons with no clear differences in previous experience, learning curve estimates are reported separately, separated by a semicolon; where the experience of surgeons was intentionally different, individual experience is reported as separate rows and experience level is stated in brackets. UC, urinary continence; RARP, robot-assisted radical prostatectomy; RKT, robot-assisted kidney transplantation; OKT, open kidney transplantation; CUSUM, cumulative sum; LND, lymph node dissection; RALP, robot-assisted laparoscopic prostatectomy; TORS, transoral robotic surgery; RPD, robot-assisted pancreatoduodenectomy; TME, total mesorectal excision. (23 per cent) were supported by multiple studies. In such instances, the number of patients required to overcome the learning curve for these varied substantially between studies. For example, three studies evaluated the learning curve for duration of surgery for robot-assisted distal pancreatectomy. In one study 47 the learning curve was overcome after 40 patients, whereas in the other two 12,58 it had not been overcome within the study period (of 11 and 83 patients).

Length of stay
Of five studies reporting LOS, two reported on whether the learning curve had been overcome by at least one robotic surgeon ( Table 2). One study 38 estimated that the number of patients required to overcome the learning curve was between 0 and 15. In the other study 35 , the learning curve was not overcome within a study period of 185 patients.

Complications
Of nine studies assessing complications, eight reported whether the learning curve was overcome for complication rate. Of these, five 26,31,34,52,58 found that the learning curve for complications had not been overcome for at least one robotic surgeon within the study period. With the exception of one study 31 , which included a surgeon with a study period of only 24 patients, these studies generally involved large patient numbers (132-404), and spanned urology, general surgery, gynaecology and cardiovascular specialties ( Table 2). Only four studies reported that the learning curve for complications had been overcome. The numbers of procedures were estimated as: 0-84 for robot-assisted sacrocolpopexy 31,36 , 12-14 for robot-assisted hysterectomy 55 and 0-15 for robot-assisted total mesorectal excision 38 .

Clinical metrics
Of the 49 studies, eight (16 per cent) evaluated whether the learning curve for clinical metrics had been overcome (Table 3). Metrics included oncology-specific metrics such as surgical margin status and recurrence rate, and urology-specific metrics such as urinary continence. The number of procedures to overcome the learning curve varied substantially. Of the two studies assessing urinary continence after robot-assisted radical prostatectomy, one 25 reported that 100 procedures were required to overcome the learning curve, whereas in the other 23 the learning curve was not overcome by any of the four robotic surgeons, with study periods ranging from 112 to 541 patients. For all other clinical metrics, the learning curve was overcome by at least one surgeon during the study period, with a wide range of 0-300 patients to achieve this target.

Within-study comparison between metrics
Some studies in the review evaluated the learning curve using more than one metric. The number of patients to overcome the learning curve was sometimes inconsistent between metrics. For example, of the two studies 35,38 that reported whether or not the learning curve was overcome for both duration of surgery and LOS, one 35 indicated that substantially greater procedural experience was required for LOS, with more than 170 additional patients required to overcome the learning curve based on this metric.

Standards of reporting
The overall standard of reporting and level of detail provided in the included studies was low, often lacking sufficient information to interpret the learning curve for robot-assisted procedures. For example, although 34 of the 49 studies (69 per cent) made some acknowledgement relating to the previous experience of included surgeons (Table 4) Variability was observed in the performance thresholds used to measure the learning curve. For example, among the 27 studies that defined whether the learning curve for duration of surgery had been overcome ( Table 2), the most common performance threshold used was the number of procedures needed to reach a plateau in performance, but other thresholds included the number of procedures to reach a change in phase, or the number to achieve a predetermined skill threshold set by an expert surgeon, and some studies did not specify the performance thresholds used (Fig. 5a). In nine of the 27 studies (33 per cent), no statistical or quantitative assessment of the learning curve was reported beyond visual fit or a qualitative description (data not reported). Similar variation in learning curve definitions was observed for analyses of LOS and complication rates (Fig. 5b,c).
Several studies used these methods to define whether surgeons had achieved a high level of performance, characterized by terms such as proficiency or competency. However, these terms were used inconsistently. In 13 of 14 studies (93 per cent) that employed performance terms, at least one term was used to describe the point where the learning curve had been overcome; 'proficiency' was used for this purpose in ten studies 22,27,28,31,36,38,39,42,50,55 , 'competency' in five studies 10,27,33,36,49 and 'expertise' in one study 50 . In the four studies that reported more than one performance term, the terms were either used interchangeably 36,50 or assigned divergent definitions 33,49 .
Although quality assessment of included studies (Table S5, supporting information) revealed a relatively low risk of bias for several quality assessment items, risk of bias was unclear for a large proportion of the questions, suggesting poor reporting of methodology. In particular, the risk of bias with respect to the blinding of subjects, external validity of included populations and study centres, and statistical power was either high or could not be determined (at least 45 of the 49 studies, 92 per cent).

Discussion
This review identified substantial variation in the lengths of learning curves, included metrics and methods employed to assess the learning curve, as well as the reporting of the analyses and terminology used across ten surgical specialties. Reported learning curve estimates are therefore subject to substantial uncertainty, and the generalizability of these findings is limited.
The results of the 49 eligible studies suggested that surgeon learning curves were complex. They varied significantly between studies, procedures and specific metrics assessed. A variety of factors could account for much of the variation in reported learning curve length.
The surgeon's previous experience may have been a significant factor; in three of five studies comparing the operating time learning curves of robotic surgeons, those with greater experience required fewer procedures to overcome their learning curve 16,29,38 . Although the captured studies often compared surgeons with different experience levels, such as trainees versus those who had completed training or robotic versus laparoscopic surgeons, studies generally did not report the participants' specific grade or training experience.
Robotic training programmes are becoming increasingly common to enable surgeons to overcome the learning curve faster 59,60 . In addition to previous experience, participation in specific training programmes may influence the learning process. Although based on a small sample size, Guend and colleagues 27 reported that a lower procedure volume was required to overcome the learning curve for robot-assisted colorectal resections for three surgeons who had participated in an institutional training programme (25-30 procedures each), compared with the volume required for an earlier surgeon who joined the institution before the programme was established (74 procedures). Few studies, however, provided details of their training programmes, where these existed. Recent training programmes have considered innovations such as feedback loops that aim to provide specific recommendations for improvement, shortening the time required to achieve adequate performance 61 .
Differences in procedural complexity may also have contributed to variation in the learning curves observed. In surgical practice, following initial improvement and subsequent stabilization of performance, a decline in performance is often observed 62 . This decline is thought to reflect the point at which, following mastery of simpler procedures, surgeons take on more challenging, technically complex procedures, that impact on a learning curve 63 .
Some procedures are inherently more complex and challenging than others. Studies 64 -66 of simulated robotic training tasks have observed learning curves of different duration for tasks of varying complexity. Learning curves are likely to be influenced by numerous observed and unobserved confounders. To account better for such differences, and to permit comparisons between studies, enhanced reporting of surgeon baseline characteristics, experience and procedure complexity is required.
This review has highlighted a number of limitations associated with reporting learning curves for robot-assisted surgery. Many studies failed to describe the characteristics of the surgeons, patients or methods of assessment in sufficient detail to make valid comparisons between studies or to enable a study to be reproduced.
Although the majority of captured studies were determined to be of reasonable methodological quality, the studies were observational and usually included few surgeons. These study designs are associated with significant drawbacks, particularly with respect to confounding and selection bias 67,68 , although this is expected given that they are more suited to measuring learning curves. Regarding sources of bias in the included studies, the risk of bias was unclear for a large proportion of the quality assessment items, particularly in relation to blinding, external validity and statistical power, suggesting poor reporting of methodology.
There was little consistency in the performance thresholds used to measure the learning curve, making between-study comparisons challenging. A large proportion of studies measured the number of procedures required to reach a plateau in surgeon performance. Although a variety of methods can define quantitatively the point at which a plateau is reached 69 -72 , there is currently no widely accepted and validated method, and some studies used visual fit alone 1 . The number of procedures required to overcome learning curves reported are subject to considerable uncertainty. Several studies measured the number of procedures to achieve a threshold set by experts. These were sometimes based on the performance of expert robotic surgeons 50 , whereas others 38,56 included expert laparoscopic surgeons. Many studies did not report the specific performance thresholds used to measure the learning curve, precluding any ability to make comparisons.
These methods were frequently used to define the points at which surgeons reached 'proficiency', 'competency' or other related terms. These terms, however, were used inconsistently or interchangeably 36,50 , or used with distinct definitions 33,49 . In one study 49 competency was used to describe performance that reached a steady state or plateau, whereas proficiency described further improvement after plateau and mastery as the achievement of outcomes better than the set target value. These definitions are not well aligned with guidelines for assessing surgical competence 73,74 , recommended by the US Accreditation Council for Graduate Medical Education and the American Board of Medical Specialties, nor the criteria developed for procedure-based assessment in the Intercollegiate Surgical Curriculum Programme in the UK.
The mismatch between the performance thresholds used to measure the learning curve and the terminology used to describe the results of the analyses can lead to misinterpretation. The term 'overcome' was commonly used to describe a point when surgeons reached a given performance threshold, implying a high level of performance. These thresholds, particularly time to plateau, are often simplistic and may not capture sufficient evidence about the learning process to support this implication. A plateau in performance does not always equate with high-quality performance, as surgeons will not necessarily plateau at the same level 1 . Likewise, using thresholds of performance to define terms such as competence and proficiency could be considered inappropriate, as these terms also imply a specific level of performance. A recent study 61 investigated proficiency-based progression training programmes in which residents who failed to show progressive improvement (reached a plateau in performance) were not considered proficient unless they had achieved predetermined proficiency benchmarks set by experienced surgeons.
The lack of consistency in methods used to describe surgical performance and the use of simplistic and inappropriate methods adds to the complexity of interpreting learning curves. Using thresholds that provide meaningful measures of surgeon performance alongside standardized terminology seems vital to realize the full potential of learning curve analyses for optimization of surgical training programmes.
Time-based variables were the most common metrics used to assess learning curves, as is the case for other systematic reviews assessing surgical learning curves in other contexts 63,75 . Although common across learning curve analyses, the present review suggests that variation can exist between the learning curve profiles of different metrics for a given procedure, with recovery and safety metrics (LOS, complications) exhibiting substantially longer learning curves than those for operating time, often with continued improvement for extended periods of time after the learning curve for operating time has been overcome. Given that improvements in clinical outcomes may be important drivers for the uptake of robot-assisted approaches, the value of comparisons based on learning curves for operating time alone is unclear. In addition, the metrics captured may not directly measure surgical performance, with surrogate markers, such as operating time and LOS, reported frequently. Real-time automated performance metrics, coupled with machine learning algorithms to process automatically collected data, may enable more direct measurement of surgeon performance 76 .
Several data gaps were identified in the reported data. Many of the identified studies enrolled fully trained surgeons, investigating the transferability of skills for conversion from laparoscopic or open surgery to robot-assisted procedures. These studies may be of limited value for informing the design of surgical training programmes. The limited data investigating the learning curves of trainee surgeons may result in missed opportunities for the optimization of programmes to accelerate the training of surgeons who are novices with robot-assisted devices. No study reported data related to the economic impact of the learning curve, such as training costs or financial impacts of suboptimal outcomes.
This systematic review was a broad, exploratory search of the literature reporting on the surgeon learning curve for robot-assisted surgery, with broad search terms and eligibility criteria. Incomplete and variable reporting created challenges for data synthesis, especially given some of the exploratory and subjective outcomes this review intended to identify. The exploratory nature of the review may have introduced a number of limitations, which may have resulted in relevant data being overlooked. For example, to include evidence of suitable quality, studies that involved fewer than 20 surgical procedures in total (across surgical approaches or surgeons) were excluded, regardless of the number of robot-assisted procedures completed by any one surgeon or included within the learning curve analysis. Studies were captured only if they reported actual learning curve data (as a graph or table presenting at least 4 time points), so that studies reporting potentially relevant data (for example the economic impact of the learning curve) could have been excluded if they did not meet these criteria. Studies that included just a single surgeon were also excluded, as the innate differences in technical ability that may exist between surgeons were anticipated to limit the reliability of the data reported in these studies. Only the learning curves of robotic procedures were considered in this review, and although the review did not set out to compare robotic-assisted procedures with other surgical approaches, this prevented any conclusions to be drawn regarding lengths of the learning curve for robot-assisted versus non-robotic procedures. Only studies originally published in English were included, and database searches were limited to 2012-2018 in order to capture evidence most relevant to present-day training practices.
Although comparisons between robot-assisted and other surgical approaches are warranted, studies with appropriate evaluation methods, standardized terminology and necessary context are essential for robust comparisons to be made. These kinds of study should provide better estimates of learning curves for robot-assisted procedures, enhance surgical training programmes and improve patient outcomes.