+**Key Contributions:** This paper proposes "confidence estimation" as the metric to quantify trustworthiness of recommended root causes by RCA tools. For such confidence estimation methods, the authors identify two requirements: 1) highly adaptable and broadly applicable since RCA predictors may widely vary across many dimensions like time, services and background knowledge. 2) assumes nothing about LLMs' model parameters (weights), probabilities, and logits for output tokens because such information is usually unavailable (taken GPT-3.5-Turbo and GPT-4 as examples). Given the two requirements, this paper introduces a general framework called LM-PACE that uses a retrieval-augmented two-step prompting procedure to do black-box confidence calibration.
0 commit comments