Poster
in
Workshop: 2nd Workshop on Generative AI and Law (GenLaw ’24)
L-FRESco: Factual Recall Evaluation Score for Legal Analysis Generation
Abe Hou · Zhengping Jiang · Guanghui Qin · Orion Weller · Andrew Blair-Stanek · Benjamin Van Durme
Existing automatic tools evaluate the factuality of text generations based on factual precision, which measures the fraction of generated information being factually accurate. However, comprehensiveness and precision are both crucial aspects of reliable and verifiable text generation. In this work, we show that precision-based factuality metrics are limited in evaluating the comprehensiveness of text generations from certain domains, especially legal texts. We propose L-FRESco, {F}actual {R}ecall {E}valuation {Sco}re for {L}egal analysis generation. Inspired by FActScore, which decomposes generated text into atomic facts and then verifies their factuality, L-FRESco follows a Decompose-Then-Compare framework to compute similarity between the reference atomic claim and the generated atomic claim. Moreover, we explore a generalized variant, FRESco, and discuss its potentials to be applied across text domains.