Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Workshop on Human-Machine Collaboration and Teaming

Elicit: A Framework for Human-in-the-Loop High-Precision Information Extraction from Text Documents

Bradley Butcher


Abstract:

Extracting information from unstructured text can help build new datasets and facilitate valuable research. Weak supervision methods can produce impressive results but may not be sufficiently reliable for high-stakes applications where precision is essential. We present a framework for information extraction which adds a human-in-the-loop element to weak supervision labelling. We demonstrate our approach by creating two new datasets with information on criminal trials from publicly available legal documents and news articles. We show that our approach requires much less human effort than manual information extraction while achieving comparable precision.

Chat is not available.