Poster
in
Workshop: 3rd Workshop on Interpretable Machine Learning in Healthcare (IMLH)
What Works in Chest X-Ray Classification? A Case Study of Design Choices
Evan Vogelbaum · Logan Engstrom · Aleksander Madry
Keywords: [ Chest x-rays ] [ comparative analysis ] [ design choices ] [ deep-learning ] [ clinical ]
Public competitions and datasets have yielded increasingly accurate chest x-ray prediction models. The best such models now match even human radiologists on benchmarks. These models go beyond "standard" image classification techniques, and instead employ design choices specialized for the chest x-ray domain. However, as a result, each model ends up using a different, non-standardized training setup, making it unclear how individual design choices---be it the choice of model architecture, data augmentation type, or loss function---actually affect performance. So, which design choices should we use in practice? Examining a wide range of model design choices on three canonical chest x-ray benchmarks, we find that by simply leveraging a (properly tuned) model composed of up standard image classification design choices, one can often match the performance of even the best domain-specific models. Moreover, starting from a "barebones," generic ResNet-50 with cross-entropy loss and no data augmentation, we discover that none of the proposed design choices---including broadly used choices like the DenseNet-121 architecture or basic data augmentation---consistently improve performance over that generic learning setup.