Reimagining Missing Data Solutions

CRILM (Contextually Relevant Imputation leveraging Language Models) transforms how we handle missing data in tabular datasets, replacing traditional numerical estimates with contextually relevant descriptors.

Key Innovations:

Aligns tabular data with language models' natural strengths
Uses large LMs to generate contextual descriptors for missing values
Enables small LMs to effectively utilize these descriptors
Bridges the gap between structured data analysis and natural language processing

Engineering Impact: This approach offers a paradigm shift for data preprocessing in engineering applications, potentially improving downstream analysis quality by preserving contextual relationships rather than forcing numerical approximations.

A Context-Aware Approach for Enhancing Data Imputation with Pre-trained Language Models