Reimagining Missing Data Solutions

Reimagining Missing Data Solutions

Using Language Models for Contextual Data Imputation

CRILM (Contextually Relevant Imputation leveraging Language Models) transforms how we handle missing data in tabular datasets, replacing traditional numerical estimates with contextually relevant descriptors.

Key Innovations:

  • Aligns tabular data with language models' natural strengths
  • Uses large LMs to generate contextual descriptors for missing values
  • Enables small LMs to effectively utilize these descriptors
  • Bridges the gap between structured data analysis and natural language processing

Engineering Impact: This approach offers a paradigm shift for data preprocessing in engineering applications, potentially improving downstream analysis quality by preserving contextual relationships rather than forcing numerical approximations.

A Context-Aware Approach for Enhancing Data Imputation with Pre-trained Language Models

2 | 108