Smarter Data Interpretation via Language Models

This research introduces dataset featurization, an unsupervised technique that enables precise extraction of natural language features from diverse datasets while maintaining interpretability.

Provides controlled granularity over feature extraction, unlike simple prompting approaches
Applies a dataset reconstruction objective to ensure features accurately represent the underlying data
Successfully tested across domains including text classification, tabular data, and security (jailbreak attack modeling)
Enables more interpretable data analysis by expressing complex patterns in natural language

For security applications, this method can help identify and characterize attack patterns in jailbreak attempts against language models, potentially improving defensive mechanisms.

Dataset Featurization: Uncovering Natural Language Features through Unsupervised Data Reconstruction