Unlocking Data Lakes with LLMs

Unlocking Data Lakes with LLMs

An intelligent approach to semantic data discovery and organization

LEDD is a novel end-to-end system that leverages Large Language Models to solve the significant challenge of discovering relevant data within massive data lakes.

  • Enables semantic search of tables beyond traditional keyword matching
  • Generates hierarchical global catalogs to better organize datasets
  • Features an extensible architecture designed specifically for data lake environments
  • Demonstrates practical application of LLMs for complex data management tasks

This research advances how organizations can efficiently discover and utilize valuable data hidden in their growing data repositories, potentially transforming how enterprises manage their data assets.

LEDD: Large Language Model-Empowered Data Discovery in Data Lakes

118 | 204