Robotics and Automation

LLM integration in robotic systems for improved control, planning, decision-making, and human-robot interaction

Hero image

Robotics and Automation

Research on Large Language Models in Robotics and Automation

Foundation Models for Smarter Robots

Foundation Models for Smarter Robots

Transferring AI knowledge to enhance robotic manipulation without extensive data collection

LLMs as Autonomous Driving Decision-Makers

LLMs as Autonomous Driving Decision-Makers

Leveraging language models to solve complex driving scenarios

Intelligent Robotic Planning with LLMs

Intelligent Robotic Planning with LLMs

Using Language Models to Correct Robot Task Execution

Smart Robots That Learn on the Fly

Smart Robots That Learn on the Fly

Flexible Task Planning with Language Models for Adaptive Robotics

Teaching Robots to Understand Human Language

Teaching Robots to Understand Human Language

Bridging the gap between natural language commands and robotic actions

Unified Robot Intelligence

Unified Robot Intelligence

A New Vision-Language-Action Model for Quadruped Robots

Making Robots Smarter and Safer

Making Robots Smarter and Safer

Aligning AI Uncertainty with Task Ambiguity

Smart Robot Learning with AI Assistance

Smart Robot Learning with AI Assistance

Merging LLMs with Reinforcement Learning for Better Robot Exploration

AI-Powered Social Navigation for Robots

AI-Powered Social Navigation for Robots

Integrating LLMs with Reinforcement Learning for Human-Interactive Robot Navigation

Smarter Robot Analysis using LLMs

Smarter Robot Analysis using LLMs

Automating the evaluation of robotic tasks with language models

Language-Powered Robot Formations

Language-Powered Robot Formations

Teaching robots to form patterns using only natural language

Smarter Robot Planning with LLMs

Smarter Robot Planning with LLMs

Decomposed planning improves efficiency and feasibility for long-term robot tasks

LLMs Powering Quadrupedal Robots

LLMs Powering Quadrupedal Robots

Enabling complex problem-solving beyond simple motion tasks

Vision-Language-Action Models: The Future of Embodied AI

Vision-Language-Action Models: The Future of Embodied AI

Bridging the gap between perception, comprehension, and robotic action

Supercharging Robot Intelligence

Supercharging Robot Intelligence

Combining Behavior Trees with LLM Reasoning

Language-Powered Robots

Language-Powered Robots

Bridging Natural Language and Robotic Control Systems

Breaking Ground in 3D-Language Models

Breaking Ground in 3D-Language Models

Combating hallucination through robust 3D environment grounding

Teaching Robots with Language

Teaching Robots with Language

Using LLMs to accelerate hierarchical reinforcement learning

Smart Dual-Arm Robot Planning

Smart Dual-Arm Robot Planning

Coordinating robot arms for complex tasks with dependency-aware planning

Bridging the Human-Robot Divide

Bridging the Human-Robot Divide

Improving robotic manipulation through enhanced visual pre-training

LLaRA: Teaching Robots with VLMs

LLaRA: Teaching Robots with VLMs

Enhancing robot learning with efficient vision-language models

Collision-Aware Robotic Manipulation

Collision-Aware Robotic Manipulation

Language-guided diffusion models for adaptable robot control

LLM-A*: Large Language Model Enhanced Incremental Heuristic ...

LLM-A*: Large Language Model Enhanced Incremental Heuristic ...

By Silin Meng, Yiwei Wang...

AI-Powered Robot Evolution

AI-Powered Robot Evolution

Using LLMs to automatically design and optimize robot morphology

Smart Robots That Reason

Smart Robots That Reason

Enhancing Robot Control through Embodied Chain-of-Thought Reasoning

Teaching Robots to See Possibilities

Teaching Robots to See Possibilities

Guiding Reinforcement Learning with Visual Cues for Better Robotics

Revolutionizing Robot Learning

Revolutionizing Robot Learning

A Therblig-Based Framework for Predictable, Generalizable Robot Tasks

RoboTwin: Revolutionizing Dual-Arm Robotics

RoboTwin: Revolutionizing Dual-Arm Robotics

Using generative models to create synthetic training data for complex manipulation tasks

Smarter Robot Planning with AI

Smarter Robot Planning with AI

Using LLMs to simplify complex robotic task planning

Smarter Social Navigation for Robots

Smarter Social Navigation for Robots

Using LLMs for Natural Conversations in Human-Robot Interactions

LLMs as Intelligent Navigation Copilots

LLMs as Intelligent Navigation Copilots

Enhancing Robot Navigation with External Information & Semantic Understanding

LLM-Guided Bipedal Robots

LLM-Guided Bipedal Robots

An end-to-end framework for autonomous robotic deployment

Breaking Language Barriers in Robotics

Breaking Language Barriers in Robotics

Using Formal Logic to Clarify Natural Language Commands for Robots

Self-Reflective Robot Navigation

Self-Reflective Robot Navigation

Enhancing LLMs with Experience-Based Adaptation

3D-TAFS: Bridging Language and Robotic Action

3D-TAFS: Bridging Language and Robotic Action

A Training-Free Framework for 3D Affordance Understanding

Smarter Robot Planning with LLMs

Smarter Robot Planning with LLMs

Automating Behavior Tree Generation for Complex Assembly Tasks

Multi-Modal Learning for Robotic Manipulation

Multi-Modal Learning for Robotic Manipulation

Enhancing LLMs with vision and force feedback for precision tasks

Smarter Robot Planning with LLMs

Smarter Robot Planning with LLMs

Leveraging language models to bootstrap object-level planning for robots

Smart Social Robot Navigation

Smart Social Robot Navigation

Teaching robots to navigate human spaces with lifelong learning

KARMA: Enhancing AI with Human-like Memory

KARMA: Enhancing AI with Human-like Memory

Dual-memory system for improved embodied AI performance

Enhancing Visual Adaptability in Robotic AI

Enhancing Visual Adaptability in Robotic AI

Overcoming visual domain limitations in robotic foundation models

Smarter Robots for Complex Tasks

Smarter Robots for Complex Tasks

Overcoming logical errors and hallucinations in embodied AI planning

Smart Self-Learning Robots

Smart Self-Learning Robots

Using LLMs to Design Automated Training Curricula for Robotic Skills

Efficient Language-Guided Robot Grasping

Efficient Language-Guided Robot Grasping

A Parameter-Efficient Framework for Robot Vision-Language Integration

Safe & Efficient Robot Planning with LLMs

Safe & Efficient Robot Planning with LLMs

Creating constraint-aware task plans for robot agents

3D Robot Manipulation Revolution

3D Robot Manipulation Revolution

Using Grounded Spatial Value Maps to Enhance Robotic Performance

Smart Robots That Understand Vague Instructions

Smart Robots That Understand Vague Instructions

SPINE: How AI enables robots to plan missions with incomplete natural language

Faster, Smarter Robots with HiRT

Faster, Smarter Robots with HiRT

Hierarchical Robot Transformers: Balancing Intelligence with Speed

Teaching Robots Through Conversation

Teaching Robots Through Conversation

Enabling LLMs to Predict Robot Actions Without Training

Language-Based Negotiation for Multi-Robot Systems

Language-Based Negotiation for Multi-Robot Systems

Enhancing safety and efficiency in collaborative robot learning

Smart Safety for Robot Manipulation

Smart Safety for Robot Manipulation

Teaching Robots Common-Sense Safety Constraints

Transforming Robot Learning

Transforming Robot Learning

Diffusion Transformers for Flexible Action Control

Teaching Robots Through Language

Teaching Robots Through Language

Making Robotics Accessible Through Natural Language Commands

Zero-shot Object-Centric Instruction Following: Integrating ...

Zero-shot Object-Centric Instruction Following: Integrating ...

By Sonia Raychaudhuri, Duy Ta...

Smart Multi-Robot Coordination

Smart Multi-Robot Coordination

Using LLMs to Handle Complex Task Dependencies

Smarter Robotic Grasping with AI

Smarter Robotic Grasping with AI

Combining LLMs and Quality Diversity for Zero-Shot Task-Aware Manipulation

Enhancing Robot Spatial Intelligence

Enhancing Robot Spatial Intelligence

Teaching 2D and 3D Vision-Language Models to Understand Space

AI Navigation at Sea: LLMs for Maritime Decision-Making

AI Navigation at Sea: LLMs for Maritime Decision-Making

Using Large Language Models to Enable COLREGs-Compliant Autonomous Surface Vehicles

Smart Robot Hands That Learn & Adapt

Smart Robot Hands That Learn & Adapt

Advancing dexterous manipulation through interaction-aware diffusion planning

Teaching Robots to Generalize

Teaching Robots to Generalize

Improving Robot Policy through Human Preferences

RoboMatrix: Revolutionizing Robot Task Execution

RoboMatrix: Revolutionizing Robot Task Execution

A skill-centric approach for adaptive robot capabilities in open-world environments

AI-Powered Clay Sculpting Robots

AI-Powered Clay Sculpting Robots

From Text Instructions to 3D Physical Shapes

Teaching Robots Through Video Observation

Teaching Robots Through Video Observation

Using Latent Motion Tokens to Bridge Human Motion and Robot Action

Enhancing 3D Spatial Understanding in AI

Enhancing 3D Spatial Understanding in AI

Making MLLMs Better at Object Disambiguation in Complex Environments

Adaptive LLMs for Smarter Human-Robot Teams

Adaptive LLMs for Smarter Human-Robot Teams

Enhancing collaboration through dynamic communication support

Smart Robot Tool Use

Smart Robot Tool Use

Teaching Robots to Manipulate Objects Like Humans

Revolutionizing Robot Grasping with AI

Revolutionizing Robot Grasping with AI

Teaching robots to grasp objects intelligently using natural language

RoboMIND: Advancing Multi-Embodiment Robotics

RoboMIND: Advancing Multi-Embodiment Robotics

A benchmark dataset for diverse robot manipulation tasks

Real-Time Robot Intelligence

Real-Time Robot Intelligence

Eliminating Latency in Multimodal LLMs for Quadruped Robots

Making Robots Understand Human Intent Naturally

Making Robots Understand Human Intent Naturally

Combining Voice Commands with Pointing Gestures through LLM Integration

Control Engineering for LLMs

Control Engineering for LLMs

Using Predictive Control to Enhance LLM Planning Capabilities

Self-Correcting Robots

Self-Correcting Robots

Teaching robots to reflect on and fix their own mistakes

Smarter Robots via Language Models

Smarter Robots via Language Models

Using LLMs to Revolutionize Robot Navigation in Dynamic Environments

Smart Mission Planning for Robot Teams

Smart Mission Planning for Robot Teams

Using LLMs to orchestrate diverse robot capabilities through hierarchical task trees

Enhancing Robot Learning with Vision-Language Models

Enhancing Robot Learning with Vision-Language Models

Improving robotic control through online reinforcement learning

Smart Robots that Make Better Decisions

Smart Robots that Make Better Decisions

Enhancing on-device LLM capabilities for robotics in specific domains

Mobile Manipulation Instruction Generation from Multiple Ima...

Mobile Manipulation Instruction Generation from Multiple Ima...

By Kei Katsumata, Motonari Kambara...

Smart Object Rearrangement for Robots

Smart Object Rearrangement for Robots

Using Large Language Models to Enhance Robot Precision and Adaptability

Bridging AI Vision and Robotic Action

Bridging AI Vision and Robotic Action

How LMM-3DP Enables Robots to Plan and Execute Complex Tasks

Making Robots Understand Language on Everyday Hardware

Making Robots Understand Language on Everyday Hardware

A modular framework that runs on consumer GPUs without retraining

Reasoning First, Actions Later

Reasoning First, Actions Later

Enhancing Robot Generalization Without Action Labels

STRIDE: AI-Powered Humanoid Robotics

STRIDE: AI-Powered Humanoid Robotics

Automating Reward Design for Robot Locomotion

Hierarchical Intelligence for Robot Manipulation

Hierarchical Intelligence for Robot Manipulation

Bridging the Gap Between Foundation Models and Robotics

AI-Powered Robot Task Planning

AI-Powered Robot Task Planning

Combining LLMs with Genetic Programming for Better Automation

RoboBERT: AI-Powered Robotic Manipulation

RoboBERT: AI-Powered Robotic Manipulation

A more efficient end-to-end model for embodied intelligence

3D-Grounded Robotics Planning

3D-Grounded Robotics Planning

Enhancing Robotic Precision with 3D Vision-Language Integration

Smart Navigation for Autonomous Delivery

Smart Navigation for Autonomous Delivery

Combining Foundation Models with Classical Navigation Methods

Teaching Robots Through YouTube

Teaching Robots Through YouTube

Scaling Manipulation Tasks Using Internet Videos

3D Flow: The Missing Link for Robotic Language Control

3D Flow: The Missing Link for Robotic Language Control

Using motion prediction to bridge language commands and robot actions

Visual Aids for Human-Robot Communication

Visual Aids for Human-Robot Communication

Enhancing task collaboration through generative AI interfaces

Teaching Robots to Understand Implicit Requests

Teaching Robots to Understand Implicit Requests

Bridging the Gap Between Human Speech and Robot Understanding

Teaching Robots to Learn from Human Preferences

Teaching Robots to Learn from Human Preferences

Combining Vision and Language to Improve Embodied Manipulation

Teaching Robots to Understand Human Language

Teaching Robots to Understand Human Language

A breakthrough framework for natural language to robot motion translation

RobotIQ: Making Robots Understand Human Language

RobotIQ: Making Robots Understand Human Language

Enabling mobile robots to interpret and execute natural language commands

Magma: The Next Generation AI Agent

Magma: The Next Generation AI Agent

Bridging verbal intelligence with spatial-temporal capabilities

Bridging the Gap in Robotic Spatial Intelligence

Bridging the Gap in Robotic Spatial Intelligence

Teaching robots to understand object orientations for precise manipulation

Smart Navigation for Home Robots

Smart Navigation for Home Robots

Using Visual Predictors for Zero-Shot Navigation in Unfamiliar Environments

Unified Robot Intelligence: Vision, Language & Action

Unified Robot Intelligence: Vision, Language & Action

Overcoming challenges in multimodal robot learning

Vision-Guided Humanoid Robots

Vision-Guided Humanoid Robots

Integrating Vision, Language, and Motion for Autonomous Robot Control

Mojito: Revolutionizing Motion Analysis with IMUs

Mojito: Revolutionizing Motion Analysis with IMUs

Harnessing LLMs to process jitter-reduced inertial data for human movement analysis

Smarter Navigation Systems with AI

Smarter Navigation Systems with AI

Using Large Language Models to Solve Complex Navigation Challenges

Teaching Robots Through Natural Language

Teaching Robots Through Natural Language

A new approach to robot path planning using human instructions

Intelligent Convoy Management with LLMs

Intelligent Convoy Management with LLMs

Using AI to enable dynamic multi-lane convoy control for autonomous vehicles

Teaching Robots Without Showing Them How

Teaching Robots Without Showing Them How

Object-Focused Manipulation Without Demonstration Data

Smarter Robots, Fewer Training Examples

Smarter Robots, Fewer Training Examples

Pre-training world models enables sample-efficient reinforcement learning

RoboBrain: Making Robots Smarter

RoboBrain: Making Robots Smarter

A unified brain model bridging abstract thinking and concrete manipulation

Smart Underwater Vehicles in Rough Seas

Smart Underwater Vehicles in Rough Seas

LLM-Enhanced AI Control Systems for Extreme Ocean Conditions

Closed-Loop Intelligence for Embodied Systems

Closed-Loop Intelligence for Embodied Systems

Enhancing robotic task execution in dynamic environments

Smarter Robot Planning Through Language

Smarter Robot Planning Through Language

Using LLMs to Transform Vague Instructions into Clear Task Plans

Smart Path Planning for Robots

Smart Path Planning for Robots

LLMs for Cost-Efficient Navigation Across Multiple Terrains

Code as Robot Planner

Code as Robot Planner

Leveraging LLMs to generate symbolic code for robot planning

AI as a Design Partner for Robotics

AI as a Design Partner for Robotics

How Large Language Models Can Select Optimal Soft Robot Designs

Smarter Robots: Understanding Object Functionality

Smarter Robots: Understanding Object Functionality

A generalizable, lightweight approach to affordance reasoning

OTTER: Revolutionizing Robotic Decision-Making

OTTER: Revolutionizing Robotic Decision-Making

Text-Aware Visual Processing for Enhanced Robotic Control

Securing LLM-Powered Robots

Securing LLM-Powered Robots

Combining AI capabilities with formal safety guarantees

Autonomous Robots Using LLMs

Autonomous Robots Using LLMs

Bridging Language Models and Reinforcement Learning for Smarter Robotic Systems

From Versatile to Virtuoso Robots

From Versatile to Virtuoso Robots

Refining generalist AI into specialized robotic experts

Revolutionizing Robotic Assembly

Revolutionizing Robotic Assembly

Zero-Shot Peg Insertion Using Vision-Language Models

Trinity: Next-Generation Humanoid Robot Intelligence

Trinity: Next-Generation Humanoid Robot Intelligence

A modular AI system integrating language, vision, and motion control

Teaching Robots to Understand Physical Limits

Teaching Robots to Understand Physical Limits

Enhancing visual language models with spatial awareness for robotics

Supercharging Lightweight LLMs for Robotics

Supercharging Lightweight LLMs for Robotics

Enhancing reasoning capabilities for complex task planning

Bridging the Gap in Robot Learning

Bridging the Gap in Robot Learning

Foundation Models for Physical Agency Through Procedural Generation

Benchmarking Home Robots

Benchmarking Home Robots

A comprehensive framework for language-controlled mobile manipulation robots

3D Foundation Policies for Robotics

3D Foundation Policies for Robotics

Advancing Robotic Manipulation Through 3D-Aware Foundation Models

Natural Voice & Gesture Robot Interaction

Natural Voice & Gesture Robot Interaction

Zero-shot HRI system using LLMs to understand natural human commands

Smarter Robots for Cluttered Environments

Smarter Robots for Cluttered Environments

Efficient language-guided pick and place using unconditioned action priors

LLM-Powered Social Robot Navigation

LLM-Powered Social Robot Navigation

Multi-Agent LLMs for Smarter, More Adaptive Robot Movement in Human Environments

Smarter Robot Navigation with Vi-LAD

Smarter Robot Navigation with Vi-LAD

Teaching robots social navigation skills using vision-language models

Unifying Vision and Dynamics for Robotic Manipulation

Unifying Vision and Dynamics for Robotic Manipulation

Using keypoints to enable open-vocabulary robotic tasks

Hybrid Robots: Smarter Actions Through AI

Hybrid Robots: Smarter Actions Through AI

Combining diffusion and autoregression for improved robotic manipulation

TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-...

TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-...

By Hongxiang Zhao, Xingchen Liu...

Smart Robots That Understand Instructions

Smart Robots That Understand Instructions

Teaching Robots to Process Language, Images, and Maps Together

Bridging Intelligence and Physical Capabilities

Bridging Intelligence and Physical Capabilities

A Modular Approach to Humanoid Robotics with Vision-Language Models

HybridGen: Smarter Robots Through Imitation

HybridGen: Smarter Robots Through Imitation

VLM-Guided Planning for Scalable Robotic Learning

Mobile Robots that Understand Human Instructions

Mobile Robots that Understand Human Instructions

Extending Vision-Language-Action Models to Mobile Manipulation

Smarter Robot Controllers with AI

Smarter Robot Controllers with AI

Using LLMs to Fix Robot Logic Faster and More Efficiently

V-Droid: Revolutionizing Mobile Task Automation

V-Droid: Revolutionizing Mobile Task Automation

Using LLMs as Verifiers Instead of Generators for More Reliable Mobile GUI Agents

Smart Robotic Grasping with Language

Smart Robotic Grasping with Language

Teaching Robots to Understand Physical Properties Through Language Models

Making Robots Understand Human Intent

Making Robots Understand Human Intent

Integrating Gaze and Speech for Intuitive Human-Robot Interaction

Recovering from the Unknown

Recovering from the Unknown

How Large Language Models Can Rescue Reinforcement Learning Agents

Smarter Two-Handed Robots

Smarter Two-Handed Robots

Using LLMs to Enable Complex Bimanual Task Planning

AI-Powered Aerial Manufacturing

AI-Powered Aerial Manufacturing

Using LLMs to Enable Precision Drone Construction

RAIDER: LLM-Powered Robotic Problem Solving

RAIDER: LLM-Powered Robotic Problem Solving

Enhancing robots' ability to detect, explain and recover from action issues

Enhancing Robotic Intelligence with LLMs

Enhancing Robotic Intelligence with LLMs

A hierarchical reinforcement learning approach for complex tasks

Optimizing Robot Training Data

Optimizing Robot Training Data

Enhancing robotic manipulation with strategically collected data

Bridging AI and Physical Robotics

Bridging AI and Physical Robotics

How Gemini 2.0 is enabling robots to interact with the real world

Smart Navigation for Robots Using LLMs

Smart Navigation for Robots Using LLMs

Improving object-finding capabilities with language model reasoning

Efficient Robot Control with Dynamic Layer-skipping

Efficient Robot Control with Dynamic Layer-skipping

Reducing computational demands in vision-language models for robotics

Smarter Robots Through Vision-Language Feedback

Smarter Robots Through Vision-Language Feedback

Enabling robots to execute complex tasks without specialized training data

Visual Reasoning for Smarter Robots

Visual Reasoning for Smarter Robots

Enhancing Robotic Decision-Making with Chain-of-Thought Reasoning

Smart Robots That Adapt to Obstacles

Smart Robots That Adapt to Obstacles

Using AI language models to help robots navigate challenging environments

Grounding LLM Knowledge for Robot Manipulation

Grounding LLM Knowledge for Robot Manipulation

Teaching robots physical common sense through language models

Revolutionizing Soft Robotics with Meta-Origami

Revolutionizing Soft Robotics with Meta-Origami

A novel approach to programmable nonlinear inflatable actuators

GPT-4 Powered Robotics

GPT-4 Powered Robotics

Real-Time Reactive Framework for Safer Robotic Behavior

GenSwarm: AI-Powered Robot Policy Generation

GenSwarm: AI-Powered Robot Policy Generation

Automating multi-robot control with language models

Intelligent Robot Planning with LLMs

Intelligent Robot Planning with LLMs

Enabling Smooth Human-Robot Interaction through Contextual Planning

Teaching Robots Common Sense

Teaching Robots Common Sense

A New Framework to Help Robots Understand Their Environment

Smart Robot Decision-Making

Smart Robot Decision-Making

LLMs Autonomously Selecting Optimal Control Strategies

Decoding Self-Handover Behaviors

Decoding Self-Handover Behaviors

First Systematic Taxonomy of How Humans Transfer Objects Between Their Hands

Teaching Robots to Understand Human Language

Teaching Robots to Understand Human Language

A framework for converting verbal commands into precise robot movements

Smarter Shelf-Picking Robots

Smarter Shelf-Picking Robots

Multimodal Human-Robot Collaboration for Warehouse Efficiency

Structured Scene Understanding for Robotic Planning

Structured Scene Understanding for Robotic Planning

Improving state grounding capabilities through domain-specific scene graphs

Building Smarter 3D Vision for Robots

Building Smarter 3D Vision for Robots

Enhancing robotic understanding through diverse semantic maps

Robust Robotic Rearrangement

Robust Robotic Rearrangement

Using Language Models to Handle Task Disruptions

Language-Driven Robot Adaptation

Language-Driven Robot Adaptation

Using LLMs to Transform Robotic Trajectory Planning

RoboTwin: Advancing Dual-Arm Robotics

RoboTwin: Advancing Dual-Arm Robotics

Creating realistic digital twins for complex manipulation tasks

Key Takeaways

Summary of Research on Robotics and Automation