You know that feeling when you want to train a drone to land on a moving car, but the car simulator and the drone simulator are two different programs talking over a network, and your sensor timestamps are always slightly off? Every paired aerial-ground frame you collect carries an interpolation error, and the synchronization tax grows with every extra sensor you bolt on. Bridge-based co-simulation between CARLA and AirSim adds 1,000–5,000 µs of cross-process sync per frame, and weather/lighting drift between the two renderers corrupts cross-view perception data.
Unreal Engine 4 enforces one rule the authors had to dance around: each world can have exactly one active GameMode, and CARLA's GameMode is welded to its traffic and weather subsystems. AirSim's flight controller, fortunately, is just a regular Actor — not a GameMode. So the authors write a new class CARLAAirGameMode that inherits CARLA's GameMode (keeping all its ground machinery), then spawns AirSim's flight actor at BeginPlay as a normal world entity. Two RPC servers run side-by-side in the same process — CARLA on TCP 2000, AirSim on TCP 41451 — so existing CARLA and AirSim Python clients connect without modification. Every sensor read happens on the same physics tick, so all 18 sensor streams share one timestamp.
If you are a robotics or embodied-AI researcher who needs paired aerial/ground sensor data — VLN/VLA dataset construction, cross-view perception, cooperative landing or escort policies — and you have been duct-taping CARLA and AirSim across processes, this gives you back your timestamps. Not useful yet if you need GPU-parallel multi-environment RL throughput (Isaac Lab / OmniDrones still win on sample efficiency), if you need >2 drones in one scene (functional but not validated), or if you can't tolerate ~20 FPS under joint load.
Yes, if your work specifically needs spatially and temporally co-registered aerial-ground sensor data — there is no other open-source single-process option in this niche today. Treat it as beta: the authors validate exactly five workflows (precision landing, VLN data, 12-stream capture, cross-view perception, RL env), explicitly flag high-density traffic and >2-drone scaling as not-yet-validated (§6), and the engine is locked to UE 4.26 while Microsoft's Project AirSim has moved to UE5. A reasonable bet for a research team, a risky one for a production roadmap.
Deep-dive insight, Easy and Pro modes, plus action playbooks — the full breakdown is one tap away.