After looking at this for 2 hours I think they did train a lot on Unreal Engine.
Like, there is something blender-esq to them. If they trained primarily on movies, you'd expect more cinematic and analog-grain looks. If it was lot of youtube you'd expect a more iphone-esq look.
I wonder if they have another internal AI that creates unreal engine setups programatically. Like generative adversarial networks but like 3-way.