I have been using more than one PC at a time for some time now thanks to the awesome Barrier software KVM and having two laptops and no monitor for a time, and so I was prepared for some of the caveats of doing GPU passthrough.
For work, it's kinda really a bit of a hassle, I ended up writing a bunch of neat little SSH based scripts that do builds and runs and such on my code, but it was nicer using IntelliJ's inbuilt test running interface, even still, I swear the graphics run better when you don't let the PRIME/etc syncing hooey run and justs directly use the display - on these common, relatively cheap AMD ryzen 5500M series laptops, the framerate and latency is amazingly improved by using a VM and monopolising the two video outputs with the ring0 and a VM just for the discrete video.
I mean, seriously, the performance is often said for most cases doing this with for example an intel cpu with an nvidia card, to be only a little bit slower, and you can run this from a linux base, meaning windows is the guest.
You can expect a few wrinkles with windows drivers refusing to function because they are inside a VM (to stop you buying cheaper game card to do compute) but once you fool it, it runs great, and on this AMD ryzen hybrid laptop, I have actually seen a definite rise in performance at least 20% or more. This is definitely for those of you who like me have a laptop not for travel but because it was the cheapest way to get an adequate rig for work and play at the lower price point.
One thing that I think helps was a bunch of kernel tweaks that seem to help a lot with managing the PCI - even with windows none the wiser, because it is VM the IO scheduling is still handled mostly by linux. This is the set I use:
#!/usr/bin/bash echo 0 > /proc/sys/vm/compaction_proactiveness echo 1048576 > /proc/sys/vm/min_free_kbytes echo 5 > /sys/kernel/mm/lru_gen/enabled echo 0 > /proc/sys/vm/zone_reclaim_mode echo never > /sys/kernel/mm/transparent_hugepage/enabled echo never > /sys/kernel/mm/transparent_hugepage/shmem_enabled echo 0 > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag echo 0 > /proc/sys/kernel/sched_child_runs_first echo 1 > /proc/sys/kernel/sched_autogroup_enabled echo 500 > /proc/sys/kernel/sched_cfs_bandwidth_slice_us echo 1000000 > /sys/kernel/debug/sched/latency_ns echo 500000 > /sys/kernel/debug/sched/migration_cost_ns echo 500000 > /sys/kernel/debug/sched/min_granularity_ns echo 0 > /sys/kernel/debug/sched/wakeup_granularity_ns echo 8 > /sys/kernel/debug/sched/nr_migrate
I forget where I gathered them from but they do definitely help.
I don't know what the standard hybrid management system is doing but it sure doesn't seem like they couldn't do it better. Really, you'd think that the GPU could dump frames into memory that is the buffer for the next frame to be rendered on the display, but there seems to be steps in between this, at least one blit more than just copying it over the PCI express bus.
The only thing that would make this better is if there was an easy VM I could just download and run that exposes a remote, accelerated X server and a simple script to configure my base system to make the fake display and join together the shared memory (paint all the windows with the dGPU).
The other thing that this has opened me up to is the idea of compartmentalising my applications inside them. Like how QUBES does but not quite so all-singing-all-dancing... but to be able to deploy the app in a container and connect it to another container that controls a display or window to display X apps...
Any virtualisation experts or junkies with 2 cents to drop in the hat about their experiences with VMs and tools and distributions, please let us know your own pro tips!