good summary, looking forward to seeing more experimentation around reward policies too