> Basically the focus of this project is its ability to know which part of an image you're referring to. They show a few examples but this can be a single point, box or freeform trace around the object.

> It then uses the model to convert (ground) that into a specific part of the image, they visualise this as a bounding box around it.

> Currently you'd have to ask "What's that thing on the table?" or trust a service like ChatGPT to correctly understand the part of the image you've circled. This is meant to make that way more accurate.

https://www.reddit.com/r/apple/comments/18pa866/apple_ferret_llm/