reply on: Would you use AI on phone if it was actually good? \ stacker news ~AI

pull down to refresh

105 sats \ 2 replies \ @optimism 3 Aug \ on: Would you use AI on phone if it was actually good? AI

Yes I would, but I don't think I'd like it in the way you're describing. Instead I'd want to augment specific processes: not give the model access to everything, but instead, allow everything to access the model. This may feel counter-intuitive, but I see this as "multi-modal LLM" being a (permissioned) API with a service worker behind it, just like camera or microphone.

For example:

Amber doesn't need LLM, so that doesn't need the permission.
Obsidian could use LLM, so that does need the permission, optionally, and when I enable it, it will use it.

This can then be extended to have also a knowledge cache in the same way, so that an app (not a centralized process) can submit new knowledge (for processing and then caching) and query it, much like your "second-brain" idea:

You made a picture of a menu and allowed the "knowledge" it contained to be added that cache last year, and then when you make a picture of the menu for the same place this year it will tell you that your fish taco is now only 3k sats instead of 10k, but you also get only 1 instead of 2 for that money.

200 sats \ 1 reply \ @zuspotirko OP 3 Aug

not give the model access to everything, but instead, allow everything to access the model. This may feel counter-intuitive, but I see this as "multi-modal LLM" being a (permissioned) API with a service worker behind it

Interesting.

I mean both approaches would be behind a "safe" API. But I hadn't thought about whether it would be nicer if the application uses the LLM as an API or the LLM retrieves information through an API.

Is one of these inherently more powerful than the other? Is one inherently safer than the other? If so, which way and why?

38 sats \ 0 replies \ @optimism 3 Aug

It feels to me like the llm-to-app interface is both more powerful and riskier than app-to-llm, but, app-to-llm is easier to both standardize and optimize. I think it really depends on what you want to achieve.

There was a nice post coming in via HN this Monday, #1057610, that basically discusses that chatbot interfaces suck. I subscribe to that thought and feel that prompt writing equals inefficiency, but it's how the LLMs are trained: to be a chatbot, a companion.

However, I believe, like the author of that article, that the better application of the technology is not interactive, but a background task incorporated into the process, rather than besides it.

If you want a chatbot, the mechanism I propose will probably hinder adoption because it requires adoption per-app. It's always cheaper to just circumvent everything and not ask for permission, but then you will quickly run into shenanigans like #1052744. I'd really not want any unchecked capability that can do this on any of my devices, so the slower adoption is imho worth it. ¹

One of my favorite things nowadays is that I get a "DCL attempted by <bad app> and prevented" message from GrapheneOS, just like I've always loved SELinux, despite its complexity. It's always nice to have OS-level (and hardware) protections against naughty software. ↩

Footnotes