pull down to refresh
100 sats \ 1 reply \ @aljaz OP 8h \ parent \ on: dawn of the token based economy econ
have you looked into combining large action models with llms for orchestration?
there was a research paper explaining the methodology of using large context frontier models as planning and smaller locally hosted models for execution, also used for working with sensitive data since no data was shared to the cloud based model, only processed by local ones. but i can't find it now...
I've been looking into something like that: I envision a work queue where:
- I spin up AWS Inf2 instances on demand that I pack with a large 100B+ instruct tuned reasoning LLM or maybe a LAM (but i'd have to learn more about that first). These do decomposition, review and maybe even prompt tuning?
- Local m4 box(es) then run smaller models like
devstral
orcodellama
for actual operations.
reply