@ryanramage I'm giving the agent a tool call to use LLaVA to "see". I'm going to allow it access to the screen, an image file, or the camera.
I want a flow along the lines of:
- "Hey, save this text as a reminder for later"
- tool("see", "extract the text from the image") => take a pic and run through llava
- tool('save', "{summarized text}", ["reminder"]) => save to local database for later
- response: "Saved!"
- "What was the last reminder you saved?"
- tool('load', 'reminder', {limit: 1})
@mauve cool project. Will like to follow along
@ryanramage yeah feel free to follow the main repo: https://github.com/RangerMauve/mind-goblin
Gonna push my latest version in the next week or so.
@ryanramage All within a few seconds with as little power or ram usage possible.