A few weeks ago Charlie Holtz (@charliebholtz) showed a pretty cool demo using GPT4’s multimodal abilities to caption and narrate a screenshot of you which was then passed to ElevenLabs to speak out in the voice of David Attenborough with their cloning service.
This demo required you to run the code on your machine using python. I wanted to see if it could be done in the browser since all the generative AI capabilities were done through APIs.
All that was needed was the ability to capture a stream from a users webcam and take a snapshot at required intervals. This is fairly trivial these days in the browser. Continue reading.