SCAI Autonomous VTuber

By Deccyf • May 3, 2025

So, I have been given a task to work on a new feature for everyone. It is quite popular among streamers on all platforms now.

VTubing.

I touched upon this a while back with some free assets from itch.io and managed to get a PNG Tuber to react to a bunch of test prompts. It would react with numerous different poses.

The project took a backseat while the team worked on other features, we felt were more important to our users.

--

We revisited the idea again and I was given a task to revive the old project.

However, this time it isn't tracking your webcam and/or your voice, it is autonomous. Initially I thought it would be designing tracking software and the like, but the boss quickly corrected me on that.

So, it could in theory support you while you stream. It could talk to your chat and now have a face.

It hasn't been an easy task, initially it was researching the idea and whether we build our own Browser Source/Application to run alongside your OBS or whether we could use API's with current VTubing software.

Both having their pro's and cons. Creating our own Browser source would mean that we could do everything in house and be completely reliant on our back end supporting the new feature. However, it would mean a scope of work that I wasn't ready for. An application would mean outsourcing to a contractor that might take a while.

Whereas, using API's with current available software means we don't need to do a lot of the hard work, we can write scripts that will do what we ask them to do within the scope of the software already developed.

--

It was quite daunting at first, it's one thing using a free asset to flick between poses, but now I need to find a way to animate or link to some animation software to move a fully rigged and functioning 2D or 3D VTuber.

Initial sources I found were vmagicmirror and VTube Studio. Both having their obstacles but still available for free. VTube Studio being available for free on Steam and easy to use. vmagicmirror was suggested to me and it supported the 3D Skye model we had commissioned a while back. However, it did not support API calls, so we couldn't use that.

VTube Studio was where I began, as it was the easiest 2D software and highly regarded among VTubers I found on YouTube and Twitch.

As I got started, I had to think about mouth movement, blinking, hand movement for full body VTubers. It wasn't as simple as jagged frames between different .png poses. So, my first focus was mouth movement, making the mouth move was important as we want it to come across as real as possible, it needs to emote, it needs express and it needs to look like it's talking.

After a bit of digging and some prototypes, I managed to provide a proof of concept that we should initially run with the API calls, with software such as VTube Studio and build from there.