Love seeing this area being developed. I think this is even too ambitious. If you would focus on professional VFX artists and follow their day to day workflows it’s quite evident that there is lot of room to improve just in the basic tools.
So far my feeling with deep learning tools is that they are too heavy handed to be usable. I’d just need small helper tools that I could integrate to existing software. No need to try to do everything for me. Just make the existing stuff ”smarter”.
Keyer that actually recognizes actor from green screen. Color corrector that gives default settings that match two elements together. Painter that fills area based on the surroundings etc. All this but so that I can take the input and continue from there.
Or maybe I am seeing first time a car and keep hoping it would be a faster horse.
Very well worded and helped me clarify my thinking on these models - thanks! Definitely agree, tho I don’t have experience in this particular domain. “Replace humans entirely” seems like a foolish goal for professional art; not only would audiences have an aversion, but LLMs by definition aren’t deterministic/symbolic so you’re going to end up fixing small details anyway.
All of the examples are extremely easy shots. Anything complicated is likely still going to need real mocap type work.
Also look at the faces of all of the characters - they have very little facial movement. Lipsync is much easier to visually match than something like subtle facial expressions.
What this may do well is automate the low hanging fruit (easy shots) and provide a nice starting point. It also very well might end up being mostly a temp tool before a final high quality pass is done by a competent artist (and temp VFX is a whole sub industry that is quite important!)
The videos which are made to show promise of a future due to investments in advanced R&D today.
These (concept) videos are made usually for client, exec presentations, or for trade shows etc. Ex. a healthcare company makes a video of 'Hospitals 2050' where they show sci-fi environment with nurse robots, transparent monitors etc. and how they still would be a leading player then because they are investing in some 'xyz advance med-tech' today.
Judging any AI "this will disrupt X" claims based on what you're being shown right now is missing the point of the claim. Is there enough here to cause disruption with continued development? Yeah the faces are bad: have you seen how much the state of the art has improved over only two years? For how much longer do you think those faces are going to keep looking bad? Two major versions? One? Not even one?
When it comes to graphics AI, don't look at right now, look at the trajectory. Is this something final, or is this just the minimal viable release that's going to be the starting point for (radical) improvement?
Are you sure? Facial MoCap (motion capture) and human dialogue is pretty hard to get right in animation, much harder than body mocap. I agree with some of the other comments that say that this is somewhat of an advancement over existing body animation tools but it doesn't show how it can solve face mocap.
For current tech that Wonder Studio is competing with, you would want to compare against Rokoko AI MoCap [1] and something like Plask [2], but I don't see anything that solves face mocap this easily just yet.
You're not seeing anything because the facial mocap solutions are extremely widespread. Back in the 90's I remember software like Softimage's Face Robot having wide use at animation studios and that procedural softbody style being used because it allowed easy transitions between mocap, animator, and procedural controls. About 15 years ago FaceShift was the inspiration for many VFX houses to start their own ML groups to produce facial capture pipelines. The human face has had the most attention, and due to that the in-house facial animation pipelines are very hard to duplicate as commercial products - they are quite sophisticated, with reasons that make less sophisticated methods look pale where it matters: in the facial performance.
You guys say "no way it's real". I actually dont see what's difficult here (big short cut). It's like motion capture but AI finds and tracks the landmarks or "ping pong balls" on the actor for you. To be clear, the AI isnt rendering, the engine is rendering the character the AI helped to pose. The AI just says where the bones are essentially.
Former VFX developer/artist/analyst & VFX patent holder here. This is a combination of known techniques integrated with a nice UI. I expect current VFX studios are already past this point in their internal tools.
Posted this in another comment and re-wording, yeah this would be similar to something like Rokoko AI MoCap [1] and Plask [2], which are very close to this in terms of motion capture (mocap) but Wonder Studio seems to have added awareness of lighting and other features on top of basic mocap.
Looks cool and would be a neat tool for independent film makers on small budgets or creators on things like YouTube. But I don't see how consolidating motion tracking, rotoscoping, animation, lighting, compositing etc. into a single web app (!) would really help any serious VFX studio.
Seems like it'd be better as a set of distinct tools. E.g. a camera motion tracker, an actor motion tracker, a clean plate generator, HDRI generator, etc. That way it would be easier for a studio to add it to their existing VFX pipeline.
I'll believe it when I see it - that is, the results verified by some 3rd parties. Now it's just a company pitch with some cherry-picked results put up together for a demo.
Seems way too good to be true, but I guess we'll see. IF it does end up working as advertised, I can see a very bright future ahead for indie filmaking, as well as a lot of shovelware style CGI heavy movies and shows due to the lower bar for entry, not that that is a bad thing.
1. You can only choose from pre-build characters (the poster-style thumbnails of the characters implies something different)
2. I don't think you can actually move the camera (or just wiggle it a bit)
They just carefully chained a bunch of things:
- segmentation to isolate character
- in-paint for character removal and a 360 HDR photo sphere for lighting
- pose estimation (and something equivalent for the face, but that was a bit weak I felt)
- estimate camera position from tracking points on the floor
- render hand built 3D model from poses and with light sphere
So the shortcomings for now are:
- you can only use pre-build humanoid 3D models
- you still need to film it with real actors
- face tracking doesn't seem that good yet
- lighting is semi-convincing
- no interaction with the environment (not even shadows)
- some visual artifacts from the character removal (I think mostly segmentation failing and hairs getting out of bounds, you can see it in the last shot)
At 00:37 robot's hand is way thinner than the original actor's, but the missing parts of the background are perfectly reconstituted. I realize that Photoshop could do this few years ago, but here it looks a bit too flawless.
Perhaps I've misunderstood, but it looks to me like the actors are green-screened into a [cg] scene. I don't think the scene is real actors in a real setting, then modified to replace actors with characters. Thus, there's no problem with the in-painting, the whole actor is on a different layer?
I'm seeing it in my phone. Like I said, maybe I misunderstood, but the scenes didn't seem like real places (eg shot cropped so foot-shadows aren't included, as this is hard to get right in the composition).
If you look closely at the "Clean Plate" demo right at the end, there is a shimmering where the character was removed. This isn't a criticism of the product, because I certainly can't do better myself - and they might be pushing the realms of what is possible right now... so kudos, it's amazing stuff. But it's not perfect, and perhaps it just has to be "good enough" for the target market (which based on previous comments is not big VFX houses with the resources to create their own internal tools).
So far my feeling with deep learning tools is that they are too heavy handed to be usable. I’d just need small helper tools that I could integrate to existing software. No need to try to do everything for me. Just make the existing stuff ”smarter”.
Keyer that actually recognizes actor from green screen. Color corrector that gives default settings that match two elements together. Painter that fills area based on the surroundings etc. All this but so that I can take the input and continue from there.
Or maybe I am seeing first time a car and keep hoping it would be a faster horse.