More

naveensky · 2025-12-04T03:54:17 1764820457

Why, just because they have similar clothing and background?

naveensky · 2025-01-24T23:03:28 1737759808

one another scenario is that if you open the domain from browser, they will do 301 redirect, but for traffic coming from Google/search engine, they will show their actual content.

maltelandwehr · 2025-01-25T11:12:53 1737803573

If this is done with SEO in mind, at first they will also do a redirect for Google Bot.

Then they build links to their domains. Once it has more backlinks than the real domain, the redirect is removed.

naveensky · on Dec 28, 2024

This is just so cool, all the best for your launch!!

naveensky · on Nov 23, 2024

Wow, this is nice. I don't know, but is it possible for us to study code for this? I would love to see how all of this is built.

I can understand you might have commercial obligation, so hoping Netlify can make this public :)

miguelspizza · on Nov 23, 2024

I would check out react three fiber if you want to see how people are building things like this. It essentially brings a component model to three js development and creates good standards for sharable code since things are just react hooks.

Rapier was brand new when I was making things in R3F 2 years ago. Glad to see how mature it’s gotten!

franck · on Nov 23, 2024

Thanks! Sorry there's currently no plan to make it open source at the moment.

naveensky · on Oct 10, 2024

Thanks a lot, glad you liked it :)

naveensky · on Oct 10, 2024

We are using Flux Pro version :)

naveensky · on Sept 6, 2024

For such models, is it possible to fine-tune models with multiple images of the main actor?

Sorry, if this question sounds dumb, but I am comparing it with regular image models, where the more images you have, the better output images you generate for the model.

andrew-w · on Sept 6, 2024

It is possible to fine-tune the model with videos of a specific actor, but not images. You need videos to train the model.

We actually did this in early overfitting experiments (to confirm our code worked!), and it worked surprisingly well. This is exciting to us, because it means we can have actor-specific models that learn the idiosyncratic gestures of particular person.

naveensky · on Sept 7, 2024

This is actually great, waiting for your API integration or replicate integration to get my hands dirty :)

naveensky · on Sept 6, 2024

Is it similar to https://loopyavatar.github.io/. I was reading about this today and even the videos are exactly the same.

I am curious if you are anyway related to this team?

lcolucci · on Sept 6, 2024

No, not related. We just took some of Loopy's demo images + audios since they came out 2 days ago and people were aware of them. We want to do an explicit side-by-side at some point, but in the meantime people can make their own comparisons, i.e. compare how the two models perform on the same inputs.

Loopy is a Unet-based diffusion model, ours is a diffusion transformer. This is our own custom foundation model we've trained.

arcticfox · on Sept 6, 2024

This took me a minute - your output demos are your own, but you included some of their inputs, to make for an easy comparison? Definitely thought you copied their outputs at first and was baffled.

lcolucci · on Sept 6, 2024

Exactly. Most talking avatar papers re-use each others images + audios in their demo clips. It's just a thing everyone does... we never thought that people would think it means we didn't train our own model!

For whoever wants to, folks can re-make all the videos themselves with our model by extracting the 1st frame and audio.

sidneyprimas · on Sept 6, 2024

Yes, exactly! We just wanted to make it easy to compare. We also used some inputs from other famous research papers for comparison (EMO and VASA). But all videos we show on our website/blog are our own. We don't host videos from any other model on our website.

Also, Loopy is not available yet (they just published the research paper). But you can try our model today, and see if it lives up to the examples : )

Stevvo · on Sept 6, 2024

[flagged]

csallen · on Sept 6, 2024

vunderba · on Sept 6, 2024

It was posted to hacker news as well within the last day.

https://news.ycombinator.com/item?id=41463726

Examples are very impressive, here's hoping we get an implementation of it on huggingface soon so we can try it out, and even potentially self-host it later.

cchance · on Sept 6, 2024

Holy shit loopy is good, i imagine another closed model, opensource never gets good shit like that :(

aaroninsf · on Sept 6, 2024

[flagged]

ricardobeat · on Sept 6, 2024

These papers are simply using each other's examples to make performance comparisons possible.

This is EMO from 6 months ago: https://humanaigc.github.io/emote-portrait-alive/

sidneyprimas · on Sept 6, 2024

We are not related to Loopy Avatar. We trained our own models. It's a coincidence that they launched yesterday.

In the AI/research community, people often try to use the same examples so that it's easier to compare performance across different models.

echelon · on Sept 6, 2024

You should watch out for Hedra and Sync. Plus a bunch of Loopy activity on Discord.

robertlagrant · on Sept 7, 2024

Not seeing other possibilities isn't great though, right? Clearly there are other possibilities.

zaptrem · on Sept 6, 2024

I know these guys in real life, they've been working on this for months and, unlike the ByteDance paper, have actually shipped something you can try yourself.

naveensky · on Sept 6, 2024

Is there any limitation on the video length?

lcolucci · on Sept 6, 2024

Our transformer model was trained to generate videos that are up to 8s in length. However, we can make videos that are longer by using it an an autoregressive manner, and taking the last N frames of output i to seed output (i+1). It is important to use more than just 1 frame. Otherwise ,the direction of movement can suddenly change, which looks very uncanny. Admittedly, the autoregressive approach tends to accumulate errors with each generation.

It is also possible to fine-tine the model so that single generations (one forward pass of the model) are longer than 8s, and we plan to do this. In practice, it just means our batch sizes have to be smaller when training.

Right now, we've limited the public tool to only allow videos up to 30s in length, if that is what you were asking.

leobg · on Sept 6, 2024

Video compression algorithms use key frames. So can’t you do the same thing? Essentially, generate five seconds. Then pull out the last frame. Use some other AI model to enhance it (upscale, consistency with the original character, etc.). Then use that as the input for the next five seconds?

andrew-w · on Sept 6, 2024

This is a good idea. We have discussed incorporating an additional "identity" signal to the conditioning, but simply enforcing consistency with the original character as a post-processing step would be a lot easier to try. Are there any tools you know of that do that?

naveensky · on Sept 6, 2024

Thanks for answering this. I would love to use it when APIs are available to integrate with my apps

naveensky · on July 9, 2024

I am curious, why would you like to do this?