Cartoon Scenes to 3D Models: This AI is Controversial
I found this Toon3D, an AI that turns hand-drawn cartoon scenes into 3D models.
My YouTube video: https://www.youtube.com/watch?v=Rt4D5RtnbFE&list=PLowcSKlQikQF3aty50SatxA6sBuaJIkux&index=15
The project page: https://toon3d.studio
Motivation
Why is it impressive? The challenge here is that, because a 3D object in the cartoon never actually exists, artists will have to imagine how it looks from different angles, and that drawing can be inaccurate.
This is why the authors built Toon3D. They want to correct these inconsistencies in 2D drawings and recover a plausible 3D structure. It gets us to experience cartoons from very new viewpoints.
What do I think of the work? Well, first let’s talk about the good stuff. I think it’s an interesting research effort.
It could potentially aid the pre-production stage when artists make cartoons. Say an artist draws one single angle of a house or a car, and then wants the scene from a different angle, they can use Toon3D for some reference point and draw on top of it.
Method
Before I share more about its limitations, let’s first take a closer look at their research methods. Don’t worry, I’ll try to make it easy to understand.
In general, here is how to reconstruct a 3D object. When we have a few photos of a REAL object from different angles, we can use NeRF or Gaussian Splatting to put all the information together to get a 3D model. This requires 3D consistency, that is, the object needs to maintain its shape and color when the camera moves around.
Unfortunately, 3D consistency is almost impossible for hand-drawn images. So is there a way we can associate the parts in different frames, to tell the AI that the two points or objects are the same thing? For this purpose, the authors developed a data labeling tool.
Then, the authors built their algorithm on top of COLMAP, a popular AI pipeline for 3D reconstruction. They claim the original COLMAP fails to reconstruct cartoon scenes.
Well, if I remember correctly, COLMAP performs well on 3D reconstruction for Sora videos. Those videos are generated by OpenAI’s text-to-video model and are not necessarily 3D consistent. Somehow, COLMAP can fix the inconsistency to some extent. I guess hand-drawn cartoons are just a lot harder to reconstruct — they deviate so much that existing methods can’t fix them anymore.
Therefore, the authors made modifications and have three stages for the NEW pipeline:
- Step 1: we use additional data labels to align the sparse points in 3D to estimate camera poses.
- Step 2: we adjust the RGB images and depth maps, trying to make a consistent, aligned point cloud.
- Step 3: we initialize a 3D Gaussian Splatting representation from the cloud, creating an immersive visual experience.
Criticism
This research work has sparked a lot of discussions, including criticism.
- Many people feel the current results look quite poor, with lots of artifacts, noise, and geometric inconsistencies.
- Some also wonder why we need 3D reconstruction for cartoons at all: “…artists stylize 3D scenes to emphasize things for aesthetic reasons. This is especially true for something surreal like SpongeBob. The artists are trying to make things look good, not realistic. And they aren’t trying to make humans reconstruct a perfect 3D image — they are trying to evoke our 3D imaginations.”
- What if PRECISE 3D structures are needed for a cartoon? Remember that Toon3D requires tons of manual labeling of corresponding components. With the same amount of effort, direct 3D modeling is likely easier and gives better results.
Despite all the criticisms and limitations, overall, I think it’s an interesting research piece. It shows the advancement of technology and future possibilities — that’s why I wanna share it with you.
Concerns
You may wonder, what if AI improves in the future and creates better cartoons than humans? Here’s a 360 video handmade by artists from the Sponge Bob team. What if someday, an AI can create 3D cartoon scenes like this?
Honestly, I don’t think AI will ever replace humans in making cartoons. It’s not the intent of this research paper, and its results even serve as reassurance. From Toon3D, we realize artistic expressions are so rich and subtle beyond a direct, precise 3D representation of the real world.
I’m personally a fan of Mr. Miyazaki from Studio Ghibli. When somebody showed him a clip of AI-generated animation, he looked so upset.
In his heart and that of many other passionate individuals, human creativity is the most precious thing that no machine can possess. Hope we’ll find a way to make AI not to replace us, but to help us to fulfill our potential.