If you haven’t seen the cranium-scorching, rendering research of Kevin Karsch, watch it, we’ll wait… Now that your brain is charred and crumbling like so much burnt popcorn, you can well understand why, after seeing this video, we had to get the low-down from Kevin himself. So… we blind folded him, spun him around, drugged him, packed him in a shipping container, and shipped him off to SolidSmack HQ for questioning (not true). While image-based modeling isn’t unheard of, Kevin and team are approaching new ways of inserting 3D objects into 2D scenes, and as Kevin explains, images are just the beginning.

Creating a Scene

Kevin Karsch is currently a PhD candidate at the University of Illinois and will be presenting the research this December at SIGGRAPH ASIA 2011. As the abstract states, this is “a method to realistically insert synthetic objects into existing photographs without requiring access to the scene or any additional scene measurements.” While some would chock this up to simple photo-editing, the technology and the possibilities extend far beyond.

SolidSmack: What’s your background, and how did you start writing graphics software?

Kevin Karsch: I think I wrote my first graphics program in a high school course; it was an embarrassingly simple variant of Pong :). This course actually led me to pursue computer science and video game development for my undergraduate degree at the University of Missouri, which is where I first started doing research in graphics and computer vision (i.e. extracting high-level knowledge from pictures and/or videos) research. Now, I’m a PhD candidate at the University of Illinois working with Prof. David Forsyth and Prof. Derek Hoiem, and my research focus is a mix of both graphics and vision.

SS: What first interested you in the possibility of rendering object directly into 2d photographs?

KK: The concept of being able to interact with an image the same way you would interact with the physical scene was very exciting to us. Plenty of applications arise if this is possible; not only inserting objects, but removing existing objects, modifying material properties, adjusting lights, and so on. We decided to attack the insertion problem first.

People are pretty good at determining the physical makeup a scene by just looking at a picture, but performing these edits can be extremely difficult and and tedious with current tools. Our goal was to enable users to do so quickly and intuitively, which we were able to achieve by piecing together many state-of-the-art research algorithms (some of our own, and some existing).

SS: Can you explain some of the technology behind the process in layman’s terms?

KK: In order to insert a 3D model into the picture, we need a 3D representation of the scene, including geometry, lights, material properties, and camera parameters (focal length, camera position, etc). The user provides some high level knowledge, such as scene boundaries and the location of the light sources by marking in the image. With this information, we can then automatically compute a rough model of the scene. The geometry and camera parameters are computed using a technique to obtain 3D structure from 2D points in an image; this is commonly referred to as single view metrology (see the IJCV paper by Criminsi et al.). Using the geometry and camera, we choose (through numerical optimization) the best material parameters and light source positions so that the rendered image of our reconstructed 3D scene best matches the original image. There are also some details for dealing with shafts of light and occluding object boundaries, and we are able to estimate models for these as well after a small amount of user markup.

SS: A paper on image-based modeling and photo editing was published in 2001 that also uses the idea of a single photo as input. How does your research differ and when will the application be applied directly to 3D rendering software?

KK: The referred paper offers a great set of tools for getting relatively accurate models of geometry from some user annotation, and also shows how to determine material properties such as reflectance using the geometry. Our paper actually uses a very similar technique to estimate material properties, but we require a much less detailed model of geometry, and thus require much less annotation. The primary difference is that we also estimate camera parameters and lighting information, which is key in inserting synthetic objects. We’ve found that simple models of geometry are sufficient for most insertions, since flaws only arise when inserted objects interact with geometry that has been modeled incorrectly (and it’s been mentioned in psychophysical literature that people aren’t very good at picking up on these inconsistencies). However, the suite of tools presented in this paper would wonderfully complement our technique if one requires more accurate geometry, perhaps for physical simulations.

We’d love to see this technology incorporated into 3D modeling and rendering software as soon as possible. We’re working with our university to make this happen.

SS: In the battle of CPU vs GPU rendering, complex global illumination algorithms become even more complex when adding the estimation of lighting and physical objects. As Rendering technology progresses, will hardware develop more toward multi-threaded processing or the graphics processor?

KK: To me, from a research perspective, this really depends on how steep the learning curve is for parallelism on upcoming CPUs and GPUs. Right now, it seems that CPUs are winning this battle, and I believe this is why a majority of rendering software today is written for CPUs. Given this trend, it seems that multi-threaded processing will dominate over the next few years. However, this could change if writing code and debugging on GPUs can be done with the same simplicity as on CPUs. A third solution that may also arise is specialized hardware (some mix of what exists today) that is built specifically for rendering.

SS: An obvious next step is for 2d geometry to be automatically extracted and rendered in a real-time ray traced environment. Do you see this happening and what are the challenges that need to be solved to do this?

KK: I agree; this would have great ramifications for augmented reality and a number of other applications! I imagine it will become a reality a few years down the road, but many technologies need to advance and come together. I think now that depth cameras are widely accessible (e.g. Microsoft’s Kinect), it’s only a matter of time before it will be possible to automatically infer light sources and materials, and then depth/geometry comes for free from the Kinect. The biggest challenge may be in making the system real-time, which would likely require efficiency improvements in the estimation and rendering stages, and probably faster hardware as well.

SS: As you develop any fundamental technology like this, you’re bound to discover things you didn’t expect along the way. There are some really clearly defined use-cases for this technology as outlined in your video, but are there any less-obvious applications you’ve come across along the way? Any results you didn’t expect, but turned out to be useful?

KK: One interesting application we’ve heard about is using this technology to insert objects into historical photos for educational purposes. We’ve also found that it may be useful for quickly creating backdrops for advertisements, and potentially allow low-budget films to compete with high-end effects companies. We’ve also had great deal of questions regarding our technique for videos (rather than still imagery), which seems to have applications in real estate, architectural design, and home redecoration, among others.

We’re currently putting together a proof-of-concept video showing a simple extension to our technique that allows for objects to be inserted accurately into videos that requires no extra input from the user, so we may be able to see these applications sooner than we realized. We have a bunch of ideas for future research, and hopefully this work will encourage other groups to explore new ideas on this topic as well!

A big thanks to Kevin for discussing his research with us. If you would like to find out more, you can visit the project page with abstract and publication that goes into greater detail about inserting objects into scenes and how it’s being done.


Josh is founder and editor at SolidSmack.com, founder at Aimsift Inc., and co-founder of EvD Media. He is involved in engineering, design, visualization, the technology making it happen, and the content developed around it. He is a SolidWorks Certified Professional and excels at falling awkwardly.