Technology - Visual Hull Motion Graphs

(This article is sponsored by The Boston Group)

Recently, I had the good fortune of graduating from MIT with a M.Eng degree in electrical engineering and computer science. My Masters thesis was the culmination of almost two years of research at MIT's artificial intelligence lab in the field of computer vision.

What is computer vision? The field is quite broad, but one main research problem is to infer properties of a 3D world from 2D images or video. The hope is to model how humans see the world – how we track and recognize objects in an intelligent fashion. For a computer, vision is through the camera lens and intelligence is through artificial intelligence (i.e. algorithms).

Over the past decade, vision research has moved quickly with the digital age. Interfaces between people and technology are advanced and intelligent. Diverse applications include 3D teleconferencing, speech, face, and gesture recognizing agents, and 3D laser guided surgery. As devices become more complex with access to information databases, intuitive human-computer interaction is essential in bridging the gap. Vision research has helped to create these superior interfaces.

For my own research at MIT, I studied the pose and motion of people. First, building on past research, I engineered a system to acquire the 3D shape of people and objects from 2D images taken from multiple cameras. The idea was to place eight synchronized cameras in a room, point them toward a subject, and capture sequences of motion. From the 2D images of all cameras at different angles, I reconstructed the 3D shape of the person in the scene using a shape-from-silhouette estimation technique. At each time frame, I stored the resulting 3D shape – the Visual Hull.

In general, 3D shape acquisition from 2D images is highly useful. In visualizing and grasping information, humans often do best in the 3D domain. While a picture is worth a thousand words, a model or replica of the objects in the picture is even better. Medical scans of tissue reveal details over cross sections but a 3D model offers more precision and information. Watson and Crick won the Nobel Prize after visualizing 3D helical bands of DNA from 2D X-ray crystallographic blueprints.

The idea of Visual Hulls is applicable for low-cost object digitization and CAD modeling. In addition, the Visual Hull is popular in graphics applications. A 3D model may be placed in various dynamic scenes, lighted, and viewed from different directions. Movies like the Matrix show slow panoramic video scenes using multi-view data from many cameras similar to the image data I captured from my lab setup.

As 3D information is important, the next generation of complex scanners and modeling tools are being built. A side problem that I worked on was to simplify 3D models. Essentially, a 3D model is a mesh surface comprised of thousands or even millions of triangles. The increasing complexity of meshes has created the need for compression algorithms and multi-resolution analysis. My research included ways to represent meshes with fewer triangles while maintaining their structural integrity.

The first goal was to develop a system to acquire and process 3D shapes. Using this information, what further analysis would I want to do? From a vision standpoint, knowing the geometric shape of an object, I could use it for object recognition or motion tracking. For example, I could learn human walking patterns or build classifiers for certain hand gestures. From a graphics standpoint, I could use the geometric shape in animation applications.

For the second half of my thesis, I chose to analyze patterns of pose and motion by constructing motion graphs. A motion graph captures a sequence of motion and if there is repetitive motion, the graph contains cycles. I could synthesize new realistic motion by randomly traversing different cycles in the graph. In addition, by matching patterns in the graph, I could detect whether a sequence of motion included a specific action. Most importantly, my motion graph included a Visual Hull shape at each node so I had the option of changing the camera view as well as changing the motion. One major goal of research is to match and transfer motion seamlessly from one character to another even in the presence of complex clothing.

Some of the problems one can expect to encounter in this kind of research are automatic camera calibration, multi-view geometry, Visual Hull acquisition, 3D modeling, mesh compression, rendering, shape matching, and motion graphs. Many of these areas are cutting-edge research. Research groups across the world present results at each conference.

Completing a Masters research gave me confidence to pursue my own ideas. Having just read "Surely You're Joking, Mr. Feynman" written by Physics Nobel laureate Richard Feynman and "Relativity" by Albert Einstein, I realize how much inspirational vision top researchers possess. In the future, if I feel some of that spark again, I may go back to get my Ph.D in vision, optics, space navigation, or a field yet undiscovered.