Inside the Robosapien V2 toy robot: A talk with its creator (Mark Tilden)

I wrote this blog for the now-defunct eebeat.com technology site. It appeared on Monday, December 4, 2006, but I think it's an interesting and fun read.

At the most recent Digital Life Show in New York City, I was accosted by a two-foot-tall toy robot who was barking out commands and strutting his stuff for me and the rest of the gathering crowd. I moved up close to where the brash robot confidently stood, and was introduced to the Robosapien V2, a humanoid robot put out by WowWee Toys that impressed me as being quite a feat of modern engineering.



So, I contacted physicist/roboticist Mark Tilden, the developer of this robot (and others) to find out what makes this bold bot tick:

EEBEAT: How many and what types of sensors are being used in the Robo V2?

TILDEN: We have CMOS XVGA color visual tracking and recognition, IR radar reflection, sharp audio detection and location microphones, as well as touch and tilt sensors throughout his body. All of it is custom built and aligned, so he runs the field from simple to complex just so he can interact with his user intelligently.

EEBEAT: How does the camera work? What is the vision system?

TILDEN: Actually it’s an old vision technology used for about 2 billion years. It’s a biologically inspired radial-retina arrangement where centroids of colorful objects generate direction vectors toward the visual center. Such an arrangement reduces data down to one byte per object, simplifying the work for our overclocked toy-grade processors. The eye does all the processing in real time.

EEBEAT: How is signal processing being accomplished?

TILDEN: For the vision system, we have a custom DSP running a series of standard vision algorithms. It also handles the camera module. The actual decision processing happens in his slow mainbrain though.

EEBEAT: What kinds of motors are involved?

TILDEN: We build our own custom motors and gearboxes, pushing the concept of “toygrade” to the limits while still keeping costs low. In the RS2 we use 12 standard M130 motors for all actions, with a single M21 motor for the eye action. This type of modularity keeps costs low while encouraging high reliability. It’s always amazing how consistent a component gets when you order 27 million of them.

EEBEAT: How many engineers worked on this? What was the engineering effort like?

TILDEN: The first prototype was designed and built myself from parts of the original Robosapien. I use adaptive analog Nervous Nets to find the natural resonances for walking and action, then set these patterns in digital silicon for reliability. Once it’s working fast and efficient, we add the playspec from a variety of sources: (”You know what it should do?” is always the first thing I hear.) Then it goes to the designers who have to make it look great. There’s then a team of about seven that spends 6 months melding the function and the aesthetic before it is presented to the world buyers. If they like it then the prototype moves on to “tool” where factories will set up assembly lines. In the RS2’s case, these were up to 3,000 girls long to handle and test all 1,500 parts, not including all the managers, injectors, painters, packagers, quality control, etc. So in the end RS2 involved at least 10,000 people from two huge factories. As for the engineering effort involved, it was sweat, coffee, and a lot of late-night Chinese food.

EEBEAT: How much memory? Can you store content/data in the robot?

TILDEN: The two custom processors used in the RS2 feature a whopping 128 bytes of RAM each. Through hierarchical memory indexing though, you can link together his six programming modes to generate modular programs up to 4000 steps long - one of his more interesting play secrets. As for multimedia type content storage though, that’s been left to our next-generation LINUX-based humanoid robot, the RSmedia.

EEBEAT: How does he talk?

TILDEN: The RS2 processors are modified toy-sound-modules, so they come standard with a large masked-ROM array and DAC amplifier to render high quality speech and music. It’s the default mode for these chips before we turned them into stand-alone RISC microcontollers.