So, how exactly was this done?
The system includes four main components: (1) cameras to track the movements of the ball, (2) the robot with the paddle attached, (3) the ball (or multiple balls), and (4) the software that runs the control loop and does all the calculations.
The cameras do all the difficult work to track the ball in the workspace. There are seven cameras mounted around the workspace (mostly on the ceiling, that's why you don't see them in the video) in such a way that they get a good view of the workspace and the objects that are being tracked even when the view is obstructed for several of the cameras (for example, by me moving around). The processing done by the cameras is very fast (up to 240Hz); for this particular experiment the frequency was set to 120Hz (which means we get new data every ~8.3ms). The cameras we used produced by Vicon and are frequently used for motion capture experiments (when you want to record and analyze how humans/robots move or use various objects). They are sufficiently good for our purposes, but unfortunately require use of markers to track. The ping-pong ball we wanted to work with has a surface very different from a marker and doesn't reflect enough light to be reliably tracked. To fix that, I covered it with reflective tape, which was an interesting problem to solve.
Covering the ball
The crux of the problem is that you need to take a flat sheet of the reflective tape and transform it into a curved sphere. To do it you have several choices (other than using more or less random patches to cover the ball): (1) use a truncated icosahedron (soccer ball projection), (2) use geodesic dome or geodesic sphere projections, and (3) use a sinusoidal projection with a large number of segments. Using random patches didn't work as they were poorly detected by cameras and, because of uneven edges, the ball was bouncing randomly. The soccer ball projection didn't work as the tape does not stretch enough to cover the sphere. The geodesic dome projection was too much work. The sinusoidal projection looked like the best option, but there was still a question about how many segments are enough (use too few and you get rough surface and edges, use too many and you get too much work to do). After a bit of experimenting I settled on 24 segments.
It turned out that there was no ready-to-use projection for the ping-pong ball and the number of segments I wanted to have, so I ended up creating a simple script that draws the projection I need so that I can print it. As the tape I had was too narrow to have the entire projection printed on it, so I stacked those projections to fill as much tape as possible. Here is the sinusoidal sphere projection for 24 segments (may not work in IE without canvas support) and the code (you can change the parameters to get a different patters: s sets the number of segments and r sets the radius in mm). After that the process was simple: print the projection, transfer it to the tape, carefully cut the tape, attach the tape to the ball. After short practice I was able to cover a ball in 3-4 hours (this is what graduate students like me are being paid to do).
Now with the ball in hands and the paddle attached to the robot (Alex Simpkins, a postdoc in our research group, helped with the latter part and with many other things on this project), I was ready to do some bouncing.
The robot that we used is a Delta Haptic robot, which is a small, fast and powerful robot. Its main disadvantage for ball bouncing is that it has a limited workspace (a cylinder approximately 1ft/32cm in diameter and about 8in/20cm high) and the end effector on which the paddle is mounted has no tilt. One of its advantages is that, while it is possible to break this robot by applying force in the wrong way or at the wrong time, the worst thing that can happen is that it breaks apart or the belts get stretched. Both of these things happened several times before I got the robot to move as I'd like it to.
The software that gets the data about ball position, controls the robot, and does all the calculations was written in MATLAB. MATLAB provides a way to integrate components written in C, which we used to get access to the Delta robot; Vicon software already came with MATLAB-friendly interface, which I wrote a simple wrapper for.
The control loop for the system is simple: (1) get the ball position data from the cameras, (2) get the robot position data from the robot, (3) do the calculations to figure out what the robot should do next, (4) send the control signal (something that tells the system how to behave; for example, setting 70 on a thermostat) to the robot that causes the desired action for the next time step; repeat this process every 10ms. This fits well the paradigm of model-predictive control (MPC), which can be summarized as a method of process control that generates output predictions using a process model and computes control signals by optimizing some measure of predicted performance (cost function). The control signals are computed at each time step (in our case every 10ms) and sent to the robot for execution.
The process model defines how the state of the system changes with time. The state of the system (in our case) includes 3 paddle positions (x,y,z) and 3 paddle velocities (as the paddle doesn't rotate these 6 variables are enough to fully define its state), 3 ball positions and 3 ball velocities (for each of the balls tracked), so 18 variables in total for two balls. The model describes how these state variables change with time. For example, the change in paddle velocity depends on paddle acceleration, which is caused by the gravity force and the force applied to the paddle, whereas the change in ball velocity depends on ball acceleration, which depends on the gravity force and air resistance.
Optimal control algorithm
Knowing the model allows to predict the behavior of the system, but that behavior depends on the control that is applied to the system; the challenge is that not any control will do as we want control that optimizes our cost function (minimizes the cost of hitting the ball correctly in our case). So, how do we do that? We apply the following algorithm (it's called "iterative linear quadratic Gaussian" algorithm -- you don't need to pronounce it -- and was invented by Emo Todorov and Weiwei Li several years ago):
- start with a control sequence and a corresponding state trajectory (a sequence of states in time that is caused by state transitions and control signals)
- approximate system dynamics and cost along the trajectory; this provides an estimate of how good/bad the trajectory is (according to the cost function) and how that cost changes along the trajectory
- compute optimal control law for these approximate dynamics and adjust the cost estimates according to the new control law
- simulate linearized system to compute new sequence of controls and new trajectory and its corresponding cost
- check if the new cost is better than the old one by sufficient margin; if so, go back to step 1.
As the result of this iterative process we get a sequence of controls for each time step that (would) produce an approximately optimal trajectory for the system to get to the next contact. The controls in our case are the forces along each of the coordinate axis applied to the paddle. We send the first set of controls (for the first time step) to the robot and discard all the other controls (as the process is going to be repeated at the next time step). Those familiar with MPC will notice that in our case we do the calculations until-the-contact, whereas usually those calculations are done for a fixed number of time steps (it is called receding horizon). For this project that involves contacts, I modified the original algorithm (as described above) to abort the simulation (step 4) when a contact condition is satisfied. This allowed us to completely ignore contact dynamics in the model of the system and include the assessment of how good the contact is into the cost function, which significantly simplified the calculations and the algorithm.
So what is that cost function? The cost function defines how well the system behaves according to some set of criteria. In our case it describes how well the system juggles the ball. It is described in two ways: one is for the final cost, which defines how good the contact is, and the other one is to assess how good the current (not final) state is. More specifically, these cost functions include things like: how far the paddle is from the center of the workspace (we want that difference to be small to keep the paddle in the center), how well the center of the paddle follows the ball xy projection (we want the paddle to follow the ball to avoid fast jerky movements), how good the velocity of the paddle at the contact is (we want the paddle to have the right velocity to hit the ball in such a way so that it flies where needed, which is defined differently for one and two ball juggling) and so on. It also includes control cost (values of the forces sent to the robot squared) to penalize applying large forces where small ones will do.
The "right" velocity to hit the ball is calculated differently for 1- and 2-ball cases. If the system only paddles one ball (and it "knows" that it only has one ball as it gets data from the cameras) it will try to send the ball to the center of the workspace. For two balls the situation is more complex. We can't send both of them to the center (as they are likely to collide) and can't give each of them its own "zone" as the contacts are noisy and the balls wonder around the workspace. What we did instead was to come up with a simple heuristic that, when considering where to send ball A, first calculates where the other ball (ball B) is going to land (where it crosses z=0 plane; this is not necessarily where it will be hit by the paddle, but it's good enough for this heuristic), and then sends ball A to the point on the line that connects the center of the workspace and ball B landing point and is some distance away from ball B landing point. I varied that distance between 10cm and 6cm and you can see in the first segment with 2 balls that the balls were flying off the workspace (it was set to 10cm) and the latter segments they were colliding (it was set to 6cm). Ideally, this should be made a parameter that needs to be optimized, but for the results on the video it was set to one of those two values.
It may seem like the balls are being sent to the center, but it's actually not the case. The desired velocity is the result of a trade-off between three different things: desire to keep the balls close to the center of the workspace and to each other (to keep them in the workspace and to minimize the movement and velocity of the paddle), the desire to keep them away from each other (to minimize collisions), and to keep them separated not only in space, but also in time. Even though the paddle can hit both balls when they are close, it can't control both balls, so the ideal situation is when the balls have contacts at regular intervals. To achieve that, the system calculates the desired velocity in such a way that makes one ball to be at the apex (the highest point of the trajectory) when the other one is at the contact.
Another aspect that doesn't exist in one ball case is that the system needs to know which of the two balls to hit first. As it doesn't really have a "history" of ball movements and balls can be removed and added at any time, the system needs to be able to re-act quickly and adjust during each of the time steps. The logic to achieve that is simple: I take position and velocity of each of the balls (to get velocity I need two positions and the time between the two snapshots) and then project (using the model of their behavior) which of them will hit z=0 plane first. The one that hits first is the one that is being tracked by the model as the "main" ball. You can see the result of that in the fragment where the system juggles one ball, with the second one being in my hand. After hitting the first ball, the paddle moves closer to the ball in my hand as it expects it will be the first one for the next contact (even though it's still in my hand).
Some of you might have noticed that I glanced over the fact that the cameras and the robot use different coordinate systems and there is nothing in what I described that connects the two. In other words, the cameras may report that the ball position is (700, 100, -10), but how does the robot know what coordinates it corresponds to in its own space? We considered two options: (1) placing markers on the paddle and doing all the calculations in the camera coordinates, and (2) calibrating the system to merge the two coordinate systems. I decided for the latter option, which, in my opinion, made the system simpler and avoided placing markers on the paddle.
To come up with a transformation between two coordinate systems you need to take measurements for some coordinate in both systems and then calculate the transformation and rotation matrices. Four measurements is enough, and so we added a simple procedure at the beginning of each session where I put one of the balls on the paddle and move the paddle to 4 different positions (just need to make sure all four are not on the same plane). I then calculate the mapping (based on this description) and use it to translate coordinates between the two systems.
With all the individual parts of the system working, the main challenge was to run it fast enough to get smooth behavior. We estimated that it should be enough to have 10ms control loop, but the initial MATLAB implementation of the algorithm ran in 100ms (the target was 5ms as there was overhead for getting/sending date, logging and other things). We did some optimization for using vectorization instead of loops, but it only gave us 3-fold improvement. Then we decided to compile the core algorithm to C using Real-Time Workshop in MATLAB, which made that part of the algorithms approximately 8 times faster (comparing to executing MATLAB code); re-writing it directly in C would make it still 4-5 times faster based on our estimates. We ran the system on a windows desktop with 3GHz dual core processor and, for the segment shown on video, the entire control loop takes 9ms on average for two-ball juggling.
The ability to calculate and adjust in real-time is critical, because the actual executed trajectory is far from being the same as optimal trajectory calculated by the algorithm. This is caused by multiple factors:
- delays in the system; for example, if it takes 7ms to calculate the controls, it means that when we send them to the robot they are already at least 7ms old (in reality a bit more, as time is also spent on getting and sending the data; on the order of .5-2ms)
- things we don't model; for example, friction in the robot
- things we model, but that may be incorrect; for example, the model needs to know how the ball bounced on the paddle to calculate the desired velocity, so we calculated coefficient of restitution based on the data we gathered on the robot, but it varied between sessions, so we used some average number
- orientation of the robot; if the robot is even slightly misleveled, it creates a permanent drift for the ball. While the robot can partially compensate for that, it still affects the trajectory
- orientation of the paddle in the workspace; the paddle doesn't stay exactly flat in the workspace as when it gets closer to the boundary of the workspace it becomes slightly tilted outwards, which we don't model
Given so many factors that can negatively affect the behavior of the robot and the actual trajectory, it may be surprising that this works at all. In fact, I was pleasantly surprised myself when we got it working, and I think it is largely a tribute to the real-time method we used as it allows the system to be impacted by deficiencies of the model and external perturbations and still generate acceptable behavior. This property becomes even more critical as we get to more complex robots that can be rarely modeled well; it is important to have control algorithms that are robust to modeling errors and perturbations.
There were several programming hacks (as in "piece of work that produces exactly what is needed") that contributed to the overall behavior of the system.
1. The first one that I implemented, and probably the most important one was to add a check for "reasonableness" of the solution. The cost that is being returned by our algorithm can be used to assess how "good" that solution is overall: If the cost is "too large", there is no good control to apply, so we only compensate for gravity. For example, when I have a ball in my hand above the paddle, there is no reasonable trajectory that will hit that ball as it's in my hand and not falling; the cost of the solution will be "bad", but the system will not do anything crazy.
2. Tracking of multiple balls had its own challenges, mostly caused by the fact that the cameras only report marker positions, but don't tell you which marker is which. Relating these markers to the objects you're tracking is largely your responsibility. (Note that this is different from regular motion capture when you have a skeleton with specific properties and the system knows what markers are attached to specific parts of the skeleton and so on; in our case we used unlabeled markers and got no help from the cameras.) I implemented simple logic that uses the model of ball movement we have to project where the system expects to see ball in the next time step and then match camera data to those expectations. The ability to know where each of the balls is needed because we calculate velocity data from positions and need to know the previous position to do it properly.
3. Another aspect of it is that when we switch from one ball to two balls or from two balls to one ball the system doesn't really know which of the balls is new or disappeared, so I made a decision of simply resetting the history of previous positions/velocities in this case.
So there you have it. The description is probably longer than you expected and definitely longer than I intended to write, but I tried to cover all the aspects I was asked about with sufficient level of detail and keep it simple (and free of math) as much as possible. Those of you that are interested in reading more math or in reviewing results may look at the draft of the paper we submitted to the ICRA 2011 conference. Now that you know how it works, go back and watch it again to see if it gives you a different perspective on the behavior of the system.