Thursday, June 17, 2010

Γνώθι σαυτόν, or "Know Thyself"

According to some estimates, algorithms now are responsible for over 70% of stock trades. Think about that. It's one thing to have computer programs automatically send a check to your landlord. It's another thing to have them make decisions on which companies to buy and sell, which companies have value and which do not.

Ultimately, that's what all markets do: decide what things are worth. This is a valuable service and the role of markets everywhere. It's the role of the BAR algorithm, for example, that ranks where you and I stand compared to other racers of our category in MABRA.

Michael Lewis' book Moneyball, probably the most influential book about sports this decade, is the story of how statistics have changed sports. Oakland's Billy Bean used new statistical tools to value players, and managed to put together winning teams despite his low budget. A recent article on soccer stats reads like a tome of econometrics.

The New York Times recently profiled several (seriously aspergers types) individuals who keep data on nearly all aspects of their own lives. One fellow gives himself a math test every day and varies several of his routines to find out what optimizes his scores. He finds that a bit of flax seed is good; coffee, not good. He concludes this by running his own data through analysis, isolating all relevant variables, and arriving robust conclusions. In other words, he himself is the subject of his study.

If you've ever taken the time to look at drug studies or exercise studies, or the last claims about sports drinks, these studies are run as follows:

(1) Collect a SAMPLE of subjects. This SAMPLE must be representative of the population as a whole, or the sub-population the study wishes to understand (expert cyclists, for example).

(2) Randomly divide the SAMPLE into two groups: the TEST group and the CONTROL group.

(3) Treat both groups exactly the same...except for one thing. This one thing is called the VARIABLE OF INTEREST. The VARIABLE OF INTEREST is the type of thing that can be either X or Y (for example, a cyclist in a test can either be drinking an energy drink or NOT drinking an energy drink, or, likewise, can be either drinking an energy drink or drinking water--in either case, the VARIABLE OF INTEREST can be divided).

(4) Record any differences in the TEST and CONTROL group. The question you want to answer is if the VARIABLE OF INTEREST is correlated with any differences between the TEST and CONTROL group.

If there are differences between TEST and CONTROL, well, you assume your VARIABLE OF INTEREST has some kind of effect.

There are lots of problems with studies. First, people are different. Ideally, you'd want a single person who'd been cloned hundreds or thousands of times. Because people are different, you need lots of them to be able to conlude that, yes, this effect is really true for most people. Ideally, you'd get millions of participants. Generally, as in exercise studies, you get a couple dozen.

And that's why most study results suck. Because people respond differently to different VARIABLES OF INTEREST. This is especially true when you have a small SAMPLE. It is also true when you can't control all other relevant variables (for example, what test subjects eat, or whether they happen to become sick).

Take, for example, a recent study substanting the virtues of chocolate milk as the perfect recovery elixer. Fine. But what about for the lactose intolerant? I'm going to go out on a limb and suggest that the 30-odd people in the study were, most likely, not lactose intolerant.

This is all to say that the most effective kinds of studies, the kind that will make you a healthier, safer person and a faster cyclist, are studies you conduct on yourself using data you create. You are the entire sample. The variable of interest is hours-per-week, saddle position, sleep per day, Nutella-per hour, and so forth.

This has been Lance Armstrong's approach from his first Tour. Of course, such self-driven data analysis is likely a necessary means of monitoring his blood boosting--only by obsessively controlling all relevant variables could Lance pass the doping tests AND win seven Tours. But it's what's necessary for speed and legality.

Cycling, more than any other sport except maybe weightlifting, is about pure numbers. And to succeed in cycling, or at least make the most of your potential, you have to be both the researcher and the subject of a thousand introspective studies.

Of course, this is all kind of dorky. It also probably makes you a jerk and warps your morality. And it takes the piss out of the sport. After all, much of the joy of cycling is its liberating power: just get on the bike and ride, right?

Go ahead. And enjoy your next race as you and your liberated ass get dropped out the back and mocked for it.

No comments: