Cavazos-Oto GPP Assessment

Background

This is a series of tests designed to measure an individual’s General Physical Preparedness, or GPP. By GPP, we refer to one’s overall fitness, specifically the ability to do things with one’s body, in every useful and measurable respect.

Our purpose is to establish a small number of standard tests, each of which is both meaningful in its own right (i.e. running speed is important) as well as indicative of one’s broad physical capabilities in a category of fitness (i.e. running speed means you’re well-conditioned), and the combination of which can produce a quantitative assessment of one’s broad, general physical fitness. Ideally, using these tests, an athlete should be able to see where his capacity ranks within various categories of physical ability, and compare that score against his scores in other categories, or against itself over time, or against other athletes’ scores for competitive purposes.

The categories used are based on a modified version of the ten general physical skills enumerated by Jim Crawley and Bruce Evans of Dynamax. We use six categories.

Strength

This is absolute, limit strength, the maximum output of your muscle fibers in a single effort. In Dynamax’s words: “The ability of a muscular unit, or combination of muscular units to apply force.” An example of an effort requiring Strength would be lifting a car.

Power

This is force applied quickly; it is strength over time (more power equals more strength, or less time, or both). In Dynamax’s words: “The ability of a muscular unit, or combination of muscular units, to apply maximum force in minimum time.” An example of an effort requiring Power would be throwing a discus.

Muscular endurance

This category does not come from Dynamax. It is closely related to Metabolic Endurance and Strength, but it can also be isolated. It is the ability to perform high repetitions of relatively low-resistance movements. An example of an effort requiring Muscular Endurance would be high-rep bicep curls.

Metabolic endurance

This category does not come from Dynamax. It is instead a combination of two of their categories, “Stamina” (whose original definition is very close to metabolic endurance) and “Cardiorespiratory endurance” (a component of metabolic endurance, especially in the aerobic pathway). It is essentially the ability of the body as a system to resist fatigue and maintain a consistent power output. An example of an effort requiring Metabolic Endurance would be running a 800-meter race.

Flexibility

This is how far the body can move in useful ways. In Dynamax’s words: “The ability to maximize the range of motion at a given joint.” An example of an effort requiring flexibility would be a straddle press to handstand.

Speed

This is a somewhat unusual category. Dynamax calls it: “The ability to minimize the time cycle of a repeated movement.” In other words, it is the rate at which you can do the same thing over and over. (This is contrasted with their Agility category, which is the rate at which you can transition between different movements.) While this is a coherent physical skill, it is difficult to quantify. We have focused on probably the most relevant application of “speed,” which is footspeed, particularly applied to running. An example of an effort requiring Speed would be hitting a boxer’s speed bag.

The remaining four Dynamax skills (Coordination, Agility, Balance, and Accuracy) have been omitted. This is primarily due to the difficulty involved in formulating meaningful tests for them, as well as the likelihood that, while important, they are generally task-specific skills whose capacity cannot be significantly measured or improved in a “general” manner.

The tests used here have been chosen for specificity to their domain. For instance, a good Flexibility test will demand a great deal of flexibility but very little Strength or Muscular Endurance; this way the result obtained will speak directly to your level of flexibility, regardless of what your strength is. Ideally, each domain score is fully isolated; it stands alone and has no influence on the others. This is impossible, as fitness is a mutually-relating enterprise; nevertheless it is the goal, and the domain-specificity of a movement was one of the three major criteria for test selection. The other two were functionality (is the test a meaningful, useful way to move, often applied in response to life’s physical demands? is it a good thing for someone to be good at this?) and carryover (does this test represent its entire domain fairly well? does a high score on this test mean that someone can do most tasks requiring this physical trait fairly well?).

Logistically, each test is intended for the average athlete to be able to test on his own, with basic equipment and perhaps one assistant.

Implementation

Scoring

Each individual test will be scored on a scale from 0 to 10. “0” represents the bare minimum of capacity for a normal human—below this score, we should be assessing whether the trainee can move around under his or her own power. “2” represents the low end of performance for an athletic individual, and marks the beginning of the region we are interested in. “10” represents a perfect score, roughly or precisely corresponding to the standing world record for a given test.

The scoring plot is a simple linear curve, except that it is divided into two segments (defined by the above three points: 0, 2, and 10), and the segments differ in slope. Scores from “2” to “10” bracket the larger segment, representing the majority of scores; scores from “0” to “2” bracket the smaller segment, which are plotted on a steeper slope, thus “spreading” the segment we are more interested in (the upper one). The bottom region is considered technically normal, but non-athletic; trainees in this region are not our focus, and would be better served using a different set of tests, or re-visiting these tests after further training.

It is of the utmost importance to understand just what is entailed by the use of a linear scale. Human adaptation follows a logarithmic curve; adding 50 pounds to an 800-pound deadlift may represent years more training than adding the same to a 200-pound deadlift—yet our scoring will award the same point increase to each. Furthermore, our scoring makes no accommodation for gender, bodyweight, age, training background, handicaps, or genetic limitations, all of which may be highly relevant to what your test results “mean” to you.

The reason for these peculiarities is that this suite of tests is designed to reflect the actual abilities of an individual to perform physical acts in the real world. While the difference in training adaptation is great between a small female with cerebral palsy pulling a 300 pound deadlift, and an enormous, steroid-enhanced male pulling 300 pounds, in the real world, the practical result is that they did the same job, and that is the variable this test is designed to measure. It is reasonable to admire the first performance as a more impressive personal feat, but if we merely need to move a heavy box, they stack up the same. The scoring spectrum we present is the total range of possible human performance; your score simply represents where you fall on this range. It is therefore important to understand that some people will inevitably have higher scores than others, most athletes will cluster within a small range of scores, and your score does not necessarily reflect your worth as a person, but merely what you are objectively capable of doing.

What, therefore, does your GPP score “mean”? Strictly speaking, it is nothing more or less than a set of rankings describing your performance relative to how human beings—not necessarily you—are capable of performing. We can assume that you do better than the worst person in the world, and worse than the best—but how much better, and how much worse? On its own, this is a somewhat arbitrary number that may not mean very much. It isn’t like a school test where you strive to score 100% and anything less is a flaw; it would be inconceivable for anybody to receive top scores in each one of these tested domains, and to approach a 10 in even one of them places you among the best in the world. Rather, the application of these numbers can only be relational. You can compare one number with another, and the result can tell you something.

A good analogy is that of a bathroom scale. If you stand on a scale, it will tell you something about your body—how much it weighs. Is that number high or low? Neither; it’s just a number. If you want to know how your weight compares with other people, you’ll need to find their numbers and make some comparisons, which is a completely separate endeavor. The number on the scale is just a metric you can use to measure weight, a piece of raw data about your body. Similarly, the rankings output by this system are just a metric you can use to measure fitness.

Interpreting these tests, and their resultant rankings, as anything more than this is a mistake. A score of 4 in the Strength domain is neither necessarily good nor bad; it does not mean that you are strong, weak, or average compared to other athletes. It is merely a piece of data.

It is understandable that most athletes will be more interested in this other, competitive, comparative form of data that these scores are not. To develop such rankings, however, would require plotting a large number of real-world results and regressing an actual, empirically-founded logarithmic formula for the relevant adaptation; this is a possible task, but one that has not yet been performed, certainly not for every one of the tests used here.

[In the future, it is our hope and intention to produce a web-based suite for these tests that not only allows easy calculation and combination of scores, but inputs them into a shared database made up of other users’ numbers. In this way, scores could be combined and analyzed (to whatever statistical strength their quantity allows), enabling one to note means, medians, and modes, peaks in the curve, and compare oneself against cohorts of similar gender, age, bodyweight, or the like. This may be implemented in the future.]

Training approaches and the complexities of human individuality are not dealt with here. The use of these tests for such applications would be to measure their end results—if general fitness is what you are training for, then at any given moment, we can tell you how fit you are. How you got there, or how to get better, is up to you.

The only individual “adjustment” performed here is a general attempt, wherever relevant and possible, to include both bodyweight (moving yourself) and non-bodyweight (moving an external object) tests in each domain. Lighter athletes will tend to excel on the former and heavier athletes on the latter. This consideration is made because both types of work are functionally important in the real world.

An athlete’s score in a given domain is the average of their scores in all of its constituent tests, except for domains with “prioritized” tests, which do not require all constituent tests to be completed before yielding a domain score. An athlete’s overall GPP score is the average of all six domain scores. This method of overall scoring therefore “weights” each domain equally; if you believe that some of them deserve greater priority (e.g. a point of strength should be worth more than a point of flexibility), you can calculate an overall score using your own formula, or the domain scores may simply be used individually.

Rules

To provide a clear and accurate snapshot of an athlete’s GPP profile, all tests used to calculate an overall score must be completed within a 30-day period. Old measurements are allowed only if they fall within this window: if you are using numbers from 15 days ago, for instance, you have 15 days to complete the rest of the tests.

For an overall GPP score to be valid, it must be comprised of scores from every domain. For a domain score to be valid, it must be comprised of scores from all of its constituent tests, with some exceptions.

Some domains (currently Metabolic Endurance and Speed) have prioritized tests. In these cases, the tests are ranked according to importance, and it is not necessary to complete all listed tests in order to record a score. However, they must still be performed in order of priority; for example, you cannot produce a Metabolic Endurance score until you test a 1600-meter Run, even if you already have a 5k Run. In these domains, your score is a “running average” that is continually adjusted for greater accuracy as more tests are completed.

The Tests

Calculation

Each test is presented along with two tools:

A table for viewing estimated rankings
A JavaScript tool for calculating precise rankings

With the table of data given, you can examine the general range of scores. However, it is worth noting that the only “true” data in that table—in the sense of being empirically-grounded—are the scores for 0 and 10, which are based on real-world records and similar figures. After those are set, the 2 ranking is determined, being assigned to roughly the point where “athletic” numbers begin. All other rankings are assigned by regression. That is, the score for a ranking of 6 is not based on anything in particular; it is simply halfway between the scores for 2 and 10.

In any case, since the table given uses rounded figures, actual rankings should be determined using the JavaScript widget, which will reveal your exact position (down to several decimal places) along the ranking curve. For many of these tests, improvements in performance will result in relatively small improvements of rank (perhaps half a point), so a fairly high level of precision is useful.

Overall scores for a domain are calculated by the simple average (arithmetic mean) of your scores for each of that domain’s tests. You may calculate this on your own, but an averaging application is provided on the individual domain pages as well.

Finally, to calculate a total GPP score, all of the (already averaged) domain scores must themselves be averaged together. This is left as an exercise for the reader, with the reminder that no GPP score is valid until it includes a domain score from all six domains.

The data used to produce the ranking curves, and the equations regressed from them, are not given here, but are available from the authors by request.

Notes

Sources

The project of developing a quantitative, practical, broad suite of tests for general physical performance was largely influenced by several sources.

Jim Crawley and Bruce Evans from Dynamax obviously produced the foundational rubric upon which this material is based, and their “10 physical skills” have had a lasting impact on numerous athletes and trainers.

The original concept of GPP, the testing thereof, and the Dynamax material were introduced to the authors through CrossFit and the work of Greg Glassman.

Finally, the impetus and inspiration to actually apply these ideas in the quantitative manner seen here was the direct result of ideas posted by David Johnson to a web forum. Although to the knowledge of the authors, he never pursued this concept, and his ideas differ from those here in a number of important ways, this project would likely not have occurred without his spark.

Steven Low and Brian Degennaro provided some input on a few of the specific testing protocols. Joey Dodds provided significant assistance with coding the JavaScript calculation widgets.

The Flexibility tests are an original creation of Brandon Oto. Some earlier work by Oto was also influential, namely concepts from the AGT template and unpublished material on adaptational curves.

Development

All material here except that which is cited otherwise is the original work of Joe Cavazos and Brandon Oto, most of it in early 2009. Cavazos suggested the original project, performed some of the testing, and organized most of the proceedings; Oto developed much of the methodology and theoretical underpinnings and coded this page. Underlying development and details were a collaborative effort.

This material was originally published September 1, 2009, in a complete but largely untested form. Changes or improvements may continue to be introduced in response to feedback. Suspected weaknesses at that time included:

Some Muscular Endurance tests may be overly taxing and may need to be adjusted in weight or duration.
The entire Speed domain may need to be altered or omitted altogether.
Coordination, Agility, Balance, and Accuracy domains may be introduced in the future if meaningful tests for them can be developed.
All three Flexibility tests may prove to be too complicated for easy and frequent use.
The Tabata Row may be impossible to perform on some models of Concept2 rowers.
Scoring standards for all tests, especially their top ends, may need refinement.
The one-month window for testing may prove to be either too narrow for feasible testing, or too wide to capture a true snapshot of fitness.

Questions or feedback can be directed to Joe at [email protected] or to Brandon at [email protected].