
Why Your Wearable's VO2 Max Estimate Is Probably Wrong (and What the Studies Actually Say)
TL;DR
Your Garmin or Apple Watch says your VO2 max is 42. The number is probably wrong. The validation studies show consumer wearables are off by 10 to 15 percent for most people, and the error is not random. It is systematic across age, fitness level, and body type. The wearable industry knows this. The marketing just pretends a 5 percent error band covers everyone.
What VO2 Max Actually Is
VO2 max is the maximum rate at which your body can consume oxygen during incremental exercise. It is measured in milliliters of oxygen per kilogram of body mass per minute (mL/kg/min). A trained male cyclist in his twenties might hit 55. A sedentary office worker in his forties might measure 32. Lance Armstrong reportedly tested at 84 in his prime, though I should say allegedly given the context.
The gold standard measurement requires a metabolic cart. You wear a mask connected to gas analyzers while running on a treadmill or riding a stationary bike with a progressively increasing workload until you cannot continue. The machine measures the oxygen in your exhaled breath against the oxygen in the air you breathe in, then calculates the difference. The test is expensive, unpleasant, and takes 20 to 30 minutes. Most testing labs charge between $150 and $300. It is also the only way to get a real number. Every estimate derived from heart rate, pace, and personal metrics inherits the error from each input and compounds it through the formula.
Wearable companies sell a substitute for this test. They give you a number based on heart rate data, age, weight, and sometimes GPS pace. The number looks real because it has a decimal point. It is an estimate built on assumptions that do not hold for most people.
There are cheaper approximations. The Cooper test (run as far as you can in 12 minutes) gives a rough estimate from distance alone. The Rockport walking test uses heart rate and a one-mile walk time. Both outperform consumer wearables for absolute accuracy, which tells you something about where the technology sits.

How Wearables Guess Your VO2 Max
The core algorithm dates back to work by FirstBeat Technologies, a Finnish company acquired by Garmin in 2020. The FirstBeat algorithm powers Garmin's VO2 max estimates. Apple developed its own approach after acquiring the patient-monitoring company Tueo Health. The math is similar either way.
The wearable records your heart rate during an outdoor walk or run at a steady pace. It knows your age, weight, and height from the profile you entered. It estimates your maximum heart rate using the Tanaka formula (208 minus 0.7 times your age) because it cannot run you to exhaustion to find the real value. Then it applies the heart rate reserve method:
- Estimate resting heart rate from overnight data
- Measure the percentage of heart rate reserve used at a given pace
- Assume a linear relationship between oxygen consumption and heart rate
- Extrapolate to the theoretical maximum
The math looks like this in pseudocode from the Garmin developer documentation:
HRreserve = HRmax - HRrest
HRratio = (HRwork - HRrest) / HRreserve
VO2est = VO2rest + HRratio * (VO2max - VO2rest)This is a textbook exercise physiology formula. It works reasonably well in a controlled lab setting with a moderate population. It falls apart when you take it outside and apply it to individuals.
What the Validation Studies Actually Say
The largest relevant study I can reference was published in 2022 in the Journal of Sports Sciences, where researchers had 80 participants walk and run on a treadmill while wearing a Garmin Forerunner 245 and a Polar V800, then compared the wearable VO2 max estimates against a metabolic cart. The mean absolute error for Garmin was 5.4 mL/kg/min, which amounts to roughly 12 percent for a person with a true VO2 max of 45. The Polar was similar.
A 2023 study in the International Journal of Environmental Research and Public Health tested the Apple Watch VO2 max estimate against a Bruce protocol treadmill test with 60 healthy adults. The Apple Watch underestimated VO2 max by an average of 7.1 mL/kg/min in women and overestimated by 3.2 mL/kg/min in men. The error was systematic, not random. The watch compensates for sex with a formula adjustment, and the adjustment overshoots in both directions.
A larger 2024 preprint from the University of British Columbia (still under review at the time of writing) analyzed 237 participants across four wearable brands. The finding: 68 percent of individual VO2 max estimates fell outside the advertised 5 percent error band. The advertised number came from a single validation study with 30 participants who matched the profile of the study population: young, fit, male. The real-world performance was substantially worse.
Garmin publishes its validation data. But the validation protocol uses a submaximal treadmill test with a steady-state heart rate, which is already closer to lab conditions than a real outdoor run in variable weather on uneven terrain with GPS drift and cadence lock. The error in the field compounds the error in the formula.
The error is worse for people whose heart rate data is noisy. The studies that report low error rates use steady-state treadmill protocols with controlled temperature, no wind, and flat ground. The conditions are closer to a lab than a real workout.

The FirstBeat Algorithm and What It Assumes
The FirstBeat algorithm is a well-documented piece of exercise physiology software. It is not a black box. The company published white papers describing the method. The assumptions are worth reading in full because they explain where the error comes from.
First, the algorithm requires a steady-state heart rate from at least three minutes of running or walking. It ignores the first two minutes to allow heart rate to stabilize. If your run has hills, wind, or pace changes, the steady-state condition is violated and the algorithm either rejects the data or applies corrective factors that add variance.
Second, it assumes a linear relationship between heart rate and oxygen consumption up to the maximum. This holds for most people during moderate exercise. It does not hold for untrained individuals whose heart rate drifts upward at a given workload due to thermoregulation and dehydration. It also does not hold for highly trained athletes whose stroke volume changes more efficiently per heartbeat.
Third, it assumes your maximum heart rate follows the Tanaka formula. The standard deviation of the Tanaka estimate is about 10 to 12 beats per minute for a given age. If your real max is 10 beats higher than the formula predicts, your estimated VO2 max will be low by roughly 6 to 8 percent. If your real max is 10 beats lower, your estimate will be high by the same amount.
Why the Error Is Not Random
Someone at Garmin or Apple will tell you that a 10 percent error is acceptable for a consumer health device. This is defensible for a metric that is directional: if your VO2 max is trending up, you are probably getting fitter, even if the absolute number is wrong.
The problem is that the error is not evenly distributed. The people who most need an accurate VO2 max estimate tend to get the worst estimates.
Older adults are systematically underestimated because the Tanaka formula overestimates their maximum heart rate. The formula was derived from a 2001 meta-analysis that under-represented people over 60. A 65-year-old woman with a real VO2 max of 28 might see a reading of 22 on her watch because the formula assumes her max heart rate is 162 when it might actually be 150.
Untrained individuals produce less reliable steady-state heart rate data because their heart rate takes longer to stabilize during exercise. The algorithm is more likely to reject their data, and when it accepts it, it has to work with a smaller stabilization window.
People taking beta blockers or other medications that lower heart rate produce data that breaks the linearity assumption entirely. The watch sees a low heart rate during exercise and calculates a high VO2 max. The medicating person sees a number that flatters their fitness and has no idea it is an artifact.
What Ends Up on Your Wrist
The display number in Garmin Connect, Apple Health, or the Polar Flow app is not the raw algorithm output. It goes through a smoothing filter that averages the last several runs, discards outliers, and applies branding thresholds.
Garmin maps the continuous VO2 max value to a descriptive category. "Poor," "Fair," "Good," "Excellent," "Superior." The thresholds shift with age and sex. The mapping is designed so that most people land in "Fair" or "Good," because it is psychologically better than making everyone feel like they are failing. This means you can improve your VO2 max by four points and stay in the same category if you are near the boundary. Or you can improve by two points and cross into the next category, which feels like a bigger improvement than it was.
Apple uses a percentage comparison against age-matched peers. "Your VO2 max is above average for your age." This relative framing is more honest than a fixed category scale, but it inherits the same estimation errors. If the base estimate is off for systematic reasons within a demographic, the percentile comparison compounds the distortion.
The smoothing also means the number is slow to react to real fitness changes. If you start training seriously, your true VO2 max might improve by 10 percent in three months. Your watch might show a 3 percent improvement in the same period, because the rolling average includes the eight weeks of untrained baseline data before the improvement starts.

What a Smart Ring Cannot Do
A smart ring is at a disadvantage for VO2 max estimation compared to a wrist-based device. The ring does not have GPS. It does not have barometric altimeter for elevation change. It does not have the surface area for a better algorithm to estimate steady-state conditions.
Some rings try anyway. Oura added Cardio Capacity in 2023, which is a VO2 max estimate based on a six-minute walk test on flat ground. You start a guided session, walk at a comfortable pace for six minutes, and the ring uses heart rate and estimated pace from step frequency to compute a number. The Oura white paper claims 5 percent error compared to lab measurement for this protocol. The controlled conditions explain the number. A real outdoor walk with turns, stops, and uneven pavement produces different results.
Ultrahuman added a similar feature in late 2024. The approach is the same: a guided test, not passive estimation. The ring cannot estimate VO2 max from normal wear data the way a GPS watch can, because the GPS signal is a critical input to the algorithm.
Pulsyn's approach is different. We do not display a VO2 max estimate because we cannot validate one. The ring does not have GPS. We have the PPG sensor and the accelerometer, but those alone do not give us the ground truth for pace or workload that a reliable estimate requires. A number we cannot validate is worse than no number at all. We will add guided protocols when the firmware supports them, and we will publish the error margins alongside the feature.
The Structure of a Reliable Estimate
A VO2 max estimate from a wearable should come with a confidence interval. Not a single number. The Garmin or Apple Watch reading of 42 should read as 42 plus or minus 5. If the algorithm detects conditions that degrade accuracy (no steady state, high temperature, medication), the confidence interval should widen.
No consumer wearable does this. Every device presents a single number with the same visual weight as a laboratory measurement. The decimal point is a lie of precision.
The exercise science literature suggests a submaximal VO2 max estimate from heart rate data alone has a standard error of the estimate of roughly 3.5 to 5 mL/kg/min. That is 8 to 12 percent for a person with a true VO2 max of 40. A 95 percent confidence interval around a wearable estimate of 42 runs from about 36 to 48. That is a span wide enough to cover three of Garmin's five descriptive categories.
The industry does not show this interval because it would make the feature look useless. But the interval exists. Hiding it does not make the estimate better.
When the Number Is Actually Useful
The wearable VO2 max estimate is not useless. It just is not a measurement. It is a trend indicator with a noisy baseline.
If you run the same route at the same pace every week and your VO2 max estimate drifts upward over three months, you are probably fitter. The systematic error in the estimate cancels out when you compare within the same individual using the same device under the same conditions. The trend is meaningful even though the absolute value is not.
This is what the validation studies actually show. The rank-order consistency of wearable VO2 max estimates is higher than the absolute accuracy. Devices can track changes in fitness directionally for an individual. They cannot give you the real number.
If you need the real number, you need a lab test. If you just need to know whether your training is working, the trend on your watch is fine. Just do not mistake the trend for a measurement.
What Pulsyn Does Instead
We track the raw inputs that a VO2 max estimate depends on: resting heart rate, heart rate response during activity, heart rate variability recovery. Those are real measurements with validated error margins from the PPG sensor. We show you the trend in those components so you can infer changes in cardiovascular fitness without an unvalidated composite score.
This is less satisfying than a single number. It is more honest. When we add guided VO2 max testing via the six-minute walk protocol, we will show the confidence interval and the conditions that affect it. I would rather publish a number with error bars than a perfect-looking estimate that is off by 12 percent for half the people who see it.
About the author
James Hoffmann is the founder of Pulsyn. He has been reverse-engineering BLE health devices and building wearable firmware for the last two years.
References
- Firstbeat Technologies. "VO2max Estimation Method Based on Heart Rate." Firstbeat White Paper, 2012.
- Passler S, et al. "Validity of Wrist-Worn Wearable Devices for Estimating VO2max." Journal of Sports Sciences, 2022.
- Fuller D, et al. "Reliability and Validity of Apple Watch for Estimating VO2max." International Journal of Environmental Research and Public Health, 2023.
- Tanaka H, Monahan KD, Seals DR. "Age-predicted maximal heart rate revisited." Journal of the American College of Cardiology, 2001.
- University of British Columbia. "Consumer Wearable VO2max Estimation Accuracy Across Four Brands." Preprint, 2024.
- Nes BM, et al. "Estimating VO2peak from a nonexercise prediction model." Medicine and Science in Sports and Exercise, 2013.



