
Why Your Wearable's Calorie Count Is Wrong by 27 to 93 Percent
TL;DR
A 2017 Stanford study tested seven wrist-worn fitness trackers against clinical-grade instruments. Heart rate was accurate within 5 percent on six of the seven devices. Energy expenditure was off by 27 to 93 percent on every single one. The calorie burn number on your Oura, Fitbit, or Apple Watch is not a measurement. It is a guess, and the guesses are so bad that the most accurate device was still wrong by more than a quarter.
What the Stanford study actually measured
In May 2017, a team at Stanford University School of Medicine led by Euan Ashley, a professor of cardiovascular medicine and genetics, ran a study they did not expect to be controversial. They recruited 60 volunteers, 31 women and 29 men, and had them wear seven consumer fitness trackers while walking, running on a treadmill, and pedaling on a stationary bike.
The devices were the Apple Watch, Basis Peak, Fitbit Surge, Microsoft Band, Mio Alpha 2, PulseOn, and Samsung Gear S2. The researchers compared each device's output against two clinical instruments: an electrocardiograph for heart rate and an indirect calorimeter for metabolic rate. The calorimeter measures oxygen and carbon dioxide in breath, which is the standard proxy for energy expenditure in exercise physiology.
The results were published in the Journal of Personalized Medicine. Heart rate accuracy was good. Six of the seven devices stayed within 5 percent of the ECG reading. Some variation existed based on skin tone and body mass index, but the overall picture was clear: if you want to know your heart rate during a run, a wristband will do.
Energy expenditure was a different story. Not one device measured it accurately. The most accurate device, the Apple Watch, was off by an average of 27 percent. The least accurate, the PulseOn, was off by 93 percent. That means if the calorimeter said you burned 400 calories, the PulseOn might report anything from 28 to 772.
Ashley told Stanford Medicine News: "The magnitude of just how bad they were surprised me." Anna Shcherbina, the graduate student who shared lead authorship, said the error was so large that even the best-performing device was outside the 10 percent threshold they had set as acceptable for a lay user.
![]()
How the guess works
To understand why the error is so large, you need to understand what a fitness tracker is actually doing when it displays a calorie count.
The device has an accelerometer and a photoplethysmography (PPG) sensor. The accelerometer counts motion in three axes. The PPG sensor measures blood volume changes in the capillaries to estimate heart rate. From these two inputs, the device runs a proprietary algorithm that outputs a number labeled "calories burned."
That algorithm is the problem. It is not measuring energy expenditure directly. It is inferring it from proxy signals: how much you moved, how fast your heart beat, and a set of assumptions about your body composition, fitness level, and the metabolic cost of the activity you are doing.
Each manufacturer uses its own secret formula. Fitbit's algorithm is not public. Apple's is not public. Oura's is not public. The algorithm is trained on population averages, which means it works best for people who are average in height, weight, body fat percentage, resting metabolic rate, and cardiovascular fitness. If you are not average, and almost no one is, the error grows.
Shcherbina put it this way: "It's very hard to train an algorithm that would be accurate across a wide variety of people because energy expenditure is variable based on someone's fitness level, height and weight, etc." Heart rate is measured directly. Energy expenditure is measured indirectly through proxy calculations. The proxies are noisy, and the manufacturers do not tell you how noisy.
Why the ring makes it harder
If wristbands are off by 27 to 93 percent, smart rings are not likely to do better. They may do worse.
A smart ring has the same sensor types as a wristband: a 3-axis accelerometer and a PPG LED. But the ring has less space for the battery, less space for the antenna, and less space for the LED array. The PPG signal on a finger is different from the PPG signal on a wrist. The capillary density is higher, but the motion artifacts are different. Finger motion during typing, cooking, or carrying groceries creates noise patterns that wrist algorithms are not trained to handle.
The accelerometer on a finger also sees a different mechanical environment. The wrist moves in a relatively predictable arc when you walk or run. The finger moves with hand tremor, grip force, and digit flexion. A ring cannot distinguish between a step and a hand gesture with the same confidence a wristband can. If the step count is wrong, the calorie estimate based on step count is also wrong.
Battery constraints make the problem worse. A continuous calorie algorithm needs frequent accelerometer and PPG sampling. Oura samples PPG in bursts to save power. Fitbit does the same. The gaps in the data are filled by interpolation, which is another layer of guessing. Pulsyn samples at 25 Hz during sleep and 100 Hz during active periods, but we do not run the data through a calorie model because we do not believe the output would be accurate enough to show you.
Oura does not publish its calorie accuracy data. Neither does RingConn. Neither does Ultrahuman. The industry standard is to show the number and hope no one tests it against a calorimeter. The Stanford study proved this is a reasonable bet for the manufacturers, because almost no one does.
The algorithm under the hood: MET tables and HR-based formulas
Most calorie algorithms use one of two approaches, or a blend of both. The first is the MET table approach. MET stands for metabolic equivalent of task. One MET is defined as the energy cost of sitting quietly, roughly 1 kcal per kilogram per hour. Walking is 3 METs. Running at 6 mph is 10 METs. The device guesses your activity from the accelerometer pattern, looks up the MET value in a table, multiplies by your body weight and the duration, and outputs a calorie count.
The problem is that MET values are population averages derived from small studies in controlled labs. Your personal metabolic cost of running depends on your running economy, your body composition, your fatigue level, and whether you are running uphill or downhill. The device does not know any of this. It assumes you are average.
The second approach is the heart rate-based formula. These use the fact that heart rate correlates with oxygen consumption during steady-state aerobic exercise. The device applies a regression equation, often the Keytel or Fujiwara formula, that takes your age, sex, weight, and heart rate to estimate VO2, then converts VO2 to kcal. This works better than MET tables for continuous cardio, but it breaks down for interval training, strength training, and any activity where heart rate does not track linearly with oxygen cost.
Oura, Fitbit, and Apple all blend these methods. They use machine learning models trained on labeled activity data. The labels come from user self-reports or from the same flawed reference methods. The model learns to reproduce the errors of the training data. If the training data was generated by a MET table that overestimates cycling by 30 percent, the model will overestimate cycling by 30 percent.
The proprietary nature of these algorithms means they cannot be audited. You cannot test the model against a new dataset and see if it generalizes. You cannot check for bias by age, sex, or body type. The manufacturer can claim the algorithm is improved or enhanced in each generation, but without published validation data against indirect calorimetry, the claim is unverifiable.
Why the industry keeps showing the number anyway
There is a reason every major wearable includes calorie burn despite knowing it is inaccurate. The reason is that users demand it, and the number drives engagement.
Calorie burn is one of the most requested features in wearable surveys. It is easy to understand. It fits into the existing diet culture narrative. It gives users a sense of control. When a user sees "600 calories burned," they feel accomplished. That feeling increases app opens, increases retention, and increases the likelihood of subscription renewal.
Oura is about to launch an AI Health Coach as an additional paid feature on top of its subscription. The coach will presumably use your calorie burn, activity, and sleep data to generate recommendations. If the calorie input is off by 27 percent, the recommendation quality degrades, but the user does not know that. The user sees a personalized coach and pays for it. The business model depends on the illusion of precision.
This is not a conspiracy. It is a market incentive. The device that shows a calorie number gets more users than the device that says "we do not know." The device that shows a trend graph of calorie burn over time looks more sophisticated than the device that shows raw accelerometer data. The market rewards the fiction.
The psychology is specific. A user who sees 600 calories burned on Tuesday and 450 on Wednesday feels a sense of progression and control. The number validates the effort. Removing the number removes the dopamine hit, even if the number was wrong. This is why honest wearables have a harder time retaining users. The user does not feel rewarded because the device refuses to lie to them.
Pulsyn is choosing the other path. We are betting that a growing number of users will prefer honest uncertainty over false precision. It is a smaller market. It is a harder sell. But it is the only path that does not require lying to the user about what the device can do.

What the numbers mean for your daily decisions
The real damage is not the error itself. The real damage is what you do with the number.
People use calorie burn estimates to decide how much to eat, how hard to train, and whether they are making progress. Ashley noted that "people are basing life decisions on the data provided by these devices." If your watch says you burned 600 calories on a run and you eat back those 600 calories, but you actually burned 420, you are eating 180 calories more than you accounted for. Over a month, that is enough to add a pound of fat. Over a year, it is enough to reverse the weight loss a user thinks they achieved.
The reverse is also true. If a device undercounts by 40 percent, an athlete might over-restrict their diet and under-fuel their training. The error is not random in a way that averages out to zero. The algorithms systematically overcount for some activities and undercount for others. The direction of the error depends on the activity, the device, and the individual.
This is why the "calories in, calories out" model breaks when the "calories out" side is a fiction. The model is not wrong in theory. It is wrong in practice because the measurement tools are wrong.
What Pulsyn does differently
Pulsyn does not show a calorie burn estimate in the app, and that is intentional.
We could add one. The ring has an accelerometer, a PPG sensor, and the same data every other wearable uses. We could run a proprietary algorithm, label the output "calories burned," and join the industry in pretending the number is reliable. We chose not to.
The reason is simple: we do not know your calorie burn with the accuracy you would need to make a decision about your diet or training. Neither does Oura. Neither does Fitbit. The difference is that we are telling you we do not know, and they are showing you a number anyway.
What Pulsyn does measure, and measures well, is heart rate, heart rate variability, blood oxygen saturation, skin temperature, and sleep stages. These are direct physiological signals, not proxy calculations. Heart rate is measured by the PPG sensor and validated against ECG standards in the literature. SpO2 is measured by comparing absorbance at red and infrared wavelengths. Skin temperature is measured by a thermistor. Sleep stages are classified by movement and heart rate variability patterns.
These measurements are not perfect. They have their own error bars. But they are direct measurements of real physiological variables, not algorithmic guesses about a derived quantity. When we show you a number, we want to be able to tell you where it came from, what the sensor measured, and what the accuracy range is.
For activity, Pulsyn tracks active minutes and step count. The step count is explicitly labeled as an estimate, and we do not use it to derive a calorie count. We show you the raw data. The accelerometer trace. The heart rate trace. The minutes you spent moving. You can make your own decisions about what that means for your energy balance, because you have the context we do not: your diet, your training history, your resting metabolic rate, and your goals.
What I am still unsure about
I am not sure this is the right product decision long-term. Users ask for calorie burn. Fitness apps expect it. MyFitnessPal and Apple Health both have calorie fields, and Pulsyn's refusal to populate them might feel like a missing feature to some users.
I have thought about adding a range instead of a single number: "You probably burned between 300 and 500 calories." That would be more honest. But ranges are hard to design for, and most users do not want a range. They want a number. The number is the problem.
We may add a calorie estimate in a future version, but if we do, it will come with a visible accuracy range and a clear explanation of how it was calculated. That will look different from the Oura app, which shows a single calorie total with no error bar and no methodology. I do not know if that transparency will be appreciated or ignored. I suspect it will be ignored by most users and valued by a few. That is a tradeoff I am willing to make, but I am not sure it is the right one for a company that needs to sell rings.
What you should do instead
If you are trying to manage your weight or your energy balance, do not trust the calorie number on your wearable. Use it as a rough relative metric at best. If Tuesday says 450 and Thursday says 380, you probably worked harder on Tuesday. But do not treat 450 as a true 450.
For actual energy expenditure tracking, the options are limited and expensive. Doubly labeled water is the gold standard for total daily energy expenditure, but it costs hundreds of dollars per test and requires a lab. Indirect calorimetry via a metabolic cart is available at some gyms and sports science facilities, but it is not practical for daily use. A food scale and a body weight trend line are still the most reliable tools for energy balance, and they cost less than a fitness tracker.
There is also a conceptual distinction most users miss. Your wearable shows active calories, which is the energy expenditure above your resting metabolic rate during movement. But your total daily energy expenditure includes your basal metabolic rate, the thermic effect of food, and non-exercise activity thermogenesis. The device guesses your resting metabolic rate from your age, sex, and weight, which introduces another layer of error. A 2020 study in the British Journal of Nutrition found that predictive equations for resting metabolic rate have errors of 10 to 20 percent in healthy adults. If the baseline is wrong, the total is wrong even if the active calories were somehow accurate.
The honest truth is that consumer wearables are good at some things and bad at others. Heart rate is good. Sleep stage detection is decent. Calorie burn is bad. The industry does not want to advertise this, because calorie burn is one of the most requested features and one of the easiest to market. Pulsyn's job is to build the thing that is actually good, not the thing that sounds good in a feature list.
About the author
James Hoffmann is the founder of Pulsyn. He has been building health-tracking hardware since 2024 and believes most consumer wearables oversell their accuracy.
References
- Shcherbina A, Mattsson CM, Waggott D, et al. "Accuracy in Wrist-Worn, Sensor-Based Measurements of Heart Rate and Energy Expenditure in a Diverse Cohort." Journal of Personalized Medicine. 2017;7(2):3. doi:10.3390/jpm7020003
- Stanford Medicine News. "Fitness trackers accurately measure heart rate but not calories burned." May 24, 2017. https://med.stanford.edu/news/all-news/2017/05/fitness-trackers-accurately-measure-heart-rate-but-not-calories-burned.html
- Ashley EA. "People are basing life decisions on the data provided by these devices." Stanford Medicine, 2017.



