89% of Smart Ring Studies Use Algorithms Nobody Can See: The Transparency Crisis in Wearable Health

TL;DR

A systematic review published in MDPI this month looked at every smart ring clinical study it could find. 89% of them used algorithms that are proprietary and unpublished. Nobody outside the company can inspect them, reproduce them, or verify that they do what they claim. If you have been treating your Oura readiness score or your Whoop strain number as data, you have been treating a black box output as data. Those are different things.

A few weeks ago I was reading through a paper on smart ring accuracy for sleep measurement. The study was solid. 30 participants. Polysomnography-grade reference devices. Good statistical methods. But buried in the methods section was a phrase I have seen before: "the Oura Ring's sleep staging algorithm is proprietary and was not modified." Which means the authors could tell you the ring agreed with the PSG on certain metrics, but they could not tell you why it agreed. They could not inspect the rules. They could not adjust them. They could not run the same data through a different version and compare. They put a black box on peoples fingers, recorded what came out, and called it validation.

That paper is not the exception. According to a new systematic review in MDPI's Biomimetics journal, it is the rule.

A transparent glass cube on a table, representing the ideal of algorithmic clarity and openness

Photo by Vadim Bogulov on Unsplash

The number

The review, titled "Smart Ring in Clinical Medicine: A Systematic Review," screened 695 records and included 37 studies for analysis. It covered sleep, heart rate, HRV, SpO2, step count, and energy expenditure across multiple devices: Oura, RingConn, Ultrahuman, Amazfit, and others.

The finding that matters most is this: 89% of the studies used algorithms that are proprietary. The exact language from the paper: "The proprietary nature of algorithms used in 89% of studies represents a fundamental barrier to scientific reproducibility and clinical validation."

This is not a small problem. It is not a niche concern for academic purists. It means that for nearly nine out of ten published findings on smart ring accuracy, nobody can independently verify the method that produced the result. The only way to "replicate" the study is to buy the same device, collect new data, and hope the algorithm has not changed in the meantime. And because these algorithms update over the air without announcement, even that is not real replication.

How medical devices handle this differently

Compare this to how an FDA-cleared pulse oximeter works. The manufacturer publishes the measurement principle. You can look at the calibration curve. You can see the wavelength of the LEDs. You can inspect the signal processing chain. If you want to validate it, you can build the same measurement chain and compare results, because you know what is in the chain.

Smart rings do not work this way. The PPG sensor in your ring captures raw light absorption data, just like a medical pulse oximeter. But what happens between that raw photodiode reading and the number you see in the app is a pipeline of filters, feature extractors, and classification models that the manufacturer considers intellectual property. Some of these models are machine learning. Some are heuristic rules. Some are probably a mix. The point is you cannot know, because the companies will not tell you.

This matters clinically. The review notes that "[this] algorithmic opacity contrasts sharply with traditional medical devices where measurement principles are transparent and standardized." When a patient brings their Oura sleep report to a doctor, the doctor has no way to evaluate how that report was produced. A medical ECG printout includes the lead configuration, the calibration signal, and the paper speed. A smart ring report includes a score and a pretty chart.

The population problem

There is a second finding in this review that compounds the transparency problem. Only 35% of the studies reported the race or ethnicity of their participants. The vast majority of participants were young, healthy, educated people from high-income countries. This is the same homogeneity problem that has plagued medical research for decades, just in a new form factor.

Here is why it matters for smart rings. PPG sensors work by shining light through your skin and measuring how much is absorbed by blood flow. The amount of melanin in your skin affects how much light is absorbed before it even reaches the blood vessels. There are well-documented disparities in PPG accuracy across skin tones. If the algorithm was validated mostly on light-skinned participants, it may perform differently on everyone else. And because the algorithm is closed, you cannot even check whether this is the case. The company could fix it in a future update, or not, and you would never know which.

Same problem, different angle. The review puts it bluntly: "Given known variations in PPG signal quality across skin tones and the documented disparities in wearable device accuracy among different ethnic groups, this homogeneity severely limits the applicability of findings to diverse clinical populations who might benefit most from remote monitoring."

Stack of academic research papers, representing the body of validation studies with limited diversity

Photo by Arisa Chattasa on Unsplash

The update problem

There is a third thing the review touches on that I want to pull out separately, because it is the one that bothers me most as someone building a wearable.

Algorithms change. Over-the-air firmware updates can and do alter the way a ring calculates sleep stages, heart rate, or recovery scores. A study published in 2023 validated the Oura Ring Gen 3 against PSG. Oura has since released multiple firmware updates. The algorithm that produced those validated results no longer exists. But the study is still cited as evidence that the device is accurate.

This is not an Oura-specific problem. Every connected wearable can push algorithm updates silently. The user wakes up one day and their sleep score is calculated differently, with no changelog, no explanation, and no way to opt out. The Nature paper on smart ring monitoring for insomnia I read recently flagged this too: "As algorithm updates may alter data outputs over time, future research should address the implications of using closed-source consumer devices in clinical research."

The problem is structural. The incentives point the wrong way. Companies want to improve their algorithms, which is good. But they also want to keep them secret, which makes independent validation a moving target. You cannot freeze a version of the algorithm for a longitudinal study unless the company gives you a way to do that, and most do not.

What this means for you

If you wear a smart ring, none of this means the data is useless. The evidence is clear that rings can track trends reasonably well. HRV trends over weeks correlate with recovery states. Sleep duration trends track fairly well against actigraphy. The problems start when you treat the specific numbers as truth.

The difference matters. Your Oura readiness score of 72 does not mean 72 out of 100 on some universal, validated scale. It means the Oura algorithm, running on your specific data, at the current firmware version, produced a 72. Tomorrow's firmware update might produce a 68 for the same physiological state. And nobody outside Oura can tell you which is more correct, because the scoring formula is secret.

Same for Whoop strain. Same for Ultrahuman's sleep score. Same for every "readiness" or "recovery" metric from a closed-source device. The trending direction is useful. The absolute number is not.

I have written about this before in different contexts: why Pulsyn publishes its sleep score formula, how our resting heart rate calculation works, and what goes into our stress score. The reason we publish these is not altruism. It is that if you are asking people to trust their health to your sensor, you should let them inspect the math. Otherwise you are asking for faith, not trust.

A fingerprint icon on a dark digital surface, symbolizing the unique and personal nature of health data and biometric identity

Photo by George Prentzas on Unsplash

What would fix this

The systematic review suggests three things. I agree with all of them.

Multi-vendor validation studies using standardized protocols. Right now, most validation studies test one device against a reference, using the manufacturer's own algorithm. Independent labs should test multiple devices on the same subjects, using the same protocols, and publish the results side by side. This would let consumers see how Oura, RingConn, and Ultrahuman perform on the same biometrics in the same conditions.

Algorithm transparency through open publication or independent audit. Companies do not have to open-source their entire codebase. But they should publish enough detail about their signal processing and classification methods that an independent researcher can understand what is being computed. Some of the most respected consumer health devices, like Continuous Glucose Monitors, publish their MARD (Mean Absolute Relative Difference) calculations and calibration methods. Smart ring companies could do the same.

Demographic diversity in validation populations. This one is straightforward but somehow still rare. Recruit study participants across the full range of skin tones, ages, BMIs, and health statuses. Report the demographic data. Disaggregate the accuracy results. If the algorithm works better on some groups than others, say so. That is not admitting weakness. It is providing information that users need to interpret their own data.

I will add a fourth: versioned algorithm releases with changelogs. If you update your sleep staging model from v2.3 to v3.0, you should tell users what changed and provide a way to see how their historical data looks under the new model. Some companies already do this for major updates. None do it consistently.

The honest part

I am building a smart ring company. I want Pulsyn to be better on transparency than the incumbents. But I would be lying if I said it was easy or that I am sure we will get it right.

The tension is real. Publishing your algorithm means competitors can study it. It means users can find edge cases where it fails and post screenshots. It means you have to stand by specific claims instead of hiding behind "proprietary." When you update the model, you have to explain why, which opens the door to criticism. There are good business reasons not to do any of this.

I think those reasons are weaker than they seem. The companies that win in health hardware are going to be the ones users trust with their most sensitive data. You cannot buy that trust with marketing. You earn it by showing your work. But I have not shipped a product at scale yet, so take that opinion with the skepticism it deserves.

What I am certain of is that the current situation is bad for everyone. Users cannot evaluate their own data. Researchers cannot build on published results. Clinicians cannot recommend devices with confidence. And the industry as a whole cannot improve as fast as it should, because the best ideas are locked inside competitive black boxes.

A magnifying glass over lines of code on a screen, representing the need to scrutinize software algorithms

Photo by Alexander Sinn on Unsplash

About the author

James Hoffmann is the founder of Pulsyn, building a privacy-first smart ring. He has been reading wearable validation studies so you do not have to.

References

Smart Ring in Clinical Medicine: A Systematic Review. Biomimetics, 2026. MDPI. https://www.mdpi.com/2313-7673/10/12/819
Adjunctive smart ring monitoring during digital cognitive behavioral therapy for insomnia. Scientific Reports, 2025. Nature. https://www.nature.com/articles/s41598-025-24312-0
Performance of wearable finger ring trackers for diagnostic sleep measurement. Scientific Reports, 2025. Nature. https://www.nature.com/articles/s41598-025-93774-z
The Use of Smart Rings in Health Monitoring: A Meta-Analysis. Applied Sciences, 2024. MDPI. https://www.mdpi.com/2076-3417/14/23/10778
How Pulsyn Calculates Your Sleep Score. Pulsyn Blog. https://pulsyn.tech/blog/sleep-score-calculation
What Goes Into Pulsyn's Stress Score. Pulsyn Blog. https://pulsyn.tech/blog/what-goes-into-pulsyns-stress-score
How Pulsyn Calculates Resting Heart Rate. Pulsyn Blog. https://pulsyn.tech/blog/how-pulsyn-calculates-resting-heart-rate-and-why-your-current-number-is-probably-wrong