How On-Device AI Actually Works: Why Your Health Data Never Leaves Your Phone

TL;DR

Every major wearable company now claims to care about privacy. Oura says its new AI runs on "privacy-friendly" servers it owns. The servers are still in a data center. Your heart rate variability data still travels over TLS to a machine Oura controls. On-device AI means the model runs on your phone, not theirs. The difference is architectural, not semantic, and it is the reason Pulsyn does not need a subscription.

The cloud AI illusion

Oura filed for IPO in May 2026. A week later, the company announced it is building its own AI model with "privacy-friendly AI architectures" hosted on Oura-owned servers. The press release used the word "privacy" four times. It never used the word "local."

This matters because the architecture determines who can see your data. When your ring sends heart rate, temperature, and accelerometer readings to a cloud server, even one Oura owns, the data exists in three places: the ring, the phone, and the server. Each additional location is an additional surface area for breach, subpoena, or employee access. Oura's privacy policy already acknowledges that "certain service providers" may process data. Adding an owned server does not remove that line. It adds a new building.

Whoop operates the same way. Your biometric stream uploads continuously to AWS. Fitbit sends to Google Cloud. Apple Watch processes some metrics on-device, but HealthKit backups sync to iCloud by default. In every case, the business model depends on cloud compute. The subscription fee does not just pay for app features. It pays for the GPU hours required to run inference on millions of users' data streams.

The privacy marketing is technically true in the narrowest sense. The data is encrypted in transit. The server is SOC 2 compliant. The company promises not to sell it. But encrypted in transit is not encrypted at rest from the provider's perspective. SOC 2 does not prevent a subpoena. And a promise not to sell data does not prevent a breach or a policy change after an acquisition.

Pulsyn's architecture removes the server from the path entirely. Not because we are more ethical. Because the server is not necessary.

Rows of dark server racks in a data center, the exact infrastructure Oura and Whoop use to process your biometrics, and the exact infrastructure Pulsyn bypasses entirely

What on-device AI actually means

Here is the flow when you wake up and check your sleep score on a cloud-dependent wearable:

The ring collected raw sensor data overnight (PPG samples, accelerometer ticks, temperature readings).
The phone uploaded that data to a cloud API while you were sleeping.
A server ran a machine learning model against your data.
The server returned a sleep score, stage breakdown, and readiness metric.
The phone displayed the result.

Step 2 and 3 are where the privacy risk lives. They are also where the latency, cost, and dependency live.

Here is the Pulsyn flow:

The ring collected raw sensor data overnight.
The phone received the data over Bluetooth Low Energy.
An AI model running locally on the phone processed the raw data.
The phone displayed the result.

The server never sees the raw data. It never sees the processed data. It does not exist in the chain. The AI model is a file on your phone's storage, loaded into memory, executed by the phone's CPU or NPU, and the output stays in a local SQLCipher database encrypted with a key derived from your PIN.

The model file itself is the product of training on public datasets and, in our case, fine-tuning on anonymized aggregate patterns. But the weights file that sits on your phone is just a binary blob. It does not phone home. It does not check a license server. It runs inference the same way a calculator runs arithmetic: locally, deterministically, with no external dependency.

This is not edge computing. Edge computing pushes inference to a server physically closer to you, often owned by a CDN or cloud provider. On-device computing pushes inference to the device in your pocket. The distinction matters because edge servers still receive your data. They just receive it faster.

An abstract overhead view of a smartphone and processor, the hardware that runs Pulsyn's AI model locally without cloud dependency

The technical constraints that make this hard

Running a neural network on a phone is not trivial. A modern large language model from a major cloud provider requires hundreds of gigabytes of memory and runs on clusters of GPUs. A phone has 6 to 12 GB of RAM and a battery that must last all day. You cannot run a 70B-parameter model on a phone. You can run a 1B-parameter model. The difference in capability is real, but the difference in applicability is smaller than it sounds.

Health inference is narrow. A sleep stage classifier does not need to know who won the 1924 Olympics. It needs to know how PPG amplitude variance, accelerometer magnitude, and skin temperature correlate with polysomnography labels. That is a bounded problem. A 1B-parameter model, properly trained and quantized, can outperform a general-purpose cloud model on that specific task because it is not wasting parameters on general world knowledge.

The constraints we work within are concrete.

Memory is the first wall. A quantized 1B-parameter model uses roughly 500 MB to 1 GB of RAM at load time. That is significant on a phone with 6 GB total, shared with the OS, other apps, and camera buffers. We use quantization down to 4-bit weights where possible and unload the model from memory immediately after inference completes. The model lives on disk, loads for the analysis window, and exits.

Quantization deserves a sentence of explanation. A standard neural network weight is stored as a 32-bit floating-point number. That is four bytes per parameter. A 1B-parameter model at full precision needs 4 GB just for the weights, before activations, KV caches, or overhead. Quantization maps those 32-bit floats to 4-bit integers, reducing the weight storage to roughly 500 MB. The precision loss is measurable on general knowledge tasks. It is negligible on narrow health classification tasks where the input distribution is bounded and the output classes are discrete.

Battery is the second wall. Neural inference on a phone's NPU or GPU is power-hungry. A full-night sleep analysis might require 30 to 60 seconds of sustained compute. We batch the work. Instead of streaming raw PPG samples to the model in real time, we collect the full night's data, compress it into a structured representation, and run inference once in the morning when the phone is likely charging.

Thermal is the third wall. Sustained NPU usage warms the phone. We throttle inference if the battery temperature exceeds 40 degrees Celsius. In practice, this rarely triggers because the analysis window is short and the batching strategy keeps compute bursts brief.

Storage is the fourth wall. The model file itself is 300 to 800 MB depending on architecture and quantization. We ship it as an optional download, not bundled with the app install. Users who want on-device AI can download the model over Wi-Fi. Users who prefer the optional cloud premium tier can skip the download and use remote inference instead.

Compatibility is the fifth wall. Not every phone has a dedicated NPU. We fall back to CPU inference with reduced thread counts on older devices. This is slower, but it works. A 2020-era mid-range phone can still run the model; it just takes two minutes instead of thirty seconds.

These constraints shape the product. They force discipline. A cloud AI company can throw a 70B-parameter model at every problem because someone else's data center pays the power bill. We have to fit the solution into a pocket-sized computer with a finite battery. That constraint produces better engineering.

Why it changes the user experience completely

The architectural difference between cloud and on-device AI is invisible until it matters. Then it matters completely.

Airplane mode is the obvious case. A cloud-dependent wearable stops working the moment you enable airplane mode or lose signal. Your sleep data from a transatlantic flight sits on the ring until landing, then uploads, then waits for server processing, then returns a score hours later. With on-device AI, you check your score while taxiing. The model is on the phone. The data is on the phone. The analysis is on the phone.

An abstract digital padlock against a dark background, representing the privacy guarantee that comes from never sending health data to a server in the first place

Corporate and hospital networks are a less obvious case. Many enterprise Wi-Fi networks block connections to consumer health APIs. A cloud-dependent wearable silently fails in these environments, queueing data until it finds an open network. Pulsyn works inside a hospital with no external connectivity because the inference path never leaves the device.

International roaming is the expensive case. Cloud inference requires data roaming. A user traveling in Japan without a roaming plan cannot get sleep analysis from a cloud-dependent ring without paying per-megabyte charges. Pulsyn's analysis uses zero bytes of mobile data.

Latency is the subtle case. Cloud inference introduces round-trip latency. For a simple sleep score, this is minor. For real-time features like stress detection or guided breathing feedback, 200 to 500 ms of API latency is perceptible and degrades the experience. Local inference runs in 50 to 200 ms.

Subpoena resistance is the uncomfortable case. If a law enforcement agency subpoenas Oura or Whoop for a user's health data, the company has the data and must respond or challenge. Pulsyn does not have the data. We cannot produce what we do not possess. Our database contains order history and support tickets. It does not contain heart rate measurements. The legal architecture is simpler because the technical architecture is simpler.

I am not a lawyer, and this is not legal advice. But the technical reality is that removing the server from the data path removes the server from the subpoena path.

The business model shift

Cloud AI is expensive. A single inference request on a GPU-backed cloud instance costs between $0.001 and $0.01 depending on model size and provider. That sounds small until you multiply by millions of users and thousands of requests per user per year.

Here is the math. Oura has roughly 2.5 million users. Assume each user generates one sleep analysis per day. At $0.005 per inference, that is $12,500 per day, or $4.56 million per year, just for sleep scores. Add readiness scores, stress scores, and any real-time features, and the compute bill likely exceeds $10 million annually. Oura's reported $11 billion IPO valuation is built on a business that pays AWS or its own data center for every sleep score it generates.

The subscription model is not a choice. It is a requirement. If Oura stopped charging $5.99 per month, the inference bill would eat the hardware margin. The hardware margin on a Gen 3 ring is already thin. Remove the recurring revenue and the unit economics collapse.

Pulsyn's on-device architecture inverts this. The compute happens on the user's phone, which the user already owns and powers. We do not pay for GPU hours. We do not pay for bandwidth. We do not pay for data center real estate. The marginal cost of an additional user approaches zero after the app download.

This is why Pulsyn can charge $160 once and never require a subscription. The $160 covers the hardware, the firmware, the app development, and the model training. It does not need to cover perpetual cloud compute because there is none.

The optional premium tier is for users who want deeper analysis or cloud-based features like API access and long-term backup. Even then, the premium tier is optional. The base product is complete without it because the base product runs on hardware the user already owns.

What Pulsyn will not do

There are features we could build that we will not build because they violate the local-first architecture.

We will not offer "cloud sync" as the default. Sync implies a server has a copy. We offer encrypted local backup to a user-controlled destination if desired, but the canonical data store is the phone.

We will not build "social features" that compare your sleep score to friends by uploading both data streams to a central server. If users want to share data, they can export an encrypted file and send it directly.

We will not train models on user data without explicit opt-in, and even then, the training data would be anonymized and aggregated, not raw biometric streams. The model on your phone was trained before you bought the ring. It does not learn from your data unless you explicitly enable on-device personalization, and even then, the learning happens on the phone, not on our servers.

We will not claim "military-grade encryption" as a privacy solution. Encryption protects data at rest and in transit. It does not protect data from the party that holds the decryption keys. The only way to ensure we cannot read your health data is to ensure we never receive it.

What I think and what I do not know

On-device AI is the correct architecture for consumer health wearables. I am less certain about how the model landscape will evolve over the next three years.

Phone NPUs are getting faster. Apple's Neural Engine and Qualcomm's Hexagon DSP improve every generation. But cloud models are also getting larger and more capable. There may come a point where a 3B-parameter on-device model cannot match a 70B-parameter cloud model on complex diagnostic tasks, and users will notice the gap.

Our answer to that is the optional premium tier. Users who want deeper analysis can enable cloud AI. The base product remains local. But I do not know whether that two-tier model will feel fair to users in 2028 or whether the gap will widen enough that the local model feels like a toy.

I also do not know whether Apple or Google will restrict on-device model execution in future OS updates. Both companies have incentives to push developers toward their own cloud AI services. If iOS or Android starts treating locally loaded model files as "untrusted executables" and requiring notarization or App Store review, the architecture becomes harder to maintain.

These are real risks. I am building anyway because the alternative, cloud-first health tracking, has already produced breaches, subpoenas, and business models that require harvesting biometric data to survive. On-device AI is not perfect. It is just better.

About the author

James Hoffmann is the founder of Pulsyn. He has been building on-device health AI systems since 2024 and thinks the best way to keep health data private is to never collect it in the first place.

References

Oura Health Oy, "Oura Files Confidentially for IPO," press release, May 21, 2026.
Oura Health Oy, "Oura Developing Privacy-Friendly AI Model with webAI," TrendingTopics.eu coverage, May 2026.
OWASP Foundation, "Password Storage Cheat Sheet," 2023. https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html
Apple Inc., "Apple Neural Engine," Apple Platform Security documentation, 2025.
Qualcomm Technologies, "Hexagon DSP Architecture," technical overview, 2024.

How On-Device AI Actually Works: Why Your Health Data Never Leaves Your Phone

TL;DR

The cloud AI illusion

What on-device AI actually means

The technical constraints that make this hard

Why it changes the user experience completely

The business model shift

What Pulsyn will not do

What I think and what I do not know

Related Articles

How Smart Rings Calculate Sleep Stages (and Why They're Mostly Guessing)

What Happens to Your Health Data When the Company Dies

Your Wearable Data Isn't Covered by HIPAA. The FTC, Congress, and 20 States Are Trying to Fix That.