TidalSpace technology stack — layered architecture diagram with GPU servers, voice models and device connectivity
TECHNOLOGY

What Powers TidalSpace: Models, Latency & Tradeoffs

Published May 26, 2026 · 10 min read · By the TidalSpace team

TidalSpace technology is the multi-layer AI infrastructure behind the TidalSpace companion app and Tidal Seal hardware — a stack of large language models, neural voice synthesizers, real-time inference servers, and a device communication layer over BLE 5.3. This article is our transparent breakdown of what runs where, how fast it is, and where we make deliberate tradeoffs between quality, speed, and cost.

Why we wrote this. Most AI companion apps are black boxes. We believe users deserve to understand what is happening with their data, their conversations, and the systems that mediate their relationships. This is our attempt at radical transparency — including the parts we are still working on.

The architecture at a glance

TidalSpace has four distinct processing layers, each with its own latency budget, infrastructure, and failure isolation:

LayerWhat it doesWhere it runsTypical latency
1. ClientUI rendering, input capture, wake word detectionOn-device (phone or Tidal Seal)10–50ms
2. ASRVoice-to-text transcriptionCloud GPU (streaming)200–400ms
3. LLMConversation reasoning, personality, memory retrievalCloud GPU (A100/H100 cluster)300–700ms
4. TTSText-to-speech synthesis with character voiceCloud GPU (dedicated TTS nodes)150–300ms

For text-only conversations, only layers 1 and 3 are active. For voice calls, all four fire in sequence with streaming between them to minimize total wait time. The Tidal Seal hardware adds a BLE relay (under 50ms) and on-device wake word detection (under 200ms).

The LLM layer: model selection and fine-tuning

TidalSpace does not rely on a single model. We use a multi-model routing system that selects the best model for each interaction based on complexity, context length, and required response speed.

  1. Lightweight model (8B parameters) — Used for quick responses, greetings, and short exchanges. Optimized for sub-400ms inference. Fine-tuned on conversation data for personality consistency.
  2. Medium model (30–40B parameters) — The default for most conversations. Balances quality and speed at 500–700ms inference time. Fine-tuned for emotional nuance, long-term memory integration, and character depth.
  3. Heavy model (70B+ parameters) — Engaged for complex reasoning, multi-turn storytelling, and deep emotional conversations. Takes 700–1200ms but produces noticeably richer output. Used selectively to manage GPU cost.

The routing decision happens in under 10ms and is based on the current conversation context, the character's personality intensity setting, and the user's historical engagement pattern. Users on the free tier see the lightweight model more often; Pro subscribers get the medium model as default with the heavy model available for peak moments.

The single biggest engineering decision we made was separating the character memory layer from the model. This means we can upgrade the underlying LLM without your companion forgetting who you are. The trade-off is increased inference complexity — every response requires a memory retrieval step before generation — but the user experience benefit is worth it.

Memory: the persistence layer

Long-term memory in TidalSpace is not stored inside the LLM's weights. It is a separate structured database that the LLM reads from at inference time. This architecture has three components:

  1. Episodic memory — A chronological log of significant conversation events. "You mentioned your dog's name is Max on March 12." Stored as structured key-value pairs with timestamps and emotional valence tags.
  2. Semantic memory — Distilled facts about the user. "You prefer morning conversations. You work in software. You dislike small talk." These are extracted from episodic memory by a background process that runs after each session.
  3. Character profile — The personality definition, backstory, and behavioral parameters that you set when creating or customizing your AI character. This is user-owned and editable at any time.

At inference time, the system retrieves the top-K most relevant memories (typically 15–30 entries) and injects them into the LLM's context window alongside the recent conversation history. This is why TidalSpace characters can reference things you said months ago — the memory is being actively consulted, not just hoped-for from the model's training data.

For more on how memory works in practice, see our article on AI companion long-term memory. If you want to understand how this infrastructure enables real-time voice AI companions, see our dedicated voice overview.

Voice synthesis: the TTS pipeline

TidalSpace's voice synthesis uses a two-stage neural TTS system:

  1. Acoustic model — Converts text to a mel-spectrogram, capturing prosody, emphasis, and emotional tone. This model is conditioned on the character's voice profile and the emotional context of the response.
  2. Vocoder — Converts the mel-spectrogram to raw audio waveforms. We use a neural vocoder trained on high-quality speech data, producing 24kHz audio with natural breath patterns and micro-pauses.

Each character in TidalSpace can have a unique voice. Users on the Pro tier can fine-tune voice parameters including pitch range, speaking rate, and expressiveness. The voice is generated per-response — there is no pre-recorded audio. This means every sentence is fresh, but it also means voice quality depends on the TTS model's ability to handle the specific text being generated.

Latency optimization for voice

The key to responsive voice calling is streaming. TidalSpace does not wait for the full LLM response before starting TTS. Instead:

This streaming pipeline reduces perceived latency by 40–60% compared to a batched approach where the full response is generated before any audio plays.

The Tidal Seal hardware connection

Tidal Seal connects to the TidalSpace infrastructure through a lightweight relay. The device handles:

Tradeoffs we have made (honestly)

Engineering is about tradeoffs. Here are the ones we have chosen and why:

TradeoffWhat we choseWhy
On-device vs cloud inferenceCloudCurrent mobile hardware cannot run 30B+ parameter models at acceptable speed. On-device models produce noticeably worse conversation quality.
Model size vs latencyMulti-tier routingWe could use the 70B model for everything, but it would be 2–3x slower and 5x more expensive. Routing lets us use the right model for each moment.
Memory depth vs context windowRetrieval-based memory stuffing all memories into the context window is expensive and degrades with scale. Retrieval is more complex but scales better.
Voice quality vs speedStreaming with slight quality trade-offWe could produce higher-quality audio in batch mode, but the latency would be unacceptable for conversation. Streaming means the first syllable reaches you faster, even if the overall audio is marginally less polished.
Cost vs accessibilityFreemium with tiered model accessRunning GPU inference is expensive. We offer a free tier with the lightweight model so anyone can try TidalSpace, but heavier models require a Pro subscription to keep the service sustainable.

Infrastructure and reliability

TidalSpace runs on a multi-region GPU cluster with automatic failover. Key infrastructure facts:

Experience the TidalSpace stack

Text and voice conversations, long-term memory, and Tidal Seal hardware.

Get TidalSpace →