Portable Avatar Semantics for Flutter Mobile: VRM 1.0 + Filament as a Shared Presentation Runtime
A technical article that evaluates how Realbits should map VRM avatar semantics to Filament-backed Flutter mobile rendering, with emphasis on portability, runtime determinism, and provider-level reuse.
Abstract
The main engineering decision in Realbits mobile character presentation is no longer just whether to display a VRM avatar, but how to make that display path portable across apps without forcing app-specific behavior to drift from the asset and runtime contract. On mobile, the cost of visual ambition is paid per frame, so architecture is now constrained by two coupled realities: VRM carries rich semantic intent, while Flutter imposes a host rendering boundary between Dart UI and native graphics stacks. This article proposes a practical layout: keep avatar semantics in a shared domain contract, map them to a native runtime core, and keep Flutter responsible only for orchestration and platform view lifecycle. The objective is a single provider-side representation that supports cross-app character reuse, deterministic parity checks, and incremental feature growth.
Recent avatar research is clear that mobile real-time performance is now practical for high quality assets, but only when the model and renderer responsibilities are matched tightly to target devices. The arXiv works on mesh-embedded and mobile-focused avatar pipelines repeatedly report speed/quality trade-offs that shift toward practical on-device delivery when the runtime pipeline is intentionally pruned and staged [S1] [S2] [S3]. In that context, Realbits should optimize first for contract stability and update determinism before adding visual flourishes, because Flutter-native boundaries and native renderer behavior become the bottlenecks at exactly the same scale as model complexity.
Realbits Context
Realbits already positions its portfolio around reusable model assets across multiple apps. That structure rewards an avatar stack that is explicit about four boundaries: catalog identity, runtime semantics, platform rendering, and presentation policy. In this model, VRM files become portable content and character metadata, while each app consumes the same runtime interface.
VRM is an intentionally constrained choice because it already encodes humanoid conventions and runtime hints. It is a glTF 2.0-derived ecosystem, and its .vrm extension effectively names a specialized container strategy around a standard core, which improves interoperability and asset reuse [S8]. When we treat VRM as a contract rather than a monolithic binary blob, we can evaluate cross-app compatibility by schema invariants instead of hand-wavy visual checks.
Flutter is the shared app UI layer, but Flutter cannot directly own every native render path with equivalent performance. The repository approach already reflects this by splitting runtime code into dedicated packages and a plugin-like runtime facade. The architectural implication is straightforward: make the runtime facade declarative and small, and keep the expensive frame loop in native code where Filament and JVM/native graphics tools can run without being constantly mediated by Dart.
For Realbits, this means the real optimization target is not just frame rate; it is repeatable semantics across apps and test environments. A character that speaks, blinks, and tracks gaze in one app must do the same in another without per-app re-calibration.
Related Work
VRM 1.0 inherits from a broader open 3D format stack rather than inventing a private binary format. Its dependence on glTF 2.0 gives Realbits an advantage: skinning, mesh attributes, morphing primitives, and material organization are already specified and broadly implemented [S7]. The vrm.dev entry further emphasizes that .vrm maps to .glb-like packing semantics and can generally round-trip through glTF tooling pathways, even if VRM-specific semantics are not always preserved [S8].
Where this matters to Realbits is execution behavior. VRM 1.0 explicitly enumerates subcomponents such as humanoid mapping, look-at, first-person view handling, expressions, springs, and constraints [S9]. This is enough to define a runtime API surface that is not about raw rendering commands, but about semantic operations that occur every frame in a specified order. In practical terms, this lets the provider define deterministic unit-level behavior independent of app shell differences.
From a rendering-engine perspective, Filament remains a strong fit because it is explicitly built as a real-time physically based renderer with mobile as a first-class target [S5], with support across Android, iOS, desktop, and web. That broad backend footprint is valuable even in a primarily Android strategy, because it lowers long-term migration risk if future targets return.
Academic progress in mobile avatar rendering shows high FPS is now possible with representation-aware optimizations, but the papers also indicate that visual systems become fragile when pipeline semantics are undefined or implicit [S1] [S2] [S3]. The useful lesson for Realbits is not to imitate any one avatar method; the lesson is to make the runtime contract explicit and test it at the semantic level.
Architecture Analysis
1) Data contract layer: VRM semantics as source-of-truth
A robust architecture should parse VRM assets into a normalized internal model early and treat rendering libraries as interchangeable sinks. VRM 1.0 requires humanoid bones, optional first-person settings, expressions, look-at, spring bones, and constraints [S9]. If that model is missing or malformed, avatar behavior should fail in validation before launch. This avoids frame-time surprises in Flutter UI interactions.
Expressions are especially sensitive because they are more than simple blendshape weights. The spec binds expressions to morph targets, materials, and texture transforms, so an expression update can drive both mesh shape and surface response [S10]. If semantics are executed through inconsistent code paths, users will observe avatar divergence across apps even when using identical assets. Therefore, the runtime should expose an ordered expression pipeline (resolve look-at, apply expressions, then constraints, then spring bones), as explicitly ordered in VRM guidance [S9].
This architecture supports both determinism and observability: if each stage receives and emits normalized state, parity captures become comparable across platforms.
2) Rendering layer: Filament as backend host
Filament gives the project a stable runtime foundation with explicit support for Android surface rendering via SurfaceTexture, TextureView, or SurfaceView through native swapchain creation and engine control [S5]. It also defines a dedicated Android story around Java/Kotlin APIs plus JNI bridging and native libs [S6], which aligns cleanly with Kotlin-based Android wrappers and existing plugin packaging.
That alignment matters because Flutter PlatformViews are not free from overhead. Android hybrid composition routes draw work through the platform thread, which may contend with plugin messages and regular framework scheduling [S4]. Also, SurfaceView and SurfaceTexture typically require manual invalidation in some scenarios [S4]. In a VRM viewer that continuously animates bones and materials, these details are correctness-critical: missed invalidation can create apparent freezes, while aggressive thread contention can reduce interactive smoothness.
The practical design is to keep Flutter at orchestration speed and Filament at render speed. Flutter should own surface lifecycle, state inputs, and app routing; native side should own all time-critical update loops, animation interpolation, and material refresh scheduling. This reduces expensive cross-boundary chatter and makes performance behavior more predictable.
3) Platform boundary and package layout
The current packaging direction should continue isolating a shared runtime package from platform-specific implementations. In that layout, all apps call the same facade, while Android, and eventually other targets, provide concrete adapters. This is the key to cross-app portability and to avoiding a duplicated runtime per app.
The boundary should pass explicit normalized frames, not raw engine commands. Each frame should include:
- normalized face/eye intent,
- expression payload,
- camera/look-at mode,
- material override hints,
- and constraint states.
The native engine applies these through a fixed update order. Any adapter mismatch (e.g., Android using one update order, macOS another) is surfaced as a compatibility defect during parity checks.
4) Determinism and parity validation
For a provider platform, visual quality without reproducibility is technically incomplete. Realbits can improve confidence by validating every incoming VRM asset through both runtime path and a reference renderer capture contract. The reference capture should include:
- fixed camera presets,
- deterministic runtime overrides,
- stable scene metadata,
- image + JSON snapshots for semantic diffs.
A normalized snapshot contract turns testing from subjective screenshot checks into measurable thresholds. Once this exists, regressions in springs, expression order, or material adaptation become objective and actionable. Even with different downstream renderers, the contract enables comparison against expected behavior.
Limitations
The first limitation is platform asymmetry. Flutter PlatformViews still impose overhead, and while Android is workable, complex animated native views can induce frame pressure in the shared raster pipeline [S4]. A single app can survive this, but a portfolio of heavy visual surfaces needs explicit budgeting.
Second, complete VRM coverage is large, and full parity is not free. VRM includes look-at, spring constraints, first-person mesh rules, mtoon-like material behavior, and extension combinations across authoring tools. Some extension paths may be accepted as approximations (for example, fallback behavior when unsupported) while others must be strict-failed for safety. The architecture should make these decisions explicit in versioned capability profiles.
Third, Filament integration can be highly performant, but ABI and packaging complexity is non-trivial because Android builds often require architecture-specific artifacts and strict toolchain alignment [S6]. This is usually acceptable for Android-first shipping, but it increases release complexity as app variants grow.
Fourth, schema evolution risk is continuous. As assets evolve across VRM revisions, older files can still be valid but interpreted differently unless adapters pin extension version handling. Without strict schema migration and schema-gated feature negotiation, runtime behavior can drift even if assets render at all.
Implications for This Repository
For Realbits, the technical stack should be evolved as a provider-owned portability layer, not a feature-specific Android viewer.
- Freeze a schema-first runtime contract first. Normalize VRM metadata into a small neutral model before engine binding. Include explicit support flags and hard errors for unsupported extension combinations.
- Keep Flutter as controller only. Use platform views for host surface hosting, but do not let Flutter own per-frame semantic evaluation [S4].
- Move heavy animation and expression scheduling fully into native code. Use Filament as stable execution host, with JVM/native surface control and glTF-driven asset handling in the native layer [S5].
- Add deterministic parity outputs. Every runtime path should emit standardized JSON metrics aligned with fixed camera presets and runtime overrides so regressions become diffable. This gives a cheap CI gate for release trains.
- Treat extension behavior as versioned capabilities. VRM 1.0 execution ordering and expression binding behavior should be enforced consistently for all rendering outputs [S9] [S10].
- Plan for staged expansion. Android remains the short-term focus, but the contract-first architecture should make future iOS or desktop ports an adapter rewrite, not a data-model rewrite [S6].
This design minimizes the risk of creating app-specific avatar behavior and maximizes the value of Realbits' asset and character-card strategy. In short, VRM should carry character identity and animation semantics; Filament should carry frame-true rendering; Flutter should carry app-level interaction and lifecycle. That partition gives the provider the strongest path to reusability and measurable quality gates.
References
- S1: https://arxiv.org/abs/2403.05087
- S2: https://arxiv.org/abs/2510.13587
- S3: https://arxiv.org/abs/2604.18583
- S4: https://docs.flutter.dev/platform-integration/android/platform-views
- S5: https://google.github.io/filament/dup/intro.html
- S6: https://google.github.io/filament/dup/building.html
- S7: https://registry.khronos.org/glTF/specs/2.0/glTF-2.0.html
- S8: https://vrm.dev/vrm/gltf/format/
- S9: https://github.com/vrm-c/vrm-specification/blob/master/specification/VRMC_vrm-1.0/README.md
- S10: https://github.com/vrm-c/vrm-specification/blob/master/specification/VRMC_vrm-1.0/expressions.md
Source Ledger
- [S1] SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting (arxiv): https://arxiv.org/abs/2403.05087 - Shows modern trade-offs between avatar quality and real-time performance, useful as a benchmark context for mobile avatar pipelines.
- [S2] HRM^2Avatar: High-Fidelity Real-Time Mobile Avatars from Monocular Phone Scans (arxiv): https://arxiv.org/abs/2510.13587 - Provides evidence that mobile-first mobile avatar systems target both fidelity and device-side efficiency.
- [S3] MUA: Mobile Ultra-detailed Animatable Avatars (arxiv): https://arxiv.org/abs/2604.18583 - Motivates a design posture that balances ultra-detailed avatar features against mobile real-time constraints.
- [S4] Flutter Android Platform Views (official-doc): https://docs.flutter.dev/platform-integration/android/platform-views - Defines Flutter's native view embedding contract, including performance characteristics and invalidation requirements.
- [S5] Filament Introduction (official-doc): https://google.github.io/filament/dup/intro.html - Defines Filament's rendering model, supported Android rendering surfaces, and mobile-oriented API assumptions.
- [S6] Filament Build and Android Tooling (official-doc): https://google.github.io/filament/dup/building.html - Captures the Java/Kotlin/JNI/native split and ABI-specific build expectations that affect mobile distribution.
- [S7] glTF 2.0 Specification (standards): https://registry.khronos.org/glTF/specs/2.0/glTF-2.0.html - Defines core schema primitives required for skinning, morph targets, materials, and geometry handling.
- [S8] VRM Format (official-doc): https://vrm.dev/vrm/gltf/format/ - States VRM's dependency on glTF 2.0 and the .vrm/.glb compatibility model.
- [S9] VRMC_vrm-1.0 README (web): https://github.com/vrm-c/vrm-specification/blob/master/specification/VRMC_vrm-1.0/README.md - Enumerates VRM 1.0 extension components and their runtime execution ordering.
- [S10] VRMC_vrm-1.0 Expressions (web): https://github.com/vrm-c/vrm-specification/blob/master/specification/VRMC_vrm-1.0/expressions.md - Defines expression-to-morph target and material binding behavior needed for deterministic expression runtime.