May 16, 20269 sources2 arXiv

Filament-Backed VRM Avatars for Mobile Character Presentation

A technical analysis of how Realbits can treat VRM avatars as portable presentation assets while keeping Filament, Flutter, and runtime semantics separated enough for mobile parity and future provider reuse.

vrmfilamentflutteravatars
<!-- Generated by the Realbits daily technology blog cron. Review before public publishing. -->

Abstract

Realbits' VRM work is best understood as a presentation runtime for provider-managed characters, not as a generic 3D viewer. VRM sits on glTF 2.0, so the base asset still benefits from a portable scene, mesh, material, skinning, and morph target container [S4]. VRM then adds avatar-specific semantics: humanoid mapping, metadata, expressions, look-at, first-person data, spring bones, node constraints, and MToon material conventions [S3][S9]. Filament is the pragmatic mobile renderer because it already provides Android artifacts, glTF loading through gltfio, Java/JNI integration, and a material toolchain suitable for custom runtime work [S5][S6]. The architectural question is therefore where Realbits should draw boundaries: Flutter owns product workflow, Filament owns frame production, and a VRM runtime layer owns avatar meaning. That split matters because Realbits wants reusable character cards and themed apps. A character needs a consistent body, face, and gaze across app surfaces, while persona prompts, voices, ownership, and catalog metadata remain provider data rather than hidden mesh state.

Realbits Context

The repository already contains an Android-first VRM and Filament implementation plan for the situational English coach app, plus a shared realbits_vrm_runtime package intended to hide direct dependency on app-specific gallery runtime packages. The local docs describe a Flutter shell, an Android PlatformView wrapper, a Filament ModelViewer baseline, a VRM 1.0 parser, runtime controls for expressions and look-at, node constraints, spring bones, first-person handling, MToon material support, outline rendering, and parity capture against a three-vrm browser harness.

That is an unusually concrete foundation for Realbits' provider pivot. The provider layer already treats model packs, character cards, and themed apps as reusable distribution units. VRM adds the visual half of that portability. A card can carry persona, greeting, tags, voice, creator address, and ownership linkage; a VRM asset can carry the standardized avatar body and the renderer-facing semantics needed to make the same character visually coherent across apps.

This article focuses on the mobile rendering side because it is where architectural pressure is highest. Web can rely on mature three-vrm workflows for inspection and parity. Android must run inside Flutter, use native rendering, keep interaction responsive, and match enough VRM semantics to avoid surprising creators. That combination makes Filament-backed VRM a product architecture concern, not only a graphics feature.

Related Work

The standards stack starts with glTF. Khronos defines glTF as a runtime 3D asset delivery format, but its specification deliberately leaves many runtime choices to clients. For example, it stores animations but does not prescribe how an application schedules, loops, or combines them [S4]. VRM fills a different gap. It says what a humanoid avatar means: how bones are identified, how facial expressions and gaze are addressed, what first-person data is available, and how spring-like secondary motion can be represented without a full game physics contract [S3][S9].

Filament occupies the rendering layer. It is not a VRM engine, but it supplies the mobile renderer, glTF loader, Android API surface, material compiler, and physically based material model that Realbits can extend [S5][S6]. The key benefit is specialization. Realbits does not need to fork a renderer to parse VRM metadata or resolve expression overrides. It can keep those rules above Filament and feed Filament updated transforms, morph weights, material parameters, and camera state.

three-vrm is useful as a behavioral reference because it integrates VRM through Three.js's GLTFLoader plugin mechanism and exposes a complete browser-side load path for VRM models [S8]. Realbits' local parity harness uses that ecosystem in the right way: not as production Android code, but as an independent renderer against which Android can compare camera presets, runtime config JSON, metrics, and PNG output.

Recent avatar research points in a more photorealistic direction. SplattingAvatar embeds Gaussian splats on a triangle mesh and reports real-time rendering, including mobile-oriented performance claims [S1]. 3DGS-Avatar uses deformable 3D Gaussian splatting to create animatable clothed avatars from monocular video with much faster training and rendering than earlier neural approaches [S2]. These papers are relevant, but they do not replace VRM for Realbits today. They optimize reconstruction and photorealism; Realbits needs interoperable provider assets, creator tooling, predictable mobile delivery, and deterministic runtime behavior.

Architecture Analysis

The first boundary is the asset boundary. VRM should not be treated as the character card. It is the character's renderable avatar package. In Realbits terms, the card remains the provider object that can bind a persona prompt, greeting, voice, tags, creator identity, ownership state, and app presentation defaults. The VRM file supplies the standardized humanoid and material payload. This distinction prevents a common failure mode: burying business and runtime policy inside artist-authored assets. VRM metadata can inform display and licensing flows, but the app should still resolve provider identity and entitlement through Realbits' catalog and ownership systems.

The second boundary is between glTF loading and VRM evaluation. glTF gives Filament a scene graph, meshes, textures, skins, morph targets, and materials [S4][S5]. VRM evaluation decides how to drive them. Expression weights are not just raw morph values; they can include preset names, override rules, material binds, and UV behavior. Look-at may be expression-driven or bone-driven. First-person can require hiding or splitting meshes. Spring bones need per-frame simulation and collider handling. Node constraints need deterministic transform evaluation. These are runtime semantics, so they belong in a VRM runtime layer rather than in Flutter widgets or ad hoc Android view code.

The third boundary is the Flutter boundary. Flutter Platform Views allow native Android views to be embedded in Flutter UI, but the official docs describe composition tradeoffs: platform view fidelity can come at a cost to Flutter-side performance, frame rate, and supported transformations [S7]. For Realbits, that means the VRM viewer should be a contained native surface with a narrow typed control API. Flutter should send avatar selection, camera preset, expression weights, view mode, and capture commands. It should not micromanage scene nodes every frame. The more per-frame state crosses the platform channel, the more the app risks turning a graphics problem into a UI synchronization problem.

The fourth boundary is material fidelity. VRM's MToon conventions are core to anime-style avatars, but Filament's material system is designed around its own material models and compiled material definitions [S6]. Realbits' current path, which approximates MToon using custom Filament materials and an outline pass, is a reasonable engineering compromise. It should be judged by creator-visible parity, not by philosophical purity. The right tests are whether common assets render with stable facial features, hair silhouettes, transparency behavior, rim and matcap cues, and acceptable outline width across representative phones.

The fifth boundary is conformance. Pixel parity with three-vrm will never be perfect because Three.js and Filament have different renderers, tone mapping choices, material pipelines, and platform constraints. But parity bundles are still valuable. A preset camera, background color, runtime config JSON, numeric metrics, and PNG diff give engineers a regression signal that is much better than manually looking at screenshots. The comparison target should be framed as behavioral compatibility: does the same config produce the same pose, gaze direction, expression, first-person visibility, and approximate material intent?

Limitations

The Android-first scope is rational, but it creates product risk if Realbits promises cross-app character portability before iOS and macOS behavior are equally characterized. A provider asset that works only on one app surface is not yet provider inventory; it is an app-specific feature with better packaging.

Platform Views also deserve caution. Flutter can host native Android views, but composition choices can reduce Flutter frame rate or limit transformations [S7]. A VRM view embedded in a chat screen should therefore avoid being overlaid with constantly animated Flutter chrome. The native view should do the expensive visual work; Flutter should keep surrounding controls stable and sparse.

The research frontier is not aligned with this implementation yet. Gaussian and neural avatars are advancing quickly [S1][S2], but they usually depend on different asset representations, training pipelines, and renderer assumptions than glTF and VRM. Realbits should watch that work for future premium avatar generation or compression ideas, while keeping the shipping runtime standards-based until the provider pipeline can validate, distribute, and render neural assets predictably.

MToon support is another limitation. Some toon effects are authoring conventions as much as renderer rules. Filament can implement custom materials, but small differences in texture transform handling, transparency sorting, normals, outlines, and color management may become creator-visible. Realbits needs an explicit unsupported-case policy, not only more code.

Implications for This Repository

The current direction should continue with a shared native VRM runtime core. Expression resolution, look-at solving, node constraints, spring-bone stepping, first-person mesh classification, and runtime override parsing are exactly the kinds of behavior that should not live separately in each Flutter app. They are provider infrastructure. If the flagship chat hub, English coach, and future vertical apps all depend on the same runtime package, Realbits gets reusable QA and creator trust.

The provider catalog should also version rendering capability. A character listing can expose whether the asset requires VRM 1.0, MToon, first-person mesh splitting, spring bones, node constraints, or unsupported extensions. That allows apps to reject, downgrade, or warn before download. It also lets the web studio run validation before publishing.

Parity tooling should become part of release gating. A small fixture set should cover front, 3/4, profile, close face, first-person, expression blends, spring-bone motion, transparency, MToon outlines, and texture transforms. Each fixture should produce Android output and a three-vrm reference bundle [S8]. The metric does not need to demand identical pixels; it needs stable thresholds and human-reviewable diffs.

Finally, Realbits should keep persona and rendering separate. VRM makes the character present; the card and provider runtime make the character behave. That separation is what allows a single asset to travel across themed apps without turning every app into a separate content silo.

References

Source Ledger