🐾 PETBLIP WALL

Phase 1 – Functional AI Deployment Summary

Status: Live + GPU Accelerated + Session-Aware

🧠 Core Brain

AI Server (ai1)

  1. Intel i7 system
  2. 64GB RAM
  3. RTX 3090 Ti (24GB VRAM)
  4. CUDA 12.2 active
  5. Ollama running locally
  6. Model: qwen2.5:32b
  7. ~20GB VRAM utilized during inference
  8. Fully GPU accelerated

Result:

Retail-grade 32B local inference with strong multi-turn reasoning.

🔐 Network Architecture

  1. AI server isolated on internal VLAN (10.0.1.x)
  2. Wall server (ser2) communicates over LAN only
  3. AI machine NOT exposed directly to public internet
  4. Token-based Authorization header required
  5. Static IP fiber available for DNS when needed

Result:

Secure internal AI architecture with controlled access.

🖥 PetBlip Wall Server (ser2)

  1. Node.js (v20)
  2. Express
  3. Socket.io (real-time WebSocket communication)
  4. MySQL2 (async MariaDB logging)
  5. Port 3000 wall interface

Live Capabilities:

  1. Customer submits question
  2. Wall sends request to AI server
  3. AI response returned in real-time
  4. Response displayed immediately
  5. Interaction logged to database

🗂 Database Layer (MariaDB 11.4)

Database: blip_analytics

Table: wall_conversations

Stored fields:

  1. id
  2. session_id
  3. question
  4. response
  5. source
  6. created_at

Result:

Every store interaction captured for later insight.

🧠 Session Memory (Phase 1 Complete)

  1. Memory tied to socket.id
  2. Last 3 customer messages injected into prompt
  3. No cross-customer contamination
  4. No identity tracking
  5. Stateless after refresh

Result:

Multi-turn contextual continuity without privacy complexity.

🎭 Blip Persona Layer

Blip operates as:

  1. In-store AI assistant
  2. Skilled tradesman tone
  3. Practical guidance
  4. Short structured responses
  5. Vet suggestion when appropriate
  6. Subtle in-store product mention
  7. Ends with follow-up question

With 32B model, persona holds under context.

⚡ Performance

  1. 32B model loads fully into VRAM (~20GB usage)
  2. Responses fast (GPU confirmed active)
  3. Wall latency acceptable for retail interaction
  4. No CPU fallback detected

🧱 What Is Working Right Now

You have:

  1. Functional in-store AI
  2. GPU-powered reasoning
  3. Secure local inference
  4. Session-aware multi-turn conversation
  5. Persistent logging
  6. VLAN-isolated architecture
  7. Real-time wall interaction
  8. Production-capable hardware stack

This is no longer a prototype.

🚀 What This System Is Now Capable Of

Without adding anything new:

  1. Handle real customer Q&A
  2. Maintain conversational continuity
  3. Collect store question data
  4. Refine persona over time
  5. Expand to other LocalAd properties
  6. Be called from forum, FixItUs, VolusiaMarket
  7. Serve as central AI brain for ecosystem

🔒 What You Intentionally Did NOT Add

  1. No customer identity tracking
  2. No RAG complexity
  3. No vector DB
  4. No external cloud dependency
  5. No direct public AI exposure
  6. No over-engineering

Safe.

Contained.

Focused.

📍 Current State Classification

Infrastructure Tier: Advanced Local AI Deployment

Retail Integration Tier: Early Production

Hardware Tier: High-end Prosumer AI Node

Security Tier: Properly Isolated LAN Deployment