
Fyra — AI Portfolio Assistant
Custom knowledge base assistant that qualifies visitors, recommends projects, and captures leads in real-time.
AI & Full-Stack Engineer
View Details →Developing custom RAG pipelines, fine-tuned LLM agents, and real-time computer vision models. I bridge the gap between AI research and production software.
3+ Years
Building systems
Remote + Onsite (Guwahati Area)
Prompt/Model Tuning → Vector Cache Sync → Production API Deploy
Optimization Summary
Summary: Generic AI chatbots hallucinate and suffer from high API costs. I build structured RAG pipelines and local classifiers to keep logic strict.
“AI chatbot hallucinates details or invents services I don't offer.”
I implement strict semantic search routing using vector embeddings (via Supabase Vector). The system forces the LLM to reply only using retrieved context, falling back to a structured brief capture form if the query is out-of-bounds.
“API token costs are skyrocketing and response times are too slow.”
I write lightweight token buckets and rate limiters. For simple patterns, I build local classifiers, compile local models, or optimize prompt payloads to reduce token overhead by up to 40%.
“Need AI computer vision, but camera streams are high-latency.”
I build localized frame processing loops using OpenCV and MediaPipe. Joint positions are processed at 30FPS directly on the host machine, streaming only processed coordinate offsets to connected hardware.
“Running AI models on device requires too much RAM and power.”
I quantize neural networks (using TensorFlow Lite) into 8-bit integers, reducing flash footprints. We run models locally on edge processors (ESP32-S3 or Jetson Nano) with sub-100ms processing cycles.
Summary: I build specialized, low-latency AI integrations that map to your private documents or hardware telemetry. I don't build generic wrapper bots.
Companies seeking to embed custom intelligence (like my Fyra lead qualifying assistant) directly into their Next.js apps to increase user retention and capture briefs.
Assam tea gardens or local brands wanting visual automated sortings, or customer journey portals showing immersive cultural histories and recommendations.
Medical teams requiring secure, local document parsing tools that answer patient inquiries using verified journals without leaking records to public LLMs.
Creators seeking custom OpenCV gesture models (like hand tracking loops) mapped directly to mechanical actuators over WebSocket lines.
Summary: Fully deployed, validated, and open-source models ready to integrate with your existing codebase.
Fully configured Supabase Postgres tables with pgvector extensions, semantic similarity functions, and index scaling configurations.
Clean, parameterized system prompts and prompt engineering scripts structured to prevent prompt injection and keep models aligned.
Quantized .tflite files optimized for microcontrollers, alongside hardware integration scripts (C++).
API route endpoint codes implementing SSE (Server-Sent Events) or WebSockets to stream conversational responses character-by-character.
Python / C++ scripts utilizing OpenCV and MediaPipe pipelines to track hands, detect faces, or classify target products.
Summary: I bridge raw web components and edge hardware using practical AI modules.
Bespoke assistant nodes. I design chat interfaces (like Fyra AI) that leverage LLM streaming, identify user intent, extract fields (timeline, budget), and switch viewports automatically.
Camera-driven logic. I compile lightweight tracking codes processing webcam coordinates at 30FPS to translate joint movements into physical commands with minimal jitter.
Edge AI. I train wake-word models and local audio classifiers using Edge Impulse, exporting lightweight libraries that compile directly inside ESP32-S3 boards.
Summary: I build AI systems using modern vector databases and optimized inference runtimes.
Summary: This document outlines Jishnu Mahanta's architectural approach to implementing robust Retrieval-Augmented Generation (RAG) vector searches, quantizing model runtimes for microcontrollers, and constructing real-time tracking pipelines.
In corporate environments, generic AI chatbots are liability hazards due to hallucinations. Solving this requires scoping queries strictly using a Postgres vector database extension (pgvector) to verify factual boundaries before invoking the Large Language Model.
Cosine Similarity Scoping
When querying vector databases, setting a static similarity threshold (e.g. similarity > 0.7) is often insufficient. Different documents generate varying density ranges. Implementing a dynamic ranking algorithm that combines full-text search (BM25) with vector similarity (HNSW index) provides far better results and filters out irrelevant prompts before they reach the API.
Below is the SQL implementation I execute to build semantic search indexing and custom matching inside Supabase:
-- Create database schema to store technical document embeddings
CREATE EXTENSION IF NOT EXISTS pgvector;
CREATE TABLE IF NOT EXISTS document_chunks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
content TEXT NOT NULL,
metadata JSONB,
embedding vector(1536) -- Match standard OpenAI / Gemini embedding dimensions
);
-- Build Hierarchical Navigable Small World (HNSW) index for sub-millisecond retrieval
CREATE INDEX ON document_chunks USING hnsw (embedding vector_cosine_ops);
-- Create database function to perform cosine similarity search
CREATE OR REPLACE FUNCTION match_documents (
query_embedding vector(1536),
match_threshold float,
match_count int
)
RETURNS TABLE (
id uuid,
content text,
metadata jsonb,
similarity float
)
LANGUAGE plpgsql AS $$
BEGIN
RETURN QUERY
SELECT
dc.id,
dc.content,
dc.metadata,
1 - (dc.embedding <=> query_embedding) AS similarity
FROM document_chunks dc
WHERE 1 - (dc.embedding <=> query_embedding) > match_threshold
ORDER BY dc.embedding <=> query_embedding
LIMIT match_count;
END;
$$;
System Prompt Sandboxing
Always wrap retrieved context fragments inside system boundaries, instructing the model to decline answering if the data is not present in the injected tags. This eliminates typical hallucination routes.
Running deep learning models locally on edge microcontrollers (like wake-word detection or gesture classification) requires converting 32-bit floats into 8-bit integers using Post-Training Quantization (PTQ).
Unquantized Models on ESP32
A common mistake is attempting to compile standard Keras or PyTorch models directly to microcontrollers. High parameter sizes will trigger immediate memory allocation failures (out of memory) during load. Quantization is mandatory to squeeze weights into the ESP32-S3's 512KB SRAM.
tf.lite.TFLiteConverter to scale parameters..tflite model into a C++ hex byte array to compile directly with the firmware source.import tensorflow as tf
# Load baseline float model
converter = tf.lite.TFLiteConverter.from_saved_model('model_dir')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
# Define representative dataset for integer range mapping
def representative_dataset():
for data in sample_data_generator():
yield [data.astype(np.float32)]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
# Convert and save
quantized_model = converter.convert()
with open('wake_word_model.tflite', 'wb') as f:
f.write(quantized_model)
SmartPot Edge Classifier
During the SmartPot companion development, the unquantized wake-word model was 1.2MB, failing to fit inside flash. After applying 8-bit integer quantization, the model footprint dropped to 142KB, running inference in 42ms on the ESP32-S3 core while retaining 96.4% classification accuracy.
To translate webcam frame inputs into robotic movements (like my Smart Robotic Arm control loop), joint offsets are processed locally on the client host using OpenCV and Google's MediaPipe:
Webcam Frame (30FPS)
|
v
MediaPipe Landmarker (Joints extraction)
|
v
Vector Mapping (Mouth/Hand angles)
|
v
Jitter Smoothing (Moving average)
|
v
WebSocket Client (Send servo angles)
|
v
PCA9685 PWM Driver (Move joints)
By smoothing coordinates on the host side using a moving average filter, we prevent servo jitters and ensure fluid robotic joints rotation with sub-50ms latency.
Summary: I develop AI features through iterative evaluation, ensuring model accuracy is verified against real edge queries.
Phase 1 of 9
We outline what the AI system must do. I analyze if the requirements can be met with structured prompting, a RAG vector backend, or if it requires a specialized local vision or voice model.
Dynamic projects fetched from the portfolio database demonstrating execution.

Custom knowledge base assistant that qualifies visitors, recommends projects, and captures leads in real-time.
AI & Full-Stack Engineer
View Details →
Voice-controlled ESP32 plant assistant with edge AI and sensor intelligence.
Firmware & AI Systems Engineer
View Details →
Gesture-controlled multi-axis arm with computer vision.
Firmware & System Engineer
View Details →Summary: Agencies often drag out projects by attempting custom model training. I integrate proven foundation APIs with lightweight local edge ML, cutting delivery times in half.
| Integration Factor | My AI Engineering | Typical Agency / Wrapper Boilerplate |
|---|---|---|
| Hallucination Protection | ✓ Vector similarity checks and strict context boundaries prevent models from lying. | ❌ Simple chat templates that easily drift and hallucinate off-topic. |
| Edge AI / Local Offline Inference | ✓ Quantized model integration directly on physical hardware (ESP32-S3). | ❌ Lacks hardware understanding; relies entirely on cloud API calls. |
| Security & Abuse Safeguards | ✓ Zod schema checks, rate limiters, and system instruction guardrails. | ⚠️ Vulnerable to prompt injection; easily abused to rack up bills. |
| Interface Interactivity | ✓ Context-aware inputs and interactive elements (like sliding forms). | ❌ Basic chat windows that only display static Markdown outputs. |
Summary: In Guwahati, I provide local AI diagnostics, on-site vector search integration, real-time computer vision testing, and face-to-face workflow consulting, ensuring low-latency results.
For clients across other major Indian tech hubs (Bengaluru, Hyderabad, Pune, Chennai, Mumbai, Delhi NCR) and global locations (US, Canada, UK, Australia, Germany, Singapore), I provide remote collaboration via GitHub, staging API deployments, and sandbox vector database integrations.
Summary: Read my engineering notes on prompt boundaries and edge ML.
How to quantize post-training weights to 8-bit integers and optimize sensor polling tasks in FreeRTOS.
A deep dive into setting cosine similarity boundaries in PostgreSQL to enforce strict context scoping.
Code walkthrough showing OpenCV landmark capture, WebSocket pipelines, and PCA9685 jitter reduction.
Structured query answers targeting specific informational searches.
Ready to integrate custom document search, RAG agents, or real-time computer vision into your app? Let's scope the architecture.