Skip to content

AI & Intelligent Automation Solutions

Developing custom RAG pipelines, fine-tuned LLM agents, and real-time computer vision models. I bridge the gap between AI research and production software.

Custom RAG (Retrieval-Augmented Generation) Architectures
Computer Vision & Gesture Recognition (MediaPipe, OpenCV)
Edge AI & Quantized Model Flashing (TFLite)
Automated Lead Capture & Conversational Agents
Local Speech-to-Text & Wake Word Detection
API Integrations with Gemini, OpenAI, & Claude
Core Track Record

3+ Years

Building systems

Collaboration Mode

Remote + Onsite (Guwahati Area)

Prompt/Model Tuning → Vector Cache Sync → Production API Deploy

Platforms Used
Google Gemini APIOpenCVMediaPipeTensorFlow Lite

Optimization Summary

Summary: Generic AI chatbots hallucinate and suffer from high API costs. I build structured RAG pipelines and local classifiers to keep logic strict.

Common Obstacle

AI chatbot hallucinates details or invents services I don't offer.

Engineering Resolution

I implement strict semantic search routing using vector embeddings (via Supabase Vector). The system forces the LLM to reply only using retrieved context, falling back to a structured brief capture form if the query is out-of-bounds.

Common Obstacle

API token costs are skyrocketing and response times are too slow.

Engineering Resolution

I write lightweight token buckets and rate limiters. For simple patterns, I build local classifiers, compile local models, or optimize prompt payloads to reduce token overhead by up to 40%.

Common Obstacle

Need AI computer vision, but camera streams are high-latency.

Engineering Resolution

I build localized frame processing loops using OpenCV and MediaPipe. Joint positions are processed at 30FPS directly on the host machine, streaming only processed coordinate offsets to connected hardware.

Common Obstacle

Running AI models on device requires too much RAM and power.

Engineering Resolution

I quantize neural networks (using TensorFlow Lite) into 8-bit integers, reducing flash footprints. We run models locally on edge processors (ESP32-S3 or Jetson Nano) with sub-100ms processing cycles.

Is This Service Right For You?

Summary: I build specialized, low-latency AI integrations that map to your private documents or hardware telemetry. I don't build generic wrapper bots.

Startups & SaaS Builders

Companies seeking to embed custom intelligence (like my Fyra lead qualifying assistant) directly into their Next.js apps to increase user retention and capture briefs.

Retailers & Tea Estates

Assam tea gardens or local brands wanting visual automated sortings, or customer journey portals showing immersive cultural histories and recommendations.

Healthcare & Biotech Clinics

Medical teams requiring secure, local document parsing tools that answer patient inquiries using verified journals without leaking records to public LLMs.

Hobbyists & Inventors

Creators seeking custom OpenCV gesture models (like hand tracking loops) mapped directly to mechanical actuators over WebSocket lines.

Typical AI Deliverables

Summary: Fully deployed, validated, and open-source models ready to integrate with your existing codebase.

Database

Vector Search Schemas

Fully configured Supabase Postgres tables with pgvector extensions, semantic similarity functions, and index scaling configurations.

Software

Custom Prompt Templates & Handlers

Clean, parameterized system prompts and prompt engineering scripts structured to prevent prompt injection and keep models aligned.

Machine Learning

Quantized Edge AI Model Files

Quantized .tflite files optimized for microcontrollers, alongside hardware integration scripts (C++).

Software

Next.js Streaming Handlers

API route endpoint codes implementing SSE (Server-Sent Events) or WebSockets to stream conversational responses character-by-character.

Software

Computer Vision Scripts

Python / C++ scripts utilizing OpenCV and MediaPipe pipelines to track hands, detect faces, or classify target products.

Specialized Building Areas

Summary: I bridge raw web components and edge hardware using practical AI modules.

Intelligent Conversational Agents

Bespoke assistant nodes. I design chat interfaces (like Fyra AI) that leverage LLM streaming, identify user intent, extract fields (timeline, budget), and switch viewports automatically.

Local Computer Vision pipelines

Camera-driven logic. I compile lightweight tracking codes processing webcam coordinates at 30FPS to translate joint movements into physical commands with minimal jitter.

On-Chip Audio & Neural Classifiers

Edge AI. I train wake-word models and local audio classifiers using Edge Impulse, exporting lightweight libraries that compile directly inside ESP32-S3 boards.

Technical Stack & Platform Coverage

Summary: I build AI systems using modern vector databases and optimized inference runtimes.

Large Language Models & APIs

Google Gemini API (1.5 Flash/Pro)OpenAI API (GPT-4o)Claude API (Sonnet 3.5)Ollama (Local Llama/Mistral)

Vector Databases & Frameworks

pgvector (Postgres / Supabase)LangChain (Python/TS)Edge Impulse (Edge ML)TensorFlow Lite Converter

Computer Vision & Tracking

OpenCV (C++ / Python)MediaPipe (Hand & Pose Tracking)NumPy / Scikit-LearnMatplotlib

Deployment & Clients

Next.js App Router API RoutesSupabase Edge FunctionsPlatformIO (ESP32 C++)FastAPI (Python)
Related Technologies:Edge AIComputer VisionGemini APISupabaseTypeScriptTensorFlow Lite
Deep Technical Documentation

Engineering Notes & Tradeoffs

Detailed Technical Deep Dive: AI & Intelligent Automation

Summary: This document outlines Jishnu Mahanta's architectural approach to implementing robust Retrieval-Augmented Generation (RAG) vector searches, quantizing model runtimes for microcontrollers, and constructing real-time tracking pipelines.

1. Robust Retrieval-Augmented Generation (RAG) Vector Search

In corporate environments, generic AI chatbots are liability hazards due to hallucinations. Solving this requires scoping queries strictly using a Postgres vector database extension (pgvector) to verify factual boundaries before invoking the Large Language Model.

Cosine Similarity Scoping

When querying vector databases, setting a static similarity threshold (e.g. similarity > 0.7) is often insufficient. Different documents generate varying density ranges. Implementing a dynamic ranking algorithm that combines full-text search (BM25) with vector similarity (HNSW index) provides far better results and filters out irrelevant prompts before they reach the API.

Below is the SQL implementation I execute to build semantic search indexing and custom matching inside Supabase:

-- Create database schema to store technical document embeddings
CREATE EXTENSION IF NOT EXISTS pgvector;

CREATE TABLE IF NOT EXISTS document_chunks (
  id           UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  content      TEXT NOT NULL,
  metadata     JSONB,
  embedding    vector(1536) -- Match standard OpenAI / Gemini embedding dimensions
);

-- Build Hierarchical Navigable Small World (HNSW) index for sub-millisecond retrieval
CREATE INDEX ON document_chunks USING hnsw (embedding vector_cosine_ops);

-- Create database function to perform cosine similarity search
CREATE OR REPLACE FUNCTION match_documents (
  query_embedding vector(1536),
  match_threshold float,
  match_count int
)
RETURNS TABLE (
  id uuid,
  content text,
  metadata jsonb,
  similarity float
)
LANGUAGE plpgsql AS $$
BEGIN
  RETURN QUERY
  SELECT
    dc.id,
    dc.content,
    dc.metadata,
    1 - (dc.embedding <=> query_embedding) AS similarity
  FROM document_chunks dc
  WHERE 1 - (dc.embedding <=> query_embedding) > match_threshold
  ORDER BY dc.embedding <=> query_embedding
  LIMIT match_count;
END;
$$;

System Prompt Sandboxing

Always wrap retrieved context fragments inside system boundaries, instructing the model to decline answering if the data is not present in the injected tags. This eliminates typical hallucination routes.


2. Edge AI Model Quantization for Microcontrollers

Running deep learning models locally on edge microcontrollers (like wake-word detection or gesture classification) requires converting 32-bit floats into 8-bit integers using Post-Training Quantization (PTQ).

Unquantized Models on ESP32

A common mistake is attempting to compile standard Keras or PyTorch models directly to microcontrollers. High parameter sizes will trigger immediate memory allocation failures (out of memory) during load. Quantization is mandatory to squeeze weights into the ESP32-S3's 512KB SRAM.

The Post-Training Quantization Pipeline in Python:

  • Quantization Configuration: Use tf.lite.TFLiteConverter to scale parameters.
  • Representative Dataset: Feed a sample dataset to map float parameters to integer ranges.
  • Flashing Header Export: Convert the resulting .tflite model into a C++ hex byte array to compile directly with the firmware source.
import tensorflow as tf

# Load baseline float model
converter = tf.lite.TFLiteConverter.from_saved_model('model_dir')
converter.optimizations = [tf.lite.Optimize.DEFAULT]

# Define representative dataset for integer range mapping
def representative_dataset():
    for data in sample_data_generator():
        yield [data.astype(np.float32)]

converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8

# Convert and save
quantized_model = converter.convert()
with open('wake_word_model.tflite', 'wb') as f:
    f.write(quantized_model)

SmartPot Edge Classifier

During the SmartPot companion development, the unquantized wake-word model was 1.2MB, failing to fit inside flash. After applying 8-bit integer quantization, the model footprint dropped to 142KB, running inference in 42ms on the ESP32-S3 core while retaining 96.4% classification accuracy.


3. Real-Time Gesture Tracking Pipelines

To translate webcam frame inputs into robotic movements (like my Smart Robotic Arm control loop), joint offsets are processed locally on the client host using OpenCV and Google's MediaPipe:

  Webcam Frame (30FPS)
          |
          v
  MediaPipe Landmarker (Joints extraction)
          |
          v
  Vector Mapping (Mouth/Hand angles)
          |
          v
  Jitter Smoothing (Moving average)
          |
          v
  WebSocket Client (Send servo angles)
          |
          v
  PCA9685 PWM Driver (Move joints)

By smoothing coordinates on the host side using a moving average filter, we prevent servo jitters and ensure fluid robotic joints rotation with sub-50ms latency.

My AI Automation Process

Summary: I develop AI features through iterative evaluation, ensuring model accuracy is verified against real edge queries.

01. Feasibility Study

Defining Accuracy Boundaries

Phase 1 of 9

We outline what the AI system must do. I analyze if the requirements can be met with structured prompting, a RAG vector backend, or if it requires a specialized local vision or voice model.

Featured Project Deliverables

Dynamic projects fetched from the portfolio database demonstrating execution.

Fyra AI Portfolio Assistant Interface - Clean Full Mockup
AI
Custom knowledge basePrompt engineering

Fyra — AI Portfolio Assistant

Custom knowledge base assistant that qualifies visitors, recommends projects, and captures leads in real-time.

Next.jsTypeScriptGemini APISupabase

AI & Full-Stack Engineer

View Details →
SmartPot plant sensor
IoT
Edge AIVoice Control

SmartPot — AI Plant Companion

Voice-controlled ESP32 plant assistant with edge AI and sensor intelligence.

ESP32FreeRTOSTensorFlow LiteMQTT

Firmware & AI Systems Engineer

View Details →
Robotic arm
IoT
RoboticsComputer Vision

Smart Robotic Arm

Gesture-controlled multi-axis arm with computer vision.

ESP32OpenCVMediaPipeWebSockets

Firmware & System Engineer

View Details →

Why Hire a Specialized AI Systems Engineer?

Summary: Agencies often drag out projects by attempting custom model training. I integrate proven foundation APIs with lightweight local edge ML, cutting delivery times in half.

Integration FactorMy AI EngineeringTypical Agency / Wrapper Boilerplate
Hallucination Protection✓ Vector similarity checks and strict context boundaries prevent models from lying.❌ Simple chat templates that easily drift and hallucinate off-topic.
Edge AI / Local Offline Inference✓ Quantized model integration directly on physical hardware (ESP32-S3).❌ Lacks hardware understanding; relies entirely on cloud API calls.
Security & Abuse Safeguards✓ Zod schema checks, rate limiters, and system instruction guardrails.⚠️ Vulnerable to prompt injection; easily abused to rack up bills.
Interface Interactivity✓ Context-aware inputs and interactive elements (like sliding forms).❌ Basic chat windows that only display static Markdown outputs.

Coverage Area & Physical Location

Summary: Based in Guwahati, I deploy custom AI integrations, LLM workflows, and computer vision models statewide across Assam. Virtual consulting and remote server setup are backed by on-site systems integration in Jorhat, Dibrugarh, Silchar, Nagaon, and Tezpur.

On-site Delivery Areas

Guwahati
Jorhat
Dibrugarh
Silchar
Nagaon
Tezpur
Tinsukia
Sivasagar
Golaghat
Barpeta
North Lakhimpur
Bongaigaon
Dhubri
Kokrajhar
Hailakandi
Karimganj

Remote Collaboration

For clients across other major Indian tech hubs (Bengaluru, Hyderabad, Pune, Chennai, Mumbai, Delhi NCR) and global locations (US, Canada, UK, Australia, Germany, Singapore), I provide remote collaboration via GitHub, staging API deployments, and sandbox vector database integrations.

Base: Guwahati, Assam, IndiaGSC Verified

Deep Technical Guides & AI Resources

Summary: Read my engineering notes on prompt boundaries and edge ML.

9 min read

Quantizing TensorFlow Lite Models for ESP32-S3 Deployments

How to quantize post-training weights to 8-bit integers and optimize sensor polling tasks in FreeRTOS.

7 min read

Preventing LLM Hallucinations with pgvector and Semantic Thresholding

A deep dive into setting cosine similarity boundaries in PostgreSQL to enforce strict context scoping.

12 min read

Real-Time Joint Tracking: Interfacing MediaPipe with Servo Drivers

Code walkthrough showing OpenCV landmark capture, WebSocket pipelines, and PCA9685 jitter reduction.

Frequently Asked Questions

Structured query answers targeting specific informational searches.

RAG is a technique where an LLM is paired with a searchable database of embeddings. When a user asks a question, the system searches the database for relevant fragments first, then feeds those fragments to the LLM as context. This guarantees factual answers.
For low latency and cost-effectiveness, Gemini 1.5 Flash is excellent. For complex data structuring, Claude 3.5 Sonnet is outstanding. I recommend the best API based on your budget.
A functional conversational assistant with custom vector search can be assembled and running in a staging sandbox in 2 to 3 weeks.
Yes. By optimizing pipelines, utilizing lightweight frameworks like MediaPipe, and compiling OpenCV with hardware acceleration, we achieve 30FPS tracking on Raspberry Pi 4/5 or Jetson Nano.

Let's Automate Your Systems with AI

Ready to integrate custom document search, RAG agents, or real-time computer vision into your app? Let's scope the architecture.

FyraAsk anything