Available for Projects

Hit Kalariya

Voice AI Architects

Voice AI Platforms · Autonomous Agents · Enterprise RAG · Distributed Systems

100% Job Success
8 Projects
$1000+ Earned
8 AI Systems Built
ATTENTION_MAP // NODE_V1
LIVE

Not just a single
AI engineer.

Most agencies or solo developers build basic API wrappers. We are a specialized team of highly experienced AI engineers, full-stack engineers, and distributed systems architects who build, optimize, and deploy high-performance, production-grade AI systems.

Led by Hit Kalariya, our team has hands-on experience delivering dedicated solutions for every dimension of modern artificial intelligence. We build custom multi-agent environments, design enterprise-grade search infrastructures, compile optimized edge vision pipelines, and deploy serverless model hosting at scale.

🎙️
Voice AI
Custom low-latency voice pipelines with proper barge-in detection and multilingual switching.
🤖
AI Agents
Autonomous multi-agent systems, complex tool-use execution, and reasoning loops.
🔍
Enterprise RAG
Schema-aware hybrid retrieval, graph-based navigation, and high-precision search.
👁️
Computer Vision
Real-time object detection, segmentation, and perimeter tracking on edge/Jetson hardware.
Hosting & Inference
High-throughput serving and low-latency deployments using vLLM, SGLang, and TensorRT.

What We Build for Clients

01

Production RAG Systems

Hybrid, recursive, graph, and agentic retrieval pipelines with HyDE, cross-encoder reranking, and chain-of-table reasoning.

LlamaIndexPineconepgvectorBM25
02

High-Throughput LLM Inference

Scalable serving using vLLM, SGLang, TensorRT-LLM, and Triton with paged attention, FlashAttention, and speculative decoding.

vLLMSGLangTensorRT-LLMTriton
03

Edge AI Deployment

ONNX, TensorRT, OpenVINO, CoreML, and TFLite with INT8/FP16 quantization optimized for Jetson, RPi, Android, and iOS.

ONNX RuntimeTensorRTOpenVINOCore ML
04

AI Agents & Multi-Agent Systems

LangGraph, LangChain, LlamaIndex, CrewAI, AutoGen, and MCP architectures. ReAct agents, tool-use, memory orchestration.

LangGraphCrewAIAutoGenMCP
05

Computer Vision Pipelines

YOLOv8/11, SAM2, GroundingDINO, DETR, Mask R-CNN pipelines with real-time GPU inference and DeepStream multi-stream processing.

YOLOv8SAM2Mask R-CNNDeepStream
06

Voice AI Calling Platforms

Custom media pipelines with barge-in/interruption detection, dynamic language switching, self-hosted LLMs, and low latency measured in milliseconds.

SIP/TrunkingCustom VADSelf-Hosted LLMsIndic Switching
07

Text-to-SQL & Multilingual NLP

Complex enterprise schemas, multilingual querying in Hindi, Tamil, Gujarati, and Telugu with IndicBERT and auto-correction loops.

GPT-4oIndicBERTRerankingHyDE
08

AI Infrastructure & MLOps

Docker, Kubernetes, Ray, Celery, Redis, Kafka. Observability with LangSmith, MLflow, W&B. CI/CD and autoscaling deployments.

KubernetesKafkaMLflowW&B

8 Systems. Real Complexity.

01
Generative AI EdTech
MEMORY: 4-TIER_RAG

Teaching Assistant

Multi-Agent Autonomous Tutoring Platform

"Two AI models. One visible. One invisible, silently orchestrating everything. Adaptive student guidance in real time."

02
Voice AI Production
LATENCY: < 150ms // CONCURRENT: 1000+

Voice AI Calling Agent

Enterprise voice agent platform with custom pipeline

"Custom low-latency pipeline with proper interruption/barge-in, language switching, self-hosted LLMs, and scale telephony."

03
Computer Vision Safety
INFERENCE: 1.5s // JETSON

Forest Surveillance

Wildlife & Perimeter Defense with Computer Vision

"Multi-model computer vision ensemble processing live drone streams, detecting and geolocating perimeter threats."

04
Enterprise AI RAG
WAREHOUSES: 150+ // 6_LANG

Enterprise RAG

Natural Language Data Access Across 150+ Warehouses

"A query in Tamil answers from 3,000+ tables across 150 warehouses in under three seconds. Schema-aware RAG."

05
Edge AI MLOps
SIZE: -65% // ONNX_INT8

Edge PaliGemma

Real-Time Vision-Language Inference on Edge Hardware

"Google's PaliGemma VLM made production-ready on Jetson Orin Nano, Raspberry Pi, and mobile devices."

06
Generative AI Speech
ASR_LATENCY: < 200ms

Video Translation

Invisible Dubbing with Lip Sync AI

"Translated speech synchronizing lip movements in real time across live video streams, preserving speaker timbre."

07
Artificial Intelligence SaaS
SRS_LATENCY: < 60s

TaskPilot Labs

AI-Powered Project Management Platform

"A two-sentence brief. A fully structured SRS, feature breakdown, and Kanban board — in under 60 seconds."

08
FinTech Distributed Systems
CAPACITY: 10B+ TX/mo

SAMPARK

Digital Payment Infrastructure Simulation

"How does UPI process 10 billion transactions without failing? SAMPARK makes that UPI-class architecture visible."

01 / 08

Two self-hosted LLMs running simultaneously. The visible speech-to-speech tutor interacts live with students. The hidden text-to-text orchestrator silently injects context, memory, difficulty signals, and guidance into the tutor's prompts in real time.

Four-Tier Long-Term Memory — Academic, Contextual, Personal, and Preference memory layers. Every session opens with a personalized recap and dynamic study plan.
Dynamic Knowledge Graph — Per-student graph that continuously updates. Auto-advances on mastery, silently reroutes to prerequisites on struggle.
Behavioral Monitoring — Screen attention analysis, camera signals, focus redirection for sustained engagement.
AI ModelsDual self-hosted LLMs (Speech-to-Speech + Text-to-Text)
AgentsLangGraph · ReAct · Real-time prompt injection pipeline
MemoryMongoDB · Four-tier RAG · Semantic chunking
VoiceWhisper ASR · Neural TTS · WebRTC real-time audio
Searchpgvector / Qdrant · Cosine similarity memory recall

A production-grade, highly optimized Voice AI calling agent platform. Rather than using default LiveKit/PipeCat wrappers, we engineered a custom end-to-end media pipeline delivering audio-in to audio-out latency within milliseconds. Handles proper barge-in detection, real-time dynamic language switching, and self-hosted LLM inference for 1000s of concurrent calls.

Custom Media Pipeline — Engineered audio processing nodes to bypass standard heavy wrappers, achieving sub-150ms round-trip latency.
Intelligent Barge-in & Interruption — High-fidelity voice activity VAD coupled with prompt-cancellation mechanics for immediate interruption.
Dynamic Multilingual Routing — Built custom classification loops that switch languages on-the-fly depending on user speech, without restarting the session.
AI ModelsSelf-Hosted LLMs (Llama-3-Instruct, Whisper)
PipelineCustom Audio IO · Python Media Nodes · Telephony (SIP/Trunking)
Scale1000+ Concurrent Calls · High-throughput Redis queueing
LatencyAudio-in to audio-out within milliseconds
DeploymentDockerized production-grade Kubernetes clusters

Multi-model computer vision ensemble processing satellite imagery and live drone footage, detecting and geolocating threats before encounters occur — specifically designed for Kerala's forest perimeters.

Multi-Model Ensemble — SAM for scene segmentation, YOLOv8 for real-time detection, Mask R-CNN for pixel-level instance separation. NVIDIA DeepStream pipeline.
Geospatial Intelligence — Live GPS + haversine distance estimation. 500m geofenced safety zones trigger simultaneous voice alerts and Firebase push notifications.
Inference Latency: 1.5 seconds on high-resolution aerial imagery after TensorRT FP16 optimization.
VisionSAM · YOLOv8 · Mask R-CNN · Ensemble inference
VideoNVIDIA DeepStream SDK · Multi-stream GPU
GeoGPS · Haversine · Polygon geofencing · GeoJSON
SpeedTensorRT FP16 · CUDA · 1.5s inference latency
AlertsFirebase Cloud Messaging · Voice alerts · WebSocket

Multi-stage advanced RAG pipeline with HyDE, recursive retrieval, schema-aware chunking, graph-based navigation, hybrid search + cross-encoder reranking, and chain-of-table reasoning — deployed across real enterprise supply chain infrastructure.

3,000+ SQL Tables · 150+ Warehouses — Complex joins, nested aggregations, multi-warehouse cross-queries at sub-3-second response time.
6+ Indian Languages — Hindi, Tamil, Gujarati, Telugu, Marathi, Bengali with IndicBERT and language detection pre-processing.
HyDE + Chain-of-Table Reasoning — Synthetic hypothetical answers drive schema retrieval. Multi-table join strategies planned before SQL generation.
RetrievalLlamaIndex · Pinecone · pgvector · BM25 hybrid
RerankingCross-encoder (BGE) · Cohere Rerank API
LLMGPT-4o / Claude 3.5 Sonnet · Structured output
SQLQuery-plan analysis · Dry-run execution · Auto-correction
NLPIndicBERT · Translation pre-processing · Lang detection

A complete, end-to-end four-stage optimization pipeline that takes Google's PaliGemma vision-language model and makes it production-ready on Jetson Orin Nano, Raspberry Pi, Android, and iOS — without meaningful accuracy loss.

Stage 1 — PyTorch → ONNX Operator-level graph optimization, node fusion, MLIR cross-platform operator lowering.
Stage 2 — INT8/FP16 Quantization Calibration datasets validate <2% accuracy ceiling vs. baseline.
Stage 3 — TensorRT Engine Compilation Ampere-architecture kernel fusion, layer optimization, memory layout tuning. OpenVINO for Intel targets.
Results: −65% model size · <2% accuracy loss · Real-time on every target platform.
Size−65% from baseline after quantization + pruning
Accuracy<2% loss — validated on calibration datasets
TargetsJetson Orin Nano · Raspberry Pi · Android · iOS
ToolsTensorRT · ONNX Runtime · MLIR · OpenVINO · Core ML
QuantINT8 (CPU) · FP16 (GPU) · Mixed precision

Fine-tuned LatentSync diffusion-based lip sync model on 4,000 custom samples. Translates, synthesizes, and synchronizes lip movements in real time across live video streams — outperforming Wav2Lip, GAN-Wav2Lip, and Wav2LipHD baselines.

Real-Time ASR — Whisper large-v3 with <200ms latency, speaker diarization across 99 languages.
Neural TTS + Voice Cloning — XTTS-v2 synthesizes translated speech in the original speaker's voice — preserving timbre, prosody, and emotional cadence.
200+ Language Pairs — NLLB-200 and SeamlessM4T with contextual coherence — preserving meaning and register, not just literal words.
Superior to all baselines — Optimized for live meeting latency on a 4,000-sample multilingual meeting-specific domain dataset.
Lip SyncFine-tuned LatentSync · 4,000-sample custom dataset
ASRWhisper large-v3 · Speaker diarization · 99 languages
TranslationNLLB-200 · SeamlessM4T · 200+ language pairs
VoiceXTTS-v2 · Speaker-conditioned TTS · Prosody
CompositingFace detection · Landmark tracking · Real-time blend

Most software teams hemorrhage time before a single line of code is written. TaskPilot Labs eliminates this bottleneck completely — replacing days of planning with seconds of intelligent automation that understands intent, not just instructions.

Intelligent Requirement Analysis — Reasons about intent, identifies ambiguities, surfaces edge cases, converts rough briefs into specs.
Automated SRS Generation — Complete Software Requirement Specifications in 30 seconds. What took a senior engineer a full day.
Smart Kanban & Auto-Assignment — Tasks generated directly from analyzed requirements, pre-loaded with context, priority, and assignees.
AI Research Assistant (RAG-Powered) — Architecture Q&A, technology recommendations grounded in real-time internet knowledge.
FrontendNext.js 14 · React · TypeScript · Tailwind CSS
AI EngineGoogle Gemini 1.5 Pro · LangChain · Prompt Chaining
RAGPinecone Vector DB · Embedding Models · HyDE Retrieval
AgentsLangGraph · ReAct Agents · Tool-Use Orchestration
InfraCloud-native · Microservices · CI/CD · JWT · OAuth 2.0

A full-fidelity simulation platform that replicates the multi-party architecture of modern payment ecosystems — from transaction routing and event sourcing to settlement workflows and real-time monitoring.

Apache Kafka Event Streaming — Guaranteed message ordering, consumer group management, and exactly-once delivery semantics.
CQRS & Event Sourcing — Immutable event log for full audit trails and time-travel debugging under concurrent transaction load.
Full Observability Stack — Prometheus + Grafana: real-time p99 latency, transaction throughput, error rates — NPCI-class monitoring.
Kubernetes-Native — Docker Compose for dev, HPA for production-scale concurrent transaction simulation.
RuntimeTypeScript · Bun (3× faster than Node.js)
StreamingApache Kafka · Kafka Streams · Consumer Groups
PatternCQRS · Event Sourcing · Saga Pattern
DBPostgreSQL · Prisma ORM · Redis Distributed Locks
MonitoringPrometheus · Grafana · Jaeger Distributed Tracing

The Full Arsenal

PythonPyTorchTensorFlow HuggingFaceTransformersLangChain LangGraphLlamaIndexOpenAI AnthropicGeminiFastAPI vLLMSGLangTensorRT-LLM TritonONNX RuntimeTensorRT OpenVINOTFLiteCUDA OpenCVMediaPipeWhisper YOLOSAMGroundingDINO FAISSPineconeQdrant MongoDBPostgreSQLRedis KafkaDockerKubernetes GCPVertex AIAWS AzureWebSocketsCrewAI AutoGenDeepSpeedFlashAttention

What Clients Say

★★★★★
"Hitt is a solid full-stack engineer who's genuinely easy to work with. He understands both the big picture and the small details, which makes collaboration smooth and efficient."
Vandan Chopra
ipop & AI Tutor Projects · Nov–Dec 2025
ReliableCollaborativeClear Communicator
★★★★★
"Hit demonstrated strong expertise in AI/ML, especially in computer vision and agent-based systems. He understood the requirements quickly, delivered innovative solutions, and maintained excellent communication throughout."
Jenish Patel
Computer Vision & Smart AI Agent Solutions · Aug–Nov 2025
Solution OrientedCommitted to QualityReliable
★★★★★
"Hit did a good job building out pipeline and testing our MVP. Will work with again."
Henry Chen
Voice AI Agents — MVP Beta Testing · Jun 2026
Voice AIPipeline BuilderReliable
100% Job Success Score
5.0 ★ Average Rating
+24h Avg Response Time
Verified ID & GitHub Linked

Ready to build something
that actually works?

I'll tell you honestly what's possible, what's overhyped, and exactly how I'd build it.
Send me a message on Upwork — I respond within 24 hours.

📍 Surat, Gujarat, India
🕐 IST (UTC+5:30) · Available for remote globally
💬 English: Fluent