Vishva Ram | Generative AI Engineer

LLM Optimization * Hybrid RAG * Agentic Systems

Professional Profile

I am a Generative AI Engineer specializing in the intersection of High-Performance Inference and Complex RAG Orchestration. With 2 years of experience, I focus on transitioning research-grade models into production-ready systems using quantization (AutoRound), vLLM optimization, and stateful agentic workflows.

Current Focus: Investigating compatibility between AutoRound W4A16 quantization and vLLM for Vision-Language Models.
Core Philosophy: "Optimization is not just about speed; it's about making advanced AI architecturally sustainable."

Core Expertise

LLM Optimization & Inference

Quantization: Advanced W4A16/INT4 quantization using AutoRound.
High-Throughput Serving: Production deployment of LLMs/VLMs via vLLM.
GPU Efficiency: Memory-efficient serving and compute optimization for NVIDIA RTX 3090/4070/5090 and AWS G5 instances.

RAG & Search Engineering

Hybrid Retrieval: Orchestrating BM25 (Elasticsearch) and Dense Vector Search.
Advanced RAG: Implementation of LightRAG and GraphRAG architectures for complex document intelligence.
Infrastructure: Building scalable backends with PostgreSQL, Neo4j, and FastAPI.

Agentic Workflows

Stateful Agents: Designing multi-step reasoning pipelines using LangGraph.
Tool Integration: Autonomous agent systems with persistent memory and complex tool-calling capabilities.

Contributions

LightRAG Contributor

Active contributor to the HKUDS/LightRAG ecosystem, focusing on enterprise-grade integrations:

Gemini Integration: Implemented the Google Gemini demo for the core framework.
Storage Backends: Developed the PostgreSQL-backed LightRAG implementation.
Enterprise Features: Added workspace isolation demos for multi-tenant knowledge management.
PRs: #2538 | #2556 | #2615

AutoRound + vLLM Compatibility

Technical investigation into the export pipelines for quantized Vision-Language Models:

Analyzing AutoRound W4A16 behavior with Qwen3-VL-8B.
Debugging AWQ export compatibility for seamless vLLM integration.
Issue: intel/auto-round #1377

Technical Stack

Category	Tools & Technologies
AI Frameworks	LangChain, LangGraph, LightRAG, PyTorch
Inference/Quant	vLLM, AutoRound
Data/Search	Elasticsearch, PostgreSQL, Neo4j, Redis
Infrastructure	Docker, AWS (ECS, ECR, G5), RunPod
Languages	Python (Advanced), SQL

GitHub Performance

Connect

LinkedIn: vishva-r
Instagram: @justt_vishva
Portfolio: GitHub Repositories

Building the infrastructure that makes AI smarter, faster, and more accessible.

Pinned Loading

Structured-Output-Examples-for-LLMs Public

This repository demonstrates structured data extraction using various language models and frameworks. It includes examples of generating JSON outputs for name and age extraction from text prompts. ...

Python 8

Data-Prep-for-LLM-fine-tuning Public

This repository helps prepare datasets for fine-tuning Large Language Models (LLMs). It includes tools for cleaning, formatting, and augmenting data to improve model performance. Designed for resea...

Jupyter Notebook 1

Blog-Writing-Agentic-RAG-CrewAI Public

An automated blog writing system that leverages CrewAI to create high-quality, well-researched blog posts. The project implements a multi-agent workflow for researching topics, generating content, ...

Python

Unsloth-FineTuning Public

Fine-tuning Qwen 2.5 3B on Reserve Bank of India (RBI) regulations using Unsloth for efficient training. Achieved 57.6% accuracy (8.2x improvement over base model).

Jupyter Notebook

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vishva R vishvaRam

Achievements