Dark Mode

Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
vishvaRam
Follow

Vishva R vishvaRam

Passionate and driven AI Engineer with 1.8 year of hands-on experience in building and deploying cutting-edge Generative AI solutions.

Block or report vishvaRam

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user's behavior. Learn more about reporting abuse.

Report abuse
vishvaRam/README.md

Vishva Ram | Generative AI Engineer

LLM Optimization * Hybrid RAG * Agentic Systems


Professional Profile

I am a Generative AI Engineer specializing in the intersection of High-Performance Inference and Complex RAG Orchestration. With 2 years of experience, I focus on transitioning research-grade models into production-ready systems using quantization (AutoRound), vLLM optimization, and stateful agentic workflows.

  • Current Focus: Investigating compatibility between AutoRound W4A16 quantization and vLLM for Vision-Language Models.
  • Core Philosophy: "Optimization is not just about speed; it's about making advanced AI architecturally sustainable."

Core Expertise

LLM Optimization & Inference

  • Quantization: Advanced W4A16/INT4 quantization using AutoRound.
  • High-Throughput Serving: Production deployment of LLMs/VLMs via vLLM.
  • GPU Efficiency: Memory-efficient serving and compute optimization for NVIDIA RTX 3090/4070/5090 and AWS G5 instances.

RAG & Search Engineering

  • Hybrid Retrieval: Orchestrating BM25 (Elasticsearch) and Dense Vector Search.
  • Advanced RAG: Implementation of LightRAG and GraphRAG architectures for complex document intelligence.
  • Infrastructure: Building scalable backends with PostgreSQL, Neo4j, and FastAPI.

Agentic Workflows

  • Stateful Agents: Designing multi-step reasoning pipelines using LangGraph.
  • Tool Integration: Autonomous agent systems with persistent memory and complex tool-calling capabilities.

Contributions

LightRAG Contributor

Active contributor to the HKUDS/LightRAG ecosystem, focusing on enterprise-grade integrations:

  • Gemini Integration: Implemented the Google Gemini demo for the core framework.
  • Storage Backends: Developed the PostgreSQL-backed LightRAG implementation.
  • Enterprise Features: Added workspace isolation demos for multi-tenant knowledge management.
  • PRs: #2538 | #2556 | #2615

AutoRound + vLLM Compatibility

Technical investigation into the export pipelines for quantized Vision-Language Models:

  • Analyzing AutoRound W4A16 behavior with Qwen3-VL-8B.
  • Debugging AWQ export compatibility for seamless vLLM integration.
  • Issue: intel/auto-round #1377

Technical Stack

Category Tools & Technologies
AI Frameworks LangChain, LangGraph, LightRAG, PyTorch
Inference/Quant vLLM, AutoRound
Data/Search Elasticsearch, PostgreSQL, Neo4j, Redis
Infrastructure Docker, AWS (ECS, ECR, G5), RunPod
Languages Python (Advanced), SQL

GitHub Performance




Connect


Building the infrastructure that makes AI smarter, faster, and more accessible.

Pinned Loading

  1. Structured-Output-Examples-for-LLMs Structured-Output-Examples-for-LLMs Public

    This repository demonstrates structured data extraction using various language models and frameworks. It includes examples of generating JSON outputs for name and age extraction from text prompts. ...

    Python 8

  2. Data-Prep-for-LLM-fine-tuning Data-Prep-for-LLM-fine-tuning Public

    This repository helps prepare datasets for fine-tuning Large Language Models (LLMs). It includes tools for cleaning, formatting, and augmenting data to improve model performance. Designed for resea...

    Jupyter Notebook 1

  3. Blog-Writing-Agentic-RAG-CrewAI Blog-Writing-Agentic-RAG-CrewAI Public

    An automated blog writing system that leverages CrewAI to create high-quality, well-researched blog posts. The project implements a multi-agent workflow for researching topics, generating content, ...

    Python

  4. Unsloth-FineTuning Unsloth-FineTuning Public

    Fine-tuning Qwen 2.5 3B on Reserve Bank of India (RBI) regulations using Unsloth for efficient training. Achieved 57.6% accuracy (8.2x improvement over base model).

    Jupyter Notebook