Muhtasim-Zawad/Project_CiteKit

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
backend		backend
frontend		frontend
README.md		README.md

Repository files navigation

CiteKit - AI-Powered Research Paper Discovery & Citation Management

An intelligent research assistant that automatically discovers, ranks, and manages scientific papers through advanced AI agents and citation mapping.

Overview
Key Features
System Architecture
Agent Pipeline
Citation Mapping
Project Structure
Getting Started
Configuration
API Documentation
Troubleshooting

Overview

CiteKit is an intelligent research assistant built with AI agents and LangGraph that helps researchers discover relevant scientific papers, manage citations, and maintain research context across multiple queries.

What Makes CiteKit Different?

AI-Powered Query Expansion: Automatically expands queries to discover more relevant papers
Multi-Agent Pipeline: Coordinated agents (Librarian, Critic, Cross-Referencer) work together for intelligent paper discovery
Persistent Memory: Thread-based conversation management maintains research context
Citation Network Mapping: Automatically maps paper citations and references
Metrics Integration: Includes citation metrics from Dimensions, OpenAlex, and Semantic Scholar
Modular Architecture: Easy to extend with new agents and features

Key Features

Feature	Description
AI Agents	LLM-powered agents for intelligent paper discovery
Smart Search	Multi-query expansion with context awareness
Citation Metrics	Real-time citation data and impact metrics
Research Memory	Thread-based conversation history with auto-summaries
Citation Mapping	Graph-based citation network visualization
Secure Auth	JWT-based authentication with Supabase
Real-time Updates	Live paper streaming and instant results

System Architecture

Agent Pipeline

The Agent Pipeline is the intelligent core of CiteKit. It orchestrates multiple specialized AI agents working together to discover and rank relevant research papers.

1. Query Expansion Agent

Purpose: Expand user queries to discover more relevant papers while maintaining research context.

How It Works:

Takes the user's natural language query
Considers previous research context (thread summary)
Generates 4-6 semantically related alternative queries
Each query expands on the original topic with methods, datasets, and domain-specific terms

Key Features:

Context-Aware: Uses previous queries to refine search strategy
LLM-Powered: Uses Groq Llama-3.3-70b for intelligent expansion
Deduplication: Removes duplicate queries automatically
Graceful Fallback: Defaults to original query if expansion fails

Example:

Input Query: "machine learning for drug discovery" Previous Context: "I've been researching AI applications in healthcare" Expanded Queries: 1. Machine learning applications in computational drug discovery workflows 2. Deep learning models for molecular property prediction and screening 3. AI-driven approaches to therapeutic target identification 4. Neural networks for ADMET prediction in drug development 5. Ensemble learning methods in virtual drug screening

2. Librarian Agent (Search & Rank)

Purpose: Discover and retrieve relevant papers from academic databases, then rank them by quality.

How It Works:

Phase 1: Search (Per Expanded Query)

Extracts key academic search terms from each query using LLM
Searches OpenAlex with extracted terms (~7 papers per query)
Collects results from all queries

Phase 2: Deduplicate & Rank

Normalizes DOI identifiers across results
Counts DOI frequency across all queries
- Papers appearing in multiple queries rank higher (stronger relevance signal)
- Frequency counter provides evidence of topical relevance
Selects top 25 candidates for enrichment

Phase 3: Enrich (Parallel Processing)

Fetches full metadata from CrossRef and Semantic Scholar
Retrieves author metrics (h-index, publication count)
Gets citation metrics from Dimensions API
Downloads full text URLs from CORE API
Uses local cache for previously fetched papers

Key Features:

Multi-Source Search: Queries all expanded variations for comprehensive coverage
Intelligent Deduplication: Frequency-based ranking prioritizes multi-query results
Parallel Enrichment: Concurrently fetches metadata from 5+ APIs
Smart Caching: Stores papers in Supabase to avoid redundant API calls

Output: 25 ranked and enriched papers ready for critique

3. Critic Agent (Evaluation & Scoring)

Purpose: Evaluate papers for relevance to the original query and assign quality scores.

How It Works:

Receives 25 candidate papers with full metadata
Evaluates each paper's relevance to the user's original query
Assigns a relevance score (0-100) with reasoning
Combines scores with appearance frequency for final ranking

Scoring Formula:

Final Score = (Critic Score x 0.7) + (Normalized Frequency x 0.3)

Critic Score (70%): LLM-based relevance evaluation
Frequency Score (30%): How many expanded queries matched this paper

Key Features:

LLM Evaluation: Uses Groq for nuanced relevance assessment
Explainability: Provides reasoning for each score
Hybrid Ranking: Combines relevance with discovery frequency
Top-N Filtering: Returns top 5 papers to frontend

Output: 5 highly relevant, scored papers with reasoning

4. Cross-Reference Agent (Metadata Enrichment)

Purpose: Retrieve detailed metadata and full-text URLs for discovered papers.

How It Works:

CORE API (Full Text & Download URLs)

Searches CORE's open access repository
Returns full text and download URLs when available
Enables direct paper access

OpenAlex API (Author Metrics)

Fetches author information and publication metrics
Retrieves h-index and i10-index for authors
Builds author reputation profile

Dimensions API (Citation Metrics)

Total citation count
Recent citations (last 2 years)
Relative Citation Ratio (RCR) - field-adjusted impact
Field Citation Ratio (FCR) - comparison to similar-age papers

Semantic Scholar (Paper Details)

Alternative paper metadata
Citation network data
Reference information

Key Features:

Parallel API Calls: Concurrent requests to multiple sources
Error Resilience: Continues if one API fails
Smart Caching: Uses Supabase to avoid redundant API calls
Comprehensive Metrics: Collects data from 4+ sources

Output: Fully enriched papers with metadata, metrics, and download URLs

Citation Mapping

CiteKit automatically maps and visualizes paper citation networks to help you understand research relationships.

How Citation Mapping Works

1. Citation Discovery

When a paper is selected, CiteKit fetches its citation network
Retrieves References (papers cited BY the selected paper)
Retrieves Citations (papers that cite the selected paper)
Builds a bidirectional citation graph

2. Data Sources

Source	Data Type	Coverage
Semantic Scholar	Citation relationships, citation count	High coverage of CS papers
CrossRef	Reference metadata, DOI links	Comprehensive for published papers
Dimensions	Citation metrics, impact data	Full interdisciplinary coverage
OpenAlex	Author metrics, institution data	Open access to all research

3. Graph Structure

Selected Paper | +- References (Cited BY) | +- [Paper A] - cited by selected paper | +- [Paper B] - foundational work | +- [Paper C] - recent related work | +- Citations (Cited BY Others) +- [Paper X] - cites selected paper +- [Paper Y] - builds on selected paper +- [Paper Z] - refutes/reviews selected paper

4. Metrics Displayed

Metric	Meaning	Source
Citation Count	Total times paper cited	Dimensions
Recent Citations	Citations in last 2 years	Dimensions
RCR (Relative Citation Ratio)	Field-adjusted impact (1.0 = field average)	Dimensions
FCR (Field Citation Ratio)	Impact vs. similar-age papers	Dimensions
H-Index	Author's productivity & impact metric	OpenAlex

5. Interactive Visualization

Frontend graph visualization includes:

Node Colors: Paper relevance or impact level
Edge Thickness: Citation strength (count)
Node Size: Citation count
Zoom/Pan: Explore large networks
Hover Info: Paper details on mouse-over
Search: Find papers in graph

6. Storage & Caching

Supabase Database: +-- reference | +-- (DOI, title, authors, abstract) +-- reference_metrics | +-- (citation counts, RCR, FCR) +-- research_network +-- (citation edges between papers)

Project Structure

Getting Started

Prerequisites

Python 3.10+ (Backend)
Node.js 18+ (Frontend)
Supabase Account (Database)
API Keys: Groq, OpenAlex, Semantic Scholar, Dimensions, CORE

Quick Start (5 minutes)

1. Backend Setup

# Clone repository git clone https://github.com/Muhtasim-Zawad/Project_CiteKit.git cd Project_CiteKit # Backend setup cd backend python -m venv .venv # Activate virtual environment source .venv/bin/activate # Linux/Mac # or .venv\Scripts\activate # Windows # Install dependencies pip install -r requirements.txt

2. Configure Backend Environment

Create backend/.env:

# Supabase Configuration SUPABASE_URL=https://your-project-id.supabase.co SUPABASE_KEY=your_supabase_anon_key SUPABASE_JWT_SECRET=your_supabase_jwt_secret # AI & LLM Configuration GROQ_API_KEY=your_groq_api_key # API Keys for Data Sources CORE_API_KEY=your_core_api_key # Optional OPENALES_API_KEY=your_openales_key # Optional # Server Configuration APP_URL=http://localhost:8000 FRONTEND_URL=http://localhost:5173

3. Start Backend Server

uvicorn app.main:app --reload --port 8000

Backend running: http://localhost:8000 API Docs: http://localhost:8000/docs

4. Frontend Setup

cd ../frontend npm install

5. Configure Frontend Environment

Create frontend/.env.local:

VITE_API_URL=http://localhost:8000

6. Start Frontend Server

npm run dev

Frontend running: http://localhost:5173

Configuration

Environment Variables Reference

Backend Variables

Variable	Description	Example
`SUPABASE_URL`	Supabase project URL	`https://xxx.supabase.co`
`SUPABASE_KEY`	Supabase anonymous key	`eyJhbGc...`
`SUPABASE_JWT_SECRET`	JWT secret for token verification	`super-secret-key`
`GROQ_API_KEY`	Groq LLM API key	`gsk_...`
`CORE_API_KEY`	CORE full-text search API	`https://api.core.ac.uk`
`APP_URL`	Backend base URL	`http://localhost:8000`
`FRONTEND_URL`	Frontend URL for CORS	`http://localhost:5173`

Frontend Variables

Variable	Required	Description	Example
`VITE_API_URL`		Backend API endpoint	`http://localhost:8000`

Getting API Keys

Supabase:

Go to supabase.com
Create/select project
Settings - API - Copy credentials

Groq:

Visit console.groq.com
Sign up/login
Create API key in account settings

Semantic Scholar:

Visit semanticscholar.org/api
No key needed (free API)

Dimensions:

Visit dimensions.ai/metricssignup
Register for free non-commercial access
No key needed

CORE (Optional):

Visit core.ac.uk/discover
Request API access
Get API key from dashboard

API Documentation

Chat Endpoints

Create Chat (Trigger Agent Pipeline)

POST /chat/

Request:

{ "thread_id": "uuid-string", "query": "machine learning for drug discovery" }

Response:

{ "id": 123, "thread_id": "uuid-string", "query": "machine learning for drug discovery", "search_terms": "machine learning | deep learning | drug discovery", "created_at": "2026-03-11T12:00:00Z", "results": [ { "doi": "10.1234/example", "title": "Paper Title", "author": "Author Name", "abstract": "...", "year": 2023, "score": 92, "critic_reasoning": "Highly relevant to ML applications in drug discovery", "dimensions_metrics": { "times_cited": 45, "recent_citations": 12, "relative_citation_ratio": 1.8 } } ] }

Get Thread Chats

GET /chat/{thread_id}

Returns full chat history with all results for a thread.

Thread Endpoints

Create Thread

POST /threads/

Request:

{ "project_id": "uuid-string", "title": "Drug Discovery Research" }

List Project Threads

GET /threads/project/{project_id}?page=1

Returns paginated threads (3 per page).

Get Thread Details

GET /threads/{thread_id}

Testing the System

1. Test Chat & Agent Pipeline

# Terminal 1: Start backend cd backend uvicorn app.main:app --reload # Terminal 2: Start frontend cd frontend npm run dev # Terminal 3: Test via API curl -X POST http://localhost:8000/chat/ \ -H "Authorization: Bearer YOUR_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "thread_id": "your-thread-uuid", "query": "transformer models in NLP" }'

2. View Real-time Agent Output

Open http://localhost:8000/docs (Swagger UI) to see:

Agent pipeline execution
Search term extraction
Paper ranking scores
Error handling

3. Test Citation Mapping

Open frontend at http://localhost:5173
Create a project
Search for a paper
Click on a paper to view citation network
Explore references and citations

Troubleshooting

Backend Issues

Issue	Solution
"Invalid JWT" error	Verify `SUPABASE_JWT_SECRET` matches Supabase settings
"Groq API error"	Check `GROQ_API_KEY` is valid and has quota
"Supabase connection failed"	Verify `SUPABASE_URL` and `SUPABASE_KEY` are correct
"CORS error from frontend"	Ensure `FRONTEND_URL` matches frontend origin in env
Thread not found	Verify `project_id` exists and user owns it

Frontend Issues

Issue	Solution
"Cannot GET /api"	Check `VITE_API_URL` points to correct backend
Blank chat history	Verify thread exists and user has permission
Papers not loading	Check network tab for API errors, verify backend health

Agent Pipeline Issues

Issue	Solution
No results returned	Query may be too specific; try more general terms
Same papers repeated	Normal behavior; indicates strong topical relevance
Slow response (>30s)	APIs timing out; check internet connection
"Agent pipeline failed"	Check backend logs for specific error

Contributing

We welcome contributions! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open a Pull Request

Areas for Contribution

New AI agents (Secretary, Summarizer, etc.)
Additional data sources (arXiv, IEEE Xplore)
UI/UX improvements
Documentation enhancements
Unit & integration tests

Roadmap

arXiv API integration
IEEE Xplore integration
Paper full-text search with chunking
Collaborative research projects
Paper annotation & highlighting
Export to BibTeX/Zotero
Research paper PDF mining
Multi-language support
Mobile app (React Native)

License

This project is licensed under the MIT License - see LICENSE file for details.

Team & Contributors

Role	Contributors
Frontend Lead	Muhtasim-Zawad
AI/Agent Lead	ironicalcrow
Backend Lead	MahadiAhmed0

Support & Contact

Email: [project-email]
Discord: [discord-link]
Issues: GitHub Issues
Docs: GitHub Wiki

Citation

If you use CiteKit in your research, please cite:

@software{citekit2026, title={CiteKit: AI-Powered Research Paper Discovery and Citation Management}, author={Zawad, Muhtasim and Khan, Arham Ibrahim and Ahmed, Mahadi}, year={2026}, url={https://github.com/Muhtasim-Zawad/Project_CiteKit} }

Acknowledgments

LangGraph - Agent orchestration
FastAPI - Backend framework
Supabase - Database & auth
Groq - LLM inference
OpenAlex - Academic metadata
Semantic Scholar - Paper discovery

Made with by CiteKit Team

About

No description, website, or topics provided.

Resources

Readme

Activity

Folders and files

Latest commit

History

Repository files navigation

CiteKit - AI-Powered Research Paper Discovery & Citation Management

Table of Contents

Overview

What Makes CiteKit Different?

Key Features

System Architecture

Agent Pipeline

1. Query Expansion Agent

2. Librarian Agent (Search & Rank)

3. Critic Agent (Evaluation & Scoring)

4. Cross-Reference Agent (Metadata Enrichment)

Citation Mapping

How Citation Mapping Works

1. Citation Discovery

2. Data Sources

3. Graph Structure

4. Metrics Displayed

5. Interactive Visualization

6. Storage & Caching

Project Structure

Getting Started

Prerequisites

Quick Start (5 minutes)

1. Backend Setup

2. Configure Backend Environment

3. Start Backend Server

4. Frontend Setup

5. Configure Frontend Environment

6. Start Frontend Server

Configuration

Environment Variables Reference

Backend Variables

Frontend Variables

Getting API Keys

API Documentation

Chat Endpoints

Create Chat (Trigger Agent Pipeline)

Get Thread Chats

Thread Endpoints

Create Thread

List Project Threads

Get Thread Details

Testing the System

1. Test Chat & Agent Pipeline

2. View Real-time Agent Output

3. Test Citation Mapping

Troubleshooting

Backend Issues

Frontend Issues

Agent Pipeline Issues

Contributing

Areas for Contribution

Roadmap

License

Team & Contributors

Support & Contact

Citation

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages