Dark Mode

Skip to content

Navigation Menu

Sign in
Appearance settings
OpenDCAI

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

OpenDCAI

Define the future of Data-centric AI together

OpenDCAI

Welcome

We are dedicated to advancing research and open-source tools in Data-Centric Artificial Intelligence (DCAI).

Our goal is to develop effective and efficient DCAI systems and algorithms that support and enhance the performance of AI models and applications.

Community

Pinned Loading

  1. DataFlow DataFlow Public

    Easy Data Preparation with latest LLMs-based Operators and Pipelines.

    Python 3k 217

  2. MyScaleDB MyScaleDB Public

    Forked from OriginHubAI/MyScaleDB

    AI Database for unified, scalable SQL + vector data management, search and analytics

    C++ 39 1

  3. DataFlex DataFlex Public

    DataFlex is a data-centric training framework that enhances model performance by either selecting the most influential samples, optimizing their weights, or adjusting their mixing ratios.

    Python 113 10

  4. Paper2Any Paper2Any Public

    Turn paper/text/topic into editable research figures, technical route diagrams, and presentation slides.

    Python 2k 138

  5. AgentFlow AgentFlow Public

    The First Unified Agent Data Synthesis Framework for Custom Agentic Task with all-in-one envrionment

    Python 48 2

Repositories

Loading
Type
Select type
Language
Select language
Sort
Select order
Showing 10 of 30 repositories
  • DataFlex Public

    DataFlex is a data-centric training framework that enhances model performance by either selecting the most influential samples, optimizing their weights, or adjusting their mixing ratios.

    OpenDCAI/DataFlex's past year of commit activity
    Python 113 10 0 0 Updated Mar 16, 2026
  • One-Eval Public
    OpenDCAI/One-Eval's past year of commit activity
    Python 2 Apache-2.0 1 0 0 Updated Mar 16, 2026
  • DataFlow Public

    Easy Data Preparation with latest LLMs-based Operators and Pipelines.

    OpenDCAI/DataFlow's past year of commit activity
    Python 3,021 Apache-2.0 217 8 3 Updated Mar 15, 2026
  • Mycel Public

    Human-Agent collaboration platform.

    OpenDCAI/Mycel's past year of commit activity
    Python 24 MIT 1 0 3 Updated Mar 15, 2026
  • Open-NotebookLM Public

    An Open Source implementation of Notebook LM.

    OpenDCAI/Open-NotebookLM's past year of commit activity
    Python 42 Apache-2.0 6 2 1 Updated Mar 14, 2026
  • Text2VectorSQL Public

    Official implementation of Text2VectorSQL: Towards a Unified Interface for Vector Search and SQL Queries

    OpenDCAI/Text2VectorSQL's past year of commit activity
    Python 52 8 2 0 Updated Mar 13, 2026
  • OpenDCAI/DataFlow-WebUI's past year of commit activity
    Python 18 13 0 0 Updated Mar 12, 2026
  • DataFlow-Agent Public

    Agent for DataFlow: Automatic Data Workflow Design

    OpenDCAI/DataFlow-Agent's past year of commit activity
    Python 57 Apache-2.0 12 1 0 Updated Mar 12, 2026
  • Paper2Any Public

    Turn paper/text/topic into editable research figures, technical route diagrams, and presentation slides.

    OpenDCAI/Paper2Any's past year of commit activity
    Python 1,950 Apache-2.0 138 7 2 Updated Mar 10, 2026
  • Flash-MinerU Public

    Ray-based accelerator for MinerU VLM inference pipeline. Lightweight, multi-GPU friendly PDF - Markdown processing. Ji Yu Ray De MinerU VLM Tui Li Jia Su Qi ,Qing Liang , Di Qin Ru ,Mian Xiang Duo GPU / Guo Chan Suan Li Huan Jing De PDF - Markdown Chu Li Fang An .

    OpenDCAI/Flash-MinerU's past year of commit activity
    Python 34 AGPL-3.0 3 2 0 Updated Mar 10, 2026