UUrdeasurdeas.com / lensgrid-explorer
Log inGet started

Lensgrid — A faster Datasette for ML datasets.

A read-side dataset explorer optimized for the way ML researchers actually browse data: faceted filters, fast random samples, and column-stat views in one shot.

Lensgrid — A faster Datasette for ML datasets

The problem

ML teams spend a frustrating amount of time on "let me just check what's in this dataset" — and the answer is usually a Jupyter notebook and 20 minutes of pandas. Datasette is wonderful but optimized for SQLite + civic data; it's slow on the kind of multi-million-row Parquet files ML teams actually work with. Streamlit / Gradio dashboards are too custom — every team rebuilds the same column-stats + sampling UI. Roboflow + Label Studio focus on the labeling step, not the exploration step. There's an open lane for a fast, opinionated read-side tool.

Our approach

Lensgrid is a desktop app (Tauri-based, runs locally) that opens any Parquet / CSV / Arrow file and gives you: instant column stats (cardinality, top-k, null rate, distribution), faceted filters that compose, and fast random sampling. No SQL knowledge required, but a SQL escape hatch for power users. Built around Apache DataFusion so a 50M-row Parquet file opens in <2 seconds and filters run in <500ms.

Where we are

Pre-product. I (Alice, a former ML eng at <undisclosed>) have been running this kind of workflow against my own datasets for years and want to ship the version I always wanted. 28 ML engineers in my network said they'd pay for it.

Where we're going

The default first thing an ML engineer does when they get a new dataset: open it in Lensgrid. Eventually: integration with W&B + Hugging Face datasets so the same UI works across local files and remote registries.

Tell me about an ML dataset you wish was easier to explore

TEAM

A

Alice BuilderFounder

@alice_qa