A read-side dataset explorer optimized for the way ML researchers actually browse data: faceted filters, fast random samples, and column-stat views in one shot.
The problem
ML teams spend a frustrating amount of time on "let me just check what's in this dataset" — and the answer is usually a Jupyter notebook and 20 minutes of pandas. Datasette is wonderful but optimized for SQLite + civic data; it's slow on the kind of multi-million-row Parquet files ML teams actually work with. Streamlit / Gradio dashboards are too custom — every team rebuilds the same column-stats + sampling UI. Roboflow + Label Studio focus on the labeling step, not the exploration step. There's an open lane for a fast, opinionated read-side tool.
Our approach
Lensgrid is a desktop app (Tauri-based, runs locally) that opens any Parquet / CSV / Arrow file and gives you: instant column stats (cardinality, top-k, null rate, distribution), faceted filters that compose, and fast random sampling. No SQL knowledge required, but a SQL escape hatch for power users. Built around Apache DataFusion so a 50M-row Parquet file opens in <2 seconds and filters run in <500ms.
Where we are
Pre-product. I (Alice, a former ML eng at <undisclosed>) have been running this kind of workflow against my own datasets for years and want to ship the version I always wanted. 28 ML engineers in my network said they'd pay for it.
Where we're going
The default first thing an ML engineer does when they get a new dataset: open it in Lensgrid. Eventually: integration with W&B + Hugging Face datasets so the same UI works across local files and remote registries.
TEAM
Alice BuilderFounder
@alice_qa