Chuting Xu

About

I work at the intersection of high-dimensional statistics and reproducible data engineering. Most of my projects turn messy economic or genomic data into clean, documented R/Python pipelines that produce interpretable models and audit-ready reports.

What I focus on

  • Structured sparsity, robustness, and fairness for models that have to stand up to shift, noise, and review.
  • End-to-end pipelines: Snakemake, Docker, Conda, Git/GitHub Actions; data cards with provenance & checksums.
  • Readable science: Quarto/LaTeX reports, clear assumptions, and diagnostics you can actually act on.

A few representative projects

  • Gene Expression from Chromatin (GM12878) — two-step ON/OFF → regression pipeline from ENCODE signals.
  • USG Analytics (Gold ETF) — time-series → supervised ML with tidy configs and simple ensembling.
    See details on the Projects page.

Notes & exercises

I keep compact derivations and small simulations (e.g., Rigollet’s High-Dimensional Statistics). Browse Notes & Exercises.

How I like to work

  • Start with a clear question and a minimal, testable model.
  • Make every step reproducible (env file, one-command run, seeded CV).
  • Prefer interpretable structure; use heavier models when they truly add signal.

Collaborate

I’m open to collaboration on statistical modeling with interesting datasets—especially clinical/omics problems where fairness and robustness matter.
Email: cx252@cornell.edu · CV: View CV · GitHub: ChuTingX · LinkedIn: chuting-xu

When I’m not debugging pipelines, I enjoy Go (the board game) and detective novels.