About

I work at the intersection of high-dimensional statistics and reproducible data engineering. Most of my projects turn messy economic or genomic data into clean, documented R/Python pipelines that produce interpretable models and audit-ready reports.

What I focus on

Structured sparsity, robustness, and fairness for models that have to stand up to shift, noise, and review.
End-to-end pipelines: Snakemake, Docker, Conda, Git/GitHub Actions; data cards with provenance & checksums.
Readable science: Quarto/LaTeX reports, clear assumptions, and diagnostics you can actually act on.

A few representative projects

Gene Expression from Chromatin (GM12878) — two-step ON/OFF → regression pipeline from ENCODE signals.
USG Analytics (Gold ETF) — time-series → supervised ML with tidy configs and simple ensembling.
See details on the Projects page.

Notes & exercises

I keep compact derivations and small simulations (e.g., Rigollet’s High-Dimensional Statistics). Browse Notes & Exercises.

How I like to work

Start with a clear question and a minimal, testable model.
Make every step reproducible (env file, one-command run, seeded CV).
Prefer interpretable structure; use heavier models when they truly add signal.

Collaborate

I’m open to collaboration on statistical modeling with interesting datasets—especially clinical/omics problems where fairness and robustness matter.
Email: cx252@cornell.edu · CV: View CV · GitHub: ChuTingX · LinkedIn: chuting-xu

When I’m not debugging pipelines, I enjoy Go (the board game) and detective novels.