About
I work at the intersection of high-dimensional statistics and reproducible data engineering. Most of my projects turn messy economic or genomic data into clean, documented R/Python pipelines that produce interpretable models and audit-ready reports.
What I focus on
- Structured sparsity, robustness, and fairness for models that have to stand up to shift, noise, and review.
- End-to-end pipelines: Snakemake, Docker, Conda, Git/GitHub Actions; data cards with provenance & checksums.
- Readable science: Quarto/LaTeX reports, clear assumptions, and diagnostics you can actually act on.
A few representative projects
- Gene Expression from Chromatin (GM12878) — two-step ON/OFF → regression pipeline from ENCODE signals.
- USG Analytics (Gold ETF) — time-series → supervised ML with tidy configs and simple ensembling.
See details on the Projects page.
Notes & exercises
I keep compact derivations and small simulations (e.g., Rigollet’s High-Dimensional Statistics). Browse Notes & Exercises.
How I like to work
- Start with a clear question and a minimal, testable model.
- Make every step reproducible (env file, one-command run, seeded CV).
- Prefer interpretable structure; use heavier models when they truly add signal.
Collaborate
I’m open to collaboration on statistical modeling with interesting datasets—especially clinical/omics problems where fairness and robustness matter.
Email: cx252@cornell.edu · CV: View CV · GitHub: ChuTingX · LinkedIn: chuting-xu
When I’m not debugging pipelines, I enjoy Go (the board game) and detective novels.