Shafayet Khan Shafee logo Shafayet Khan Shafee logo SKS
  • About
  • Publications

Shafayet Khan Shafee

Statistician, Autodidact, and Bibliophile.

Mastodon Bluesky Twitter LinkedIn Github Google Scholar ORCID Mail

I’m a Data scientist and Statistician who enjoys using statistical thinking to shape real product and business decisions. My interests and expertise lie at the intersection of applied statistics, machine learning, and open-source development, turning rigorous analytical ideas into practical tools, software and workflows.

What I Do

I translate complex data problems into clear, actionable insights. My work spans machine learning, causal inference, and experimentation, and I spend a lot of time designing experiments and developing scalable data workflows.

Lately, I’ve been digging deeper into methods at the intersection of causal inference and hierarchical modeling, as well as modern causal ML approaches like metalearners, double machine learning, and causal forests. I’m also very enthusiastic about OSS development: I have developed a few  packages,  packages, and Quarto extensions, and I’m always looking for ways to improve the tools I rely on. R is my first love , but I also work comfortably with Python & SQL , and I’m currently exploring Rust and Julia to broaden my toolkit, mostly because they’re fast !!

Work Experience

  • Data Scientist I @ Pathao Ltd.
    Jan, 2025 - Present
    Dhaka, Bangladesh

  • Graduate Data Scientist @ Pathao Ltd.
    July, 2023 - Dec, 2024
    Dhaka, Bangladesh

Technical Skills

Built on a strong foundation in statistical methodologies, ranging from probability theory, statistical inference, and hypothesis testing to survival analysis, multilevel modeling, Bayesian methods, and causal inference, I work across the broader landscape of modern data science and software development. I build end-to-end data transformation pipelines using tools like dbt, bash scripting, manage CI/CD workflows with GitHub Actions, and develop automated reporting dashboards in Google Sheets using Google AppsScript and BigQuery, as well as dynamic dashboards in Looker Studio.

My experience includes developing and productionizing machine-learning models and pipelines with Kedro, MLflow, and containerized deployments using Docker. I support polyglot data-science workflows with deep expertise in R complemented by effective integration of Python when needed.

Open Source Contributions

I contribute to open-source projects across R, Python, and the Quarto ecosystem. My work includes developing  packages such as MOR and skmisc, as well as  packages like skmiscpy for causal effect estimation and kedrogen, a CLI tool for scaffolding reproducible Kedro project structures. I have also developed a range of Quarto extensions (e.g., downloadthis, line-highlight, and interactive-sql), using Lua, HTML/CSS, and JavaScript to support more efficient and user-friendly scientific publishing workflows.

Research & Publications

I have one peer-reviewed publication in PLoS ONE (a Q1 journal), where I examined the causal impact of maternal continuum of care on a child’s minimum acceptable diet in Bangladesh.

Education

  • M.Sc. in Applied Statistics, 2023
    Grade: 3.97 out of 4.00
    ISRT, University of Dhaka, Bangladesh

  • B.Sc. in Applied Statistics, 2021
    Grade: 3.96 out of 4.00
    ISRT, University of Dhaka, Bangladesh

Interests

  • Causal Inference
  • Experimentation
  • Hierarchical Modeling
  • Bayesian Inference
  • Predictive Modeling
  • Time Series Forecasting
  • Reproducible Research
  • Open Source Contributions
  • R & Python Pkg Development
  • Data Visualisation & Storytelling
Back to top

Built with & Quarto.

Copyright © 2025 Shafayet Khan Shafee

License: CC BY 4.0