Bio


Hello! My name is Ethan, and I am pursuing a Master of Science (M.S.) in Data Science at Stanford University (with coursework in Statistics, Computer Science, and Computational and Mathematical Engineering). Before Stanford, I graduated summa cum laude with a Bachelor of Science (B.S.) in Statistics from the University of California, Los Angeles (UCLA). I am always eager to contribute to research and gain more experience through data science internships. My technical prowess, determined work ethic (I completed my four-year undergraduate degree at UCLA in three years), and effective communication skills make me a valuable addition to any team.

I am currently a Data Science Intern at Apple, working on data processing, visualization, and modeling (ML). I am also a research assistant within the Luby Lab at Stanford, working on processing, standardizing, and visualizing data regarding brick kiln production in South Asia. Last year, I interned with Bridg as a Data Science Intern, working on data querying, data transformations, natural language processing (NLP), and machine learning with Python and SQL--with integrations in Snowflake (and Snowpark, Snowflake's Python API)--on terabytes of data (over 100 billion observations). My projects improved insights from product descriptions and standardized features across multiple sources.

My experiences have prepared me to work in virtually any domain. I am always willing to discuss potential work opportunities or my path with prospective undergraduate or graduate students or data science enthusiasts via LinkedIn or email.

Honors & Awards


  • Summa Cum Laude, University of California, Los Angeles (UCLA) (June 2022)
  • Best Visualization (Honorable Mention), UCLA DataFest (April 2021)
  • Employee of the Quarter, Lifetime Activities (October 2020)
  • National AP Scholar, College Board (July 2019)
  • U.S. Presidential Scholars Program Candidate, U.S. Department of Education (February 2019)

Education & Certifications


  • Master of Science, Stanford University, Statistics Data Science (2024)
  • Bachelor of Science, University of California, Los Angeles (UCLA), Statistics (2022)
  • High School Diploma, Amador Valley High School (2019)

Service, Volunteer and Community Work


  • Alumni Mentor, University of California, Los Angeles (September 2022 - Present)

    • Serve as an alumni mentor for approximately 30 current UCLA students
    • Provide advice on life/classes at UCLA, career/internship searches, and grad school applications

    Location

    Los Angeles

Work Experience


  • Data Science and Visualization Intern, Apple (June 2023)

    • Data cleaning, processing, and transformation for statistical modeling and visualizations
    • Provide product and business insights for use by various teams
    • Member of the DataViz team (within Hardware Engineering)

    Location

    Cupertino, California, USA

  • Graduate Research Assistant, Stanford University (January 2023 - Present)

    • Assist randomized controlled trials for energy efficiency improvements and information on worker incentives for brick kiln owners across South Asia in R
    • Standardize, process, and QA survey data from various questionnaires
    • Perform statistical analysis and produce various data visualizations
    • Luby Lab (under Dr. Stephen Luby)

    Location

    Stanford, CA

  • Data Science Intern, Bridg / Cardlytics (June 2022 - November 2022)

    • Constructed machine learning models with AWS Sagemaker (in Python) and Snowflake (SQL) to analyze terabytes of data (over 100 billion observations)
    • Developed natural language processing (NLP) algorithms to standardize and categorize product descriptions for enhanced business analytics
    • Deployed machine learning models to identify potential store location openings and closings
    • Assisted product recommendation and churn identification via machine learning models in Sagemaker and Snowpark (Snowflake Python API)

    Location

    San Francisco

  • Data Analyst Intern, SCAN Health Plan (November 2021 - June 2022)

    • Generated insights about SCAN membership experience challenges by completing end-to-end projects analyzing “Voice of the Consumer” (VOC) data using Python, R, SQL, Tableau
    • Analyzed member call data (over 1 million calls) to create FAQ sections using natural language processing (NLP) and unsupervised clustering in Python for benefit categories and disenrollment groups
    • Queried disenrollment data from SQL to create visualizations in Tableau and R (ggplot) then analyzed which demographics had high disenrollment rates via chi-square and posthoc analyses
    • Analyzed the relationship between disenrollments and frequent member inquiries and grievances for disenrollees during OEP in recent years in Python and Tableau
    • Presented findings, insights, and learnings to the leadership team and in all-teams meetings

    Location

    Los Angeles

  • Data Science Consultant, UCLA Library Data Science Center (November 2021 - June 2022)

    • Consulted to curate, transform, analyze, and visualize researcher data using Python, R, Tableau, SPSS
    • Implemented and refined machine learning models including K-Nearest Neighbors (KNN), decision trees, neural networks
    • Performed statistical tests including chi-square analyses, Peacock tests (i.e., a multidimensional KS test)
    • Reverse geolocated addresses from latitude and longitude using Google’s API key
    • Implemented regular expressions to parse text for analysis, extract highlighted text from Word documents with Python
    • Extracted data from .doc, .xls, and .xlsx files for cleaning and parsing with R
    • Conducted and interpreted linear and logistic regression in SPSS
    • Member of the DataSquad, a team within the UCLA Library Data Science Center

    Location

    Los Angeles

  • Student Researcher, UCLA Department of Statistics (January 2022 - June 2022)

    • Reviewed and clarified homework instructions for upper division statistics courses in statistical programming and simulation in R
    • Provided students with additional guidance through the redesign of instruction and template files
    • Research conducted within the UCLA Department of Statistics (under Dr. Miles Chen)

    Location

    Los Angeles

  • Student Assistant, UCLA Luskin School of Public Affairs (June 2021 - March 2022)

    • Webscraped firm-level data with Selenium webdriver in R and analyzed the data about the political connections for various enterprises
    • Imputed missing data for over 30 million firms
    • Used Google Translate API to convert various job titles and board positions from 5+ languages into English
    • Research conducted under Dr. Darin Christensen (UCLA) and with Dr. Jonah Rexer (Princeton)

    Location

    Los Angeles

  • InStep Intern (Data Analytics and AI), Infosys (June 2021 - August 2021)

    • Defined and developed machine learning models in Python to analyze and detect patterns of play and their success and failure percentages for Roland Garros (the French Open) and the Australian Open to improve user experience for media, players, and coaches
    • Supported creation of AI commentary through natural language generation to provide engaging point-by-point description of Roland Garros and Australian Open matches using Python to improve the user interfaces for tennis fans

    Location

    San Francisco

  • Student Researcher, UCLA Department of Statistics (January 2020 - June 2020)

    • Developed automated, interactive lecture notes for undergraduate statistics students
    • Implemented the learnR package (interactive R code, automated review questions in R) under Dr. Michael Tsiang

    Location

    Los Angeles