Masters student at

Cornell Tech

Dual MS in Applied Information Science, Information Systems - Urban Tech
Currently focused on
not breaking this website
Top languages

    Python

    PostgreSQL

    JavaScript

Main tools
  • Pandas
  • Scikit-learn
  • Plotly (D3)
About Me
Most recently a Data Science Intern at Square (Block, Inc.), I'm currently pursuing an MS in Applied Information Science at Cornell. I graduated from Princeton in 2019 with a BSE in Chemical & Biological Engineering, and was a Senior Data Scientist at CKM Analytix prior to Cornell. This website serves both as a portfolio and sandbox for me to write about data science in the context of urban technologies and sustainability.
Projects

Predicting Service Level Agreement (SLA) Violations for NYC 311

Classifying the likelihood that government agencies meet response time commitments for 311 complaints, a 'hotline' that users can dial to submit service requests for government agencies to address.

  • Status: live
  • Python
  • Plotly
  • Scikit-learn
  • XGBoost
  • SQL (SoQL)

Fine-Tuning a Vision Transformer for Detection and Classification of AI-Generated Images

Identifying whether an image is authentic (created by a human) or generated by one of a series of text-to-image AI generators (i.e., Stable Diffusion, Midjourney, and DALL-E) by fine-tuning a pre-trained vision transformer (Swin-Tiny) to tackle a multiclass classification problem.

  • Status: live
  • Python
  • PyTorch
  • Hugging Face

Event Planning App

Web application to create and track details of upcoming events and vacations.

  • Status: live
  • JS
  • Express
  • React
  • Tailwind

Serverless Scraping of Solar Irradiance Data

The NREL Measurement and Instrumentation Data Center (MIDC) tracks granular solar and meteorological trends down to the minute at a station in Golden, CO. This automated crawler scrapes the data into a Postgres instance by leveraging AWS resources.

  • Python
  • PostgreSQL
  • Scrapy
  • Serverless
  • AWS Lambda
  • AWS RDS

DFFNN Binary Classifier

Implemention of a generic deep feedforward neural network (DFFNN), or multi-layer perceptron (MLP), to tackle binary classification problems. Collated from a series of assignments from Andrew Ng's (deeplearning.ai ) Deep Learning Specialization.

  • Status: live
  • Python
  • Neural Networks

K-Means Clustering of Solar Irradiance Profiles (Undergraduate Thesis)

Modelling the performance of near-UV organic solar cells under meteorologically representative spectral irradiances. Data was queried from the NSRDB (National Solar Radiation Database) API maintained by NREL.

  • Status: live
  • Python
  • k-means clustering
  • Scikit-learn
  • Pandas
Work history

Square | Block, Inc.

Data Science Intern, Square Appointments
Summer 2023

Urban Tech Hub | Jacobs Technion-Cornell Institute

Urban Tech Hub Researcher
September 2022 - May 2023

CKM Analytix

Data Scientist | Senior Data Scientist
September 2019 - June 2022

Environmental Defense Fund

Midwest Clean Energy Big Data Intern
Summer 2018

Ruhr University Bochum

Research Internship Program (REACH)
Summer 2017

Princeton Organic and Polymer Electronics Laboratory

Princeton Environmental Institute (PEI) Internship
Summer 2016
© 2024 Kim Sha
    Powered by: