- ๐ Recent graduate with a Masterโs from Columbia University and a Bachelorโs in Mathematics with Co-ops from the University of Waterloo
- ๐ผ Multiple industry experiences as a Data Engineer, with hands-on experience including building data pipelines, real-time streaming and integrating cloud services.
- Programming Languages: Python, Java, SQL, Bash
- Cloud & Infrastructure: AWS, GCP, Kafka, Airflow, Docker
- Data Technologies: Spark, Hadoop, PostgreSQL, MongoDB, Redshift, BigQuery, Snowflake, DBT
๐ธ๏ธ Wikipedia Profile Pipeline
- Built a Kafka-based data pipeline that crawls Wikipedia pages, extracts person-related information using LLMs, and stores enriched data into MongoDB and Elasticsearch for search and analytics.
๐ค Custom GPT NL2SQL
- Built a Streamlit app that turns natural language into SQL using GPT and executes it on your database.
- AWS Certified Cloud Practitioner
- ๐ง Rock climbing enthusiast
- ๐พ Enjoy playing tennis on weekends
๐ซ Letโs connect!
Feel free to reach out on LinkedIn or email me at yuhan.xie@hotmail.com. Thanks for stopping by!