Ivan Lai
Machine Learning, Data Science & Visualization
Professional Projects
[Python, PyTorch, Numpy, Pandas, GeoPandas, Scikit-learn, XGBoost, FBProphet, NLTK, SpaCy, Gensim, NetworkX]
​
-
Machine Learning Engineering (NLP / Computer vision):
-
Built an image annotation pipeline with CLIP model as backbone and Faster/Mask R-CNN as object detector/segmentator;
-
Created a duplicate image detection dash app with imagehash and TSNE;
-
Developed NLP-based models and pipeline to match and group similar products offered by different merchants;
-
Wrote bespoke Named-Entity Recognition library to identify product features;
-
​
-
Time-series forecasting using a wide range of ML techniques including Lasso regression, SVM, random forests and Gradient Boosting methods with feature engineering.
Personal Projects
[PyTorch, Tensorflow, Transformers, OpenCV, Pillow, Scikit-image, Albumentations]
​
Conditional Text Generation given title and keywords by fine tuning GPT-2: Medium Article, Colab notebook.
​
Twitter sentiment analysis on US Election 2020:
-
Article on Medium;
-
Corresponding Colab notebook on fine-tuning RoBERTa on TPUs.
​
Prove of concept and feasibility for a prospective client:
-
Performed classification and sentiment analysis by fine-tuning RoBERTa on Google TPUs:
​
Data Visualization
[Matplotlib, Seaborn, Dash, Plotly, Kepler.gl, Yellowbrick]
​
-
Plotly Web-App in showing England and Wales average house prices by postcode sector: https://ukhouseprice.project-ds.net/
-
A Kaggle notebook Taxi Trips as Air Distance – Animation.
​
Algo-Trading
[Matlab, Julia, Java]
​
-
Developed a java-based proprietary algo-trading system to trade on Interactive Broker, with bespoke models developed and back-tested in Python and Julia.
-
High-frequency algo trading in spot FX on EBS and Reuters with an average daily volume of EUR 6 billion. Proprietary models developed in Matlab.
​
​
Kaggle
​
Competition contributor:
-
Top 10% in Quora Question Pairs NLP competition;
-
Top 14% in Data Science Bowl 2017 on lung cancer detection from medical images;
-
Top 12% in Amazon from Space satellite data image classification competition.
​
​
​