Data is beautiful and there's so much to learn.
Hi, my name is Casper and I have a passion for data & programming and a love for mathematics. My language of choice is Python for almost all things considered though I do not fear treading into murkier waters to achieve my goals. With a work hard, get stuff done attitude I like to throw myself into challenges to learn and evolve. Also I can type pretty fast.
Currently experimenting with a self-hosted dynamic Hadoop cluster to learn more about using Dask at scale on the reddit comments data (>1TB pure json) link.
Professional (only relevant)
Full-time Data Scientist / Consultant @ Codebeez November 2019 - Current
As a consultant contractor for Codebeez I am helping other companies in their development of Data Science Python programs in short term projects with high stakes. As my first client I landed a Data Scientist role at CoNet where I continue to lead the development of their data science solution for their clients. These solutions consist first and foremost of predicting sensory values based on historic bases, but also doing optimization work with heuristics and performing predictive maintenance with survival models.
Full-time Data Scientist / Data Engineer / Consultant @ KPN ICT Consulting May 2017 - November 2019
In my previous job as Data Engineer / Scientist at KPN ICT Consulting I have developed and delivered multiple best practices for an end-to-end data science solution that have been adopted by the Data & Analytics team of consultants. Some examples are suggesting Apache Airflow for our scheduling tasks, suggesting Dask for scaling our Python analysis operations and writing our preprocessing code as scikit-learn Transformers to keep it clean. I’ve had the lead as data engineer and scientist for multiple projects where we’ve implemented end-to-end machine learning solutions, starting with intakes on what datasets are available, all the way to generating insights from the forecasts. Mostly in the domain of predictive maintenance and customer churn. Here I have leveraged multiple models ranging from Markov Chains, XGBoost survival model, Random Forest Classifiers and time series models. Furthermore a new recent favorite library is SHAP in combination with similar techniques to gather insights from the machine learning models. While I am an enthusiast on the technical parts, I do understand the business and their needs and love to translate these into concrete solutions. Creating a business case around a data science business question is a daunting challenge but one I love to think about.
Nominated Star of the Year 2018: Best Technician
Data Scientist / Data Engineer @ Ymere May 2019 - Current (Project for KPNIC)
Retrieve, connect and streamline multiple data sources in a datalake in order to connect with the reporting dashboards and machine learning pipeline. In addtion I support in the validation of the machine learning model and optimization algorithm selection and design.
Data Scientist / Data Engineer @ Dutch Government Oktober 2017 - Current (Project for KPNIC)
Technical lead in the data engineering team. Responsible for delivering well prepared tables for the use of collegues for further insight and making sure that a pipeline is available to refresh these tables. While not busy preparing tables and keeping the pipeline alive I'm supporting the data science team by setting up systems (Dask) to scale their operation and testing new models.
Data Scientist / Data Engineer @ KPN February 2019 - May 2019 (Project for KPNIC & Graduation)
Combining several data sources with network information, weather information and event data to create a model that will provide an accurate forecast on what node with generate an alarm - and what the root cause of this alarm is. Used XGBoost survival model in combination with Markov chains to predict relative hazard per node. pdf link
Data Scientist / Data Engineer @ Consolidated Oktober 2018 - Februari 2019 (Project for KPNIC)
Implemented and improved the PoC in a production ready environment. With the use of Docker, Airflow and AWS EC2 instances I lead the deployment of the model in a production ready environment. In addition I improved the accuracy of the prediction model.
Data Scientist / Data Engineer @ KPN Internedservices June 2017 - June 2018 (Project for KPNIC)
Worked in a team of 5 to deliver a production ready Datalake solution that links a large amount of diverse data sources to provide insights and forecasts on a daily basis. Responsible for the design and implementation of the datalake, as well as the forecasts. Forecasted customer churn and customer NPS. Tools used were exclusively Python.
Lead Software engineer / Data Scientist @ KPN ICT Consulting October 2017 - June 2018 (Project for KPNIC)
Pitched the Timelord project and lead a team of 5 on a one-day-a-week basis. Provided guidance and design input for the team members and contributed most of the core and utilities. Also spend a considerable amount of time researching multivariate time series models.
Data Scientist / Data Engineer @ Ymere June 2017 - November 2017 (Project for KPNIC)
Advised on improving the forecast accuracy by refining the features and performing dimensionality reduction. The goal was to predict what specific roof will leak in a given month as proof of concept. This PoC will be expanded vastly on functionality and will be brought into production with my role as advising data engineer.
Data Scientist / Data Engineer @ Telfort June 2017 - October 2017 (Project for KPNIC)
Improved the model design to improve accuracy of forecasting customer statisfaction after customer support calls. Delivered actionable insights by fitting an appropriate model and extracting valueable features from complicated log files. Expanded the already existing pipeline and implemented it production ready to run on an hourly basis.
Full-time Data Engineer / Scientist @ CashRocket / SporeBI March 2016 - May 2017
Responsible for the architecture of the ETL of hundreds of bookkeepings of SME's through the Exact Online API for constant up-to-date insights. These insights consisted of numerous KPI's but most notably of several time series models in an ensemble configuration (combining RNNs and classical time series models) and clustering methods (ranging from K-means to Dynamic Time Warping). Completely build in Python and R utilizing libraries such as Forecast (R), Tensorflow (python) and Scikit-Learn (python).
Part-time Teamleader @ Albert Heijn 2010-2011
The go-to person in a team varying from 10-20 members. Responsible for everyone doing their job properly and general trouble-shooting.
|Applied Mathematics||The Hague University of Applied Sciences||2014 - 2019|
|Mechanical Engineering||The Hague University of Applied Sciences||2012 - 2013|
9th Machine Learning NL Meetup hosted by KPN ICT Consulting link
At KPN ICT Consulting we're often faced with the challenge to scale the deployments of analytics Python code. In this talk I will explain what Dask is, how we use it and, more importantly, how you can use it to scale your existing analytics Python code.
|Languages||Operating Systems & sorts||Other Languages (fluent)|
|Python, Bash, R, HCL (Terraform), Ansible, SQL, VBA, .NET, C||Windows, Linux (Ubuntu, CentOS), Hadoop+||Dutch, English|
Personal projects (P) & Quick hacks (Q) & Study projects (S) & Open Source (O)
- (S) Predicting the relative risk of failure of a node in a telecom network
- (O) Dask Dataframe d-type inference consolidation and rewrite (In Progress)
- (P) Terraform, Terraform-Inventory, Ansible adventures for on-demand clusters and environments (write-up upcoming)
- (O) OpenStreetMaps & Nominatim as a single docker container for offline use (write-up upcoming)
- (P) Race data video analysis pipeline (In progress) link
- (O) Timelord, a framework for forecasting probability time series of binary variables. link
- (Q) A brief analysis of a Reddit AMA link
- (P) True 3D sound in Python: link
- (P) Tor & Polipo proxies for Scrapy: link
- (Q) Toying around with a Genetic Algorithm: link
- (P) Self-written crawler framework: link
- (Q) Reading back-numbers of cyclists: link
- (S) An initial TSP solution: link (solo study project)
- (Q) A Youtube Channel downloader: link
- (Q) A simple WhatsApp chat analysis: link
- (P) Titanic & Whale recognition Kaggle challenges link
- (P) Automatic number plate recognition (ANPR) in Python link
- Auto racing Youtube Channel
- Reading wikipedia
Contact me on: САЅPЕR [at] needsmore.xyz