David Koleczek

davidkoleczek at gmail dot com | LinkedIn


I am currently working as a Data Scientist at MassMutual in the Data Science Development Program. I am also a graduate student in Computer Science at UMass Amherst's College of Information and Computer Sciences. I graduated with a Bachelor’s in CS also from UMass in May 2020.

I have done a lot of work in the energy forecasting domain in my 2+ years at ISO New England. It was an awesome experience to learn how complicated a process it is to make sure the lights stay on 24/7/365 and the profound effects that demand forecasts have on grid operations. On top of that I grew to love the freedom I was given to experiment with many different parts of data science which culminated in techniques that I hope will not only keep our forecasts state of the art, but could be extended to many other problems in data science.

I enjoy reading and keeping up with how others tackle problems using DS, ML, etc. This inspired me to make ML Twitter Feed as a way to help me aggregate related content from Twitter. Since making it, it’s turned into a platform for me to try out all the cool things I have learned at school; creating a software system with many components, frontend web dev, DBA, designing and implementing algorithms, search, visualization, and honestly the list could be endless.

Currently I’m working on adding search to ML Twitter Feed. I’ve also recently learned I really like writing articles on things I’ve worked on and have a few more ideas I want to explore!


Experience

MassMutual (2020-Present)

Data Scientist - Data Science Development Program

ISO New England (2017-2020)

Data Science Intern in the IT Day-Ahead Support team where I worked on short-term energy demand forecasting.

In the fall of 2017 I started my first internship at ISO New England, New England's non-profit electric grid operator. ISO-NE is responsible for the operation of the bulk electric grid for the six New England states, designing and running wholesale energy markets, and planning to ensure electricity needs will be met over the next ten years. The team I worked in, IT-EMS Day-Ahead Support, is responsible for the development and maintenance of software that the operators running the power grid use every minute. This includes software responsible for allowing participants in the energy markets to make bids, solving for the most economical way to schedule power generation, forecasting electricity demand, wind, and solar generation.

My work was mostly dedicated to short-term energy demand forecasting. The power system operators need to have as accurate of a picture as possible of the amount of generation that needs to be scheduled and for the upcoming week. With the forecast for the immediate next day being the most critical as it is used as input to our energy markets to make sure there is enough generation purchased and available to meet upcoming electric demand. An accurate forecast is critical for both ensuring grid reliability and keeping costs down for the people of New England.

I developed a new machine learning system to forecast energy demand for the next seven days. Currently the solution uses LightGBM as the underlying ML framework, but also employs several other tricks to squeeze as much accuracy as possible. A few examples are correcting errors temporally (similar to a moving average) and upweighting certain critical instances. This project also included the software engineering aspect of creating a system that integrates with our existing databases and is completely reliable day to day.

I created a web app using RShiny to make it as easy as possible to view our data and forecasts in order to quickly analyse their relative performance and provide operational metrics. It includes highlighting how much of our errors come from solar PV forecasts and interactive demos of how several of the techniques used in the model impact the final output.

I also worked on several experimental projects to help streamline or improve several processes. One was trying to quantify the uncertainty in our forecasts. Just as with any other ML system, the load forecast will never be 100% accurate. However, could we derive bounds or intervals that we can expect the forecast to fall under most of the time or at least classify a day as likely to have high error? To answer this question I worked on a project that broke down where errors come from into two components. One of which is the errors that come from the difference between the forecasted weather and what the weather actually ends up being. The other component being measuring errors inherent to the model itself. The motivation here being that instances/days with very hot temperatures are likely to have higher errors than days with mild temperatures. So to get an estimate of model error, the general idea was to condition the dataset based on the new sample and look at the errors of the historical samples. With these estimates of errors combined, we have the desired metric that could be used to compare instances to each other and see if power system operations should be more wary of the upcoming load forecast.


Projects

ML Twitter Feed mlfeed.tech

ML Twitter Feed is the place to get automatically curated tweets from the biggest names in the machine learning community on Twitter. I created ML Twitter Feed because I got tired of how much junk I had to sift through just following 40-50 people who are well known names in the ML community. At its core, MLTF fetches tweets from a home timeline on Twitter, uses relevance labels to train a neural net, a Flask server provides a RESTful API that a React frontend uses to display the relevant tweets at mlfeed.tech. The relevant tweets are also retweeted back @dave_co_dev, give it a follow!

Viking Patrol

A final project for CS 590G. This is a tower-defense game I created with Robert Jewell and Nicholas Sichalov. You can play it by downloading the Windows executable from this drive link (recommended) or play in your browser with WebGL (which has some issues).


Articles

Search Engines and Information Retrieval: Applications for Twitter
  August 30, 2020

How I implemented a search engine for mlfeed.tech

Adaptive Weighting
  June 10, 2020

Ensembling Forecasts AdaBoost-Style and More

Moving Average Correction
  May 21, 2020

A Method to Account for Temporal Errors in Forecasting with Machine Learning


What's on my Bookshelf?

Shorter, but well-worthy reads

Books

Stack of Papers