David Koleczek

davidkoleczek at gmail dot com | LinkedIn


I am an Applied Scientist II at Microsoft working on AI projects for Microsoft Loop! I have a Master's in Computer Science and concentration in Data Science from the University of Massachusetts Amherst.

My passion is to always be building things and learning. One of my recent projects, not-again-ai, is an open-source Python package collection of various functionalities that come up over and over again when developing AI projects. Check it out on GitHub!

News
  • I open sourced evaluate-ai; a tool for easily developing custom evaluations for language models.
  • June 10, 2024: I started working on AI projects in Microsoft Loop within our Office product group.
  • January 28, 2023: Our Ludum Dare 52 game placed in the top 20 for both innovation and fun out of over 1,000 submissions.

Experience

Microsoft (2022-Present)

Applied Scientist II - Microsoft Loop (June 2024 - Present)

Applied Scientist II - Microsoft AI Development Acceleration Program (July 2022 - June 2024)

MassMutual (2020-2022)

Data Scientist - Security & Fraud, Data Science Development Program (Feb - Jul 2022)

Data Scientist - Investments & Finance, Data Science Development Program (Jun 2020 - Feb 2022)

More Details

I worked on a custom Barra-style factor risk model for MM Funds. My primary contribution has been implementing active risk decomposition. For those unfamiliar with institutional investing, factors are quantifiable characteristics that can explain differences in returns, one can think of them as playing a similar role to machine learning features. A factor model computes the exposure of each asset to each factor and uses those factors to estimate the covariance between them. A factor model is created to decompose portfolio risk according to these individual exposures and covariances between assets. Business users can then use these insights to ensure funds are aligned with their stated objectives and provide oversight over portfolio manager decisions.

Concurrently, I led a cross-functional effort to migrate the risk system from RShiny to a robust Django based framework. Our goals for this successful project were to facilitate the development of advanced functionality to make the most out of model results, promote collaboration with software engineers, enable team members to do what they are best at.

My first project involved working with Investment Management, which is the team responsible for making decisions around the hundreds of billions of dollars we have under management. A typical insurance company invests into corporate bonds of many public companies. However, not all of these companies are able to make good on their principal or coupon payments over the bond's term. Our goal is to predict which of over a thousand companies are at the highest risk of default, so appropriate action can be taken. My role in this project was two-fold. The first being to refresh the model with new default events and ensure that it is able to correctly identify all companies that have defaulted in the recent past. The second was to improve on the methods that take the ranked list that the model outputs and then pick a subset that is deemed as most at risk. I created an algorithm that considers leading economic factors, like credit spread, to determine how to adjust the companies on the list based on the state of the economy.

ISO New England (2017-2020)

Data Science Intern - IT Day-Ahead Support

More Details

In the fall of 2017 I started my first internship at ISO New England, New England's non-profit electric grid operator. ISO-NE is responsible for the operation of the bulk electric grid for the six New England states, designing and running wholesale energy markets, and planning to ensure electricity needs will be met over the next ten years. The team I worked in, IT-EMS Day-Ahead Support, is responsible for the development and maintenance of software that the operators running the power grid use every minute. This includes software responsible for allowing participants in the energy markets to make bids, solving for the most economical way to schedule power generation, forecasting electricity demand, wind, and solar generation.

My work was mostly dedicated to short-term energy demand forecasting. The power system operators need to have as accurate of a picture as possible of the amount of generation that needs to be scheduled and for the upcoming week. With the forecast for the immediate next day being the most critical as it is used as input to our energy markets to make sure there is enough generation purchased and available to meet upcoming electric demand. An accurate forecast is critical for both ensuring grid reliability and keeping costs down for the people of New England.

I developed a new machine learning system to forecast energy demand for the next seven days. Currently the solution uses LightGBM as the underlying ML framework, but also employs several other tricks to squeeze as much accuracy as possible. A few examples are correcting errors temporally (similar to a moving average) and upweighting certain critical instances. This project also included the software engineering aspect of creating a system that integrates with our existing databases and is completely reliable day to day.

I also worked on several experimental projects to help streamline or improve several processes. One was trying to quantify the uncertainty in our forecasts. Just as with any other ML system, the load forecast will never be 100% accurate. However, could we derive bounds or intervals that we can expect the forecast to fall under most of the time or at least classify a day as likely to have high error? To answer this question I worked on a project that broke down where errors come from into two components. One of which is the errors that come from the difference between the forecasted weather and what the weather actually ends up being. The other component being measuring errors inherent to the model itself. The motivation here being that instances/days with very hot temperatures are likely to have higher errors than days with mild temperatures. So to get an estimate of model error, the general idea was to condition the dataset based on the new sample and look at the errors of the historical samples. With these estimates of errors combined, we have the desired metric that could be used to compare instances to each other and see if power system operations should be more wary of the upcoming load forecast.


Publications

UMass PCL at SemEval-2022 Task 4: Pre-trained Language Model Ensembles for Detecting Patronizing and Condescending Language | Arxiv
David Koleczek, Alex Scarlatos, Siddha Karakare, Preshma Linet Pereira
The 16th International Workshop on Semantic Evaluation (SemEval-2022)

Abstract

Patronizing and condescending language (PCL) is everywhere, but rarely is the focus on its use by media towards vulnerable communities. Accurately detecting PCL of this form is a difficult task due to limited labeled data and how subtle it can be. In this paper, we describe our system for detecting such language which was submitted to SemEval 2022 Task 4: Patronizing and Condescending Language Detection. Our approach uses an ensemble of pre-trained language models, data augmentation, and optimizing the threshold for detection. Experimental results on the evaluation dataset released by the competition hosts show that our work is reliably able to detect PCL, achieving an F1 score of 55.47% on the binary classification task and a macro F1 score of 36.25% on the fine-grained, multi-label detection task.

On Optimizing Interventions in Shared Autonomy | Arxiv | Code
Weihao Tan*, David Koleczek*, Siddhant Pradhan*, Nicholas Perello, Vivek Chettiar, Nan Ma, Aaslesha Rajaram, Vishal Rohra, Soundar Srinivasan, H M Sajjad Hossain^, Yash Chandak^. *Equal contribution, ^Equal advising
Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI 2022)

Abstract

Shared autonomy refers to approaches for enabling an autonomous agent to collaborate with a human with the aim of improving human performance. However, besides improving performance, it may often also be beneficial that the agent concurrently accounts for preserving the user’s experience or satisfaction of collaboration. In order to address this additional goal, we examine approaches for improving the user experience by constraining the number of interventions by the autonomous agent. We propose two model-free reinforcement learning methods that can account for both hard and soft constraints on the number of interventions. We show that not only does our method outperform the existing baseline, but also eliminates the need to manually tune a black-box hyperparameter for controlling the level of assistance. We also provide an in-depth analysis of intervention scenarios in order to further illuminate system understanding.

Intervention Aware Shared Autonomy | PDF
Weihao Tan*, David Koleczek*, Siddhant Pradhan*, Nicholas Perello, Vivek Chettiar, Nan Ma, Aaslesha Rajaram, Vishal Rohra, Soundar Srinivasan, H M Sajjad Hossain^, Yash Chandak^. *Equal contribution, ^Equal advising
HumanAI workshop @ Thirty-eighth International Conference on Machine Learning (ICML 2021)

Abstract

Shared autonomy refers to approaches for enabling an autonomous agent to collaborate with a human with the aim of improving human performance. However, besides improving performance, it may often be beneficial that the agent concurrently accounts for preserving the user’s experience or satisfaction of collaboration. In order to address this additional goal, we examine approaches for improving the user experience by constraining the number of interventions by the autonomous agent. We propose two model-free reinforcement learning methods that can account for both hard and soft constraints on the number of interventions. We show that not only does our method outperform the existing baseline, but also eliminates the need to manually tune an arbitrary hyperparameter for controlling the level of assistance. We also provide an in-depth analysis of intervention scenarios in order to further illuminate system understanding.


Projects

evalute-ai
  (June 2024 - Present)
| Github

evaluate-ai is a tool for easily developing custom evaluations for language models.

not-again-ai
  (April 2022 - Present)
| Github

not-again-ai is a collection of various functionalities that come up over and over again when developing AI projects. It is designed to be simple and minimize dependencies first and foremost.

mlfeed.tech | @mlfeedtech
  (May 2019 - March 2023)

A NLP-powered web application to automatically curate tweets from the machine learning community on Twitter.

More Details

I created mlfeed to solve the problem of wanting the latest ML news, but without sorting through other noise. Currently the Twitter account has over 300 followers and over 500 average monthly profile visits. It is composed of 4 primary components:

  • A data ingestion pipeline which fetches raw tweet data from a Twitter account that follows other machine learning accounts. Also includes a web scraper to gather relevant information from URLs embedded within tweets
  • Ingested tweets are scored using fine-tuned RoBERTa trained with PyTorch and bootstrapped by unsupervised labeling techniques using Snorkel. Note that it is fine-tuned on a couple thousand manually generated labels.
  • Tweet relevance scores are fed into an adaptive queuing system to ensure the best tweets are retweeted on a regular schedule.
  • Alongside retweeting tweets directly on Twitter @mlfeedtech, I developed a web UI using React to display the twitter feed at mlfeed.tech.

Game Development

Soul Food
  (April 2024)
| Ludum Dare, itch.io

Soul Food is a game where you summon monsters to satiate your hungry customers! The game was made in collaboration with multiple collaborators for Ludum Dare 55.

Societies Stranding
  (October 2023)
| Ludum Dare, itch.io

Societies Stranding is a physics-based game where you pilot a space ship in an open world trying to deliver people to safe planets! The game was made in collaboration with Matthew Clinton for Ludum Dare 54.

Courier Crusaders
  (May 2023)
| Ludum Dare, itch.io

Courier Crusaders is a RPG management game I made with Matthew Clinton for Ludum Dare 53.

Barn Busters
  (January 2023)
| Ludum Dare, itch.io

Barn Busters is a physics based tower defense game inspired by Fall Guys that I made with Matthew Clinton for Ludum Dare 52. We placed in the top 20 for both innovation and fun and in the top 10% overall (out of over 1,000 submissions)!

Big Block Mode
  (October 2022)
| Ludum Dare, itch.io

Big Block Mode is another take on the classic tetromino puzzler I made with Matthew Clinton for Ludum Dare 51. We placed 67th for Innovation and in the top 20% overall!

Cabbage Crashers
  (April 2022)
| Ludum Dare, itch.io

Cabbage Crashers is a cabbage farming simulation game I made with Matthew Clinton for Ludum Dare 50.

Viking Patrol
  (May, 2020)

Viking Patrol is a tower-defense game. It was a a final project for the class CS 590G that I created with Robert Jewell. You can play it by downloading the Windows executable from this drive link (recommended) or play in your browser with WebGL.


Articles

What's Happening in AI!? Hierarchical Topic Analysis for Artificial Intelligence Tweets
  December 7, 2021

A project I worked on for CS682 Neural Networks on topic modeling with AI-related tweets. The plan is to integrate this with mlfeed.tech.

Search Engines and Information Retrieval: Applications for Twitter
  August 30, 2020

How I implemented a search engine for mlfeed.tech

Adaptive Weighting
  June 10, 2020

Ensembling Forecasts AdaBoost-Style and More

Moving Average Correction
  May 21, 2020

A Method to Account for Temporal Errors in Forecasting with Machine Learning