Time series (TS) forecasting is notoriously finicky. That is, until now.

In 2019, Amazon’s research team developed a deep learning method called DeepAR that exhibits a ~15% accuracy boost relative to state-of-the-art TS forecasting models. It’s robust out-of-the-box and can learn from many different time series’, so if you have lots of choppy data, DeepAR could be an effective solution.

From an implementation perspective, DeepAR is more computationally complex than other TS methods. It also requires more data than traditional TS forecasting methods such as ARIMA or Facebook’s prophet.

**That being said, if you have lots of complex data and…**

On my journey towards writing 52 posts about data science, I realized I don’t know how to read a paper. These are my notes…

After writing my 5th post, I realized that I was spending too much time reading papers. I’m not super proud to admit it, but I spent ~8 hours just understanding the method. So, over the weekend I did some research on how to read a scientific paper.

The most common piece of advice is that you shouldn’t read papers linearly i.e. start to finish. Instead, you should start high-level and just read the abstract, introduction, and…

When running A/B tests, we often try to improve long-term metrics. However, to properly measure a long-term metric’s impact requires long experiment durations.

To tackle this problem, researchers at LinkedIn published a 2019 paper that outlines a method for replacing long-term metrics with predictions using short-term covariates.

The prediction method is left to the user, however the paper outlines requirements that ensure statistical validity. Relative to other methods covered in this series, surrogate metrics are quite labor intensive because you must develop a robust forecasting model. …

Statistics is based on assumptions. If those assumptions become invalid, all conclusions based on those assumptions likewise become invalid.

“All models are wrong but some are useful” — George E.P. Box

An assumption-lean approach proposed by statisticians at the University of Pennsylvania outlines how to develop more defendable conclusions from our data. The method describes a new language for interpreting model coefficients as well as confidence interval calculations that are robust to an assumption-lean approach.

While there is nuance in the method, it’s very computationally efficient and straightforward to implement.

**In practice, guaranteeing a correctly specified model is very difficult…**

Generalized Linear Models (GLMs) are one of the most widely used inferential modeling techniques. Their simplicity makes them easy to interpret, so when communicating causal inference to stakeholders they’re a very effective tool.

Elastic net regularization, a widely used regularization method, is a logical pairing with GLMs — it removes unimportant and highly correlated features, which can hurt both accuracy and inference. These two methods are a useful part of any data science toolkit.

Prior to March of 2021, the combination of GLMs and elastic net regularization was fairly complex. However, researchers at Stanford released a paper that leverages *cyclic…*

“Data preparation accounts for about 80% of the work of data scientists.“ — Forbes

NLP modeling projects are no different — often the most time-consuming step is wrangling data and then developing features from the cleaned data. There are many tools that facilitate this process, but it’s still laborious.

To aid in the feature engineering step, researchers at the University of Central Florida published a 2021 paper that leverages genetic algorithms to remove unimportant tokenized text. Genetic algorithms (GA’s) are evolution-inspired optimizations that perform well on complex data, so they naturally lend well to NLP data. …

Recently, Bayesian A/B testing has gotten lots of publicity because its methods are easy to understand and allow useful calculations, such as the probability that a treatment is better than the control. Bayesian inference also performs much better on small sample sizes; according to a 2019 medium post, Bayesian A/B testing can reduce required sample size by 75%.

While these methods are more computationally expensive than traditional frequentist approaches, they are computed offline, which reduces performance requirements. The main challenge is choosing effective distributions to support inference.

Anyone with an experimentation pipeline and access to a computer can leverage Bayesian…

Invariant Risk Minimization (IRM) is an exciting new learning paradigm that helps predictive models generalize beyond the training data. It was developed by researchers at Facebook and outlined in a 2020 paper. The method can be added to virtually any modeling framework, however it’s suited best for black-box models that leverage lots of data i.e. neural networks and their many flavors.

Without further ado, let’s dive in.

At a high level, IRM is a learning paradigm that attempts to learn causal relationships instead of correlational ones. By developing* training* *environments*, structured samples of data,* *we can maximize accuracy while also…

In January of 2021, researchers at MIT and Harvard developed a paper that outlines a theoretical framework for optimal analysis and design of switchback experiments. Switchback experiments, also known as time split experiments, employ sequential reshuffling of control/treatments to remove bias inherent to certain data. These methods are popular in 2-sided marketplaces, such as Uber and Lyft, because they allow for robust experimentation on data with finite resources (drivers, riders, etc.).

The algorithm outlined in the paper leverages our knowledge about the carryover effect, the time it takes for our finite resource to replenish, to minimize the variance of our…

Air BnB (2018) developed a method to account for a bias called winner’s curse. When estimating the impact of implemented features we often use the A/B testing lift for that feature. However, winner’s curse makes our treatment lifts, on average, overestiamte the true impact of a feature. The adjustment outlined below allows us to account for this bias and develop a more robust estimate of feature impact. See figure 1 for Air BnB’s example.

From an implementation perspective, the bias adjustment simply involves subtracting a term from the feature’s observed lift. It’s computationally efficient and simple to implement. …

I’m a Data Scientist writing 52 posts that bring “academic” research to DS industry. https://www.linkedin.com/in/michael-berk-48783a146/