An autoregressive recurrent neural net developed at Amazon

Time series (TS) forecasting is notoriously finicky. That is, until now.

DeepAR Machine Learning Time Series Deep Learning Recurrent Neural Net Long Short Term Memory Likelihood
DeepAR Machine Learning Time Series Deep Learning Recurrent Neural Net Long Short Term Memory Likelihood
Figure 1: DeepAR trained output based on this tutorial. Image by author.

In 2019, Amazon’s research team developed a deep learning method called DeepAR that exhibits a ~15% accuracy boost relative to state-of-the-art TS forecasting models. It’s robust out-of-the-box and can learn from many different time series’, so if you have lots of choppy data, DeepAR could be an effective solution.

From an implementation perspective, DeepAR is more computationally complex than other TS methods. It also requires more data than traditional TS forecasting methods such as ARIMA or Facebook’s prophet.

That being said, if you have lots of complex data and…


A few tips and tricks for learning data science concepts.

On my journey towards writing 52 posts about data science, I realized I don’t know how to read a paper. These are my notes…

Photo by Mehdi on Unsplash

What the internet thinks?

After writing my 5th post, I realized that I was spending too much time reading papers. I’m not super proud to admit it, but I spent ~8 hours just understanding the method. So, over the weekend I did some research on how to read a scientific paper.

The most common piece of advice is that you shouldn’t read papers linearly i.e. start to finish. Instead, you should start high-level and just read the abstract, introduction, and…


A method developed by LinkedIn to approximate long-term metrics

When running A/B tests, we often try to improve long-term metrics. However, to properly measure a long-term metric’s impact requires long experiment durations.

A/B Testing Surrogate Metrics Variance Reduction Sample Size Duration Data Science Short Term
A/B Testing Surrogate Metrics Variance Reduction Sample Size Duration Data Science Short Term

To tackle this problem, researchers at LinkedIn published a 2019 paper that outlines a method for replacing long-term metrics with predictions using short-term covariates.

The prediction method is left to the user, however the paper outlines requirements that ensure statistical validity. Relative to other methods covered in this series, surrogate metrics are quite labor intensive because you must develop a robust forecasting model. …


Here’s how to make them useful…

Photo by Michal Matlon on Unsplash

Statistics is based on assumptions. If those assumptions become invalid, all conclusions based on those assumptions likewise become invalid.

“All models are wrong but some are useful” — George E.P. Box

An assumption-lean approach proposed by statisticians at the University of Pennsylvania outlines how to develop more defendable conclusions from our data. The method describes a new language for interpreting model coefficients as well as confidence interval calculations that are robust to an assumption-lean approach.

While there is nuance in the method, it’s very computationally efficient and straightforward to implement.

Technical TLDR

  • In practice, guaranteeing a correctly specified model is very difficult…


Thoughts and Theory

A new algorithm developed by Stanford researchers and its application in R.

Generalized Linear Models (GLMs) are one of the most widely used inferential modeling techniques. Their simplicity makes them easy to interpret, so when communicating causal inference to stakeholders they’re a very effective tool.

Elastic net regularization, a widely used regularization method, is a logical pairing with GLMs — it removes unimportant and highly correlated features, which can hurt both accuracy and inference. These two methods are a useful part of any data science toolkit.

Elastic Net Regularization Data Science Generalized Linear Model OLS GLMs glmnet R python inferential modeling
Elastic Net Regularization Data Science Generalized Linear Model OLS GLMs glmnet R python inferential modeling
Photo by JESHOOTS.COM on Unsplash

Prior to March of 2021, the combination of GLMs and elastic net regularization was fairly complex. However, researchers at Stanford released a paper that leverages cyclic…


Why GA’s are effective for preprocessing NLP data

NLP, Genetic Algorithm, Extractive Summarization, Token, Gram, Vocabulary
NLP, Genetic Algorithm, Extractive Summarization, Token, Gram, Vocabulary
Figure 1: genetic algorithm training a red square to avoid blue rectangles. Image by author.

“Data preparation accounts for about 80% of the work of data scientists.“ — Forbes

NLP modeling projects are no different — often the most time-consuming step is wrangling data and then developing features from the cleaned data. There are many tools that facilitate this process, but it’s still laborious.

To aid in the feature engineering step, researchers at the University of Central Florida published a 2021 paper that leverages genetic algorithms to remove unimportant tokenized text. Genetic algorithms (GA’s) are evolution-inspired optimizations that perform well on complex data, so they naturally lend well to NLP data. …


What is Bayesian A/B testing and when should you use it?

Bayesian A/B Testing, Statistics, Data Science, Experiments
Bayesian A/B Testing, Statistics, Data Science, Experiments
The A/B testing dilemma. Image by author.

Recently, Bayesian A/B testing has gotten lots of publicity because its methods are easy to understand and allow useful calculations, such as the probability that a treatment is better than the control. Bayesian inference also performs much better on small sample sizes; according to a 2019 medium post, Bayesian A/B testing can reduce required sample size by 75%.

While these methods are more computationally expensive than traditional frequentist approaches, they are computed offline, which reduces performance requirements. The main challenge is choosing effective distributions to support inference.

Anyone with an experimentation pipeline and access to a computer can leverage Bayesian…


A new technique developed by Facebook’s AI research team.

Invariant Risk Minimization (IRM) is an exciting new learning paradigm that helps predictive models generalize beyond the training data. It was developed by researchers at Facebook and outlined in a 2020 paper. The method can be added to virtually any modeling framework, however it’s suited best for black-box models that leverage lots of data i.e. neural networks and their many flavors.

Without further ado, let’s dive in.

0. Technical TLDR

At a high level, IRM is a learning paradigm that attempts to learn causal relationships instead of correlational ones. By developing training environments, structured samples of data, we can maximize accuracy while also…


Thoughts and Theory

An algorithm that determines the most effective randomization points for a switchback experiment.

In January of 2021, researchers at MIT and Harvard developed a paper that outlines a theoretical framework for optimal analysis and design of switchback experiments. Switchback experiments, also known as time split experiments, employ sequential reshuffling of control/treatments to remove bias inherent to certain data. These methods are popular in 2-sided marketplaces, such as Uber and Lyft, because they allow for robust experimentation on data with finite resources (drivers, riders, etc.).

The algorithm outlined in the paper leverages our knowledge about the carryover effect, the time it takes for our finite resource to replenish, to minimize the variance of our…


Air BnB’s method for estimating experimentation impact

Air BnB (2018) developed a method to account for a bias called winner’s curse. When estimating the impact of implemented features we often use the A/B testing lift for that feature. However, winner’s curse makes our treatment lifts, on average, overestiamte the true impact of a feature. The adjustment outlined below allows us to account for this bias and develop a more robust estimate of feature impact. See figure 1 for Air BnB’s example.

Data science, A/B testing, Experimentation, Lift
Data science, A/B testing, Experimentation, Lift
Figure 1: Test of the accuracy of Air BnB’s bias adjustment, run on 7 experiments. Note these numbers are adapted from the paper.

From an implementation perspective, the bias adjustment simply involves subtracting a term from the feature’s observed lift. It’s computationally efficient and simple to implement. …

Michael Berk

I’m a Data Scientist writing 52 posts that bring “academic” research to DS industry. https://www.linkedin.com/in/michael-berk-48783a146/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store