Open in app

Sign in

Write

Sign in

Michael Berk
Michael Berk

1.8K Followers

Home

About

Published in

Towards Data Science

·Nov 8, 2022

HyperOpt Demystified

How to automate model tuning with HyperOpt — Do you love tuning models? If your answer is “yes”, this post is not for you. In this blog we will cover the extremely popular automated hyperparameter tuning algorithm called Tree-based Parzen Estimators (TPE). TPE is supported by the open-source package, HyperOpt. By…

Machine Learning

13 min read

HyperOpt Demystified
HyperOpt Demystified
Machine Learning

13 min read


Published in

Dev Genius

·Sep 8, 2022

How to Automate Your Data Infrastructure with Code

What is Terraform and why should you use it — Have you ever used the AWS console? If so, you’ve probably noticed how tedious it can be to manage services. There must be a better way… Well, there is. After the launch of the major cloud providers in the middle 2000’s, there was a need to manage vast data infrastructures…

Terraform

10 min read

How to Automate Your Data Infrastructure with Code
How to Automate Your Data Infrastructure with Code
Terraform

10 min read


Published in

Towards Data Science

·Aug 16, 2022

Demystifying the Parquet File Format

The default file format for any data science workflow — Have you ever used pd.read_csv() in pandas? Well, that command could have run ~50x faster if you had used parquet instead of CSV. In this post we will discuss apache parquet, an extremely efficient and well-supported file format. The post is geared towards data practitioners (ML, DE, DS) so we’ll…

Data Science

8 min read

Demystifying the Parquet File Format
Demystifying the Parquet File Format
Data Science

8 min read


Published in

Towards Data Science

·May 10, 2022

PySpark Data Skew in 5 Minutes

Exactly what you need, and no more — There are lots of overly-complex posts about data skew, a deceptively simple topic. In this post, we will cover the necessary basics in 5minutes. The primary source for this post was Spark: The Definitive Guide and here’s the code. Let’s dive in… What is Data Skew? In spark, data are split into chunk of…

Pyspark

5 min read

PySpark Data Skew in 5 Minutes
PySpark Data Skew in 5 Minutes
Pyspark

5 min read


Published in

Towards Data Science

·May 6, 2022

SQL to PySpark

A quick guide for moving from SQL to PySpark. — If you know SQL but need to work in PySpark, this post is for you! Spark is rapidly becoming one of the most widely adopted frameworks for big data processing. But why work in native PySpark instead of SQL? Well, you don’t have to. PySpark allows you to create a…

Sql

4 min read

SQL to PySpark
SQL to PySpark
Sql

4 min read


Published in

Towards Data Science

·Feb 23, 2022

How does linear regression really work?

The math and intuition behind ordinary least squares (OLS) — Do you know how linear regression measures effects while “holding everything else constant”? Or how it minimizes the sum of squared error? In this post we’ll discuss how, and much more. We will leverage both a matrix algebra and python to understand what’s going on. Where possible, we will go…

Linear Regression

12 min read

How does linear regression really work?
How does linear regression really work?
Linear Regression

12 min read


Published in

Towards Data Science

·Feb 9, 2022

5 Advanced Tips on Python Objects

Python is an object oriented programming language but can behave strangely. If you come from other OOP languages, this post may benefit you — In chapter 8 of Fluent Python, Luciano Ramalho discusses how python objects under the hood. Here will define the fundamental concept behind variable storage in python and explore some relevant notes. Without further ado, let’s dive in. 1 — Python Variables are not Boxes

Python

5 min read

5 Advanced Tips on Python Objects
5 Advanced Tips on Python Objects
Python

5 min read


Published in

Towards Data Science

·Feb 2, 2022

Don’t Use a T-Test for A/B Testing

How to use multiple linear regression to determine ATE and statistical significance — Have you ever wanted to speed up an A/B test? Well, here’s arguably the highest ROI solution for variance reduction in an A/B testing setting. Frequentist experimentation is commonly leveraged for experimentation. However, as compared to Bayesian or sequential regimes, frequentist A/B tests often require large sample sizes, which slows…

Ab Test

7 min read

Don’t Use a T-Test for A/B Testing
Don’t Use a T-Test for A/B Testing
Ab Test

7 min read


Published in

Towards Data Science

·Jan 24, 2022

5 Advanced Tips on Python Decorators

Do you want to write concise, readable, and efficient code? Well, python decorators may help you on your journey. — In chapter 7 of Fluent Python, Luciano Ramalho discusses decorators and closures. They are not super common in basic DS work, however as you start building production models writing async code, they become an invaluable tool. Without further ado, let’s dive in. 1 — What’s a decorator?

Python

5 min read

5 Advanced Tips on Python Decorators
5 Advanced Tips on Python Decorators
Python

5 min read


Published in

Towards Data Science

·Jan 19, 2022

How to Find Weaknesses in Your Machine Learning Models

A possible implementation of IBM’s FreaAI — Any time you simplify data using a summary statistic, you lose information. Model accuracy is no different. When simplifying your model’s fit to a summary statistic, you lose the ability to determine where your performance is lowest/highest and why. In this post we discuss the code behind IBM’s FreaAI, an…

Data Science

8 min read

How to Find Weaknesses in your Machine Learning Models
How to Find Weaknesses in your Machine Learning Models
Data Science

8 min read

Michael Berk

Michael Berk

1.8K Followers

I’m a Data Scientist writing 52 posts that bring academic research to DS industry. https://www.linkedin.com/in/michael-berk-48783a146/

Following
  • Leihua Ye, PhD

    Leihua Ye, PhD

  • Aliaksei Mikhailiuk

    Aliaksei Mikhailiuk

  • Ms Aerin

    Ms Aerin

  • Michael Simmons (blockbuster.thoughtleader.school)

    Michael Simmons (blockbuster.thoughtleader.school)

  • Rachel Berk

    Rachel Berk

See all (16)

Help

Status

About

Careers

Blog

Privacy

Terms

Text to speech

Teams