2.5 Years of MLflow Knowledge in 8 Tips

My learnings from Databricks customer engagements.

7 min readNov 11, 2024

At Databricks, I help large organizations deploy and scale machine learning pipelines. Here are the 8 most important MLflow tips/tricks I’ve learned in the field.

0 — Where is MLflow Strong/Weak?

Answer: MLflow is valuable for teams doing iterative development, but there’s typically a learning curve.

MLflow is an open-source MLOps tool that streamlines the ML development lifecycle. It focuses on the full lifecycle for machine learning projects, ensuring that each phase is manageable, traceable, and reproducible. Check out this tutorial if you need a refresher.

Figure 1: the MLOps development lifecycle supported by MLflow.

Strengths

Team Collaboration: MLflow simplifies model versioning, organization, and deployment, making it easier for many people to collaborate on many models.
Ease of Deployment: MLflow organizes the relevant package and file dependencies needed for model serving. You can serve models with standardized APIs via PyFunc or load them in their original format and leverage the model’s native APIs.
Iterative Model Development: MLflow’s tracking capabilities are designed to support iterative model development. With options for autologging or explicit tracking calls, you can organize and compare development experiments.
Databricks Integration: Much of Databricks machine learning offerings are centered around MLflow. Databricks also hosts edge versions (free for Databricks users) that is highly performant and scalable.

Weaknesses

Complexity: MLflow’s powerful capabilities come with a learning curve that can be challenging for newcomers. This article aims to ease that path.
Documentation: MLflow’s documentation can be hard to navigate. For an easier approach, see tip 2.

In summary, if you’re a technical team doing iterative model development, you should be using an MLOps framework, and MLflow is a good one.

1 — What is the MLflow vocabulary?

Here are essential MLflow terms relevant across MLOps frameworks.

Artifact: A file or object related to a training run, such as datasets, model files, or metrics outputs.
Run: A grouping of logged artifacts. Historically, runs have corresponded to training runs but with GenAI, the definition has expanded.
Experiment: A grouping of runs.
Model (in the model registry): A grouping of related model versions, identified by a unique name.
Model Version (in the model registry): A specific iteration of a model, often associated with a particular run or training cycle.
Model Signature: The schema that specifies the model’s expected input and output formats, including any additional inference parameters.

These definitions have additional complexities that are beyond the scope of this article, but if you have questions, see tip 2.

2 — How should you navigate the MLflow docs?

Answer: use the RunLLM chat in the MLflow docs.

MLflow provides rich documentation, but due to the volume of information, it can be challenging to find what you need.

Figure 2: Location of the RunLLM integration in the MLflow docs.

A new third-party tool called RunLLM has recently been integrated into the documentation, providing a much more efficient way to access code snippets and find answers to your questions.

Figure 3: example output from the RunLLM integration.

3— How should you log a model?

Answer: log_model() with the 4 parameters listed below.

A common mistake for new MLflow users is improper model logging. While log_model() can seem straightforward, it offers a lot of power when configured correctly.

To start, make sure to use the log_model() function for your specific model flavor. For example, use mlflow.sklearn.log_model() for scikit-learn models. If MLflow doesn’t support your model type natively, you can log a custom PyFunc model with mlflow.pyfunc.log_model().

The 4 key parameters to set:

model: The model you want to log.
artifact_path: The relative path where model artifacts will be saved.
registered_model_name: The name to use in the model registry.
input_example: A sample input for inference.

With these four parameters, you get the following benefits out of the box:

A serialized model artifact, accessible through the tracking server or model registry.
A requirements.txt file listing inferred dependencies.
A Conda environment specification with additional dependencies.
Model input/output examples in the MLflow UI.
A model signature to validate inference inputs.
Additional metadata if your model flavor supports autologging.

Here’s a quick snippet for reference.

import mlflow
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier

with mlflow.start_run():
  # Create the model
  X,y = load_iris(return_X_y=True)
  model = RandomForestClassifier().fit(X, y)
  
  # Log the model
  mlflow.sklearn.log_model(
    sk_model=model, 
    artifact_path="local_path_to_model",
    registered_model_name="my_awesome_model",
    input_example=X
  )

Now, if you need additional customization feel free to mess around with both the log_model() and mlflow.register_model() parameters, but the above implementation will handle many use cases.

4 — What is the difference between the model registry and tracking server?

Answer: The tracking server holds the raw data, while the model registry holds metadata about your logged artifacts.

In open source MLflow, the tracking server and model registry are user-managed backends that store your model development data. Here are the key differences:

Tracking Server: Stores actual data from your experiments, such as serialized model files, metrics, and artifacts. It uses object storage solutions (e.g., S3, ADLS, GCS) to manage these files.
Model Registry: Acts as a lightweight metadata layer, organizing and versioning models by storing references (pointers) to artifacts in the tracking server. It typically uses a relational database (e.g., MySQL, PostgreSQL) to manage this metadata.

5 — What is a URI?

Answer: query the run_id and relative_artifact_path

A URI in MLflow is a unique identifier used to locate a specific model or artifact. Given model URIs are the entry point for loading a model, they are used very frequently and thereby important to understand. Here are the best ways to load your model back into memory (using sci-kit learn for demonstration purposes).

############# Via Runs #############
# Use this if you log and load the model in the same Python session

# Option 1: URI recreated from the model info
with mlflow.start_run() as run:
    # Log the model
    model_info = mlflow.sklearn.log_model(
        sk_model=model, 
        artifact_path=artifact_path, 
    )

model_uri = model_info.model_uri

# Option 2: URI recreated from a run object
artifact_path = "my_cool_model"
with mlflow.start_run() as run:
    # Log the model
    mlflow.sklearn.log_model(
        sk_model=model, 
        artifact_path=artifact_path, 
    )

model_uri = f"runs:/{run.info.run_id}/{artifact_path}"

############# Via Model Registry #############
# Use this if you DON'T log and load the model in the same Python session

# Option 1: URI that points to the model registry via a model version
model_name = "my_cool_model"
model_version = 3
model_uri = f"models:/{model_name}/{model_version}"

# Option 2: URI that points to the model registry vai a model alias
model_name = "my_cool_model"
model_alias = "prod"
model_uri = f"models:/{model_name}/{model_alias}"

Note that you can also leverage the MLflowClient or fluent APIs to interact with these models further.

6— How should you make predictions on a spark DataFrame?

Answer: Spark User Defined Functions (UDFs).

MLflow Spark UDFs leverage Pandas UDFs to parallelize your model’s inference.

import mlflow
from pyspark.sql.functions import struct

# Step 1: Train and log your model
# (your model training and logging code here)

# Step 2: Create a Spark DataFrame `df` for inference
# (your DataFrame creation code here)

# Step 3: Perform inference using the logged model
model_uri = "/path/to/logged/model"
custom_predict_udf = mlflow.pyfunc.spark_udf(spark, model_uri)
df.withColumn("prediction", custom_predict_udf(struct("name", "age"))).show()

Here’s why you should use mlflow.pyfunc.spark_udf…

Parallel Inference: Spark UDFs use Spark’s distributed computing capabilities to run inference in parallel.
Optimized Performance: Pandas UDFs are the most performant type of Python UDF in Spark.
Automatic Dependency Management: Dependencies are loaded automatically in the Spark worker context, which simplifies setup.

7 — Use MLflow Tracing for GenAI Agents

Answer: use MLflow trace autologging to track granular agent execution.

If you are building GenAI agents and don’t know about tracing, this will change your life.

Tracking the execution of complex agentic frameworks is essential. Due to their asynchronous and nondeterministic execution, understanding what happened at each step is invaluable.

MLflow offers a high-level tracing API that enables detailed tracking within agent-based frameworks. For many MLflow-supported model flavors, such as LangChain and LlamaIndex, autologging is available — simply call mlflow.{flavor}.autolog() to automatically capture full trace logs for all operations in that flavor.

For packages without native autologging support, you can implement custom tracing using the @mlflow.trace decorator on any Python callable.

In figure 4 below, we can see an exmaple output for a RAG agent. Notice a clear breakdown of execution sequence, duration, and inputs/outputs for the major agentic components.

Figure 4: an example of MLflow tracing UI for a RAG agent.

This granular execution data helps identify bugs, optimize performance, and improve response quality.

Honorable Mentions

The Model from Code feature allows you to define your model directly in Python. This approach bypasses many serialization challenges, especially with Pydantic-based GenAI packages, making model deployment smoother and more flexible.
Custom PyFunc models are incredibly versatile — you can turn nearly anything into a model! While this flexibility can get real janky real fast, they simply involve a basic class implementation that exposes tons of power and versatility.
Model version aliases are an excellent way to decouple your model serving endpoint from the specific model version being served. By assigning a mutable identifier to a model version, you can seamlessly update the served model without changing the endpoint configuration. This flexibility makes it easy to manage and update models in production.

Summary

In summary, MLflow is a powerful MLOps tool for managing the entire machine learning lifecycle, but its full potential is best unlocked through a few specific strategies. By understanding MLflow’s key features, teams can streamline iterative model development, simplify collaboration, and efficiently manage productionization.

Happy coding!