Sparkitecture
  • Welcome to Sparkitecture!
  • Cloud Service Integration
    • Azure Storage
    • Azure SQL Data Warehouse / Synapse
    • Azure Data Factory
  • Data Preparation
    • Reading and Writing Data
    • Shaping Data with Pipelines
    • Other Common Tasks
  • Machine Learning
    • About Spark MLlib
    • Classification
      • Logistic Regression
      • Naïve Bayes
      • Decision Tree
      • Random Forest
      • Gradient-Boosted Trees
    • Regression
      • Linear Regression
      • Decision Tree
      • Random Forest
      • Gradient-Boosted Trees
    • MLflow
    • Feature Importance
    • Model Saving and Loading
    • Model Evaluation
  • Streaming Data
    • Structured Streaming
  • Operationalization
    • API Serving
    • Batch Scoring
  • Natural Language Processing
    • Text Data Preparation
    • Model Evaluation
  • Bioinformatics and Genomics
    • Glow
Powered by GitBook
On this page

Was this helpful?

Export as PDF
  1. Machine Learning

MLflow

MLflow is an open source library by the Databricks team designed for managing the machine learning lifecycle. It allows for the creation of projects, tracking of metrics, and model versioning.

PreviousGradient-Boosted TreesNextFeature Importance

Last updated 4 years ago

Was this helpful?

Install mlflow using pip

pip install mlflow

MLflow can be used in any Spark environmnet, but the automated tracking and UI of MLflow is Databricks-Specific Functionality.

Track metrics and parameters

import mlflow

## Log Parameters and Metrics from your normal MLlib run
with mlflow.start_run():
  # Log a parameter (key-value pair)
  mlflow.log_param("alpha", 0.1)

  # Log a metric; metrics can be updated throughout the run
  mlflow.log_metric("AUC", 0.871827)
  mlflow.log_metric("F1", 0.726153)
  mlflow.log_metric("Precision", 0.213873)

MLflow GitHub:

https://github.com/mlflow/mlflow/