MLFlow
review your work.
MLflow is an open-source product designed to manage the Machine Learning development lifecycle.
MLflow is an important part of machine learning with Azure Databricks, as it integrates key operational processes with the Azure Databricks interface.
MLflow makes it easy for data scientists to train models and make them available without writing a great deal of code.
MLflow allows data scientists to train models, register those models, deploy the models to a web server, and manage model updates.
MLflow will also operate on workloads outside of Azure Databricks.
There are four components to MLflow:
· MLflow Tracking
o MLflow Tracking allows data scientists to work with experiments.
o MLflow Tracking is built around runs, that is, executions of code for a data science task.
o Each run contains several key attributes, including:
- Parameters: Key-value pairs, which represent inputs. Use parameters to track hyperparameters, that is, inputs to functions, which affect the machine learning process.
- Metrics: Key-value pairs, which represent how the model is performing. This can include evaluation measures such as Root Mean Square Error, and metrics can be updated throughout the course of a run. This allows a data scientist, for example, to track Root Mean Square Error for each epoch of a neural network.
- Artifacts: Output files. Artifacts may be stored in any format, and can include models, images, log files, data files, or anything else, which might be important for model analysis and understanding.
o
· MLflow Projects
o An MLflow Project is a way of packaging up code in a manner, which allows for consistent deployment and the ability to reproduce results.
o MLflow supports several environments for projects, including via Conda, Docker, and directly on a system.
o Each project includes at least one entry point, which is a file (either .py or .sh) that is intended to act as the starting point for project use.
o Projects also specify details about the environment.
o .This includes the specific packages (and versions of packages) used in developing the project, as new versions of packages may include breaking changes.
· MLflow Models
o MLflow offers a standardized format for packaging models for distribution.
o This standardized model format allows MLflow to work with models generated from several popular libraries, including scikit-learn
, Keras
, MLlib
, ONNX
, and more
o MLflow allows models to be of a particular flavor, which is a descriptor of which tool or library generated a model.
o Each model has a signature, which describes the expected inputs and outputs for the model.
o A model in MLflow is a directory containing an arbitrary set of files along with an MLmodel file in the root of the directory.
o MLflow allows models to be of a particular flavor, which is a descriptor of which tool or library generated a model.
o This allows MLflow to work with a wide variety of modeling libraries, such as scikit-learn
, Keras
, MLlib
, ONNX
, and many more.
· MLflow Model Registry
o The MLflow Model Registry allows data scientists to register models in a registry.
o Each registered model may have multiple versions, which allow a data scientist to keep track of model changes over time.
o Each model version may be in one stage, such as Staging, Production, or Archived.
o Data scientists and administrators may transition a model version from one stage to the next.
MLflow experiments allow data scientists to track training runs in a collection called an experiment.
This is useful for comparing changes over time or comparing the relative performance of models with different hyperparameter values.
Creating an experiment in Azure Databricks happens automatically when you start a run.
Here is an example of starting a run in MLflow, logging two parameters, and logging one metric:
PythonCopy
with mlflow.start_run():
mlflow.log_param(“input1”, input1)
mlflow.log_param(“input2”, input2)
# Perform operations here like model training.
mlflow.log_metric(“rmse”, rmse)
In this case, the experiment’s name will be the name of the notebook.
It is possible to export a variable named MLFLOW_EXPERIMENT_NAME to change the name of your experiment should you choose.