Using a Cost Functional to Optimize Hyperparameters Using Cross Validation

** Updated:**

This blog post discusses the importance of cost functions in mathematical optimization and how it applies to machine learning problems. The author argues that formulating the optimization of the performance metrics of a machine learning classifier in terms of a cost function is better than optimizing a single metric because it provides a more comprehensive and flexible framework for optimization, can help to address the trade-off between model complexity and performance, and can lead to better performance and generalization. An example of this formulation is provided for a binary classifier.

## What is a cost function?

In mathematical optimization, a cost function is a mathematical function that represents the cost or objective to be minimized or maximized in a given optimization problem. The cost function defines the relationship between the input variables and the output values of the problem. In optimization problems, the goal is to find the input values that minimize or maximize the cost function, subject to certain constraints. The cost function plays a crucial role in defining the optimization problem and in guiding the search for the optimal solution. The choice of the cost function depends on the nature of the problem and the desired optimization criteria. In many real-world problems, the cost function may be a complex, nonlinear function that requires advanced mathematical tools and techniques to be analyzed and optimized.

## Advantages of Using a Cost Functional to Optimize Hyperparameters Using Cross Validation

Formulating the optimization of the performance metrics of a machine learning classifier in terms of a cost function is better than optimizing a single metric for several reasons.

Firstly, machine learning problems often involve multiple metrics that need to be optimized simultaneously, such as accuracy, precision, recall, F1-score, and others. However, optimizing a single metric in isolation may not necessarily lead to the best overall performance of the classifier. For example, optimizing only for accuracy may lead to a model that performs poorly on a specific class or in a specific context.

Secondly, a cost function can provide a more comprehensive and flexible framework for optimization. By defining a cost function that combines multiple metrics and incorporates domain-specific constraints and preferences, we can optimize the model for a specific task and context in a more principled and efficient way.

Thirdly, a cost function can also help to address the trade-off between model complexity and performance. By penalizing complex models that are prone to overfitting, we can ensure that the model is not only accurate but also robust and generalizable.

Overall, formulating the optimization of the performance metrics of a machine learning classifier in terms of a cost function provides a more principled, flexible, and effective approach to model optimization that can lead to better performance and generalization.

## Example

An implementation of this formulation is shown below for a binary classifier.

Import the required libraries.

1
2
3
4
5
6
7
8
9

import sklearn.metrics as sm
from sklearn.datasets import make_classification
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
from collections.abc import Iterable, Callable
import pandas as pd
import numpy as np
from abc import ABC, abstractmethod

Define an abstract cost function.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46

class CostFunction(ABC):
"""Abstract class for cost functions"""
def __init__(self, metrics: Iterable[str], M: 'np.ndarray[float]') -> None:
"""_summary_
Args:
metrics (Iterable[str]): Iterable of strings of the form (metric_name).
M (np.ndarray[float]): Positive definite matrix of size len(metrics).
Raises:
ValueError: _description_
Returns:
_type_: _description_
"""
self.metrics = metrics
self.M = M or np.identity(len(metrics)) # type: ignore
self._check_positive_definite(self.M)
@abstractmethod
def functional(self, y_true: 'np.ndarray[float]', y_pred: 'np.ndarray[float]') -> float:
"""_summary_
Args:
y_true (np.ndarray[float]): Array-like of true labels of length N.
y_pred (np.ndarray[float]): Array-like of predicted labels of length N.
"""
pass
@staticmethod
def _to_array(y: Iterable[float]) -> 'np.ndarray[float]':
return np.fromiter(y, float)
@staticmethod
def _check_positive_definite(M: 'np.ndarray[float]') -> None:
if not np.all(np.linalg.eigvals(M) > 0):
raise ValueError(f'Matrix {M} is not positive definite')
def make_scorer(self) -> Callable:
return sm.make_scorer(self.functional, greater_is_better=False)
def __call__(self, y_true: Iterable[float], y_pred: Iterable[float]) -> float:
y_pred_array = self._to_array(y_pred)
y_true_array = self._to_array(y_true)
return self.functional(y_true_array, y_pred_array)

Define a specific cost function for a classifier. The default performance metrics to optimize for are accuracy, f1, precision, recall, log loss and rocauc.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62

class ClassificationCostFunction(CostFunction):
def __init__(self, metrics: Iterable[str], M: 'np.ndarray[float]' = None, metric_class_opt_val_map: dict[str, tuple[str, float]]=None, proba_threshold: float = 0.5):
"""Defines cost functional for optimization of multiple metrics.
Since this is defined as a loss function, cross validation returns the negative of the score [1].
Args:
metrics (Iterable[str]): Iterable of strings of the form (metric_name).
M (np.ndarray[float]): Positive definite matrix of size len(metrics).
metric_class_map (dict[str, str], optional): Dictionary mapping metric to class or probability of the form {'metric': 'class' or 'proba'}. Defaults to {}.
proba_threshold (float, optional): Probability threshold used to convert probabilities into classes. Defaults to 0.5.
References:
[1] https://github.com/scikit-learn/scikit-learn/issues/2439
Example:
>>> y_true = [0, 0, 0, 1, 1]
>>> y_pred = [0.46, 0.6, 0.29, 0.25, 0.012]
>>> threshold = 0.5
>>> metrics = ["f1_score", "roc_auc_score"]
>>> cf = ClassificationCostFunction(metrics)
>>> np.isclose(cf(y_true, y_pred), 1.41, rtol=1e-01, atol=1e-01)
True
>>> X, y = make_classification()
>>> model = LogisticRegression()
>>> model.fit(X, y)
>>> y_proba = model.predict_proba(X)[:, 1]
>>> cost = cf(y, y_proba)
>>> f1 = getattr(sm, "f1_score")
>>> roc_auc = getattr(sm, "roc_auc_score")
>>> y_pred = np.where(y_proba > 0.5, 1, 0)
>>> scorer_output = np.sqrt((f1(y, y_pred) - 1.0)**2 + (roc_auc(y, y_proba) - 1.0)**2)
>>> np.isclose(cost, scorer_output)
True
"""
super().__init__(metrics, M)
self.proba_threshold = proba_threshold
self.metric_class_opt_val_map = metric_class_opt_val_map or {
"accuracy_score": ("class", 1),
"f1_score": ("class", 1),
"log_loss": ("class", 0),
"precision_score": ("class", 1),
"recall_score": ("class", 1),
"roc_auc_score": ("proba", 1),
}
def _to_class(self, array: 'np.ndarray[float]', metric: str) -> 'np.ndarray[float]':
# sourcery skip: inline-immediately-returned-variable
output = np.where(array > self.proba_threshold, 1, 0) if self.metric_class_opt_val_map[metric][0] == "class" else array
return output
def functional(self, y_true: 'np.ndarray[float]', y_pred: 'np.ndarray[float]') -> float:
self._check_positive_definite(self.M)
opt_values = np.array([self.metric_class_opt_val_map[metric][1] for metric in self.metrics])
metric_values = np.array([getattr(sm, metric)(y_true, self._to_class(y_pred, metric)) for metric in self.metrics])
return np.sqrt(np.dot(np.dot(metric_values - opt_values, self.M), metric_values - opt_values))

Run the code in a grid search strategy.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

metrics = [
"accuracy_score",
"f1_score",
"log_loss",
"precision_score",
"recall_score",
"roc_auc_score"
]
param_grid = {"C": [0.5, 1]}
scorer = ClassificationCostFunction(metrics, proba_threshold=0.5)
cv = GridSearchCV(LogisticRegression(), param_grid, scoring=scorer.make_scorer())
X, y = make_classification()
cv.fit(X, y)
pd.DataFrame.from_dict(cv.cv_results_)

mean_fit_time | std_fit_time | mean_score_time | std_score_time | param_C | params | split0_test_score | split1_test_score | split2_test_score | split3_test_score | split4_test_score | mean_test_score | std_test_score | rank_test_score | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

0 | 0.009353 | 0.003661 | 0.008929 | 0.002612 | 0.5 | {‘C’: 0.5} | -1.732076 | -6.922296 | -1.732076 | -3.464335 | -3.461615 | -3.462480 | 1.895201 | 1 |

1 | 0.006416 | 0.000833 | 0.006427 | 0.000340 | 1 | {‘C’: 1} | -1.732076 | -8.654072 | -1.732076 | -3.464335 | -3.461615 | -3.808835 | 2.543282 | 2 |

## Conclusion

In conclusion, cost functions play a critical role in mathematical optimization problems and are essential in guiding the search for the optimal solution. In machine learning problems, where multiple performance metrics need to be optimized simultaneously, using a cost function provides a more principled and efficient way to optimize the model for a specific task and context. Furthermore, it helps to address the trade-off between model complexity and performance, ensuring that the model is not only accurate but also robust and generalizable. By using a cost function to optimize performance metrics, machine learning practitioners can achieve better performance and generalization on their models, making it a valuable tool for model optimization.

## Comments