add loss_functions by Ahmad10Raza · Pull Request #8858 · TheAlgorithms/Python · GitHub

Ahmad10Raza · 2023-07-08T21:29:59Z

Describe your change:

Adding Loss Functions

Description:
This contribution aims to enhance the existing Python codebase by adding commonly used loss functions. These functions are essential in various machine learning and optimization tasks for evaluating the performance of models and algorithms.

The implemented loss functions include:

Mean Squared Error (MSE)
Mean Absolute Error (MAE)
Binary Cross Entropy (BCE)
Categorical Cross Entropy (CCE)
Huber Loss

Each function follows the PEP 8 style guidelines and includes proper documentation, including function descriptions, input parameters, return values, and explanations of their usage.

Contributing these loss functions will provide the community with a more comprehensive set of tools for evaluating model performance and optimizing algorithms in machine learning and related fields.

Implementation Details

The loss functions are implemented as separate Python functions and adhere to the following guidelines:

The functions take appropriate input parameters such as true values and predicted values.
They perform the necessary calculations to compute the respective loss values.
Error handling is incorporated to ensure that input arrays have the same length.
The functions return the computed loss values as floats.
Additionally, type hints are included in the function signatures to improve code readability and maintainability.

Usage Example

To showcase the functionality of the added loss functions, a main function has been included. It demonstrates the usage of each loss function with sample input arrays. The results are printed to the console, allowing users to observe the calculated loss values.

Testing

To ensure the correctness of the implemented loss functions, doctests have been included for each function. These tests verify the expected output against known inputs and edge cases. Running the doctest.testmod() function within the if name == "main" block validates the functionality of the functions and provides immediate feedback on their correctness.

Next Steps

This contribution can be expanded further by adding additional loss functions or enhancing the existing ones with additional features. Additionally, comprehensive unit tests can be created to further validate the correctness and edge cases of the functions.

Add an algorithm?
Fix a bug or typo in an existing algorithm?
Documentation change?

Checklist:

Ahmad10Raza · 2023-07-08T21:32:20Z

added loss function

rohan472000

use ruff . --fix to get rid of ruff issues and also tick the checklists.

rohan472000 · 2023-07-15T11:37:19Z

+    import doctest
+
+    main()
+    doctest.testmod()


you have not included doctests anywhere above in functions, instead of return type of parameters in function you should add doctests.

rohan472000 · 2023-07-15T11:37:52Z

+
+
+def main():
+    # Example usage of the loss functions


no need of main, include doctests in every functions.

tianyizheng02 · 2023-07-25T01:58:56Z

+import math
+
+
+def mean_squared_error(y_true, y_pred):


Add type hints, and let's make them np.ndarrays since using numpy arrays is standard for ML

tianyizheng02 · 2023-07-25T02:01:05Z

+    """
+    Calculates the mean squared error (MSE)
+    between the true and predicted values.
+
+    Args:
+        y_true (array-like): Array of true values.
+        y_pred (array-like): Array of predicted values.
+
+    Returns:
+        float: Mean squared error.
+    """


Could you provide a more detailed explanation of MSE? Remember that this repo is meant for educational purposes, so providing a formula, explaining why this specific loss function is useful, etc would be helpful

tianyizheng02 · 2023-07-25T02:02:42Z

+    Returns:
+        float: Mean squared error.
+    """
+    assert len(y_true) == len(y_pred), "Input arrays must have the same length."


Suggested change

assert len(y_true) == len(y_pred), "Input arrays must have the same length."

if len(y_true) != len(y_pred):

raise ValueError("Input arrays must have the same length")

I think raising an Error is more helpful than an assert statement here because you get to specify the type of error

tianyizheng02 · 2023-07-25T02:03:15Z

+    Returns:
+        float: Mean squared error.
+    """
+    assert len(y_true) == len(y_pred), "Input arrays must have the same length."


You should also check to make sure that the arrays are 1-dimensional

tianyizheng02 · 2023-07-25T02:15:09Z

+    squared_errors = [(true - pred) ** 2 for true, pred in zip(y_true, y_pred)]
+    mse = sum(squared_errors) / len(y_true)


Let's use numpy arrays since those are standard when it comes to doing ML with Python

Instead of calculating the SSE using the formula

$$\mathrm{SSE} = \sum_{i = 1}^{N} (\mathbf{y}_i - \hat{\mathbf{y}}_i)^2$$

consider expressing it in terms of matrix/vector operations:

$$\mathrm{SSE} = (\mathbf{y}_i - \hat{\mathbf{y}}_i)^{\top} (\mathbf{y}_i - \hat{\mathbf{y}}_i) = | \mathbf{y}_i - \hat{\mathbf{y}}_i |^2$$

This is probably more efficient than manually squaring and summing the residuals since numpy typically performs these sorts of operations more efficiently than vanilla Python. Plus, numpy has linear algebra functions for these operations that can make the implementation much simpler.

If you are going to compute it manually in vanilla Python, use a generator instead of a list because generators don't need to load all the elements into memory at once:

sse = sum((true - pred) ** 2 for true, pred in zip(y_true, y_pred)) mse = sse / len(y_true)

tianyizheng02 · 2023-07-25T02:25:05Z

+def mean_absolute_error(y_true, y_pred):
+    """
+    Calculates the mean absolute error (MAE)
+    between the true and predicted values.
+
+    Args:
+        y_true (array-like): Array of true values.
+        y_pred (array-like): Array of predicted values.
+
+    Returns:
+        float: Mean absolute error.
+    """
+    assert len(y_true) == len(y_pred), "Input arrays must have the same length."
+    absolute_errors = [abs(true - pred) for true, pred in zip(y_true, y_pred)]
+    mae = sum(absolute_errors) / len(y_true)
+    return mae


Same comments as above:

Add type hints

Explain the loss function

Raise Error instead of asserting

Use numpy

Use numpy matrix/vector operations (optional but probably more efficient)

tianyizheng02 · 2023-07-25T02:31:10Z

+def binary_cross_entropy(y_true, y_pred):
+    """
+    Calculates the binary cross entropy (BCE)
+    between the true and predicted values.
+
+    Args:
+        y_true (array-like): Array of true values.
+        y_pred (array-like): Array of predicted values.
+
+    Returns:
+        float: Binary cross entropy.
+    """
+    assert len(y_true) == len(y_pred), "Input arrays must have the same length."
+    bce = 0
+    for true, pred in zip(y_true, y_pred):
+        bce += -true * math.log(pred) - (1 - true) * math.log(1 - pred)
+    bce /= len(y_true)
+    return bce


Same comments as above, but also avoid computing with for-loops if you can use comprehensions and built-in functions:

# Could also do this with numpy bce = sum( -true * math.log(pred) - (1 - true) * math.log(1 - pred) for true, pred in zip(y_true, y_pred) ) bce /= len(y_true)

tianyizheng02 · 2023-07-25T02:31:27Z

+def categorical_cross_entropy(y_true, y_pred):
+    """
+    Calculates the categorical cross entropy (CCE)
+    between the true and predicted values.
+
+    Args:
+        y_true (array-like): Array of true values.
+        y_pred (array-like): Array of predicted values.
+
+    Returns:
+        float: Categorical cross entropy.
+    """
+    assert len(y_true) == len(y_pred), "Input arrays must have the same length."
+    cce = 0
+    for true, pred in zip(y_true, y_pred):
+        for t, p in zip(true, pred):
+            cce += -t * math.log(p)
+    cce /= len(y_true)
+    return cce
+


Same comments as above

tianyizheng02 · 2023-07-25T12:57:36Z

+        for t, p in zip(true, pred):
+            cce += -t * math.log(p)


Are y_true and y_pred meant to be 1-D arrays? If so, then why the inner for-loop?

tianyizheng02 · 2023-07-25T13:06:55Z

+def huber_loss(y_true, y_pred, delta=1.0):
+    """
+    Calculates the Huber loss between the true and predicted values.
+
+    Args:
+        y_true (array-like): Array of true values.
+        y_pred (array-like): Array of predicted values.
+        delta (float): Threshold value for Huber loss.
+
+    Returns:
+        float: Huber loss.
+    """
+    assert len(y_true) == len(y_pred), "Input arrays must have the same length."
+    huber_loss = 0
+    for true, pred in zip(y_true, y_pred):
+        error = true - pred
+        if abs(error) <= delta:
+            huber_loss += 0.5 * error**2
+        else:
+            huber_loss += delta * (abs(error) - 0.5 * delta)
+    huber_loss /= len(y_true)
+    return huber_loss


Same comments as above, but FYI assignment expressions from Python 3.8 allow you to do this with comprehensions too:

# Could probably also do this with numpy # This could also be cleaner with a helper function/lambda huber_loss = sum( 0.5 * abs_error**2 if (abs_error := abs(true - pred)) <= delta else delta * (abs_error - 0.5 * delta) for true, pred in zip(y_true, y_pred) ) huber_loss /= len(y_true)

tianyizheng02 · 2023-09-27T16:06:05Z

Closing due to lack of response. We maintainers are currently trying to clear out old PRs in preparation for Hacktoberfest, and we don't have the time to extensively fix this PR ourselves.

add loss_functions

5e2de43

algorithms-keeper Bot added the tests are failing Do not merge until tests pass label Jul 8, 2023

rohan472000 suggested changes Jul 15, 2023

View reviewed changes

tianyizheng02 requested changes Jul 25, 2023

View reviewed changes

tianyizheng02 closed this Sep 27, 2023

	assert len(y_true) == len(y_pred), "Input arrays must have the same length."
	if len(y_true) != len(y_pred):
	raise ValueError("Input arrays must have the same length")

		squared_errors = [(true - pred) ** 2 for true, pred in zip(y_true, y_pred)]
		mse = sum(squared_errors) / len(y_true)

Uh oh!

Conversation

Ahmad10Raza commented Jul 8, 2023

Describe your change:

Adding Loss Functions

Implementation Details

Usage Example

Testing

Next Steps

Checklist:

Uh oh!

Ahmad10Raza commented Jul 8, 2023

Uh oh!

rohan472000 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tianyizheng02 commented Sep 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants