implement-batch-norm-layer #217

m0saan · 2023-12-19T02:51:47Z

Proposed changes

Description

This pull request introduces implementation of Batch Normalization, following the specifications outlined in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.

Changes Made

Added a new class BatchNorm1d that extends the Module class.
The implementation includes the forward pass logic for Batch Normalization.
Options for configurable parameters such as eps (numerical stability constant), momentum (for running mean and variance updates), and affine (whether to include learnable affine parameters).
Provided examples in the documentation to demonstrate how to use the BatchNorm1d module with and without learnable parameters.

Usage

import mlx.core as mx
import mlx.nn as nn

# With Learnable Parameters
m = nn.BatchNorm1d(100)
# Without Learnable Parameters
m = nn.BatchNorm1d(4, affine=False)
input = mx.random.normal(20, 4)
output = m(input)

Notes

The implementation ensures compatibility with mlx conventions and practices.

Please review and provide feedback.

Checklist

Put an x in the boxes that apply.

I have read the CONTRIBUTING document
I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
I have added tests that prove my fix is effective or that my feature works
I have updated the necessary documentation (if needed)

python/mlx/nn/layers/normalization.py

m0saan · 2023-12-19T11:51:48Z

Hey @awni, I've been thinking about how we should structure the batch normalization module. Do you think it's a good idea to have one class that covers all batch normalization types (like BatchNorm1d, BatchNorm2d, BatchNorm3d), or do we go down the road of having separate classes for each type? I'd love to know what you think!

awni · 2023-12-19T15:34:39Z

Adds #204 #216

awni · 2023-12-19T15:41:12Z

From an implementation standpoint, I think having a single BatchNorm with the ability to specify a tuple of axes is a good idea, I would also go with that from an API standpoint for now since it's pretty clean and general (compared to adding ND options).

@gboduljak Has an implementation in #216

m0saan · 2023-12-19T17:55:41Z

From an implementation standpoint, I think having a single BatchNorm with the ability to specify a tuple of axes is a good idea, I would also go with that from an API standpoint for now since it's pretty clean and general (compared to adding ND options).

@gboduljak Has an implementation in #216

I fully agree with the idea of a unified BatchNorm. It's a clean and versatile approach for both implementation and the API.

m0saan · 2023-12-19T18:30:50Z

@awni Here is an updated version of BN, that is general:

from typing import Tuple

import mlx.core as mx
from mlx.nn.layers.base import Module

class BatchNorm(Module):
    def __init__(
        self,
        num_features: int,
        num_dims: int,
        eps: float = 1e-5,
        momentum: float = 0.1,
        affine: bool = True,
        track_running_stats: bool = True,
    ):
        super().__init__()

        dims_dict = {
            2: ((1, num_features), (0,)),
            3: ((1, num_features, 1), (0, 2)),
            4: ((1, num_features, 1, 1), (0, 2, 3)),
        }

        if num_dims not in dims_dict:
            raise ValueError(f"expected num_dims to be 2, 3, or 4 (got {num_dims})")

        shape, self.reduction_axes = dims_dict[num_dims]
        self.num_features = num_features
        self.eps = eps
        self.momentum = momentum
        self.affine = affine
        self.track_running_stats = track_running_stats

        if self.affine:
            self.weight = mx.ones(shape)
            self.bias = mx.zeros(shape)

        if self.track_running_stats:
            self.running_mean = mx.zeros(shape)
            self.running_var = mx.ones(shape)

    def _extra_repr(self):
        return f"{self.num_features}, eps={self.eps}, momentum={self.momentum}, affine={'weight' in self}, track_running_stats={self.track_running_stats}"

    def _calc_stats(self, x: mx.array) -> Tuple[mx.array, mx.array]:
        """
        Calculate the mean and variance of the input tensor.

        Args:
            x (mx.array): Input tensor.

        Returns:
            tuple: Tuple containing mean and variance.
        """

        
        means = mx.mean(x, axis=self.reduction_axes, keepdims=True)
        var = mx.var(x, axis=self.reduction_axes, keepdims=True)

        if self.track_running_stats and self.training:
            self.running_mean = (
                1 - self.momentum
            ) * self.running_mean + self.momentum * means
            self.running_var = (
                1 - self.momentum
            ) * self.running_var + self.momentum * var
        return means, var

    def __call__(self, x: mx.array):
        """
        Forward pass of BatchNorm1d.

        Args:
            x (mx.array): Input tensor.

        Returns:
            mx.array: Output tensor.
        """

        if self.training or not self.track_running_stats:
            means, var = self._calc_stats(x)
        else:
            means, var = self.running_mean, self.running_var
        x = (x - means) * mx.rsqrt(var + self.eps)
        return (self.weight * x + self.bias) if "weight" in self else x
        # return x

but can be used as follow:

batch_size = 4
num_features = 32
num_iters = 5
input = mx.random.normal((batch_size, num_features))
bn = BatchNorm(num_features=num_features, num_dims=2)
output = bn(input)

m0saan · 2023-12-19T19:49:56Z

We can remove the num_dims parameter by updating the implementation like so.

class BatchNorm(Module):
    def __init__(
        self,
        num_features: int,
        eps: float = 1e-5,
        momentum: float = 0.1,
        affine: bool = True,
        track_running_stats: bool = True,
    ):
        super().__init__()

        self.num_features = num_features
        self.eps = eps
        self.momentum = momentum
        self.affine = affine
        self.track_running_stats = track_running_stats

        if self.affine:
            self.weight = mx.ones((num_features,))
            self.bias = mx.zeros((num_features,))

        if self.track_running_stats:
            self.running_mean = mx.zeros((num_features,))
            self.running_var = mx.ones((num_features,))

    def _extra_repr(self):
        return f"{self.num_features}, eps={self.eps}, momentum={self.momentum}, affine={'weight' in self}, track_running_stats={self.track_running_stats}"
    
    def _check_and_expand_dims(self, x: mx.array):
        """
        Check if the input is a 2D or 3D tensor and expand the weight, bias, running mean, and running variance accordingly.

        Args:
            x (mx.array): Input tensor.
        """
        
        num_dims = len(x.shape)
        dims_dict = {
            2: ((1, self.num_features), (0,)),
            3: ((1, self.num_features, 1), (0, 2)),
            4: ((1, self.num_features, 1, 1), (0, 2, 3)),
        }

        if num_dims not in dims_dict:
            raise ValueError(f"expected num_dims to be 2, 3, or 4 (got {num_dims})")

        shape, self.reduction_axes = dims_dict[num_dims]
        
        if self.affine and self.weight.ndim != num_dims:
            self.weight = mx.expand_dims(self.weight, self.reduction_axes)
            self.bias = mx.expand_dims(self.bias, self.reduction_axes)
        
        if self.track_running_stats and self.running_mean.ndim != num_dims:
            self.running_mean = mx.expand_dims(self.running_mean, self.reduction_axes)
            self.running_var = mx.expand_dims(self.running_var, self.reduction_axes)

    def _calc_stats(self, x: mx.array) -> Tuple[mx.array, mx.array]:
        """
        Calculate the mean and variance of the input tensor.

        Args:
            x (mx.array): Input tensor.

        Returns:
            tuple: Tuple containing mean and variance.
        """

        means = mx.mean(x, axis=self.reduction_axes, keepdims=True)
        var = mx.var(x, axis=self.reduction_axes, keepdims=True)

        if self.track_running_stats and self.training:
            self.running_mean = (
                1 - self.momentum
            ) * self.running_mean + self.momentum * means
            self.running_var = (
                1 - self.momentum
            ) * self.running_var + self.momentum * var
        return means, var

    def __call__(self, x: mx.array):
        """
        Forward pass of BatchNorm1d.

        Args:
            x (mx.array): Input tensor.

        Returns:
            mx.array: Output tensor.
        """
        
        self._check_and_expand_dims(x)

        if self.training or not self.track_running_stats:
            means, var = self._calc_stats(x)
        else:
            means, var = self.running_mean, self.running_var
        x = (x - means) * mx.rsqrt(var + self.eps)
        return (self.weight * x + self.bias) if "weight" in self else x

gboduljak · 2023-12-19T21:59:34Z

We can remove the num_dims parameter by updating the implementation like so.

class BatchNorm(Module):
    def __init__(
        self,
        num_features: int,
        eps: float = 1e-5,
        momentum: float = 0.1,
        affine: bool = True,
        track_running_stats: bool = True,
    ):
        super().__init__()

        self.num_features = num_features
        self.eps = eps
        self.momentum = momentum
        self.affine = affine
        self.track_running_stats = track_running_stats

        if self.affine:
            self.weight = mx.ones((num_features,))
            self.bias = mx.zeros((num_features,))

        if self.track_running_stats:
            self.running_mean = mx.zeros((num_features,))
            self.running_var = mx.ones((num_features,))

    def _extra_repr(self):
        return f"{self.num_features}, eps={self.eps}, momentum={self.momentum}, affine={'weight' in self}, track_running_stats={self.track_running_stats}"
    
    def _check_and_expand_dims(self, x: mx.array):
        """
        Check if the input is a 2D or 3D tensor and expand the weight, bias, running mean, and running variance accordingly.

        Args:
            x (mx.array): Input tensor.
        """
        
        num_dims = len(x.shape)
        dims_dict = {
            2: ((1, self.num_features), (0,)),
            3: ((1, self.num_features, 1), (0, 2)),
            4: ((1, self.num_features, 1, 1), (0, 2, 3)),
        }

        if num_dims not in dims_dict:
            raise ValueError(f"expected num_dims to be 2, 3, or 4 (got {num_dims})")

        shape, self.reduction_axes = dims_dict[num_dims]
        
        if self.affine and self.weight.ndim != num_dims:
            self.weight = mx.expand_dims(self.weight, self.reduction_axes)
            self.bias = mx.expand_dims(self.bias, self.reduction_axes)
        
        if self.track_running_stats and self.running_mean.ndim != num_dims:
            self.running_mean = mx.expand_dims(self.running_mean, self.reduction_axes)
            self.running_var = mx.expand_dims(self.running_var, self.reduction_axes)

    def _calc_stats(self, x: mx.array) -> Tuple[mx.array, mx.array]:
        """
        Calculate the mean and variance of the input tensor.

        Args:
            x (mx.array): Input tensor.

        Returns:
            tuple: Tuple containing mean and variance.
        """

        means = mx.mean(x, axis=self.reduction_axes, keepdims=True)
        var = mx.var(x, axis=self.reduction_axes, keepdims=True)

        if self.track_running_stats and self.training:
            self.running_mean = (
                1 - self.momentum
            ) * self.running_mean + self.momentum * means
            self.running_var = (
                1 - self.momentum
            ) * self.running_var + self.momentum * var
        return means, var

    def __call__(self, x: mx.array):
        """
        Forward pass of BatchNorm1d.

        Args:
            x (mx.array): Input tensor.

        Returns:
            mx.array: Output tensor.
        """
        
        self._check_and_expand_dims(x)

        if self.training or not self.track_running_stats:
            means, var = self._calc_stats(x)
        else:
            means, var = self.running_mean, self.running_var
        x = (x - means) * mx.rsqrt(var + self.eps)
        return (self.weight * x + self.bias) if "weight" in self else x

@m0saan your final suggestion looks great. Your dims_dict captures commonly used scenarios well. However, I think that more flexibility in selection of reduction axes and feature axes is beneficial. Maybe we cannot always infer these from the shape of the array. To this end, we can maybe keep reduction_axes, feature_axes and num_features arguments in the constructor of BatchNorm. However, we can extract your dims_dict outside of the BatchNorm class and use it in an enum.

Then we can use BatchNorm as follows,

bn = mx.layers.BatchNorm(
   num_features=16, 
   reduction_axes=BatchNormReductionAxes.2D, 
   feature_axes=BatchNormFeatureAxes.2D
)

We can implement some validations, e.g. axis cannot be both a reduction axis and a feature axis.
However, I am not sure that this generality is necessary. I am also not sure what are the performance implications of using expand_dims. Thus, a 'static' implementation as above may be faster.

gboduljak · 2023-12-19T22:03:18Z

In this discussion post, I included some ideas to test batch norm layers. I think it is important to verify we match PyTorch and/or Jax implementations. Maybe you can use these tests. I can also add them. Your tests look good as well, but we may want to test whether we are doing moving stats tracking correctly. It would be also beneficial to test that BatchNorm is behaving correctly in train/eval mode.

m0saan · 2023-12-20T05:52:18Z

Hey @gboduljak, thanks a lot for your input! I really value your ideas on making the BatchNorm class more flexible. We would like to maintain the simplicity and user-friendliness of the framework. While your proposed changes provide additional options, they may also introduce complexity that might be unnecessary for many use cases. Maybe @awni has some thoughts on this too?

Regarding your point on the performance impact of using expand_dims, I get your concern. We're doing it just once to ensure all created arrays have shapes suitable for broadcasting. In my opinion, it's not a big issue, but I'd love to hear @awni's thoughts to make sure!

m0saan · 2023-12-20T05:54:15Z

In this discussion post, I included some ideas to test batch norm layers. I think it is important to verify we match PyTorch and/or Jax implementations. Maybe you can use these tests. I can also add them. Your tests look good as well, but we may want to test whether we are doing moving stats tracking correctly. It would be also beneficial to test that BatchNorm is behaving correctly in train/eval mode.

I will incorporate these ideas into the testing of BN. If you have additional tests to add, feel free to include them, and we can collaborate to ensure comprehensive testing.

rickypang0219 · 2023-12-20T13:43:55Z

Could someone explain to me what is the axes in num_dict referring to? I know that having this dict we can generalise the BatchNorm Class to higher-dimension but I do not understand what is the meaning of (1, self.num_features), (1, self.num_features, 1), and (1, self.num_features, 1,1)

dims_dict = {
            2: ((1, self.num_features), (0,)),
            3: ((1, self.num_features, 1), (0, 2)),
            4: ((1, self.num_features, 1, 1), (0, 2, 3)),
        }

Besides, if we generalise to N-dimensional BatchNorm, does dims_dict be like

dims_dict = {
            N : (1, self.num_features, *( 1 for I in range(3,N)) )  , tuple( i for i in range(N) if i != 1 ) 
        }

gboduljak · 2023-12-20T14:08:28Z

Could someone explain to me what is the axes in num_dict referring to? I know that having this dict we can generalise the BatchNorm Class to higher-dimension but I do not understand what is the meaning of (1, self.num_features), (1, self.num_features, 1), and (1, self.num_features, 1,1)
dims_dict = {
            2: ((1, self.num_features), (0,)),
            3: ((1, self.num_features, 1), (0, 2)),
            4: ((1, self.num_features, 1, 1), (0, 2, 3)),
        }

Depending on the BatchNorm you want to implement (e.g. 1D, 2D), you want to normalize over different axes.
For example, in BatchNorm1D, your input is of shape [batch_dim, num_features] and you normalize over the batch axis (0). This means you also have num_features scale and shift parameters. To broadcast correctly when scaling and shifting your normalized inputs, your parameter shape is [1, num_features]. If you use BatchNorm2D, your input is of shape [batch_dim, num_features, height, width] and you want to normalize over all axes except the channel axis (axis 1). Thus, you normalize over axes (0, 2, 3). You also have num_features parameters, but you need to keep them in a tensor of shape [1, num_features, 1, 1] to broadcast correctly when scaling and shifting after normalization. Hope this helps :)

gboduljak · 2023-12-20T17:49:00Z

We can remove the num_dims parameter by updating the implementation like so.

class BatchNorm(Module):
    def __init__(
        self,
        num_features: int,
        eps: float = 1e-5,
        momentum: float = 0.1,
        affine: bool = True,
        track_running_stats: bool = True,
    ):
        super().__init__()

        self.num_features = num_features
        self.eps = eps
        self.momentum = momentum
        self.affine = affine
        self.track_running_stats = track_running_stats

        if self.affine:
            self.weight = mx.ones((num_features,))
            self.bias = mx.zeros((num_features,))

        if self.track_running_stats:
            self.running_mean = mx.zeros((num_features,))
            self.running_var = mx.ones((num_features,))

    def _extra_repr(self):
        return f"{self.num_features}, eps={self.eps}, momentum={self.momentum}, affine={'weight' in self}, track_running_stats={self.track_running_stats}"
    
    def _check_and_expand_dims(self, x: mx.array):
        """
        Check if the input is a 2D or 3D tensor and expand the weight, bias, running mean, and running variance accordingly.

        Args:
            x (mx.array): Input tensor.
        """
        
        num_dims = len(x.shape)
        dims_dict = {
            2: ((1, self.num_features), (0,)),
            3: ((1, self.num_features, 1), (0, 2)),
            4: ((1, self.num_features, 1, 1), (0, 2, 3)),
        }

        if num_dims not in dims_dict:
            raise ValueError(f"expected num_dims to be 2, 3, or 4 (got {num_dims})")

        shape, self.reduction_axes = dims_dict[num_dims]
        
        if self.affine and self.weight.ndim != num_dims:
            self.weight = mx.expand_dims(self.weight, self.reduction_axes)
            self.bias = mx.expand_dims(self.bias, self.reduction_axes)
        
        if self.track_running_stats and self.running_mean.ndim != num_dims:
            self.running_mean = mx.expand_dims(self.running_mean, self.reduction_axes)
            self.running_var = mx.expand_dims(self.running_var, self.reduction_axes)

    def _calc_stats(self, x: mx.array) -> Tuple[mx.array, mx.array]:
        """
        Calculate the mean and variance of the input tensor.

        Args:
            x (mx.array): Input tensor.

        Returns:
            tuple: Tuple containing mean and variance.
        """

        means = mx.mean(x, axis=self.reduction_axes, keepdims=True)
        var = mx.var(x, axis=self.reduction_axes, keepdims=True)

        if self.track_running_stats and self.training:
            self.running_mean = (
                1 - self.momentum
            ) * self.running_mean + self.momentum * means
            self.running_var = (
                1 - self.momentum
            ) * self.running_var + self.momentum * var
        return means, var

    def __call__(self, x: mx.array):
        """
        Forward pass of BatchNorm1d.

        Args:
            x (mx.array): Input tensor.

        Returns:
            mx.array: Output tensor.
        """
        
        self._check_and_expand_dims(x)

        if self.training or not self.track_running_stats:
            means, var = self._calc_stats(x)
        else:
            means, var = self.running_mean, self.running_var
        x = (x - means) * mx.rsqrt(var + self.eps)
        return (self.weight * x + self.bias) if "weight" in self else x

I just came up with a new idea to avoid repeatedly calling _check_and_expand_dims. In (very) large models, this may cause some overhead, due to the cost of setting up a new call frame in Python. We can use the condition self.weight.ndim != num_dims within __call__ to determine whether it is necessary to expand dims. If it is, we can call _setup_parameter_shape, implementing your logic. Alternatively, we may just include all the logic for the expansion in the __call__. The same holds for _calc_stats. Perhaps we can include _calc_stats within __call__ to eliminate the function call.

@m0saan what do you think?

m0saan · 2023-12-22T08:26:22Z

Hello @gboduljak, I apologize for the delayed response. Using self.weight.ndim != num_dims may not be effective in all scenarios. Consider a situation where self.affine is set to False; in this case, there is no weight parameter in the BN Module, rendering the condition self.weight.ndim != num_dims inappropriate. To address this, we could introduce an additional check, such as if self.affine and self.weight.ndim != num_dims. However, implementing this check may pose a challenge because it prevents the reshaping of moving_mean and moving_var.

m0saan · 2023-12-22T09:33:19Z

@awni can you please review?

awni · 2023-12-22T17:07:46Z

Sorry for the delay in reviewing this, we were busy getting v0.0.6 out yesterday w/ quantization etc. I will get on this asap.

m0saan · 2023-12-22T18:45:50Z

Sorry for the delay in reviewing this, we were busy getting v0.0.6 out yesterday w/ quantization etc. I will get on this asap.

sure, thanks. I have a question, while trying to make sure that the stats calculated by BatchNorm layer is correct I noticed this, the variance is somehow is a bit not equal to the one calculated by PyTorch:

but it is the same as numpy:

robertmccraith

based on trying this layer in MIMM using the mimm/scripts/train.py to train imagenet

python/mlx/nn/layers/normalization.py

awni

This is really nice! I have one request that I think will make it perfect then we merge it.

awni · 2023-12-24T03:11:42Z

python/mlx/nn/layers/normalization.py

+    The input tensor shape is specified as (N, C) or (N, C, L), representing the batch size (N), the number of features or channels (C), and optionally, the sequence length (L). The output tensor maintains the same shape as the input, adhering to (N, C) or (N, C, L).
+    For three-dimensional tensors, the shape is denoted as (N, C, H, W), where N signifies the batch size, C represents the number of channels, H corresponds to the height, and W denotes the width.


This looks great!

I think one change can make it more consistent and a lot simpler:

For our convolutions (and in general) we follow the convention that the channels are last. So inputs to convolutions are NLC or NHWC. We should change two thigns:

Batch norm should also follow that convention

Since it is following that convention it should easily broadcast with the inputs and you can remove the whole check_and_expand_dims machinery and just let broadcasting manage it (it's super cheap to expand dims at runtime so from a perf perspective it should be trivial!)

done, I've updated the batch norm implementation and tests to handle inputs of shape, NLC, NWHC!

…for 3D inputs

Co-authored-by: Robert McCraith <mccraithrobert@gmail.com>

Update BatchNorm to support NLC and NHWC input formats In our convolution operations, we follow the convention that the channels are the last dimension. This commit updates the BatchNorm implementation to support inputs where the channels are the last dimension (NLC or NHWC). This involves changing the dimensions we normalize over and the dimensions we expand our parameters over. Co-authored-by: Robert McCraith <mccraithrobert@gmail.com>

awni

🚀 This looks awesome, thanks for adding it!

awni · 2023-12-25T15:30:01Z

@m0saan

torch.var uses a bias correction by default whereas MLX and NumPy do not, that is why you see the slight difference. I think it is the right call for now to use the uncorrected variance in our BN as I believe PyTorch also uses an uncorrected variance in their normalization layers.

awni · 2023-12-25T15:32:37Z

PS @gboduljak, @dc-dc-dc, @robertmccraith thanks for the extra reviews / discussion!

PS @robertmccraith I'm following mimm eagerly, keep us posted on how it's going and what else you need to get it fully operational!

m0saan · 2023-12-25T16:33:35Z

thanks @awni for your inputs!

- Add batch normalization layer --------- Co-authored-by: Robert McCraith <mccraithrobert@gmail.com> Co-authored-by: Awni Hannun <awni@apple.com>

m0saan marked this pull request as draft December 19, 2023 02:52

dc-dc-dc reviewed Dec 19, 2023

View reviewed changes

python/mlx/nn/layers/normalization.py Outdated Show resolved Hide resolved

m0saan marked this pull request as ready for review December 19, 2023 06:53

m0saan requested a review from dc-dc-dc December 19, 2023 09:18

This was referenced Dec 19, 2023

Request of Implementation of BatchNorm #204

Closed

[Feature Request]: Add Function NaiveSyncBatchNorm 1d, 2d, 3d #27

Closed

robertmccraith reviewed Dec 22, 2023

View reviewed changes

python/mlx/nn/layers/normalization.py Outdated Show resolved Hide resolved

python/mlx/nn/layers/normalization.py Outdated Show resolved Hide resolved

awni requested changes Dec 24, 2023

View reviewed changes

m0saan requested review from awni and robertmccraith December 24, 2023 22:07

m0saan and others added 3 commits December 24, 2023 23:08

implemented batchnorm layer

2b617b6

Update normalization.py

ad53687

update batch norm implementation

e9fd1cf

m0saan and others added 17 commits December 24, 2023 23:08

update batch norm implementation -> fixed some bug and added support …

c3c2fcf

…for 3D inputs

calc running mean and var only when training

d4bf9a2

rebasing ...

a0b2a34

Update normalization.py

eca773b

added test cases for batch norm on 3D input & refactored code ^^

7ec3cad

updated docs and added examples to doc string ^^

7b0f8bd

updated BN implementation to be more generic ^^

82ca771

Update layers.rst

02ce72d

Update __init__.py

8b08f44

refactored and updated batch norm tests ^^

a43b853

Update normalization.py

b444a6a

improve batch norm code ^^

019a855

Update python/mlx/nn/layers/normalization.py

c68a472

Co-authored-by: Robert McCraith <mccraithrobert@gmail.com>

updated the batch norm doc string ^^

cf5a5a4

updated BN implementation to handle input shape as NLC and NWHC^^

9bf6881

updated BN implementation to handle input shape as NLC and NWHC^^

a1c06b7

m0saan force-pushed the implement-batch-norm-layer branch from 3a235af to a1c06b7 Compare December 24, 2023 22:10

m0saan and others added 3 commits December 24, 2023 23:14

Update __init__.py

15577cb

doc nits

865e53f

cleanup stats test

6b4f49f

awni approved these changes Dec 25, 2023

View reviewed changes

awni merged commit a123c3c into ml-explore:main Dec 25, 2023

awni added a commit that referenced this pull request Dec 25, 2023

implement-batch-norm-layer (#217)

bc11645

- Add batch normalization layer --------- Co-authored-by: Robert McCraith <mccraithrobert@gmail.com> Co-authored-by: Awni Hannun <awni@apple.com>

This was referenced Jan 4, 2024

Add batch normalization to the model menzHSE/mlx-cifar-10-cnn#3

Closed

[BUG] cifar example fails with ValueError when replacing LayerNorm with BatchNorm with default params ml-explore/mlx-examples#233

Closed

Jyun1998 pushed a commit to Jyun1998/mlx that referenced this pull request Jan 7, 2024

implement-batch-norm-layer (ml-explore#217)

42a5a51

- Add batch normalization layer --------- Co-authored-by: Robert McCraith <mccraithrobert@gmail.com> Co-authored-by: Awni Hannun <awni@apple.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement-batch-norm-layer #217

implement-batch-norm-layer #217

m0saan commented Dec 19, 2023 •

edited

Loading

m0saan commented Dec 19, 2023

awni commented Dec 19, 2023

awni commented Dec 19, 2023

m0saan commented Dec 19, 2023

m0saan commented Dec 19, 2023 •

edited

Loading

m0saan commented Dec 19, 2023 •

edited

Loading

gboduljak commented Dec 19, 2023 •

edited

Loading

gboduljak commented Dec 19, 2023 •

edited

Loading

m0saan commented Dec 20, 2023

m0saan commented Dec 20, 2023

rickypang0219 commented Dec 20, 2023 •

edited

Loading

gboduljak commented Dec 20, 2023 •

edited

Loading

gboduljak commented Dec 20, 2023 •

edited

Loading

m0saan commented Dec 22, 2023

m0saan commented Dec 22, 2023

awni commented Dec 22, 2023

m0saan commented Dec 22, 2023 •

edited

Loading

robertmccraith left a comment

awni left a comment

awni Dec 24, 2023

m0saan Dec 24, 2023

awni left a comment

awni commented Dec 25, 2023

awni commented Dec 25, 2023 •

edited

Loading

m0saan commented Dec 25, 2023

		The input tensor shape is specified as (N, C) or (N, C, L), representing the batch size (N), the number of features or channels (C), and optionally, the sequence length (L). The output tensor maintains the same shape as the input, adhering to (N, C) or (N, C, L).
		For three-dimensional tensors, the shape is denoted as (N, C, H, W), where N signifies the batch size, C represents the number of channels, H corresponds to the height, and W denotes the width.

implement-batch-norm-layer #217

implement-batch-norm-layer #217

Conversation

m0saan commented Dec 19, 2023 • edited Loading

Proposed changes

Description

Changes Made

Usage

Notes

Checklist

m0saan commented Dec 19, 2023

awni commented Dec 19, 2023

awni commented Dec 19, 2023

m0saan commented Dec 19, 2023

m0saan commented Dec 19, 2023 • edited Loading

m0saan commented Dec 19, 2023 • edited Loading

gboduljak commented Dec 19, 2023 • edited Loading

gboduljak commented Dec 19, 2023 • edited Loading

m0saan commented Dec 20, 2023

m0saan commented Dec 20, 2023

rickypang0219 commented Dec 20, 2023 • edited Loading

gboduljak commented Dec 20, 2023 • edited Loading

gboduljak commented Dec 20, 2023 • edited Loading

m0saan commented Dec 22, 2023

m0saan commented Dec 22, 2023

awni commented Dec 22, 2023

m0saan commented Dec 22, 2023 • edited Loading

robertmccraith left a comment

Choose a reason for hiding this comment

awni left a comment

Choose a reason for hiding this comment

awni Dec 24, 2023

Choose a reason for hiding this comment

m0saan Dec 24, 2023

Choose a reason for hiding this comment

awni left a comment

Choose a reason for hiding this comment

awni commented Dec 25, 2023

awni commented Dec 25, 2023 • edited Loading

m0saan commented Dec 25, 2023

m0saan commented Dec 19, 2023 •

edited

Loading

m0saan commented Dec 19, 2023 •

edited

Loading

m0saan commented Dec 19, 2023 •

edited

Loading

gboduljak commented Dec 19, 2023 •

edited

Loading

gboduljak commented Dec 19, 2023 •

edited

Loading

rickypang0219 commented Dec 20, 2023 •

edited

Loading

gboduljak commented Dec 20, 2023 •

edited

Loading

gboduljak commented Dec 20, 2023 •

edited

Loading

m0saan commented Dec 22, 2023 •

edited

Loading

awni commented Dec 25, 2023 •

edited

Loading