medsegpy.modeling.layers

medsegpy.modeling.layers.attention

Attention Layers

The following layers implement an attention gating module based on the paper “Attention U-Net: Learning Where to Look For the Pancreas” (Oktay et al.). The code below is based on a PyTorch implementation of this technique by the paper’s authors:

https://github.com/ozan-oktay/Attention-Gated-Networks/tree/a96edb72622274f6705097d70cfaa7f2bf818a5a

Each layer has 2D and 3D versions. Only the 2D versions have been tested so far.

class medsegpy.modeling.layers.attention.CreateGatingSignal2D(out_channels: int, kernel_size: Union[int, typing.Sequence[int]] = 1, kernel_initializer: Union[str, typing.Dict] = 'he_normal', activation: str = 'relu', add_batchnorm: bool = True, **kwargs)[source]
class medsegpy.modeling.layers.attention.CreateGatingSignal3D(out_channels: int, kernel_size: Union[int, typing.Sequence[int]] = 1, kernel_initializer: Union[str, typing.Dict] = 'he_normal', activation: str = 'relu', add_batchnorm: bool = True, **kwargs)[source]
class medsegpy.modeling.layers.attention.GridAttentionModule2D(in_channels: int, intermediate_channels: int, sub_sample_factor: Union[int, typing.Sequence[int]] = 2, kernel_initializer: Union[str, typing.Dict] = 'he_normal', **kwargs)[source]
class medsegpy.modeling.layers.attention.GridAttentionModule3D(in_channels: int, intermediate_channels: int, sub_sample_factor: Union[int, typing.Sequence[int]] = 2, kernel_initializer: Union[str, typing.Dict] = 'he_normal', **kwargs)[source]
class medsegpy.modeling.layers.attention.MultiAttentionModule2D(in_channels: int, intermediate_channels: int, sub_sample_factor: Union[int, typing.Sequence[int]] = 2, kernel_initializer: Union[str, typing.Dict] = 'he_normal', activation: str = 'relu', **kwargs)[source]
class medsegpy.modeling.layers.attention.MultiAttentionModule3D(in_channels: int, intermediate_channels: int, sub_sample_factor: Union[int, typing.Sequence[int]] = 2, kernel_initializer: Union[str, typing.Dict] = 'he_normal', activation: str = 'relu', **kwargs)[source]
class medsegpy.modeling.layers.attention.DeepSupervision2D(out_channels: int, scale_factor: Union[int, typing.Sequence[int]], kernel_initializer: Union[str, typing.Dict] = 'he_normal', **kwargs)[source]
class medsegpy.modeling.layers.attention.DeepSupervision3D(out_channels: int, scale_factor: Union[int, typing.Sequence[int]], kernel_initializer: Union[str, typing.Dict] = 'he_normal', **kwargs)[source]

medsegpy.modeling.layers.pooling

Pooling layers.

MaxPoolingMask2D, MaxPoolingWithArgmax2D, and MaxUnpooling2D are based on are based on code from the open-source repo below. We thank the authors for making the code publicly available.

https://github.com/ykamikawa/tf-keras-SegNet/tree/648ee1aa6870e8280a5f24ee193caa585adde9cd.

class medsegpy.modeling.layers.pooling.MaxPooling2D(pool_size=(2, 2), strides=None, padding='valid', data_format=None, **kwargs)[source]

Max pooling operation for 2D spatial data.

Downsamples the input representation by taking the maximum value over the window defined by pool_size for each dimension along the features axis. The window is shifted by strides in each dimension. The resulting output when using “valid” padding option has a shape(number of rows or columns) of: output_shape = (input_shape - pool_size + 1) / strides)

The resulting output shape when using the “same” padding option is: output_shape = input_shape / strides

For example, for stride=(1,1) and padding=”valid”:

>>> x = tf.constant([[1., 2., 3.],
...                  [4., 5., 6.],
...                  [7., 8., 9.]])
>>> x = tf.reshape(x, [1, 3, 3, 1])
>>> max_pool_2d = tf.keras.layers.MaxPooling2D(pool_size=(2, 2),
...    strides=(1, 1), padding='valid')
>>> max_pool_2d(x)
<tf.Tensor: shape=(1, 2, 2, 1), dtype=float32, numpy=
  array([[[[5.],
           [6.]],
          [[8.],
           [9.]]]], dtype=float32)>

For example, for stride=(2,2) and padding=”valid”:

>>> x = tf.constant([[1., 2., 3., 4.],
...                  [5., 6., 7., 8.],
...                  [9., 10., 11., 12.]])
>>> x = tf.reshape(x, [1, 3, 4, 1])
>>> max_pool_2d = tf.keras.layers.MaxPooling2D(pool_size=(2, 2),
...    strides=(1, 1), padding='valid')
>>> max_pool_2d(x)
<tf.Tensor: shape=(1, 2, 3, 1), dtype=float32, numpy=
  array([[[[ 6.],
           [ 7.],
           [ 8.]],
          [[10.],
           [11.],
           [12.]]]], dtype=float32)>

Usage Example:

>>> input_image = tf.constant([[[[1.], [1.], [2.], [4.]],
...                            [[2.], [2.], [3.], [2.]],
...                            [[4.], [1.], [1.], [1.]],
...                            [[2.], [2.], [1.], [4.]]]])
>>> output = tf.constant([[[[1], [0]],
...                       [[0], [1]]]])
>>> model = tf.keras.models.Sequential()
>>> model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2),
...    input_shape=(4,4,1)))
>>> model.compile('adam', 'mean_squared_error')
>>> model.predict(input_image, steps=1)
array([[[[2.],
         [4.]],
        [[4.],
         [4.]]]], dtype=float32)

For example, for stride=(1,1) and padding=”same”:

>>> x = tf.constant([[1., 2., 3.],
...                  [4., 5., 6.],
...                  [7., 8., 9.]])
>>> x = tf.reshape(x, [1, 3, 3, 1])
>>> max_pool_2d = tf.keras.layers.MaxPooling2D(pool_size=(2, 2),
...    strides=(1, 1), padding='same')
>>> max_pool_2d(x)
<tf.Tensor: shape=(1, 3, 3, 1), dtype=float32, numpy=
  array([[[[5.],
           [6.],
           [6.]],
          [[8.],
           [9.],
           [9.]],
          [[8.],
           [9.],
           [9.]]]], dtype=float32)>
Parameters:
  • pool_size – integer or tuple of 2 integers, window size over which to take the maximum. (2, 2) will take the max value over a 2x2 pooling window. If only one integer is specified, the same window length will be used for both dimensions.
  • strides – Integer, tuple of 2 integers, or None. Strides values. Specifies how far the pooling window moves for each pooling step. If None, it will default to pool_size.
  • padding – One of “valid” or “same” (case-insensitive). “valid” adds no zero padding. “same” adds padding such that if the stride is 1, the output shape is the same as input shape.
  • data_format – A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape (batch, height, width, channels) while channels_first corresponds to inputs with shape (batch, channels, height, width). It defaults to the image_data_format value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be “channels_last”.
Input shape:
  • If data_format=’channels_last’: 4D tensor with shape (batch_size, rows, cols, channels).
  • If data_format=’channels_first’: 4D tensor with shape (batch_size, channels, rows, cols).
Output shape:
  • If data_format=’channels_last’: 4D tensor with shape (batch_size, pooled_rows, pooled_cols, channels).
  • If data_format=’channels_first’: 4D tensor with shape (batch_size, channels, pooled_rows, pooled_cols).
Returns:A tensor of rank 4 representing the maximum pooled values. See above for output shape.
class medsegpy.modeling.layers.pooling.UpSampling2D(size=(2, 2), data_format=None, interpolation='nearest', **kwargs)[source]

Upsampling layer for 2D inputs.

Repeats the rows and columns of the data by size[0] and size[1] respectively.

Examples:

>>> input_shape = (2, 2, 1, 3)
>>> x = np.arange(np.prod(input_shape)).reshape(input_shape)
>>> print(x)
[[[[ 0  1  2]]
  [[ 3  4  5]]]
 [[[ 6  7  8]]
  [[ 9 10 11]]]]
>>> y = tf.keras.layers.UpSampling2D(size=(1, 2))(x)
>>> print(y)
tf.Tensor(
  [[[[ 0  1  2]
     [ 0  1  2]]
    [[ 3  4  5]
     [ 3  4  5]]]
   [[[ 6  7  8]
     [ 6  7  8]]
    [[ 9 10 11]
     [ 9 10 11]]]], shape=(2, 2, 2, 3), dtype=int64)
Parameters:
  • size – Int, or tuple of 2 integers. The upsampling factors for rows and columns.
  • data_format – A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape (batch_size, height, width, channels) while channels_first corresponds to inputs with shape (batch_size, channels, height, width). It defaults to the image_data_format value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be “channels_last”.
  • interpolation – A string, one of nearest or bilinear.
Input shape:

4D tensor with shape: - If data_format is “channels_last”:

(batch_size, rows, cols, channels)
  • If data_format is “channels_first”:
    (batch_size, channels, rows, cols)
Output shape:

4D tensor with shape: - If data_format is “channels_last”:

(batch_size, upsampled_rows, upsampled_cols, channels)
  • If data_format is “channels_first”:
    (batch_size, channels, upsampled_rows, upsampled_cols)

medsegpy.modeling.layers.upsampling

Deeplabv3+ model for Keras. This model is based on TF repo: https://github.com/tensorflow/models/tree/master/research/deeplab On Pascal VOC, original model gets to 84.56% mIOU

Now this model is only available for the TensorFlow backend, due to its reliance on SeparableConvolution layers, but Theano will add this layer soon.

MobileNetv2 backbone is based on this repo: https://github.com/JonathanCMitchell/mobilenet_v2_keras

# Reference - [Encoder-Decoder with Atrous Separable Convolution

for Semantic Image Segmentation](https://arxiv.org/pdf/1802.02611.pdf)
class medsegpy.modeling.layers.upsampling.BilinearUpsampling(upsampling=(2, 2), output_size=None, data_format=None, **kwargs)[source]

Just a simple bilinear upsampling layer. Works only with TF. :param upsampling: tuple of 2 numbers > 0. The upsampling ratio for h and w :param output_size: used instead of upsampling arg if passed!

medsegpy.modeling.meta_arch

medsegpy.modeling.meta_arch.build

medsegpy.modeling.meta_arch.build.build_model(cfg, input_tensor=None) → medsegpy.modeling.model.Model[source]

Build the whole model architecture, defined by cfg.MODEL_NAME. Note that it does not load any weights from cfg.

medsegpy.modeling.meta_arch.deeplabv3

DeeplabV3+ implementation.

This model is based on TF and Keras repos below: https://github.com/tensorflow/models/tree/master/research/deeplab https://github.com/bonlime/keras-deeplab-v3-plus

class medsegpy.modeling.meta_arch.deeplabv3.DeeplabV3Plus(cfg: medsegpy.config.DeeplabV3Config)[source]
sep_conv_bn(x, filters, prefix, stride=1, kernel_size=3, rate=1, depth_activation=False, epsilon=0.001)[source]

SepConv with BN between depthwise & pointwise.

Optionally add activation after BN. Implements right “same” padding for even kernel sizes

Parameters:
  • x – input tensor
  • filters – num of filters in pointwise convolution
  • prefix – prefix before name
  • stride – stride at depthwise conv
  • kernel_size – kernel size for depthwise convolution
  • rate – atrous rate for depthwise convolution
  • depth_activation – flag to use activation between depthwise & pointwise convs
  • epsilon – epsilon to use in BN layer
deeplabv3(input_tensor=None, input_shape=(512, 512, 3), classes=21, backbone='mobilenetv2', OS=16, alpha=1.0, dilation_divisor=1, dil_rate_input=None, dropout_rate=0.1) → medsegpy.modeling.model.Model[source]

Instantiates the Deeplabv3+ architecture

Optionally loads weights pre-trained on PASCAL VOC. This model is available for TensorFlow only, and can only be used with inputs following the TensorFlow data format (width, height, channels). # Arguments

weights: one of ‘pascal_voc’ (pre-trained on pascal voc)
or None (random initialization)
input_tensor: optional Keras tensor (i.e. output of layers.Input())
to use as image input for the model.
input_shape: shape of input image. format HxWxC
PASCAL VOC model was trained on (512,512,3) images
classes: number of desired classes. If classes != 21,
last layer is initialized randomly

backbone: backbone to use. one of {‘xception’,’mobilenetv2’} OS: determines input_shape/feature_extractor_output ratio. One of {8,16}.

Used only for xception backbone.
alpha: controls the width of the MobileNetV2 network. This is known as the
width multiplier in the MobileNetV2 paper.
  • If alpha < 1.0, proportionally decreases the number
    of filters in each layer.
  • If alpha > 1.0, proportionally increases the number
    of filters in each layer.
  • If alpha = 1, default number of filters from the paper
    are used at each layer.

Used only for mobilenetv2 backbone

# Returns
A Keras model instance.
# Raises
RuntimeError: If attempting to run this model with a
backend that does not support separable convolutions.

ValueError: in case of invalid argument for weights or backbone

relu6(x)[source]
build_model(input_tensor=None) → medsegpy.modeling.model.Model[source]

medsegpy.modeling.meta_arch.unet

medsegpy.modeling.meta_arch.unet.build_encoder_block(x: tensorflow.python.framework.ops.Tensor, num_filters: Union[int, typing.Sequence[int]], kernel_size: Union[int, typing.Sequence[int]] = 3, num_conv: int = 2, activation: str = 'relu', kernel_initializer: Union[str, typing.Dict] = 'he_normal', dropout: float = 0.0)[source]

Builds simple FCN encoder block.

Structure is below. Where blocks in [] are repeated:

[Conv -> Activation] -> BN -> Dropout.

Parameters:
  • x (tf.Tensor) – Input tensor.
  • num_filters (int or Sequence[int]) – Number of filters to use for each conv layer. If a sequence, will override number of conv layers specified by num_conv.
  • kernel_size – Kernel size accepted by Keras convolution layers.
  • num_conv (int, optional) – Number of convolutional blocks (conv + activation) to use.
  • activation (str, optional) – Activation type.
  • kernel_initializer – Kernel initializer accepted by keras.layers.Conv(…).
  • dropout (float, optional) – Dropout rate.
Returns:

tf.Tensor – Encoder block output.