Aimed at a revamped implementation of SCAE with ideally improving upon limitation of original Stacked Capsule AutoEncoder.
- Support for flipping the template (using STE to avoid cross over zero region)
- Using softplus instead of sigmoid for scale parameters (allows for zooming > 1x)
- Berkeley implementation used colored templates, and never used colors extracted from the capsule features. I am doing as per the paper, with support for categorical colors too
- Allow various modes of operation (colored and monochrome templates, see below)
- Support variable sized images by masking both the encoder and the decoder
- Support for RGB/Monochrome and categorical pallete images
- Support one-to-many templates to capsule binding, thus allowing multiple instanciation of the same part-template per image
- In this mode, each capsule (soft) selects a template. A gumbel softmax is used to select the template for each capsule so hard selection is intended after training.
- Potential Bug Fix: In berkeley implementation, they add the sigmoided alpha templates to the log probabiltity of the capsule presence which makes no sense. I have corrected this in my implementation.
- Instead of a single static background template (with fixed color), which may not be applicable in more complicated images. I allow the network to have affine and similarity capsules. The intention is that similarity capsules will naturally learn background like templates and affinity toward them.
- Allow complex intercapsule interactions by using transformer trunk right after the conv trunk, and eliminating the MLPs.
- When alpha channel absent, infer alpha channel from the color channel + temperature
- Bias the templates to be more interpretable (See below)
- Consider replacing noise scale with more principled gumbel sigmoid
- A dummy dataset to test different modes of operation
-
Monochrome Templates: Allows a template to only encode brightness or be absent entirely(in case of categorical colors).
-
RGB/Monochrome:
- Brightness Channel: This encodes the brightness at each template pixel location.
- It is multiplicatively combined with the capsule color, effectively capsule modulating the brightness of the whole template in range [-1, 1] (-> [0, 1] after sigmoid)
- Initialised orthogonally to break symmetry.
- Alpha Channel: This encodes the presence of the template at each template pixel location.
- It is combined with the capsule presence logits to get the transformed template presence logits.
- Initialised to zero.
- deduced from color channel + temperature if no alpha channel is present
- Brightness Channel: This encodes the brightness at each template pixel location.
-
Categorical Colors:
- No Brightness Channel (as it doesn't make sense, we can't multiply or add brightness to a capsule's predicted categorical color)
- Alpha Channel: This encodes the presence of the template at each template pixel location.
- It is combined with the capsule presence logits to get the transformed template presence logits.
- Initialised to orthogonal values to break symmetry
- It is not combined with the capsule color.
-
-
Colored Templates
-
RGB/Monochrome:
- RGB/Monochrome Channel: This encodes the color at each template pixel location.
- It is combined with the capsule color to get the transformed template color.
- Initialised to orthogonal values to break symmetry
- Alpha Channel: This encodes the presence of the template at each template pixel location.
- It is combined with the capsule presence logits to get the transformed template presence logits.
- Initialised to zero.
- deduced from color channel + temperature if no alpha channel is present
- RGB/Monochrome Channel: This encodes the color at each template pixel location.
-
Categorical Colors:
- Categorical Colors Channel: This encodes the categorical color at each template pixel location.
- It is combined with the capsule color logits aggitively to get the transformed template color logits.
- Not need to normalise as we are using softmax.
- Initialised to orthogonal values to break symmetry
- Alpha Channel: This encodes the presence of the template at each template pixel location.
- It is combined with the capsule presence logits to get the transformed template presence logits.
- Initialised to zero.
- It is not combined with the capsule color.
- deduced from color channel + temperature if no alpha channel is present
- Categorical Colors Channel: This encodes the categorical color at each template pixel location.
-
NOTE: Presence logits are a binary signal encoding whether a capsule is present or not. (Not to be confused with the alpha channel which is encoding the presence of the template pixel.)
In practice, we want the templates to be more interpretable and have the following properties:
- the pattern is centered on the template center
- the pattern is (mostly) symmetric (learn square vs a parallelogram)
- the pattern to be a single connected component (no holes)
To achieve this, follow regulisers with added
- gaussian alpha initalisation (bias center, symmetry, connectedness)
- center of mass regulariser (bias center)
- compactness regulariser (bias single blob, no holes)
- neighbor agreement regulariser (bias connectedness)
- No symmetry loss, depends on the first experiments