S3 Estimator and Image Classification#71
Conversation
| **/.DS_Store | ||
| venv/ | ||
| *.rec | ||
| *~ No newline at end of file |
There was a problem hiding this comment.
What is this symbol?
There was a problem hiding this comment.
It was added from a sync pull I made on the original repo. Not mine.
| intended to be instantiated directly. This is difference from the base class | ||
| because this class handles S3 data""" | ||
|
|
||
| """Base class for Amazon first-party Estimator implementations. This class isn't intended |
| """Initialize an AmazonAlgorithmEstimatorBase. | ||
|
|
||
| Args: | ||
| algortihm (str): Use one of the supported algorithms |
There was a problem hiding this comment.
Where is the typo? I don't see.
|
|
||
| isint = istype(int) | ||
| isbool = istype(bool) | ||
| isstr = istype(str) |
There was a problem hiding this comment.
These validations are gone. The pull request itself is stating there are conflicts, can you please update your branch to fix the conflicts?
Refer to this PR: 54b3830#diff-0bb270a0ed6827421dc5669020eb6427 to check what the changes to the hyperparameters are.
There was a problem hiding this comment.
I need some of these validations. @orchidmajumder, what is a solution to this?
orchidmajumder
left a comment
There was a problem hiding this comment.
Mostly looked into syntactic issues. Need comments on semantics from Owen/Marcio.
ragavvenkatesan
left a comment
There was a problem hiding this comment.
@iquintero Conflicts removed.
| """Initialize an AmazonAlgorithmEstimatorBase. | ||
|
|
||
| Args: | ||
| algortihm (str): Use one of the supported algorithms |
There was a problem hiding this comment.
Where is the typo? I don't see.
|
|
||
| isint = istype(int) | ||
| isbool = istype(bool) | ||
| isstr = istype(str) |
There was a problem hiding this comment.
I need some of these validations. @orchidmajumder, what is a solution to this?
iquintero
left a comment
There was a problem hiding this comment.
The unit tests failed :( It looks like there are some import errors and possibly flake8 errors too.
More importantly, please look into the changes to the hyperparameter class as the type validations are no longer required and instead you should declare the data type. I have left an example below.
| import numpy as np | ||
| from scipy.sparse import issparse | ||
|
|
||
| import json |
There was a problem hiding this comment.
Please maintain the import order:
1.- python built in libraries
2.- 3rd party imports
3.- local library imports (from sagemaker...)
There was a problem hiding this comment.
this still needs to be fixed. import json should go before the numpy import.
Also, please maintain the alphabetical order of the imports when you change it.
import io
import json
import struct
....
| intended to be instantiated directly. This is difference from the base class | ||
| because this class handles S3 data""" | ||
|
|
||
| mini_batch_size = hp('mini_batch_size', (validation.isint, validation.gt(0))) |
There was a problem hiding this comment.
please look at #54 we intentionally removed these type validations: isint() isbool() etc. In favor of declaring a specific type for the hp.
So this should be
hp('mini_batch_size', validation.gt(0), data_type=int)
This applies to every hp declaration in this PR.
There was a problem hiding this comment.
There are a lot of cases where the alignment is not consistent. I didn't want to list every single one but please go through it and review them.
Some examples are in the docstrings where each line is aligned differently. In some cases you have:
param_name (str)
some others are:
param_name ...(long space..) (str)
The original comments have not been addressed yet either. Everything is minor changes but they do add up. Once you change that this should be ready to merge.
| """Initialize an AmazonAlgorithmEstimatorBase. | ||
|
|
||
| Args: | ||
| algortihm (str): Use one of the supported algorithms |
| import numpy as np | ||
| from scipy.sparse import issparse | ||
|
|
||
| import json |
There was a problem hiding this comment.
this still needs to be fixed. import json should go before the numpy import.
Also, please maintain the alphabetical order of the imports when you change it.
import io
import json
import struct
....
|
|
||
|
|
||
| class record_deserializer(object): | ||
| class file_to_image_serializer(object): |
There was a problem hiding this comment.
Keep one naming convention. FileToImageSerializer
There was a problem hiding this comment.
I am using this because the other methods are also in this convention.. Refer numpy_to_recod_serializer. ..
| return payload | ||
|
|
||
|
|
||
| class record_deserializer(object): |
There was a problem hiding this comment.
Same here.
RecordDeserializer
There was a problem hiding this comment.
Again, I am maintaining this because of the other methods... refer
response_deseiralizer.
| stream.close() | ||
|
|
||
|
|
||
| class response_deserializer(object): |
| @@ -0,0 +1,65 @@ | |||
| # Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved. | |||
There was a problem hiding this comment.
2017-2018
might as well :P
| assert pca.num_components == 55 | ||
|
|
||
|
|
||
| def test_s3_init(sagemaker_session): |
There was a problem hiding this comment.
I think all the tests you've added here should go on their own file. This way we have a better organized test suite. This existing file is meant to test the AmazonEstimator. you should create a file to test your new estimators. The content of the tests is fine just split it into its own file.
There was a problem hiding this comment.
The test I wrote tests the AmazonS3Estimator, which is on the same module as AmazonEstimator, which is why I think that this belongs on this file. I have a serparate test for image classification tests.
| mini_batch_size (int or None): The size of each mini-batch to use when training. If None, a | ||
| default value will be used. | ||
| """ | ||
| default_mini_batch_size = 32 |
There was a problem hiding this comment.
why dont you make 32 the default value for mini_batch_size in the method signature?
def fit(self, s3set, mini_batch_size=32, distribution='ShardedByS3Key', **kwargs):
then you don't even have to do this whole thing. and you can just set it as
self.mini_batch_size = mini_batch_size
There was a problem hiding this comment.
Two reasosn why: 1. Its a protocol used in the other alogrithms. 2. We want to make this a must supply parameter for user. If I assume a default and it fails because of memory error, it becomes a customer error, which is wrong.
| The implementation of :meth:`~sagemaker.predictor.RealTimePredictor.predict` in this | ||
| `RealTimePredictor` requires a `x-image` as input. | ||
|
|
||
| ``predict()`` returns """ |
There was a problem hiding this comment.
I think this sentence is also incomplete. predict() returns <>... ?
| checkpoint_frequency = hp('checkpoint_frequency', (ge(1),), | ||
| 'checkpoint_frequency should be an integer greater-than 1', int) | ||
| num_layers = hp('num_layers', (isin(18, 34, 50, 101, 152, 200, 20, 32, 44, 56, 110),), | ||
| 'num_layers should be in the set [18, 34, 50, 101, 152, 200, 20, 32, 44, 56, 110]', int) |
There was a problem hiding this comment.
just a suggestion but maybe putting these in ascending order would make it easier for users to reason about? Or is there a reason why they are seemingly in a random order?
There was a problem hiding this comment.
Again, I don't want to because this is a logic that is used in the docs also. There is a reason why it is in this order and users familiar with this algorithm will find this ordering comfortable.
|
closing due to inactivity. feel free to reopen (or maybe create a new PR, given all the merge conflicts) if work on this resumes. |
[LDA] Fix minor typos
This PR sets up the SDK for S3 algorithms to come in and this brings in image classification.