Fix invalid escape sequences#219
Conversation
In python3.12, using escape sequences outside of raw strings has been moved from a deprecation warning into a syntax error, as noted in previous PRs such as #157. As such, I've added a linter to track down cases where we still have this issue and fixed all said reported issues. The original list was as follows: ``` nemo_text_processing/fst_alignment/alignment.py:102:70: W605 invalid escape sequence '\:' nemo_text_processing/hybrid/model_utils.py:60:26: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:60:31: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:64:36: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:64:41: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:68:33: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:68:38: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:79:46: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:79:51: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:81:30: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:81:35: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:118:46: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:118:51: W605 invalid escape sequence '\s' nemo_text_processing/text_normalization/en/taggers/punctuation.py:46:15: W605 invalid escape sequence '\[' nemo_text_processing/text_normalization/en/taggers/punctuation.py:46:21: W605 invalid escape sequence '\]' nemo_text_processing/text_normalization/en/verbalizers/post_processing.py:112:37: W605 invalid escape sequence '\[' ``` Signed-off-by: Kevin James <[email protected]>
tbartley94
left a comment
There was a problem hiding this comment.
LGTM
So it lookslike the issue is we weren't calling the unicode escape for the re package?
| WHITE_SPACE = "\u23B5" | ||
| ITN_MODE = "itn" | ||
| TN_MODE = "tn" | ||
| tn_itn_symbols = list(string.ascii_letters + string.digits) + list("$\:+-=") |
There was a problem hiding this comment.
list("$\:+-=") implicitly converts to a list with each element being a character. Since we needed to properly escape the backslash, though, I thought it might be clearer to make that list explicit, since otherwise it'd be a bit confusing -- list("$\\:+-=") and list("$\:+-=") do identical things, but the former relies on folks having deeper knowledge of escape characters.
Not unicode escape, but rather having backslashes in non-raw strings which do not correspond to valid escape characters is not a supported Python feature. From the linked docs above:
|
In python3.12, using escape sequences outside of raw strings has been moved from a deprecation warning into a syntax error, as noted in previous PRs such as #157. As such, I've added a linter to track down cases where we still have this issue and fixed all said reported issues. The original list was as follows: ``` nemo_text_processing/fst_alignment/alignment.py:102:70: W605 invalid escape sequence '\:' nemo_text_processing/hybrid/model_utils.py:60:26: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:60:31: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:64:36: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:64:41: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:68:33: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:68:38: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:79:46: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:79:51: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:81:30: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:81:35: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:118:46: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:118:51: W605 invalid escape sequence '\s' nemo_text_processing/text_normalization/en/taggers/punctuation.py:46:15: W605 invalid escape sequence '\[' nemo_text_processing/text_normalization/en/taggers/punctuation.py:46:21: W605 invalid escape sequence '\]' nemo_text_processing/text_normalization/en/verbalizers/post_processing.py:112:37: W605 invalid escape sequence '\[' ``` Signed-off-by: Kevin James <[email protected]> Signed-off-by: Alex Cui <[email protected]>
In python3.12, using escape sequences outside of raw strings has been moved from a deprecation warning into a syntax error, as noted in previous PRs such as #157. As such, I've added a linter to track down cases where we still have this issue and fixed all said reported issues. The original list was as follows: ``` nemo_text_processing/fst_alignment/alignment.py:102:70: W605 invalid escape sequence '\:' nemo_text_processing/hybrid/model_utils.py:60:26: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:60:31: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:64:36: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:64:41: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:68:33: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:68:38: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:79:46: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:79:51: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:81:30: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:81:35: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:118:46: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:118:51: W605 invalid escape sequence '\s' nemo_text_processing/text_normalization/en/taggers/punctuation.py:46:15: W605 invalid escape sequence '\[' nemo_text_processing/text_normalization/en/taggers/punctuation.py:46:21: W605 invalid escape sequence '\]' nemo_text_processing/text_normalization/en/verbalizers/post_processing.py:112:37: W605 invalid escape sequence '\[' ``` Signed-off-by: Kevin James <[email protected]> Signed-off-by: Alex Cui <[email protected]>
In python3.12, using escape sequences outside of raw strings has been moved from a deprecation warning into a syntax error, as noted in previous PRs such as NVIDIA#157. As such, I've added a linter to track down cases where we still have this issue and fixed all said reported issues. The original list was as follows: ``` nemo_text_processing/fst_alignment/alignment.py:102:70: W605 invalid escape sequence '\:' nemo_text_processing/hybrid/model_utils.py:60:26: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:60:31: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:64:36: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:64:41: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:68:33: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:68:38: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:79:46: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:79:51: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:81:30: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:81:35: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:118:46: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:118:51: W605 invalid escape sequence '\s' nemo_text_processing/text_normalization/en/taggers/punctuation.py:46:15: W605 invalid escape sequence '\[' nemo_text_processing/text_normalization/en/taggers/punctuation.py:46:21: W605 invalid escape sequence '\]' nemo_text_processing/text_normalization/en/verbalizers/post_processing.py:112:37: W605 invalid escape sequence '\[' ``` Signed-off-by: Kevin James <[email protected]> Signed-off-by: Ankit Narwade <[email protected]>
In python3.12, using escape sequences outside of raw strings has been moved from a deprecation warning into a syntax error, as noted in previous PRs such as NVIDIA#157. As such, I've added a linter to track down cases where we still have this issue and fixed all said reported issues. The original list was as follows: ``` nemo_text_processing/fst_alignment/alignment.py:102:70: W605 invalid escape sequence '\:' nemo_text_processing/hybrid/model_utils.py:60:26: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:60:31: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:64:36: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:64:41: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:68:33: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:68:38: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:79:46: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:79:51: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:81:30: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:81:35: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:118:46: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:118:51: W605 invalid escape sequence '\s' nemo_text_processing/text_normalization/en/taggers/punctuation.py:46:15: W605 invalid escape sequence '\[' nemo_text_processing/text_normalization/en/taggers/punctuation.py:46:21: W605 invalid escape sequence '\]' nemo_text_processing/text_normalization/en/verbalizers/post_processing.py:112:37: W605 invalid escape sequence '\[' ``` Signed-off-by: Kevin James <[email protected]> Signed-off-by: Namrata Gachchi <[email protected]>
In python3.12, using escape sequences outside of raw strings has been moved from a deprecation warning into a syntax error, as noted in previous PRs such as NVIDIA#157. As such, I've added a linter to track down cases where we still have this issue and fixed all said reported issues. The original list was as follows: ``` nemo_text_processing/fst_alignment/alignment.py:102:70: W605 invalid escape sequence '\:' nemo_text_processing/hybrid/model_utils.py:60:26: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:60:31: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:64:36: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:64:41: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:68:33: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:68:38: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:79:46: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:79:51: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:81:30: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:81:35: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:118:46: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:118:51: W605 invalid escape sequence '\s' nemo_text_processing/text_normalization/en/taggers/punctuation.py:46:15: W605 invalid escape sequence '\[' nemo_text_processing/text_normalization/en/taggers/punctuation.py:46:21: W605 invalid escape sequence '\]' nemo_text_processing/text_normalization/en/verbalizers/post_processing.py:112:37: W605 invalid escape sequence '\[' ``` Signed-off-by: Kevin James <[email protected]>
In python3.12, using escape sequences outside of raw strings has been moved from a deprecation warning into a syntax error, as noted in previous PRs such as #157. As such, I've added a linter to track down cases where we still have this issue and fixed all said reported issues. The original list was as follows: ``` nemo_text_processing/fst_alignment/alignment.py:102:70: W605 invalid escape sequence '\:' nemo_text_processing/hybrid/model_utils.py:60:26: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:60:31: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:64:36: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:64:41: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:68:33: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:68:38: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:79:46: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:79:51: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:81:30: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:81:35: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:118:46: W605 invalid escape sequence '\s' nemo_text_processing/hybrid/model_utils.py:118:51: W605 invalid escape sequence '\s' nemo_text_processing/text_normalization/en/taggers/punctuation.py:46:15: W605 invalid escape sequence '\[' nemo_text_processing/text_normalization/en/taggers/punctuation.py:46:21: W605 invalid escape sequence '\]' nemo_text_processing/text_normalization/en/verbalizers/post_processing.py:112:37: W605 invalid escape sequence '\[' ``` Signed-off-by: Kevin James <[email protected]>
What does this PR do ?
This PR fixes SyntaxError issues in Python3.12.
In python3.12, using escape sequences outside of raw strings has been moved from a deprecation warning into a syntax error, as noted in previous PRs such as #157. As such, I've added a linter to track down cases where we still have this issue and fixed all said reported issues. The original list was as follows:
Before your PR is "Ready for review"
Pre checks:
git commit -sto sign.pytestor (if your machine does not have GPU)pytest --cpufrom the root folder (given you marked your test cases accordingly@pytest.mark.run_only_on('CPU')).bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...PR Type: