DE TN Fixes#177
Merged
Merged
Conversation
Signed-off-by: Simon Zuberek <[email protected]>
Signed-off-by: Simon Zuberek <[email protected]>
Signed-off-by: Simon Zuberek <[email protected]>
…git strings Signed-off-by: Simon Zuberek <[email protected]>
…00.00 Uhr or 0.00 Uhr Signed-off-by: Simon Zuberek <[email protected]>
…ng with a domain name would be tagged as part of that domain name Signed-off-by: Simon Zuberek <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Simon Zuberek <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Simon Zuberek <[email protected]>
for more information, see https://pre-commit.ci
Collaborator
|
@zoobereq please update grammars path in Jenkins to re-built CI grammars https://github.com/NVIDIA/NeMo-text-processing/blob/main/Jenkinsfile#L15 |
Signed-off-by: Simon Zuberek <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Simon Zuberek <[email protected]>
bonham79
suggested changes
Jun 3, 2024
| w w w punkt a m a z o n punkt com punkt de .~www.amazon.com.de. | ||
| h t t p s doppelpunkt slash slash w w w punkt a b c punkt com slash a b fragezeichen gleichheitszeichen drei bindestrich slash a b s slash eins~https://www.abc.com/ab?=3-/abs/1 | ||
| at z u c k~@zuck | ||
| at z o o b e r e q~@zoobereq |
There was a problem hiding this comment.
Don't use your name as an example. It leads to potential doxing.
Contributor
Author
There was a problem hiding this comment.
Good point - thank you. Fixed.
Signed-off-by: Simon Zuberek <[email protected]>
tbartley94
previously approved these changes
Jun 4, 2024
Member
tbartley94
left a comment
There was a problem hiding this comment.
Remove references to yourself and competitors and LGTM
Signed-off-by: Simon Zuberek <[email protected]>
Member
|
@zoobereq LGTM, will approve once Evelina is happy |
BuyuanCui
pushed a commit
that referenced
this pull request
Jul 12, 2024
* Adds support for social media tags (e.g. @zoobereq) Signed-off-by: Simon Zuberek <[email protected]> * Adds test cases for social media tags Signed-off-by: Simon Zuberek <[email protected]> * Fixes pathing for Sparrowhawk Signed-off-by: Simon Zuberek <[email protected]> * Fixes the issue of the DE normalizer not accepting comma-separated digit strings Signed-off-by: Simon Zuberek <[email protected]> * Fixes the issue where the normalizer didn't accept time formatted as 00.00 Uhr or 0.00 Uhr Signed-off-by: Simon Zuberek <[email protected]> * Fixes the issue where the the sentence-final period in sentences ending with a domain name would be tagged as part of that domain name Signed-off-by: Simon Zuberek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Removes unused imports Signed-off-by: Simon Zuberek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes the formatting Signed-off-by: Simon Zuberek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes #166 for DE Signed-off-by: Simon Zuberek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updates grammar paths Signed-off-by: Simon Zuberek <[email protected]> * Minor Fixes Signed-off-by: Simon Zuberek <[email protected]> * Fixes test cases Signed-off-by: Simon Zuberek <[email protected]> --------- Signed-off-by: Simon Zuberek <[email protected]> Co-authored-by: Simon Zuberek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]>
BuyuanCui
pushed a commit
that referenced
this pull request
Aug 20, 2024
* Adds support for social media tags (e.g. @zoobereq) Signed-off-by: Simon Zuberek <[email protected]> * Adds test cases for social media tags Signed-off-by: Simon Zuberek <[email protected]> * Fixes pathing for Sparrowhawk Signed-off-by: Simon Zuberek <[email protected]> * Fixes the issue of the DE normalizer not accepting comma-separated digit strings Signed-off-by: Simon Zuberek <[email protected]> * Fixes the issue where the normalizer didn't accept time formatted as 00.00 Uhr or 0.00 Uhr Signed-off-by: Simon Zuberek <[email protected]> * Fixes the issue where the the sentence-final period in sentences ending with a domain name would be tagged as part of that domain name Signed-off-by: Simon Zuberek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Removes unused imports Signed-off-by: Simon Zuberek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes the formatting Signed-off-by: Simon Zuberek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes #166 for DE Signed-off-by: Simon Zuberek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updates grammar paths Signed-off-by: Simon Zuberek <[email protected]> * Minor Fixes Signed-off-by: Simon Zuberek <[email protected]> * Fixes test cases Signed-off-by: Simon Zuberek <[email protected]> --------- Signed-off-by: Simon Zuberek <[email protected]> Co-authored-by: Simon Zuberek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]>
BuyuanCui
pushed a commit
that referenced
this pull request
Sep 19, 2024
* Adds support for social media tags (e.g. @zoobereq) Signed-off-by: Simon Zuberek <[email protected]> * Adds test cases for social media tags Signed-off-by: Simon Zuberek <[email protected]> * Fixes pathing for Sparrowhawk Signed-off-by: Simon Zuberek <[email protected]> * Fixes the issue of the DE normalizer not accepting comma-separated digit strings Signed-off-by: Simon Zuberek <[email protected]> * Fixes the issue where the normalizer didn't accept time formatted as 00.00 Uhr or 0.00 Uhr Signed-off-by: Simon Zuberek <[email protected]> * Fixes the issue where the the sentence-final period in sentences ending with a domain name would be tagged as part of that domain name Signed-off-by: Simon Zuberek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Removes unused imports Signed-off-by: Simon Zuberek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes the formatting Signed-off-by: Simon Zuberek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes #166 for DE Signed-off-by: Simon Zuberek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updates grammar paths Signed-off-by: Simon Zuberek <[email protected]> * Minor Fixes Signed-off-by: Simon Zuberek <[email protected]> * Fixes test cases Signed-off-by: Simon Zuberek <[email protected]> --------- Signed-off-by: Simon Zuberek <[email protected]> Co-authored-by: Simon Zuberek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]>
BuyuanCui
pushed a commit
that referenced
this pull request
Sep 26, 2024
* Adds support for social media tags (e.g. @zoobereq) Signed-off-by: Simon Zuberek <[email protected]> * Adds test cases for social media tags Signed-off-by: Simon Zuberek <[email protected]> * Fixes pathing for Sparrowhawk Signed-off-by: Simon Zuberek <[email protected]> * Fixes the issue of the DE normalizer not accepting comma-separated digit strings Signed-off-by: Simon Zuberek <[email protected]> * Fixes the issue where the normalizer didn't accept time formatted as 00.00 Uhr or 0.00 Uhr Signed-off-by: Simon Zuberek <[email protected]> * Fixes the issue where the the sentence-final period in sentences ending with a domain name would be tagged as part of that domain name Signed-off-by: Simon Zuberek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Removes unused imports Signed-off-by: Simon Zuberek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes the formatting Signed-off-by: Simon Zuberek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes #166 for DE Signed-off-by: Simon Zuberek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updates grammar paths Signed-off-by: Simon Zuberek <[email protected]> * Minor Fixes Signed-off-by: Simon Zuberek <[email protected]> * Fixes test cases Signed-off-by: Simon Zuberek <[email protected]> --------- Signed-off-by: Simon Zuberek <[email protected]> Co-authored-by: Simon Zuberek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]>
BuyuanCui
pushed a commit
that referenced
this pull request
Oct 16, 2024
* Adds support for social media tags (e.g. @zoobereq) Signed-off-by: Simon Zuberek <[email protected]> * Adds test cases for social media tags Signed-off-by: Simon Zuberek <[email protected]> * Fixes pathing for Sparrowhawk Signed-off-by: Simon Zuberek <[email protected]> * Fixes the issue of the DE normalizer not accepting comma-separated digit strings Signed-off-by: Simon Zuberek <[email protected]> * Fixes the issue where the normalizer didn't accept time formatted as 00.00 Uhr or 0.00 Uhr Signed-off-by: Simon Zuberek <[email protected]> * Fixes the issue where the the sentence-final period in sentences ending with a domain name would be tagged as part of that domain name Signed-off-by: Simon Zuberek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Removes unused imports Signed-off-by: Simon Zuberek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes the formatting Signed-off-by: Simon Zuberek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes #166 for DE Signed-off-by: Simon Zuberek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updates grammar paths Signed-off-by: Simon Zuberek <[email protected]> * Minor Fixes Signed-off-by: Simon Zuberek <[email protected]> * Fixes test cases Signed-off-by: Simon Zuberek <[email protected]> --------- Signed-off-by: Simon Zuberek <[email protected]> Co-authored-by: Simon Zuberek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <[email protected]>
ngachchi
pushed a commit
to ngachchi/NeMo-text-processing
that referenced
this pull request
Jun 23, 2025
* Adds support for social media tags (e.g. @zoobereq) Signed-off-by: Simon Zuberek <[email protected]> * Adds test cases for social media tags Signed-off-by: Simon Zuberek <[email protected]> * Fixes pathing for Sparrowhawk Signed-off-by: Simon Zuberek <[email protected]> * Fixes the issue of the DE normalizer not accepting comma-separated digit strings Signed-off-by: Simon Zuberek <[email protected]> * Fixes the issue where the normalizer didn't accept time formatted as 00.00 Uhr or 0.00 Uhr Signed-off-by: Simon Zuberek <[email protected]> * Fixes the issue where the the sentence-final period in sentences ending with a domain name would be tagged as part of that domain name Signed-off-by: Simon Zuberek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Removes unused imports Signed-off-by: Simon Zuberek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes the formatting Signed-off-by: Simon Zuberek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes NVIDIA#166 for DE Signed-off-by: Simon Zuberek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updates grammar paths Signed-off-by: Simon Zuberek <[email protected]> * Minor Fixes Signed-off-by: Simon Zuberek <[email protected]> * Fixes test cases Signed-off-by: Simon Zuberek <[email protected]> --------- Signed-off-by: Simon Zuberek <[email protected]> Co-authored-by: Simon Zuberek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Namrata Gachchi <[email protected]>
FredHaa
pushed a commit
to FredHaa/NeMo-text-processing
that referenced
this pull request
Aug 15, 2025
* Adds support for social media tags (e.g. @zoobereq) Signed-off-by: Simon Zuberek <[email protected]> * Adds test cases for social media tags Signed-off-by: Simon Zuberek <[email protected]> * Fixes pathing for Sparrowhawk Signed-off-by: Simon Zuberek <[email protected]> * Fixes the issue of the DE normalizer not accepting comma-separated digit strings Signed-off-by: Simon Zuberek <[email protected]> * Fixes the issue where the normalizer didn't accept time formatted as 00.00 Uhr or 0.00 Uhr Signed-off-by: Simon Zuberek <[email protected]> * Fixes the issue where the the sentence-final period in sentences ending with a domain name would be tagged as part of that domain name Signed-off-by: Simon Zuberek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Removes unused imports Signed-off-by: Simon Zuberek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes the formatting Signed-off-by: Simon Zuberek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes NVIDIA#166 for DE Signed-off-by: Simon Zuberek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updates grammar paths Signed-off-by: Simon Zuberek <[email protected]> * Minor Fixes Signed-off-by: Simon Zuberek <[email protected]> * Fixes test cases Signed-off-by: Simon Zuberek <[email protected]> --------- Signed-off-by: Simon Zuberek <[email protected]> Co-authored-by: Simon Zuberek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
This PR implements DE TN fixes for the following issues:
@zoobereqand@zoobereq.net)2.30and02.30)This PR does not address the following:
Before your PR is "Ready for review"
Pre checks:
git commit -sto sign.pytestor (if your machine does not have GPU)pytest --cpufrom the root folder (given you marked your test cases accordingly@pytest.mark.run_only_on('CPU')).bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...pytestand Sparrowhawk here.__init__.pyfor every folder and subfolder, includingdatafolder which has .TSV files?Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.to all newly added Python files?Copyright 2015 and onwards Google, Inc.. See an example here.try import: ... except: ...) if not already done.PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.