Skip to content

pre-processing of text #890 issue#990

Open
ashishgit7 wants to merge 1 commit into
aimacode:masterfrom
ashishgit7:patch-1
Open

pre-processing of text #890 issue#990
ashishgit7 wants to merge 1 commit into
aimacode:masterfrom
ashishgit7:patch-1

Conversation

@ashishgit7
Copy link
Copy Markdown
Contributor

No description provided.

@ashishgit7 ashishgit7 changed the title pre-processing of text #928 issue pre-processing of text #890 issue Dec 15, 2018
@ashishgit7
Copy link
Copy Markdown
Contributor Author

#890 added pre-processing of text

Copy link
Copy Markdown
Contributor

@ad71 ad71 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some things in your PR which are not exactly in line with what aima-python aims to do

Comment thread nlp_apps.ipynb
"wordseq = words(federalist)\n",
"wordseq = wordseq[114:-3098]"
"wordseqs = words(federalist)\n",
"wordseqs = wordseqs[114:-3098]"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it necessary to change the name of the variable?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't change the name if variable actually I have created a new variable

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't wordseq already present in the repository?
Anyway, its fine if it makes things simpler.

Comment thread nlp_apps.ipynb
"outputs": [],
"source": [
"#removing stopwords\n",
"from nltk.corpus import stopwords\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to try to minimize the use of third-party libraries. The point of the nlp module is to have basic implementations of standard functions used in the domain. Importing from nltk is the opposite of what we want to do.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright , I'll try to create new function in place of nltk library in my next contribution to minimize third-party library

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment thread nlp_apps.ipynb
"source": [
"#stemming and lemmatization\n",
"from nltk.stem.wordnet import WordNetLemmatizer\n",
"lmtzr = WordNetLemmatizer()\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, we shouldn't use lemmatizers from third parties. Instead, we could have a lemmatizer within the repository, however basic it may be. The point of this repository is to be able to explain the underlying concepts of these algorithms, not directly import from other modules.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ad71 can we use this file for lemmatization.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sagar-sehgal We can, but make sure you read their license first. We might have to cite/acknowledge them. If the license allows, I think we can save a copy of the file in aima-data and carry on from there.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I'll try to do that. Thank You!

Comment thread nlp_apps.ipynb
],
"source": [
"' '.join(wordseq[:100])"
"' '.join(wordseqs[:100])"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was fine already

Comment thread nlp_apps.ipynb
"outputs": [],
"source": [
"wordseq = [w for w in wordseq if w != 'publius']"
"wordseqs = [w for w in wordseqs if w != 'publius']"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And so was this.

Comment thread nlp_apps.ipynb
]
},
"execution_count": 6,
"execution_count": 41,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A slightly picky complaint, but you can rerun a notebook to serialize the execution counts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants