{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Wikipedia API for Python.ipynb",
"provenance": [],
"authorship_tag": "ABX9TyOqBVPlO7t/H/K54tPNQXu0",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Ko5a1q2y1cOl",
"colab_type": "text"
},
"source": [
"# Wikipedia API for Python\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "rCQjrE2n1drg",
"colab_type": "text"
},
"source": [
"## In this tutorial let us understand the usage of Wikipedia API.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "qib3T9UX1fTy",
"colab_type": "text"
},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Krw-FZ7F10j7",
"colab_type": "text"
},
"source": [
"# Introduction\n",
"\n",
"Wikipedia, the worldâs largest and free encyclopedia. It is the land full of information. I mean who would have used Wikipedia in their entire life (If you havenât used it then most probably you are lying). The python library called `Wikipedia` allows us to easily access and parse the data from Wikipedia. In other words, you can also use this library as a little scraper where you can scrape only limited information from Wikipedia. We will see how can we do that today in this tutorial.\n",
"\n",
"\n",
"\n",
"\n",
"---\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "HDAg8LhB1-Xh",
"colab_type": "text"
},
"source": [
"# Installation\n",
"\n",
"The first step of using the API is manually installing it. Because, this is an external API itâs not built-in, so just type the following command to install it.\n",
"\n",
"* If you are using a [jupyter notebook](https://colab.research.google.com/notebooks/intro.ipynb) then make sure you use the below command (with the â!â mark â the reason for this is it tell the jupyter notebook environment that a command is being typed (AKA **command mode**).\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "GciTaTsc2Kk_",
"colab_type": "code",
"colab": {}
},
"source": [
"!pip install wikipedia"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "A-4eBwuy2MAV",
"colab_type": "text"
},
"source": [
"* If you are using any IDE such as [Microsoft Visual Studio Code](https://code.visualstudio.com/), [PyCharm](https://www.jetbrains.com/pycharm/) and even [Sublime Text](https://www.sublimetext.com/3) then make sure in the terminal you enter the below command:\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "nx8uaHvZ2Wzb",
"colab_type": "code",
"colab": {}
},
"source": [
"pip install wikipedia"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "vgkki4_c2Zvb",
"colab_type": "text"
},
"source": [
"After you enter the above command, in either of the above two cases you will be then prompted by success message like the one shown below. This is an indication that the library is successfully installed.\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "Dn1zyNjJ2bQP",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 297
},
"outputId": "6c757c2b-d52e-4dce-8f7f-e52a3dc11fac"
},
"source": [
"!pip install wikipedia"
],
"execution_count": 1,
"outputs": [
{
"output_type": "stream",
"text": [
"Collecting wikipedia\n",
" Downloading https://files.pythonhosted.org/packages/67/35/25e68fbc99e672127cc6fbb14b8ec1ba3dfef035bf1e4c90f78f24a80b7d/wikipedia-1.4.0.tar.gz\n",
"Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.6/dist-packages (from wikipedia) (4.6.3)\n",
"Requirement already satisfied: requests<3.0.0,>=2.0.0 in /usr/local/lib/python3.6/dist-packages (from wikipedia) (2.21.0)\n",
"Requirement already satisfied: urllib3<1.25,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests<3.0.0,>=2.0.0->wikipedia) (1.24.3)\n",
"Requirement already satisfied: idna<2.9,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests<3.0.0,>=2.0.0->wikipedia) (2.8)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests<3.0.0,>=2.0.0->wikipedia) (2019.11.28)\n",
"Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests<3.0.0,>=2.0.0->wikipedia) (3.0.4)\n",
"Building wheels for collected packages: wikipedia\n",
" Building wheel for wikipedia (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
" Created wheel for wikipedia: filename=wikipedia-1.4.0-cp36-none-any.whl size=11686 sha256=d0d5cc5f62e177020a96252ea5991ec3839cb9e4a302f3d217426ce0a2c406d5\n",
" Stored in directory: /root/.cache/pip/wheels/87/2a/18/4e471fd96d12114d16fe4a446d00c3b38fb9efcb744bd31f4a\n",
"Successfully built wikipedia\n",
"Installing collected packages: wikipedia\n",
"Successfully installed wikipedia-1.4.0\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "90b60Aep2em7",
"colab_type": "text"
},
"source": [
"\n",
"\n",
"---\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "wqIc6wa-2gRi",
"colab_type": "text"
},
"source": [
"# Search and Suggestion\n",
"\n",
"Now let us see some of the built-in methods provided by the Wikipedia API. The first one is Search and Suggestion. Iâm pretty sure you guys might know the usage of these two methods because of its name.\n",
"\n",
"## Search\n",
"\n",
"The search method returns the search result for a query. Just like other search engines, Wikipedia has its own search engine, you can have a look at it below:\n",
"\n",
"[Wikipedia Search](https://en.wikipedia.org/w/index.php?search)\n",
"\n",
"Now let us see how to retrieve the search results of a query using python. I will use **âCoronavirusâ** as the topic in todayâs tutorial because as well all know itâs trending and spreading worldwide. The first thing before starting to use API you need to first import it.\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "MEoZNFvo2tM4",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 55
},
"outputId": "24d33ec0-0cc3-4b6c-8f06-7b6c2b7c1426"
},
"source": [
"import wikipedia\n",
"print(wikipedia.search(\"Coronavirus\"))"
],
"execution_count": 2,
"outputs": [
{
"output_type": "stream",
"text": [
"['Coronavirus', '2019â20 coronavirus pandemic', 'Severe acute respiratory syndrome coronavirus 2', 'Middle East respiratory syndrome-related coronavirus', 'Coronavirus disease 2019', '2020 coronavirus pandemic in California', 'Misinformation related to the 2019â20 coronavirus pandemic', 'Socio-economic impact of the 2019â20 coronavirus pandemic', 'Severe acute respiratory syndrome-related coronavirus', 'Severe acute respiratory syndrome coronavirus']\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "dAxRz5co209d",
"colab_type": "text"
},
"source": [
"The above are some of the most searched queries on Wikipedia if you donât believe me, go to the above link I have given and search for the topic and compare the results. And the search results change every hour probably.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xx9fUJqq220F",
"colab_type": "text"
},
"source": [
"There are some of the ways where you can filter the search results by using search parameters such as results and suggestion (I know donât worry about the spelling). The result returns the maximum number of results and the suggestion if True, return results and suggestion (if any) in a tuple.\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "JIFba2k924Dj",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 55
},
"outputId": "151cbc24-d19a-4b22-c65a-2a875320c8fb"
},
"source": [
"print(wikipedia.search(\"Coronavirus\", results = 5, suggestion = True))"
],
"execution_count": 3,
"outputs": [
{
"output_type": "stream",
"text": [
"(['Coronavirus', '2019â20 coronavirus pandemic', 'Middle East respiratory syndrome-related coronavirus', 'Severe acute respiratory syndrome coronavirus 2', 'Severe acute respiratory syndrome-related coronavirus'], None)\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UAaUc8Ix26dQ",
"colab_type": "text"
},
"source": [
"## Suggestion\n",
"\n",
"Now the suggestion as the name suggests returns the suggested Wikipedia title for the query or none if it doesn't get any.\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "5UbFt3hL29hh",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"outputId": "11b5fdc1-1f90-453d-9e21-a51321ea497d"
},
"source": [
"print(wikipedia.suggest('Coronavir'))"
],
"execution_count": 4,
"outputs": [
{
"output_type": "stream",
"text": [
"coronavirus\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Wk_fReEO3Aiu",
"colab_type": "text"
},
"source": [
"\n",
"\n",
"---\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "cnOV23eb3BCB",
"colab_type": "text"
},
"source": [
"# Summary\n",
"\n",
"To get the summary of an article use the **âsummaryâ** method as shown below:\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "4-WdvLw53HrO",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 72
},
"outputId": "3f61c67f-49ba-4334-8842-a17e48d63214"
},
"source": [
"print(wikipedia.summary(\"Coronavirus\"))"
],
"execution_count": 6,
"outputs": [
{
"output_type": "stream",
"text": [
"Coronaviruses are a group of related viruses that cause diseases in mammals and birds. In humans, coronaviruses cause respiratory tract infections that can be mild, such as some cases of the common cold (among other possible causes, predominantly rhinoviruses), and others that can be lethal, such as SARS, MERS, and COVID-19. Symptoms in other species vary: in chickens, they cause an upper respiratory tract disease, while in cows and pigs they cause diarrhea. There are yet to be vaccines or antiviral drugs to prevent or treat human coronavirus infections. \n",
"Coronaviruses constitute the subfamily Orthocoronavirinae, in the family Coronaviridae, order Nidovirales, and realm Riboviria. They are enveloped viruses with a positive-sense single-stranded RNA genome and a nucleocapsid of helical symmetry. The genome size of coronaviruses ranges from approximately 27 to 34 kilobases, the largest among known RNA viruses. The name coronavirus is derived from the Latin corona, meaning \"crown\" or \"halo\", which refers to the characteristic appearance reminiscent of a crown or a solar corona around the virions (virus particles) when viewed under two-dimensional transmission electron microscopy, due to the surface being covered in club-shaped protein spikes.\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2YqJfHWc3GrV",
"colab_type": "text"
},
"source": [
"But sometimes be careful, you might run into a `DisambiguationError`. Which means the same words with different meanings. For example, the word **âbassâ** can represent a fish or beats or many more. At that time the summary method throws an error as shown below.\n",
"\n",
"\n",
"\n",
"> **Hint**: Be specific in your approach\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "AYi54Xik3YPu",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"outputId": "88b8b9f1-0123-4d52-f776-2e92497567b1"
},
"source": [
"print(wikipedia.summary(\"bass\"))"
],
"execution_count": 7,
"outputs": [
{
"output_type": "stream",
"text": [
"/usr/local/lib/python3.6/dist-packages/wikipedia/wikipedia.py:389: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system (\"lxml\"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.\n",
"\n",
"The code that caused this warning is on line 389 of the file /usr/local/lib/python3.6/dist-packages/wikipedia/wikipedia.py. To get rid of this warning, pass the additional argument 'features=\"lxml\"' to the BeautifulSoup constructor.\n",
"\n",
" lis = BeautifulSoup(html).find_all('li')\n"
],
"name": "stderr"
},
{
"output_type": "error",
"ename": "DisambiguationError",
"evalue": "ignored",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mDisambiguationError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m