Intellipaat

Intellipaat

Explore Online Courses Free Courses Hire from us Become an Instructor Reviews
All Courses
  • Articles
  • Tutorials
  • Interview Questions
Home > Blog > Tutorials > Python Tutorial For Beginners > Web Scraping with Python – A Step-by-Step Tutorial

Web Scraping with Python – A Step-by-Step Tutorial

By Lithin Reddy | Last updated on November 22, 2024 | 86705 Views
Share this article
Previous
Next
Tutorial Playlist
  • Python Tutorials

    • Python Tutorial For Beginners
    • Introduction and History of Python
    • Python Download – How To Install Python [Easy Steps]
    • Python Version History
    • What is Python Programming Language?
    • Advantages and Disadvantages of Python
    • Python Data Types: Complete Guide with Examples (2026)
    • Python Arrays – The Complete Guide
    • Strings in Python
    • Python Numbers – Learn How to Create Prime Numbers, Perfect Numbers, and Reverse Numbers in Python
    • Python Classes and Objects
    • Python for Loops – A Step-by-Step Guide
    • Python If Else Statements – Conditional Statements with Examples
    • Python Syntax: A Guide To Writing Basic Python Code
    • Python JSON – Parsing, Creating, and Working with JSON Data
    • File Handling in Python
    • Introduction to Python Modules
    • Python Operators
    • Enumerate() in Python – A Detailed Explanation
    • Python Set – The Basics
    • Python Datetime – A Guide to Work With Dates and Times in Python
    • Python Lists – A Complete Guide (With Syntax and Examples)
    • How to Install Pip in Python
    • What are comments in python
    • Tokens in Python – Definition, Types, and More
    • How to Take List Input in Python – Python List Input
    • Tuples in Python
    • Python Function – Example & Syntax
    • What is Regular Expression in Python
    • Python Modules, Regular Expressions & Python Frameworks
    • How to Sort a List in Python Without Using Sort Function
    • How to Compare Two Strings in Python?
    • What is Type Casting in Python with Examples?
    • List vs Tuple in Python
    • Identifiers in Python
    • A Complete Guide to Data Visualization in Python
    • What is Recursion in Python?
    • Python Lambda Functions – A Beginner’s Guide
    • List Comprehension in Python
    • Python Built-in Functions
    • Dictionaries in Python – From Key-Value Pairs to Advanced Methods
    • Python Input and Output Commands
    • Web Scraping with Python – A Step-by-Step Tutorial
    • Exception Handling in Python with Examples
    • Numpy – Features, Installation and Examples
    • Python Pandas – Features and Use Cases (With Examples)
    • SciPy in Python Tutorial
    • Introduction to Matplotlib in Python
    • Scikit-Learn Cheat Sheet: Python Machine Learning
  • Python Tutorials

    • Python Tutorial For Beginners
    • Introduction and History of Python
    • Python Download – How To Install Python [Easy Steps]
    • Python Version History
    • What is Python Programming Language?
    • Advantages and Disadvantages of Python
    • Python Data Types: Complete Guide with Examples (2026)
    • Python Arrays – The Complete Guide
    • Strings in Python
    • Python Numbers – Learn How to Create Prime Numbers, Perfect Numbers, and Reverse Numbers in Python
    • Python Classes and Objects
    • Python for Loops – A Step-by-Step Guide
    • Python If Else Statements – Conditional Statements with Examples
    • Python Syntax: A Guide To Writing Basic Python Code
    • Python JSON – Parsing, Creating, and Working with JSON Data
    • File Handling in Python
    • Introduction to Python Modules
    • Python Operators
    • Enumerate() in Python – A Detailed Explanation
    • Python Set – The Basics
    • Python Datetime – A Guide to Work With Dates and Times in Python
    • Python Lists – A Complete Guide (With Syntax and Examples)
    • How to Install Pip in Python
    • What are comments in python
    • Tokens in Python – Definition, Types, and More
    • How to Take List Input in Python – Python List Input
    • Tuples in Python
    • Python Function – Example & Syntax
    • What is Regular Expression in Python
    • Python Modules, Regular Expressions & Python Frameworks
    • How to Sort a List in Python Without Using Sort Function
    • How to Compare Two Strings in Python?
    • What is Type Casting in Python with Examples?
    • List vs Tuple in Python
    • Identifiers in Python
    • A Complete Guide to Data Visualization in Python
    • What is Recursion in Python?
    • Python Lambda Functions – A Beginner’s Guide
    • List Comprehension in Python
    • Python Built-in Functions
    • Dictionaries in Python – From Key-Value Pairs to Advanced Methods
    • Python Input and Output Commands
    • Web Scraping with Python – A Step-by-Step Tutorial
    • Exception Handling in Python with Examples
    • Numpy – Features, Installation and Examples
    • Python Pandas – Features and Use Cases (With Examples)
    • SciPy in Python Tutorial
    • Introduction to Matplotlib in Python
    • Scikit-Learn Cheat Sheet: Python Machine Learning
`; ip_get_section_iq.innerHTML = sidebarhtml_desk; playlistmobile.innerHTML = sidebarhtml_desk; var ip_iq_scriptToRemove = document.getElementById('ip-blog-iq-script-removal'); if (ip_iq_scriptToRemove) { ip_iq_scriptToRemove.remove(); } var activeSubmenuItems = document.querySelectorAll('.tutorial_list_submenu li.active'); activeSubmenuItems.forEach(function(activeItem) { var rootParentLi = activeItem.closest('.maincata'); if (rootParentLi) { rootParentLi.classList.add('opentutorialsubmenu'); } }); var ip_blog_tutorialListMenu = document.querySelector('.tutorial_list_menu'); if(ip_blog_tutorialListMenu){ var ip_blo_activeItem = ip_blog_tutorialListMenu.querySelector('li.active'); var lastlink = ''; var nextlink = ''; var total = 0; jQuery('#TutorialLeftArea .maincata ul').children('li').each(function(indexx) { total = indexx; }); jQuery('#TutorialLeftArea .maincata ul').children('li').each(function(i) { var isActive = jQuery(this).hasClass('active'); if(isActive){ if(i !== 0){ var lastIndexedElement = jQuery('#TutorialLeftArea .maincata ul').children('li').eq(i - 1); lastlink = lastIndexedElement.children('a').attr('href'); }else{ lastlink = ''; } if(total > i){ var nextIndexedElement = jQuery('#TutorialLeftArea .maincata ul').children('li').eq(i + 1); nextlink = nextIndexedElement.children('a').attr('href'); }else{ nextlink = ''; } return false; } }); var ip_blog_prevBlog = document.querySelector('.prev-blog a'); var ip_blog_nextBlog = document.querySelector('.next-blog a'); if (lastlink !== '' && typeof lastlink !== 'undefined') { ip_blog_prevBlog.setAttribute('href', lastlink); }else { if(ip_blog_prevBlog){ ip_blog_prevBlog.style.display = 'none'; } } if (nextlink !== '' && typeof nextlink !== 'undefined') { ip_blog_nextBlog.setAttribute('href', nextlink); }else { if(ip_blog_nextBlog){ ip_blog_nextBlog.style.display = 'none'; } } } function ip_blog_setActiveLink() { var divElements = document.querySelectorAll('div[id]'); var links = document.querySelectorAll('.interview-question-bookmark-list-alt li a'); var activeLink = null; divElements.forEach(function(div) { if (ip_blog_isInViewportThreshold(div, 50)) { var ip_blog_divId = div.getAttribute('id'); links.forEach(function(link) { if (link.getAttribute('href') === '#' + ip_blog_divId) { activeLink = link; } }); } }); links.forEach(function(link) { link.classList.remove('active'); }); if (activeLink) { activeLink.classList.add('active'); } } function ip_blog_isInViewportThreshold(element, threshold) { var rect = element.getBoundingClientRect(); var windowHeight = window.innerHeight || document.documentElement.clientHeight; var topThreshold = rect.top - threshold; var bottomThreshold = rect.bottom + threshold; return topThreshold <= windowHeight && bottomThreshold >= 0; } window.addEventListener('scroll', ip_blog_setActiveLink); window.addEventListener('load', ip_blog_setActiveLink); }); function ip_blg_findClosestAnchor(element) { while (element) { if (element.tagName === 'A') { return element; } element = element.parentNode; } return null; } function ip_bl_v_scrollToDiv(event_pb, offset) { event_pb.preventDefault(); const ip_bl_linkElement = ip_blg_findClosestAnchor(event_pb.target); if (ip_bl_linkElement) { const it_bl_hashValue = ip_bl_linkElement.getAttribute('href').substring(1); const it_blg_vf_targetElement = document.getElementById(it_bl_hashValue); if (it_blg_vf_targetElement) { jQuery('html, body').animate({ scrollTop: jQuery('#' + it_bl_hashValue).offset().top - offset }, 1000); } } } document.addEventListener('DOMContentLoaded', function() { let it_bl_offset = 0; const ip_blo_vi_anchorLinks = document.querySelectorAll('a[href^="#"]'); ip_blo_vi_anchorLinks.forEach(function(linkip_bg) { linkip_bg.addEventListener('click', function(event_pb) { setTimeout(function() { console.log('Offset passed during click: ' + it_bl_offset); ip_bl_v_scrollToDiv(event_pb, it_bl_offset); }, 0); }); }); });

Table of content

  • What Is Web Scraping in Python?
  • Why Web Scraping Using Python?
  • Is Web Scraping Python Legal?
  • Python Web Scraping Rules
  • How to Perform Web Scraping Using Python?
  • Web Scraping Python Workflow
  • Setting up Python Web Scraper
  • Demo: Web Scraping Wikipedia
Show More

Web Scraping Using Python

See More

Web scraping Python has been around for a while now, but it has become more popular in the past decade. Web Scraping using Python is very easy. With the help of Python, extracting data from a web page can be done automatically. In this module, we will discuss web scraping in Python from scratch. Also, this tutorial will be guiding us through a step-by-step demonstration to our first web scraping Python project.

Watch this Python Web Scraping Video

Video Thumbnail

What Is Web Scraping in Python?

Python Web scraping is nothing but the process of collecting data from the web. Web scraping in Python involves automating the process of fetching data from the web. In order to fetch the web data, all we need is the URL or the web address that we want to scrape from. The fetched data will be found in an unstructured form. In order to make use of the data or collect useful insights from it, we transform it into a structured form. Once converted into a structured form, we need to store the data for further processing. The whole process is called web scraping.

Become a Python Expert

Why Web Scraping Using Python?

Now that we are familiar with what web scraping in Python is, let us discuss why to perform web scraping using python or for what business scenarios Python web scraping is useful. We all agree to the fact that data has become a commodity in the 21st century, data-driven technologies have experienced a significant rise, and there is an abundance of data generated from different sources on a daily basis. But, how do we collect data in order to make use of it?
Some of the industrial applications of web scraping:
Let us discuss for what business scenarios web scraping can be used.

Data Science

For learning Data Science, we need large amounts of data. Web scraping Python can fulfill this requirement.

Market Research

Before launching a product or service, companies can study the market in advance with the help of web scraping.

Tracking Competitive Pricing

Web scraping Python can help study the service or product pricing of the competitors to stay ahead in the market.

Monitoring Brand Value

Web scraping can be used in order to build brand intelligence and monitor how customers feel about a product or a service.

Lead Generation

With the help of web scraping, businesses can grow their lead generation by gathering contact details of businesses or individuals.

Get 100% Hike!

Master Most in Demand Skills Now!

Is Web Scraping Python Legal?

Well, this is one of the most common questions that arise when it comes to web scraping (also known as data scraping). The answer to it can’t be summed up in one word. Not all web scraping acts are considered legal. Web scraping Python services that extract publicly available data is legal. But, at times, it may cause legal issues, just the way any tool or technique in the world can be used for good as well as for bad. For example, web scraping non-public data, which is not accessible to everyone on the web, can be unethical, and also it can be an invitation to legal trouble. So, it is advised to avoid doing that. Let us take a look at some of the cases where web scrapers broke the rule and try to learn from them.
Some of the legal cases that found web scraping to be on the wrong side of the law:
Is Web Scraping Legal
This is why, in order to perform ethical web scraping, web scrapers need to follow some rules. Let us discuss them before scraping the web.

Python Web Scraping Rules: 

Before we start scraping the web, there are some rules that we must follow to avoid legal issues. They are:

  • Check the Terms and Conditions of the website before we scrape it. The Legal Use of Data section will have the information about data that we all can use. Usually, the data we scrape should not be used for commercial purposes. Use the text method as shown below. Every website keeps its rules defined in a txt file. We should inspect it to find the things that are allowed and most importantly the things that are not allowed. For example, let us inspect the twitter page.

Web Scraping Rules 1

  • Keep the pace low. If we request for data from the website too aggressively with our bot or our program, it might be considered as spamming. Add wait time in between to make the program behave like a human.
  • Use public content only.

How to Perform Web Scraping Using Python?

Now that we are familiar with what web scraping is and why web scraping is used, we are all set to dive right into the understanding of how to carry out a Python web scraping project. Let us take a look at the workflow of a Python web scraping project before moving ahead with the actual hands-on.

Web Scraping Python Workflow:

A Python web scraping project workflow is commonly categorized into three steps: First, fetch web pages that we want to retrieve data from; second, apply web scraping technologies, and finally, store the data in a structured form. The below image depicts the process of a web scraping project.
How to Perform Web Scraping Using Python

Setting up Python Web Scraper:

We will be using Python 3 and Jupyter Notebook throughout the hands-on. We will be importing two packages as well.

  • For performing HTTP requests: Import Python requests
  • For handling all of the HTML processing: Import BeautifulSoup from bs4

Demo: A Step-by-step Guide on Python Web Scraping a Wikipedia Page

In this demonstration, we will be walking through our first Python web scraping project. We will be scraping the Wikipedia page to fetch the List of Indian Billionaires published by Forbes in the year 2018. We can fetch the List of Billionaires even after it gets updated for the year 2019 with the help of the same Python web scraping program. Exciting, right? Let us move ahead and get our hands dirty.
Step 1: Fetch the web page and convert the HTML page into text with the help of the Python request library

#import the python request library to query a website<br>
import requests<br>
#specify the url we want to scrape from<br>
Link = "https://en.wikipedia.org/wiki/Forbes_list_of_Indian_billionaires"<br>
#convert the web page to text<br>
Link_text = requests.get(Link).text<br>
print(Link_text)<br>

Output:
Output
Step 2: In order to fetch useful information, convert Link_text (which is of string data type) into a BeautifulSoup object. Import BeautifulSoup library from bs4

#import BautifulSoup library to pull data out of HTML and XML files<br>
from bs4 import BeautifulSoup<br>
#to convert Link_text into a BeautifulSoup Object<br>
soup = BeautifulSoup(Link_text, 'lxml')<br>
print(soup)<br>

Output:
Output 2
Step 3: With the help of the prettify() function, make the indentation proper

#make the indentation proper<br>
print(soup.prettify())<br>

Output:
Output 3
Step 4: To fetch the web page title, use soup.title

#To take a look at the title of the web page<br>
print(soup.title)<br>

Output: The first title tag will be given out as an output.

<title>Forbes list of Indian billionaires - Wikipedia</title><br>

Step 5: We want only the string part of the title, not the tags

#Only the string not the tags<br>
print(soup.title.string)<br>

Output:

Forbes list of Indian billionaires - Wikipedia<br>

Step 6: We can also explore <a></a> tags in the soup object

#First <a></a> tag<br>
soup.a<br>

Output: First <a></a> tag can be seen here.

<a id="top"></a><br>

Step 7: Explore all <a></a> tags

#all the <a> </a> tags<br>
soup.find_all('a')<br>

Output:
Output 7
Step 8: Again, just the way we fetched title tags, we will fetch all table tags

#Fetch all the table tags<br>
all_table = soup.find_all('table')<br>
print(all_table)<br>

Output:
output 8
Step 9: Since our aim is to get the List of Billionaires from the wiki-page, we need to find out the table class name. Go to the webpage. Inspect the table by placing cursor over the table and inspect the element using ‘Shift+Q’.
step 9
So, our table class name is ‘wikitable sortable’. Let us move ahead and fetch the list.
Step 10: Now, fetch all table tags with the class name ‘wikitable sortable’

#fetch all the table tags with class name="wikitable sortable"<br>
our_table = soup.find('table', class_= 'wikitable sortable')<br>
print(our_table)<br>

Output:
step 10
Step 11: We can see that the information that we want to retrieve from the table has <a> tags in them. So, find all the <a> tags from table_links.

#In the table that we will fetch find the <a> </a>tags <br>
table_links = our_table.find_all('a')<br>
print(table_links)<br>

Output:
step 11
Step 12: In order to put the title on a list, iterate over table_links and append the title using the get() function

#put the title into a list<br>
billionaires = []<br>
for links in table_links:<br>
billionaires.append(links.get('title'))<br>
print(billionaires)<br>

Output:
Output 12
Step 13: Now that we have our required data in the form of a list, we will be using Python Pandas library to save the data in an Excel file. Before that, we have to convert the list into a DataFrame

#Convert the list into a dataframe<br>
import pandas as pd<br>
df = pd.DataFrame(billionaires)<br>
print(df)<br>

Output:
output 13
 
Step 14: Use the following method to write data into an Excel file.

#To save the data into an excel file<br>
writer = pd.ExcelWriter('indian_billionaires.xlsx', engine='xlsxwriter')<br>
df.to_excel(writer, sheet_name='List')<br>
writer.save()<br>

Now our data has been saved into an Excel workbook with the name ‘indian_billionaires.xlsx’ and inside a sheet named ‘List’.
Step 15: Just to make sure if the Excel workbook is saved or not, read the file using read_excel

#check if it’s done right or not<br>
df1= pd.read_excel('indian_billionaires.xlsx')<br>
df1<br>

Output:
output 15
Congratulations! We have successfully created our first web scraping program.
In this Python Tutorial, we have discussed web scraping using Python from scratch. We have also mentioned some of the must-follow rules while performing web scraping using python. The demonstration given at the end of the tutorial was a quick walk-through of our first web scraping project.

About the Author

Lithin Reddy
Lithin Reddy
Data Scientist | Technical Research Analyst - Analytics & Business Intelligence

Lithin Reddy is a Data Scientist and Technical Research Analyst with around 1.5 years of experience, specializing in Python, SQL, system design, and Power BI. Known for building robust, well-structured solutions and contributing clear, practical insights that address real-world development challenges.

Recommended Videos
Python Interview Questions And Answers
Python Interview Questions And Answers
Numpy Interview Questions For Freshers
Numpy Interview Questions For Freshers
Pandas Coding Interview Questions
Pandas Coding Interview Questions
OOPS Interview Questions and Answers
OOPS Interview Questions and Answers
Python Pandas Tutorial
Python Pandas Tutorial
Recommended Programs
Python Course
Python Course
5 (218118)
Free Python Certification Course Online
Free Python Certification Course Online
5 (53455)
Python Data Science Course
Python Data Science Course
5 (76533)
Software Development Engineering Course
Software Development Engineering Course
5 (23421)

Course Preview

Expert-Led No.1

Web Scraping with Python – A Step-by-Step Tutorial

Intellipaat

facebook twitter linkedin youtube insta telegram

Intellipaat

facebook twitter linkedin youtube insta telegram

Get Our App Now!

Intellipaat android app Intellipaat android app

Get Our App Now!

Intellipaat android app Intellipaat android app

Courses

  • Data Scientist Course
  • Machine Learning Course
  • Python Course
  • Devops Training
  • Business Analyst Certification
  • Cyber Security Courses
  • Business Analytics Training
  • Investment Banking Course
  • SQL Course
  • AWS DevOps Course
  • Full Stack Developer Course
  • Product Management Course

Courses

  • AWS Solutions Architect
  • UI UX Design Course
  • Salesforce Training
  • Selenium Training
  • Artificial Intelligence Course
  • Ethical Hacking Course
  • Azure Administrator Certification
  • Cyber Security Course
  • Digital Marketing Course
  • Electric Vehicle Course
  • Azure DevOps Course
  • Web Development Courses

Tutorials

  • Python Tutorial
  • AWS Tutorial
  • Devops Tutorial
  • Java Tutorial
  • Node Js Tutorial
  • Cyber Security Tutorial
  • Salesforce Tutorial
  • Azure Tutorial
  • Ethical Hacking Tutorial
  • Data Science Tutorial
  • Cloud Computing Courses
  • Python Data Science Course

Interview Questions

  • Python Interview Questions
  • AWS Interview Questions
  • Data Science Interview Questions
  • Devops Interview Questions
  • Salesforce Interview Questions
  • Java Interview Questions
  • SQL Interview Questions
  • React Interview Questions
  • Node Js Interview Questions
  • Digital Marketing Interview Questions

Browse By Domains

Data Science Salesforce Courses Cloud Computing Courses AI & Machine Learning Courses Project Management Courses Cyber Security and Ethical Hacking Courses Web Development Courses Job Oriented Courses Degree Courses Marketing CRM Courses Software Development Courses Doctorate Programs Undergraduate Courses Banking and Finance Courses Product Design Courses Electric and Hybrid Vehicle Courses Leadership & Management Courses Management Courses Generative AI Courses Design Thinking Courses Microsoft Certification Courses

Top Tutorials

Machine Learning Tutorial Power BI Tutorial SQL Tutorial Artificial Intelligence Tutorial Digital Marketing Tutorial Data Analytics Tutorial UI/UX Tutorial

Top Articles

Cloud Computing Data Science Machine Learning What is AWS Digital Marketing Cyber Security Salesforce Artificial Intelligence

Top Interview Questions

Selenium Interview Questions Azure Interview Questions Machine Learning Interview Questions Cyber Security Interview Questions Business Analyst Interview Questions and Answers C Interview Questions Data Analyst Interview Questions Software Engineering Interview Questions

© Copyright 2011 - 2026 Intellipaat Software Solutions Pvt. Ltd.
Media
Contact Us
Tutorials
Interview Questions

Address: 6th Floor, Primeco Towers, Arekere Gate Junction, Bannerghatta Main Road, Bengaluru, Karnataka 560076, India.

Disclaimer: The certification names are the trademarks of their respective owners.

INTPL_2026-05-22