Text Analysis with Python: A Research-Oriented Guide


Book
August 2022
Bentham Science Publishers Ltd
ID: 5654206

Text Analysis with Python: A Research-Oriented Guide is a quick and comprehensive reference on text mining using python code. The main objective of the book is to equip the reader with the knowledge to apply various machine learning and deep learning techniques to text data. The book is organized into eight chapters which present the topic in a structured and progressive way.

Key Features

Introduces the reader to Python programming and data processing
Introduces the reader to the preliminaries of natural language processing (NLP)
Covers data analysis and visualization using predefined python libraries and datasets
Teaches how to write text mining programs in Python
Includes text classification and clustering techniques
Informs the reader about different types of neural networks for text analysis
Includes advanced analytical techniques such as fuzzy logic and deep learning techniques
Explains concepts in a simplified and structured way that is ideal for learners
Includes References for further reading

Text Analysis with Python: A Research-Oriented Guide is an ideal guide for students in data science and computer science courses, and for researchers and analysts who want to work on artificial intelligence projects that require the application of text mining and NLP techniques.

1. Contents Preface
1.1. Introduction
1.2. Natural Language
1.2.1. from Linguistics to Natural Language Processing (Nlp)
1.2.2. Natural Language Processing (Nlp)
1.3. Text Analysis
1.3.1. Advantages
1.3.2. Methods & Techniques
1.3.3. Sentiment Analysis (Sa)
1.3.4. Topic Modelling
1.3.5. Intent Identification
1.3.6. Keyword Extraction
1.3.7. Entity Recognition
1.3.8. Text Analysis Functionality
1.4. Text Summarization
1.4.1. Extraction
1.4.2. Abstractive Summarization
1.5. Text Mining and Workflow
1.5.1. Data Recovery
1.5.2. Data Extraction
1.5.3. Data Mining

Conclusion
References

Chapter 2 Introduction to Python
2.1. Introduction
2.2. Working Environments of Python

Google Colab
Features of Google Collaboratory (Colab)

2.3.Working with Anaconda

Steps to Anaconda Installation

2.4. Creating the First Project in Google Colab Creating the First Project in Google Colab Creating the First Project in Google Colab Creating the First Project in Google Colab
2.5. Mathematical Operations
2.6. Python Libraries and Concepts

Libraries
A). Math and Cmath Libraries
B). Scipy Library
C). Scikitlearn Library
D). Numpy Library

2.7.Basic Concepts in Python

A). Arrays
B). Data Frames
C). Loops
for Loop
while Loop and the Else Branch
Program:
Conclusion
References

Chapter 3 Data Loading and Pre-Processing
3.1. Introduction
3.1. Importing Datasets
3.2. Data Reshaping
3.3. Pivot and Melt Functions
3.4. Stacking and Unstacking
3.5. Data Pre-Processing

Outliers
Missing Value Imputation
Handling of Missing Data
Mean Calculation
Deleting of Specific Row
Dummy Variables
One Hot Encoding

3.6. Data Visualization

- Matplotlib
- Ggplot Visualization
- Geoplot Visualization
- Regression Plots
Conclusion
References

Chapter 4 Text Mining

Introduction
the Steps Followed for Text Mining Are:
Why Should We Use Text Mining?
Benefits of Text Mining
Text Analysis in Real-Time
Text Mining Applications
Issues in Text Mining

4.1. Text Mining with Python

Program:
Program:
Program:
Gensim Library
Program:
Output:
Program
Output

4.2. Data Gathering

Reading a Text File
Steps for Reading a Text File in Python
Open() Function
Syntax
Reading Text File
Close ()
Syntax:Close()
Reading a Csv File
Steps
Reading Text from a Pdf File
Import Pypdf2
Program

4.3. Text Mining Pre-Processing Techniques

Program:
Output:
Program:
Output
Program:
Program:
Program:
Output
Program:
Output:
Program:
Program:

4.4. Feature Selection in Text Mining

Program
Output:

4.5. Text Summarization

Program
Program:

4.6. Text Extraction
4.6.1. Bag of Words

Program:
Limitations of Bag of Words

4.6.2. Tf-Idf

Program
Output
Program:
Output:
Word2Vec
Program:
Output
Document Term Matrix
Program:
Output

4.7. Text Visualization

Program
Output
Program
Output:
Program:
Output
Program
Output
Program
Output
Program
Output
Conclusion
References

Chapter 5 Text Classification in Python
5.1. Introduction
5.2. Text Classification
5.3. Machine Learning-Based Text Classification

Step by Step Explanation

5.4. Applications of Text Mining
5.4.1. Email Spam Detection
5.4.2. Social Media Reviews
5.4.3. Google Translator
5.4.4. Text Labelling Based on Content
5.5. Classification Algorithms
5.5.1. Naïve Bayes (Nb) Classifiers

Case Study: Text Classification with Naïve Bayes
Movie Review Classification Dataset

5.5.2. Decision Tree Classifiers

Case Study Text Classification with Decision Tree Algorithms

5.5.3. Nearest Neighbour Classifier

How Knn Will Work in Text Classifications
Useful Information with Knn
Case Study Text Classification with Knn

5.5.4. Support Vector Machines

from Texts to Vectors
Advantages
Case Study Text Classification with Knn
Conclusions

Chapter Highlights

References

Chapter 6 Text Clustering in Python
6.1. Introduction
6.2. Clustering Process
6.2.1. Word Clustering
6.2.2. Document Clustering
6.2.3. Term Frequency-Inverse Document Frequency (Tf-Idf)
6.3. Applications of Text Clustering in Real-Time

Identifying Fake News
Spam Filter
Marketing and Sales
Classifying Website Traffic
Identifying Fraudulent or Criminal Activity
Document Analysis

6.4. Clustering Algorithms with Code Implementation
6.4.1. K-Means Clustering

Advantages
Disadvantages of K-Means Clustering
K Means Clustering in Scikit-Learn

6.4.2. Hierarchical Clustering

Author

Mamta Mittal
Gopi Battineni
Bhimavarapu Usharani
Lalit Mohan Goyal

Key Features

Table of Contents

Author

Related Topics

Related Products

Principles of Soft Computing Using Python Programming. Learn How to Deploy Soft Computing Models in Real World Applications. Edition No. 1

Text Analysis Software Market Report 2025

Artificial Intelligence Programming with Python. From Zero to Hero. Edition No. 1

Introduction to Machine Learning with Python

Analyzing Social Media Networks with NodeXL. Insights from a Connected World. Edition No. 2