Text Analysis with Python: A Research-Oriented Guide is a quick and comprehensive reference on text mining using python code. The main objective of the book is to equip the reader with the knowledge to apply various machine learning and deep learning techniques to text data. The book is organized into eight chapters which present the topic in a structured and progressive way.
Key Features
- Introduces the reader to Python programming and data processing
- Introduces the reader to the preliminaries of natural language processing (NLP)
- Covers data analysis and visualization using predefined python libraries and datasets
- Teaches how to write text mining programs in Python
- Includes text classification and clustering techniques
- Informs the reader about different types of neural networks for text analysis
- Includes advanced analytical techniques such as fuzzy logic and deep learning techniques
- Explains concepts in a simplified and structured way that is ideal for learners
- Includes References for further reading
Table of Contents
1. Contents Preface1.1. Introduction
1.2. Natural Language
1.2.1. from Linguistics to Natural Language Processing (Nlp)
1.2.2. Natural Language Processing (Nlp)
1.3. Text Analysis
1.3.1. Advantages
1.3.2. Methods & Techniques
1.3.3. Sentiment Analysis (Sa)
1.3.4. Topic Modelling
1.3.5. Intent Identification
1.3.6. Keyword Extraction
1.3.7. Entity Recognition
1.3.8. Text Analysis Functionality
1.4. Text Summarization
1.4.1. Extraction
1.4.2. Abstractive Summarization
1.5. Text Mining and Workflow
1.5.1. Data Recovery
1.5.2. Data Extraction
1.5.3. Data Mining
- Conclusion
- References
2.1. Introduction
2.2. Working Environments of Python
- Google Colab
- Features of Google Collaboratory (Colab)
- Steps to Anaconda Installation
2.5. Mathematical Operations
2.6. Python Libraries and Concepts
- Libraries
- A). Math and Cmath Libraries
- B). Scipy Library
- C). Scikitlearn Library
- D). Numpy Library
- A). Arrays
- B). Data Frames
- C). Loops
- for Loop
- while Loop and the Else Branch
- Program:
- Conclusion
- References
3.1. Introduction
3.1. Importing Datasets
3.2. Data Reshaping
3.3. Pivot and Melt Functions
3.4. Stacking and Unstacking
3.5. Data Pre-Processing
- Outliers
- Missing Value Imputation
- Handling of Missing Data
- Mean Calculation
- Deleting of Specific Row
- Dummy Variables
- One Hot Encoding
- - Matplotlib
- - Ggplot Visualization
- - Geoplot Visualization
- - Regression Plots
- Conclusion
- References
- Introduction
- the Steps Followed for Text Mining Are:
- Why Should We Use Text Mining?
- Benefits of Text Mining
- Text Analysis in Real-Time
- Text Mining Applications
- Issues in Text Mining
- Program:
- Program:
- Program:
- Gensim Library
- Program:
- Output:
- Program
- Output
- Reading a Text File
- Steps for Reading a Text File in Python
- Open() Function
- Syntax
- Reading Text File
- Close ()
- Syntax:Close()
- Reading a Csv File
- Steps
- Reading Text from a Pdf File
- Import Pypdf2
- Program
- Program:
- Output:
- Program:
- Output
- Program:
- Program:
- Program:
- Output
- Program:
- Output:
- Program:
- Program:
- Program
- Output:
- Program
- Program:
4.6.1. Bag of Words
- Program:
- Limitations of Bag of Words
- Program
- Output
- Program:
- Output:
- Word2Vec
- Program:
- Output
- Document Term Matrix
- Program:
- Output
- Program
- Output
- Program
- Output:
- Program:
- Output
- Program
- Output
- Program
- Output
- Program
- Output
- Conclusion
- References
5.1. Introduction
5.2. Text Classification
5.3. Machine Learning-Based Text Classification
- Step by Step Explanation
5.4.1. Email Spam Detection
5.4.2. Social Media Reviews
5.4.3. Google Translator
5.4.4. Text Labelling Based on Content
5.5. Classification Algorithms
5.5.1. Naïve Bayes (Nb) Classifiers
- Case Study: Text Classification with Naïve Bayes
- Movie Review Classification Dataset
- Case Study Text Classification with Decision Tree Algorithms
- How Knn Will Work in Text Classifications
- Useful Information with Knn
- Case Study Text Classification with Knn
- from Texts to Vectors
- Advantages
- Case Study Text Classification with Knn
- Conclusions
- References
6.1. Introduction
6.2. Clustering Process
6.2.1. Word Clustering
6.2.2. Document Clustering
6.2.3. Term Frequency-Inverse Document Frequency (Tf-Idf)
6.3. Applications of Text Clustering in Real-Time
- Identifying Fake News
- Spam Filter
- Marketing and Sales
- Classifying Website Traffic
- Identifying Fraudulent or Criminal Activity
- Document Analysis
6.4.1. K-Means Clustering
- Advantages
- Disadvantages of K-Means Clustering
- K Means Clustering in Scikit-Learn
Author
- Mamta Mittal
- Gopi Battineni
- Bhimavarapu Usharani
- Lalit Mohan Goyal