Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Text Mining: Text Mining Basics

What is Text Mining?

Text mining is the process of using automation to analyze collections of textual materials in order to capture key concepts and themes and uncover hidden relationships and trends.

This distilled, structured information can be used to address questions such as:

  • Which concepts occur together?
  • What else are they linked to?
  • What higher level categories can be made from extracted information?
  • What do the concepts or categories predict?
  • How do the concepts or categories predict behavior?

from About Text Mining, IBM Knowledge Center

Quick Introduction to Text Mining

How does Text Mining Work? (Elsevier, 1:34)

Text Mine Responsibly

Unauthorized use of programming tools such as Python, Selenium, webcrawlers, bots, etc. to scrape database search results or journal content is in violation of many of our licenses and can result in access being shutdown to the entire university.  

Please contact so that we can help you work with our vendors and to ensure what you are planning to do with potentially copyrighted texts complies with legal standards, including the publishing of your results. 

Getting Started with Text Mining

  1. What is your research question or your research goals? 
  2. What texts will address your research needs?
  3. Identify and locate the text to be mined.
  4. Consider the format of the text. Is the text in machine-readable format? Is the text high quality or does it need to be cleaned up? 
  5. Acquire the text via an API, authorized bulk downloading, or using platform provided or approved tools.
  6. Mine the text and extract structured data. Apply text mining algorithms to the source text.
  7. Build concept and category models. Identify the key concepts and/or create categories. The number of concepts returned from the unstructured data is typically very large. Identify the best concepts and categories for scoring.
  8. Analyze the structured data. Employ data mining methods, such as clustering, classification, and predictive modeling, to discover relationships between the concepts or predict future patterns.

About Text Mining, IBM Knowledge Center


Thank you to Stacy Reardon, UC Berkeley Literatures and Digital Humanities Librarian, for permission to use Text Mining & Computational Text Analysis as a model for this guide.