Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
ProQuest LibGuides Banner ProQuest LogoProQuest LibGuides homeProQuest LibGuides home

TDM Studio

A text and data mining solution for research, teaching and learning

Accessing your Jupyter notebook

 

We will go into the development environment by clicking on Open Jupyter Notebook.

Main Interface

 

 

When you first enter the Jupyter notebook you will see several folders.  One that states start here which is to help you get started and access your dataset.  

You will also find answers to common questions in the ProQuest TDM Studio Manual folder.  You can find information about importing and exporting and Frequently Asked Questions.

When you first sign up for TDM Studio, your team will be provided with an onboarding session to walk you through the interface and notebook to familiarize your team with the environment and answer any questions in order to get their project started.


From here, we will choose to look at the Covid Topic Modeling Example.

Topic Modelling Example

This Topic Modeling script is an example using matrix factorization for detecting topics within a dataset of newspaper documents where we searched for the terms COVID OR Coronavirus. This one is written in python as you can see in the upper right corner.  You can write your scripts in either R or python within Jupyter notebook

Topic Modeling is just one example of text mining but it provides us with methods to organize, understand and summarize large collections of textual information. 

It helps in:

  • Discovering hidden topical patterns that are present across the collection
  • Annotating documents according to these topics
  • Using these annotations to organize, search and summarize texts

Topic modelling can be described as a method for finding a group of words (i.e topic) from a collection of documents that best represents the information in the collection.

This topic modeling example produces a series of visualizations and scores the reoccurrence of the listed topics within documents across time. The red line was also plotted on the topic model to represent the number of new COVID cases in the UK over the same time period.

In this example,  You can see the topics of – virus, cases, health Italy, spread, outbreak.  From this graph, we could possibly interpret that it was likely the virus peaked in Italy several weeks before the UK by comparing the peaks of the blue and red graphs.  The blue graph is representing the presence of these topics in the news and the Italy cases were being heavily discussed in the news.

Of course this is just the beginning of research but gives the researcher an idea of connected topics for further investigation.
 

If we scroll down further, you can also see the topics of economy, debt, crisis and gdp being heavily reported. This directly precedes the peak of cases in the UK (at least at the point of developing this model) but the financial reporting and impacts was being reported and experiencing significant impacts as COVID made its way across the globe.

These are simple topics that can again give the researcher a good idea or connection of data to make an interesting research topic or dig further into the analysis.

Now from this point, you can export the tables and data behind these graphs, the visualizations themselves, the script and any derivative data.  The only thing that cannot be exported is the full text or any consumptive information that would allow the researcher to reconstruct the full text.