<img height="1" width="1" style="display:none" src="https://q.quora.com/_/ad/fddcdc9dc8954bf7bdefaa9d33414665/pixel?tag=ViewContent&amp;noscript=1">

Data Enrichment in Action

November 30, 2022

Article by Steven Goldstein, CEO of ResoluteAI

In October we published Scientific data enrichment: We did it so you don't have to where we described our process for tagging our Foundation datasets with ontologies, taxonomies, and controlled vocabularies from the UMLS Metathesaurus. In this post, we illustrate how this works in practice for a researcher.

When searching through Publications1 within Foundation, for example, in Advanced Search you can use the Field pull-down to see a list of the ontologies you can limit your search to.

Screenshot of available ontologies to filter Publications within FoundationAn advanced search using the RxNorm taxonomy for ciprofloxacin reduces the number of results to 32,759; from there we can further narrow our search results using the filters in the left-hand navigation. For example, we can use either GO or MedDRA to find articles about drug resistance. Even though the GO and MedDRA taxonomies are different, we turn up a very similar number of articles at the intersection of ciprofloxacin and resistance.

Example of publications filtered using the RxNorm taxonomy and the available articles that can now be filtered using GO or MedDRA taxonomies.

 

We can then delve deeper using Analytics, where we can break down the articles about ciprofloxacin and drug resistance by Tag and by year.2

Heat map of publications by tags and associated MedDRA taxonomy by published date. For each article that is selected for review, there is a new information button in the left hand navigation that will show in a table how the article was tagged from each ontology, taxonomy, or controlled vocabulary. Here we show the MeSH tags for a given article:

Example of MeSH tags for an article within Foundation.We can also create a network graph3 that connects Tags from different taxonomies, in this case MedDRA and ICD10, to narrow our search results.

Network Graph example of connecting different tags from different taxonomies, in this case MedDRA and ICD10.

The benefits of tagging content with multiple ontologies, taxonomies, and controlled vocabularies are many. The two key benefits are:

  1. Researchers who are familiar with a given taxonomy will find it easier and faster to search and filter their search results.
  2. Often, search results that have unexpected tags are the most valuable. These results can lead researchers where they were not expecting to go, occasionally leading to a “that’s funny”4 moment. 

 



1 Publications on ResoluteAI’s Foundation platform includes CrossRef, PubMed, arXiv, and IEEE.
 
 
 
 

 

Let's talk

Steven Goldstein

Steven Goldstein is the CEO of ResoluteAI. He has spent his career developing information retrieval tools for researchers in many industries. ResoluteAI is at the forefront of the application of AI and ML for commercial scientific enterprises.