Large-scale Content-based Image Retrieval System (CBIR) for Interactive Search through Virtual Solar Observatory - Montana NASA EPSCoR

Led by Dr. Rafal Angryk, who recently took a postition at Georgia State University and Dr. Petrus “Piet” C. Martens of MSU’s Solar Physics Group, the Content-Based Image Retrieval System (CBIRS) for Interactive Search through NASA’s VSO is a unique twist on a concept everyone is familiar with: search engines. Search engines have come a long way in the last fifteen years, and the CBIRS is another step in the evolution of the concept. Search engines use data mining concepts similar to what CBIRS will use. When a user opens a web browser and conducts a search based on a list of keywords, the search engine accesses a database – called an index – that contains a list of websites associated with a list of keywords. This is an extremely simplified explanation, but it is the basic idea and also the same principle behind seeking images in solar research. The difference is that solar researchers are looking for images stored in a database.

The CBIRS is an attempt to automate and add a visually driven aspect to the experience of searching solar mission databases. The process of retrieving data in solar research is not for the impatient. The difficulty is comparing a list of user (the researcher) supplied search terms to vast amounts of high resolution images stored in massive databases. The images stored in different repositories are stored with a list of keywords, similar to the way in which search engines index webpages.

One unique aspect of the CBIRS, and one of the exciting challenges of mining data for Dr. Angryk, is how a search will be performed by the user. Instead of a list of search terms, the user will provide an example image. At the level of the pixel, events in solar images do not have crisp, highly contrasted edges, but more closely resemble fuzzy gradients (Fig 1). The CBIRS will be able to determine edges of events, like a solar flare for example, and distinguish from other events that might be in the vicinity. The CBIRS will be intelligent enough to ignore these extraneous markers on the image and search only for events the researcher seeks. Since the CBIRS will run locally (that is, it is installed on the end-user’s computer as opposed to an application accessed via a website), it will learn from the user and become more able to “interpret” the researcher’s intention.

The end result is searches based on example images with returns that match more closely what the researcher wanted. Less time sifting through results equates to more time for actual research. The CBIRS project is one of 27 groups awarded funding by NASA for a three year period starting in 2011 through NASA's Experimental Program to Stimulate Competitive Research (EPSCoR). The group intends to release the CBIRS as modularized Open Source software under the General Public License (GPL). This license allows for the alteration and redistribution of the software. Those with the ability are legally granted permission to build on it, and re-release it, provided it is re-released under the same license and remains free for public use (www.gnu.org).

Images used for the solar observatory are as crucial to research as they are unavoidable. The repositories are extremely large and not highly intuitive. CBIRS will use images from several solar missions. First among these missions, the Transition Region and Coronal Explorer (TRACE), was launched in 1998 and was required to run for a term of at least one year, but lasted until it was shut down in 2010. During that twelve year period, millions of images were taken that continue to be used in solar research (i). A second mission that collected data that will be used by CBIRS is the Solar and Heliospheric Observatory Extreme ultraviolet Imaging Telescope (SoHO/EIT). It is a collaborative mission between NASA and the European Space Agency (ESA). It was launched in 1995 and is still running (ii). Hinode X-ray Telescope, a third data source for CBIRS through VSO, is a multinational collaborative mission that is also currently running (iii). These three missions alone represent enough data to keep solar researchers busy and a strong foundation to build CBIRS.

Currently, a search in NASA’s VSO is no simple task and certainly not for the mildly curious. The provided search mechanisms on VSO’s website limit the searcher’s ability to construct intuitive search parameters (i.e. you can’t simply enter a list of simple keywords and click a “search” button). For a database to be queried, it must have keywords associated with each entry, so when you perform a search, your keywords must match at least some of those stored keywords. This is how text-based searches of databases work. The difficulty when performing text-based searches on databases of images is that you are limited only to the keywords associated with each individual image. If you want to study a specific feature of a solar image by finding other images with the same type of feature, the only results you will retrieve will be images with the keywords you enter. If there are no keywords associated with what you are looking for in any stored images, you will receive no results. The CBIRS proposes not only to search through the massive repositories of solar images, but to search in a non-textual way, that is, to “look” at the stored images and compare features to an example image that the searcher provides.

There is potential for this software to benefit other fields that use images for research. “There’s quite a lot of emphasis put on analysis of bio or medical images,” Angryk says, “and what we find is that these images are very similar to solar images.” For example, Magnetic Resonance Imaging (MRI) produces images that are monochromatic (grayscale). Other image recognition software (such as Google images) relies on color variations to retrieve results, which would not be very helpful when comparing MRI scans. The CBIRS could easily adapt to such fields, opening doors for medical researchers or to doctors looking for diagnostic validation, for example. Angryk speculates that the CBIRS could be used in different types of cancer research and diagnostics. Doctors could search a massive database (anonymously, to protect patient confidentiality) of images and see that a hundred other doctors reached the same (or different) diagnostic conclusions with images very similar to that of their patient, ultimately improving diagnoses and reducing time and cost of reaching conclusions.

i For more info on Trace: http://sunland.gsfc.nasa.gov/smex/trace/ and http://trace.lmsal.com/
ii SOHO’s website: http://sohowww.nascom.nasa.gov
iii Hinode’s info page: www.nasa.gov/mission_pages/hinode/

Team Website

Proposal Abstract