discussion / AI for Conservation  / 6 September 2018

Google unveils search engine for open data

Dataset Search enables users to find datasets stored across thousands of repositories on the Web, making these datasets universally accessible.

Google unveils search engine for open data

Google has unveiled a search engine to help researchers locate online data that is freely available for use. The company launched the service on 5 September, saying that it is aimed at “scientists, data journalists, data geeks, or anyone else”.

Dataset Search, now available alongside Google’s other specialized search engines, such as those for news and images — as well as Google Scholar and Google Books — locates files and databases on the basis of how their owners have classified them. It does not read the content of the files themselves in the way search engines do for web pages.

Experts say that it fills a gap and could contribute significantly to the success of the open-data movement, which aims to make data openly available for use and re-use.

Government agencies, scientific publishers, research institutions and even individual researchers maintain thousands of open-data repositories around the world, containing millions of data sets.

But researchers who want to know what types of data are available, or who hope to locate data they know already exist, often have to rely on word of mouth, says Natasha Noy, a computer scientist at Google AI in Mountain View, California.

This problem is especially serious for early-career researchers who are not already “plugged” into a network of professional connections, Noy says. It’s also a downside for those who do cross-disciplinary research — for example, an epidemiologist who needs access to climate data that could be relevant to the spread of a virus.

I saw this pop up on twitter this morning - seems interesting. @ac0159, @benkt or anyone else working neck deep in data - have you had a chance to look at it? Is it going to be useful? Curious to hear thoughts.

I did a quick trial just out of curiosity and did a search for 'biodiversity' - the info it seems to bring up is ...diverse... but seems to present useful high level info on the datasets (i.e. licensing, description, types of files the data comes in etc). My search may have been too broad a term, but seems to be an answer to the need for a possible repository of repositories we've been discussing?