Wednesday, October 21, 2009
Interview with Gil Elbaz, Factual
Story by Benjamin F. Kuo
This morning's interview is with Gil Elbaz, founder of Factual (www.factual.com), a new startup based here in Los Angeles focused on open data sharing and accuracy. Gil is one of the founders of Applied Semantics, the firm acquired by Google for its AdSense technology. We spoke with Gil about the purpose of Factual, and what it's trying to do.
What's your new startup all about, and why did you start the company?
Gil Elbaz: What we have been working on, and what we offer now, is a platform where anyone can share and mash open data. It's so much more than that, but that crystallizes the key thing. That data can be on any subject--we're a horizontal platform--and a few of the examples you see on our site are a list of restaurants, things in the health space, and other partnerships. We see this as a community built on a trusted repository of structured data, something which ultimately helps everyone make decisions. Publishers can come and snap valuable data into our website, to augment end user's experience, and developers can help user our data and our API to build more innovative applications, and to be more productive because of the significant availability of this trusted data.
It really came from seeing that--even this far along in the evolution of the Internet--there is still a lot of ambiguous data out there. There is a challenge around access to good, clean, and structured data in a good format, with clarity around where it came from, and whether it should be trusted. That makes the lives of developers difficult. The government has terrific sources, but there isn't a simple place where you can find that data. We have improvement tools, and you can either use our technology or leverage the community to improve the data and clean that data. Our philosophy is that data drives the best types of decisions, but if you have bad data, you have bad data driving your decision.
How does this differ from the kind of data Amazon has said it will make available through services like S3? Is this similar?
Gil Elbaz: Amazon has made public some data sets, which is great. They are making storage available to people or to institutions who have data they want to share publicly, for free, on S3. We're similar, in regards to the fact that anyone can upload their data to share with the world. But, really, we are a platform that offers so much more--specifically around collaboration and deeper data technology. Unlike Amazon, once your data is up there, we allow anyone to explore that data, and not only access it, but publicly comment on it, expressing opinions on validity of any data--within every single cell of data. Unless that data set is read only, you can put in a differing view, and cite sources and comments. One of our big areas of differentiation is the fact that we have all of this technology to improve data, whether that is looking for other, related data sets, merging that data, or using the technology to mine for factual information, and integrate that into the data set.
In your launch you mention that Demand Media and others are using the platform. How does that work?
Gil Elbaz: We have bloggers, publishers, and other partners testing with us. They're using our data in various formats. In the case of Demand Media, they looked at us to be a host for their data within the health vertical. They're created a physician's table, which displays data on oncologists, and are integrating that into the section of their site where they offer information on oncologists--things like which specialties they have and what insurance they accept. The reason we are excited to partner with Demand is they know something about how to crowd source structured information. The key to this, is getting the word out to the largest community on a topic, to help maintain the most accurate database.
What's the story on how you ended up starting Factual from Google?
Gil Elbaz: I ended up at Google in 2003, through the acquisition of Applied Semantics, which we started working on in 1998. At Google, I had the amazing chance to continue working on AdSense, and broadening it, but also to build up what became Google's Santa Monica Office. It was tremendous working with so many talented people. After about three and a half years, I got that startup itch again, wanting to do something impactful. I thought another startup made sense. It's probably no coincidence I wanted to work on something on the structured data front. It was something I was extremely interested in and passionate about, which is structuring knowledge to be as useful as possible. At Applied Semantics, the information was about words, so we could algorithmically determine the context of a web page or news article. That became the underpinning of several products. What I'm doing now at Factual, is really taking that to the next level, to structure as much data on as many subjects as possible. With this philosophy of open data, I think that will allow us to touch the greatest number of developers.
Back when you first sold Applied Semantics to Google, ever think AdSense would become such a huge part of Google's?
Gil Elbaz: I think it was always clear to us that AdSense had a huge amount of potential. Even the, the online advertising industry was very large, and it's continued to grow. We knew that if you could match ads based on context, there would be significant potential there. Certainly, the fact that Google had the team that could really leverage their amazing network, and execute on it, meant that it drew larger than we were anticipating.
Back to Factual, why would developers want to use your platform?
Gil Elbaz: They can come, explore our data, and they can even download the data. There are lots of services out there, but many don't actually let a developer download all the data they're seeing in one keystroke. On every table, there's a file menu which allows developers to download the data to a CSV file. We'll offer the data over many more formats over time, but CSV is the most ubiquitous. Our licenses are also fairly unrestrictive, and it's the notion that that way we will get the greatest distribution to as many developers as possible
Would developers use your data live from your system, or download the data for their own use?
Gil Elbaz: It's a combination. In some cases, it's much easier to use our API and use that as a live database--for both accessing the available information, as well as feeding the new information coming in through crowd sourcing or other curation mechanisms. In other cases, it will require the download of the full data set, because of the advanced way you might be indexing or processing the data, which you can't do through our API.
It looks like you've got a team here working on all of this?
Gil Elbaz: We've been building up our back end and algorithmic technology team here in L.A., and we've been here since last year. We also have an office in Shanghai.
Gil Elbaz: We've been working hard on this technology, and over the last couple of months we've becoming more external. We're going through partnership conversations, and starting to outreach to people who we think it might be terrific to engage with. For example, we recently added Esther Dyson to our advisory board, who is someone I know and have respect for. We want to be able to pick her brain on emerging trends, especially around open data and structured data.
If we can make data better and more accurate through our mechanisms, we'll be very happy. We'll see if we stay as under-the-radar as we were with Applied Semantics or not, but having people aware of us is not our goal--our goal is to make the largest impact possible.Thanks!