Can you turn future news events into structured data?


This post is by from Nieman Journalism Lab


Click here to view on the original site: Original Post




David Smydra usually works at Google, but he’s spending a few weeks here at Lippmann House as a visiting Nieman Fellow, working on an independent project. He’d like your feedback on it. fn-jsonYou can read David’s summary of his idea for yourself, but in brief, he wants to find a way to allow the public to benefit from all the knowledge of future news events locked up inside newsrooms. Reporters know a lot about things that are going to happen or likely to happen. Some of these are highly predictable: Federal employment data will be released on the first Friday of the month. (Except when it isn’t.) Some are less structured: An indictment is followed by an arraignment and preliminary hearings and a trial. (Again, except when it isn’t.) Some are easily findable by a motivated member of the public; some are known only to, say, an experienced city hall reporter who understands the rhythms of the beat. Most of this information stays locked inside newsrooms — maybe in a staff-wide tickler file, maybe in a unstructured Word doc on a reporter’s laptop, maybe only in her head. Could you create a standardized way to gather these future news events in a way that could be (a) useful to the news organization, but also (b) perhaps publishable in some form to readers? That’s the project David is working on. He’s trying to come up with a standardized data format (FN-JSON, which is fun to say out loud) that could be used within or even across newsrooms. He’d love your thoughts.

Google’s new search feature makes it easier to find seminal articles on big topics


This post is by Caroline O'Donovan from Nieman Journalism Lab


Click here to view on the original site: Original Post




When big news breaks, readers clamor for updates — but they also yearn for context. For example, when word got out Monday afternoon that Jeff Bezos had spent $250 million to become the new owner of The Washington Post, there was suddenly a demand for all kinds of information. Who are the Grahams? How long have they owned the paper? What kind of leader has Bezos been at Amazon? What’s the status of other historic newspapers — have any others been purchased recently?

Some of this information would have been clear after a quick Google search, but piecing together a full portrait of the significance of what happened would likely have taken a combination of queries and resources — maybe a Wikipedia article, some breaking blog posts, a couple of company biographies — to put it all together.

Google wants to change that. Today, they announced a new search feature that aims to put in-depth and longform coverage of people, places, events and themes at your fingertips.

“We’ve done research that suggests this is a fairly common kind of information need, not a specific need but a broad interest,” says Google product manager Jake Hubert. For example, Hubert says that while most people who search for the term “YouTube” are looking for a video, or to be navigated to YouTube.com, some might be looking for, say, the essential essay on YouTube’s history, operations, business model and leadership. The new algorithm will help you do that.

It’s key to note that unlike doing an image search and specifying a type (“face” “clip art” “animated”), or a doing a news search and specifying a time period (“within the last month” “archives”), the in-depth feature is not something you can choose. Instead, if you search a broad term — ”happiness” and “Taylor Swift” are two examples — you might see an inset box with a few headlines from major publications pop up.

Hubert wouldn’t go into detail about how the algorithm works, so we don’t know what factors are incorporated into the final results — renown of publication, length of article, number of views, publishing date. (He was clear, however, that there is absolutely no element of human curation.)

“This content can come from anywhere on the web. You’ll find content from well-known publishers, but also from lesser known publications and blogs,” he said. “Sometimes, for local angles on an in-depth topic from your hometown or a particularly niche subject, a smaller, lesser known publication may actually have the best in-depth content.”

Of course, as the feature rolls out over the next few days, publishers will increasingly want to know how to get their own content promoted as seminal works on a topic. To that end, Google has provided some support via their help center. Structuring article data in a certain way — specifically annotating HTML so that authors, headlines, text body, images, date are clearly demarcated — will help Google surface the kind of in-depth articles the algorithm is looking for. For example, Hubert was particularly proud that doing a search for “Gloria Steinem” not only turned up a couple of key interviews and profiles, but also her fundamental 1969 essay, “After Black Power, Women’s Revolution.” To find that kind of needle in the haystack of the internet, it can really help if the author is clearly marked.

One possible result of the new search might be that more eyes are turned toward content produced by journalists in newsrooms rather than the aggregators we have come to rely on when looking for background information — Wikipedia, IMDb, or WebMD. It also suggests that Google is aware of an information gap that others are also trying to fill, a centralized hub for background and context on an issue. The motivation is similar to what, for example, Fast.Co Labs was aiming for when its team started experimenting with stub posts, which Chris Dannen described as a “way of telling the reader, ‘If you’re not up on the story, if you need context, go back and read the stub and it will make sense why this is timely, and why this is a person we’re talking to.’”

Google says they’ll keep experimenting with the feature depending on feedback, but they’re excited to take steps towards surfacing the content that remains useful and interesting long after its published, but can easily be lost in the fire hose of content that is the internet. But there’s also a note of warning to publishers that increasingly advanced search may be doing a better job of giving readers what they want than the publications are themselves.

“It’s a tricky thing. Unless you really know what publications have written this kind of content, you’re going to find yourself going to these publications and trying to find them. We really want to help users find it all from across the web,” says Hubert. “This makes it easier for users to discover this kind of content, that used to be not as easy to discover.”

Intercontinental collaboration: How 86 journalists in 46 countries can work on a single investigation


This post is by Caroline O'Donovan from Nieman Journalism Lab


Click here to view on the original site: Original Post




piggy-bank-offshore-banking-beach

On Thursday morning, the International Consortium of Investigative Journalists — a project of the Center for Public Integrity — will begin releasing detailed reports on the workings of offshore tax havens. A little over a year ago, 260 gigabytes of data were leaked to ICIJ executive dIrector Gerard Ryle; they contained information about the finances of individuals in over 170 countries.

Ryle was a media executive in Australia at the time he received the data, says deputy director Marina Walker Guevara. “He came with the story under his arm.” Walker Guevara says the ICIJ was surprised Ryle wanted a job in their small office in Washington, but soon realized that it was only through their international scope and experience with cross border reporting that the Offshore Project could be executed. The result is a major international collaboration that has to be one of the largest in journalism history.

“It was a huge step. As reporters and journalists, the first thing you think is not ‘Let me see how I can share this with the world.’ You think: ‘How can I scoop everyone else?’ The thinking here was different.” Walker Guevara says the ICIJ seriously considered keeping the team to a core five or six members, but ultimately decided to go with the “most risky” approach when they realized the enormous scope of the project: Journalists from around the world were given lists of names to identify and, if they found interesting connections, were given access to Interdata, the secure, searchable, online database built by the ICIJ.

Just as the rise of information technology has allowed new competition for the attention of audiences, it’s also enabled traditional news organizations to partner in what can sometimes seem like dizzyingly complex relationships. The ICIJ says this is the largest collaborative journalism project they have ever organized, with the most comparable involving a team of 25 cross border journalists.

In the end, the Offshore Project brings together 86 journalists from 46 countries into an ongoing reporting collaboration. German and Canadian news outlets (Süddeutsche Zeitung, Norddeutscher Rundfunk, and the CBC) will be among the first to report their findings this week, with The Washington Post beginning their report on April 7, just in time for Tax Day. Reporters from more than 30 other publications also contributed, including Le Monde, the BBC and The Guardian. (The ICIJ actually published some preliminary findings in conjunction with the U.K. publications as a teaser back in November.)

“The natural step wasn’t to sit in Washington and try to figure out who is this person and why this matters in Azerbaijan or Romania,” Walker Guevara said, “but to go to our members there — or a good reporter if we didn’t have a member — give them the names, invite them into the project, see if the name mattered, and involve them in the process.”

Defining names that matter was a learning experience for the leaders of the Offshore Project. Writes Duncan Campbell, an ICIJ founder and current data journalism manager:

ICIJ’s fundamental lesson from the Offshore Project data has been patience and perseverance. Many members started by feeding in lists of names of politicians, tycoons, suspected or convicted fraudsters and the like, hoping that bank accounts and scam plots would just pop out. It was a frustrating road to follow. The data was not like that.

The data was, in fact, very messy and unstructured. Between a bevy of spreadsheets, emails, PDFs without OCR, and pictures of passports, the ICIJ still hasn’t finished mining all the data from the raw files. Campbell details the complicated process of cleaning the data and sorting it into a searchable database. Using NUIX software licenses granted to the ICIJ for free, it took a British programmer two weeks to build a secure database that would allow all of the far-flung journalists not only to safely search and download the documents, but also to communicate with one another through an online forum.

“Once we went to these places and gathered these reporters, we needed to give them the tools to function as a team,” Walker Guevara said.

Even so, some were so overwhelmed by the amount of information available, and so unaccustomed to hunting for stories in a database, that the ICIJ ultimately hired a research manager to do searches for reporters and send them the documents via email. “We do have places like Pakistan where the reporters didn’t have much Internet access, so it was a hassle for him,” says Walker Guevara, adding that there were also security concerns. “We asked him to take precautions and all that, and he was nervous, so I understand.”

They also had to explain to each of the reporting teams that they weren’t simply on the lookout for politicians hiding money and people who had broken the law. “First, you try the name of your president. Then, your biggest politician, former presidents — everybody has to go through that,” Walker Guevara says. While a few headline names did eventually appear — Imelda Marcos, Robert Mugabe — she says some of the most surprising stories came from observing broader trends.

“Alongside many usual suspects, there were hundreds of thousands of regular people — doctors and dentists from the U.S.,” she says, “It made us understand a system that is a lot more used than what you think. It’s not just people breaking the law or politicians hiding money, but a lot of people who may feel insecure in their own countries. Or hiding money from their spouses. We’re actually writing some stories about divorce.”

In the 2 million records they accessed, ICIJ reporters began to get an understanding of the methods account holders use to avoid association with these accounts. Many use “nominee directors,” a process which Campbell says is similar to registering a car in the name of a stranger. But in their post about the Offshore Project, the ICIJ team acknowledges that, to a great extent, most of the money being channeled through offshore accounts and shell companies is actually not being used for illegal transactions. Defenders of the offshore banks say they “allow companies and individuals to diversify their investments, forge commercial alliances across national borders, and do business in entrepreneur-friendly zones that eschew the heavy rules and red tape of the onshore world.”

Walker Guevara says that, while that can be true, the “parallel set of rules” that governs the offshore world so disproportionately favor the elite, wealthy few as to be unethical. “Regulations, bureaucracy, and red tape are bothersome,” she says, “but that’s how democracy works.”

Perhaps the most interesting question surrounding the Offshore Project, however, is how do you get traditional shoe-leather journalists up to speed on an international story that involves intensive data crunching. Walker Guevara says it’s all about recognizing when the numbers cease to be interesting on their own and putting them in global context. Ultimately, while it’s rewarding to be able to trace dozens of shell companies to a man accused of stealing $5 billion from a Russian bank, someone has to be able to connect the dots.

“This is not a data story. It was based on a huge amount of data, but once you have the name and you look at your documents, you can’t just sit there and write a story,” says Walker Guevara. “That’s why we needed reporters on the ground. We needed people checking courthouse records. We needed people going and talking to experts in the field.”

All of the stories that result from the Offshore Project — some of which could take up to a year to be published — will live on a central project page at ICIJ.org. The team is also considering creating a web app that will allow users to explore some (though probably not all) of the data. In terms of the unique tools they built, Walker Guevara says most are easily replicable by anyone using NUIX or dtSearch software, but they won’t be open sourced. Other lessons from the project, like the inherent vulnerability of PGP encryption and “other complex cryptographic systems popular with computer hackers,” will endure.

“I think one of the most fascinating things about the project was that you couldn’t isolate yourself. It was a big temptation — the data was very addictive,” Walker Guevara says. “But the story worked because there was a whole other level of traditional reporting that was going and checking public records, going and seeing — going places.”

Photo by Aaron Shumaker used under a Creative Commons license.

Three lessons news sites can take from the launch of The Verge


This post is by from Nieman Journalism Lab


Click here to view on the original site: Original Post




Maybe it’s just the 30-something former rock critic in me, but I keep accidentally calling new gadget site The Verge The Verve instead. But whatever you call it, The Verge’s launch today is one of the most anticipated in the online news space in some time. The chance to build a new platform from the ground up, with talented editorial and tech teams attached, combined with the months of buildup at the placeholder site This Is My Next, meant a lot of people were waiting to see what they’d come up with.

And it is impressive: bold, chunky, and structured, all at the same time. The gadget/tech space has no shortage of competitors, and building a new brand against some established incumbents takes a bold move. Which of The Verge’s moves might you want to steal for your own news site? Here are three.

Don’t be afraid of big, bold visuals

Engadget, the tech site from which most of The Verge’s core staff came, has long committed itself to having big, 600-pixel-wide-or-so art on each of its posts, be they short or long. But the Verge takes that a step further. Just look at the home page — big beautiful images with lovely CSS-driven tinting in the top promo area, then more photos attached to nearly every linked story. Because every story has all the visual fixings, they can ebb and flow as the story moves down the front page. (The movement is still blog-style reverse-chronological.)

The story pages expand the photo well even more and feature page-width headline slots with a nice slab serif, Adelle Web. (Slab serifs are all the rage these days.)

The Verge’s short, aggregation-y posts get a bigger design treatment than most news sites’ feature stories do. (They also carry over Engadget’s highly annoying habit of burying the credit links for what they aggregate in a tiny box at post bottom.) But if you really want to see the power of big visuals, look at one of the site’s feature stories, like its review of the iPhone 4S or this takeout on survivalism — photos over 1,000 pixels wide, bold headlines and decks, structured story organization, embedded galleries, columns that don’t all stick to a common template well, page-width graphics. And check out the gallery and video pages, both of which stretch out Abe Lincoln-style to fill the browser window. In all, it’s the kind of bold look that you’re unlikely to see on most daily news sites; its design DNA lies much more in magazine layout.

That bold look comes with some tradeoffs, of course. While the front-page content is still generally newest-up-top, it’s not quite as obvious what’s new if it’s your second time checking the site in a day. And the front page has far less information density than a typical news site; on my monitor, the first screenful of The New York Times homepage (to go to the opposite extreme) includes links to 32 stories, videos, or slideshows, while The Verge’s has only eight. But that’s okay — while prolific, The Verge produces a lot less content than the Times, and I suspect the appealing graphical look will encourage scrolling more than a denser site would. And each story on The Verge homepage gets a bigger sales push — between a headline, an image, a deck, and an excerpt — than all but a few newspaper stories do on their front pages.

I suspect we’re going to see more of this big, bold, tablet-ish design approach finding its way back into more traditional news sites in the next year or so; you can already see movement in that direction comparing the Times’ (redesigned) opinion front to its (almost unchanged since 2006) homepage. In a world where an increasing proportion of traffic comes from social media and search — that is, from some place other than the front door — it makes sense that the burden of a site’s homepage to link to everything might be lightened.

Layer your reporting on top of structured data

It’s telling that the first item in the top navigation on The Verge is “Products.” Not “Articles” or “Latest News” — “Products.” Just about every significant product in the gadget universe — from cell phones to TVs to laptops — gets its own page in the underlying Verge taxonomy. Here are all the cameras, and here are all the gaming systems, for instance, and here are some sample product pages. (Intriguingly, you can search through them by using filters including “Rumored,” “Announced,” “Available,” “Canceled,” and “Discontinued.” Did you know there were 129 different tablets out there?)

The product pages feature basic information, full tech specs, related products, and in some cases “What You Need To Know” sections. These will be good for SEO and pageviews, and they’ll likely be useful to readers; stories about a product link prominently to their related product pages. (I’m actually a little surprised the product pages don’t take the logical next step and slap “Buy Now” links next to everything, with affiliate links to the appropriate vendors.)

Topic pages are nothing new, of course, but few news sites make this sort of commitment to being a reference point outside the boundaries of the traditional news story. A newspaper may not care much about the Nokia Lumia 800, but they could build their own semantic structured web of data around city council members, professional athletes, local restaurants, businesses, neighborhoods…whatever matters to readers. Most news organizations will have something that completes the SAT analogy The Verge : gadgets :: Your News Site : _________.

Build a place for community

Engadget has a famously active community — so much so that it had to turn off comments entirely for a stretch in 2010 when things were getting out of hand. (“What is normally a charged — but fun — environment for our users and editors has become mean, ugly, pointless, and frankly threatening in some situations…and that’s just not acceptable. Some of you out there in the world of anonymous grandstanding have gotten the impression that you run the place, but that’s simply not the case.”)

The Verge appears to be doubling down on community, though, adding topic-specific forums to the voluminous comment threads on specific entries. Forum posts get big bold presentation too. The same Josh Topolsky who wrote that rip of Engadget’s commenters above writes today that the new site is designed to let “reader voices be heard in a meaningful way…we think it’s totally cool and normal to be psyched about a product or brand and want to talk about it.” He also promises that active commenters and posters will get “special sneak previews of our newest features.”

Will it work out and generate positive, useful discussions (or at least enough pageviews to satisfy the ad sales team)? We’ll see. But it’s good to see some attention to reader forums, a form of reader interaction a number of news sites have walked away from in recent years.

What’s most promising about The Verge, though, is not any one specific element — it’s the fact that they’re giving a lot of thought to the form of their content, at a time when the basics of the blog format have congealed into a kind of design conventional wisdom. Here’s hoping other sites join that process of thinking about their tools and their structures along with their daily content.

Eric Schmidt: Google wants to get so smart it can answer your questions without having to link you elsewhere


This post is by from Nieman Journalism Lab


Click here to view on the original site: Original Post




Last night, Google executive chairman Eric Schmidt spoke at this year’s iteration of the D: All Things Digital conference. And while coverage of the talk focused on subjects like Google’s frenemies Apple and Facebook, Schmidt said something about search that I think is of interest to news organizations and other publishers.

The Wall Street Journal’s Walt Mossberg asked Schmidt about perceptions that Google’s search results are decreasing in quality, and whether there was an opening for a new search competitor to move into the space with a new innovation. Schmidt said that Google is constantly making improvements to its search algorithms, and then said this (it’s at 6:28 of the video above):

But the other thing that we’re doing that’s more strategic is we’re trying to move from answers that are link-based to answers that are algorithmically based, where we can actually compute the right answer. And we now have enough artificial intelligence technology and enough scale and so forth that we can, for example, give you — literally compute the right answer.

The video above is edited down from the full interview, so you can’t see what Schmidt said next, but according to Engadget’s liveblog, he next said something along the lines of “This is exactly what drove the acquisition of ITA,” the flight-data company that Google bought last year. That purchase allowed Google to get into the flight search business, so a search for “flights from Boston to Chicago” can now give you direct information at the top of the search results page on flights and schedules — information that Google plans to expand to direct price comparisons of the sort you’d see on Orbitz or Kayak.

The video on Google’s page about the acquisition notes that Google purchased ITA to get beyond “the traditional 10 blue links” of a Google search page and start providing the information directly.

That’s great — unless you’re behind one of those 10 blue links and you’ve been counting on Google sending you search traffic when someone searches for “flights from Boston to Chicago.”

The kind of shift Schmidt is talking about — from “link-based” to “algorithmically based” — could have a big impact on publishers in the business of providing answers to searchers questions. And not just the Demand Medias of the world who are attached at the neck to search — traditional publishers too.

There are already some questions Google feels confident enough about to answer directly, without sending the searcher off to another site. Try:

What’s the weather like in Cambridge today?

How is Apple’s stock price doing?

What time is it in Zanzibar?

What was the score in last night’s Mavs-Heat game? (Sadly — go Mavs!)

How many euros would $100 buy me?

How many people live in Canada?

What’s 73 times 14 minus 12?

In each case, Google gives you a direct answer before it presents you with links. Note that these sorts of questions deal in defined data sets — they’re numbers, mostly, or tied to a known set of geographic locations.

When it gets a query outside of those defined sets, it sometimes tries to use the artificial intelligence Schmidt is talking about. So try a search for population of boston instead of asking about Canada and this is what you get:

Google’s trying to figure it out, based how its AI has analyzed the data it’s spidered from around the web. (And it’s not a bad guess; the 2010 census said 617,594 people lived in Boston proper, with 7.6 million in the metropolitan area. Note that Google feels good enough about its guess to highlight mentions of “600,000″ in the traditional search results.)

For now, Google’s ability to answer questions directly is bound by the sorts of things its algorithms can know. But they’ll get smarter — and Schmidt’s comments make clear it’s a strategic goal of the company to ensure they get smarter. So imagine a point in the near future where Google can give direct answers to questions like:

What time is Modern Family on?

Who are the Republicans running for president?

What red blouses are on sale at Macy’s?

Who’s the backup left tackle for the New Orleans Saints?

Those all seem achievable enough — that’s all structured data. But each one of those already starts to disrupt things news organizations try to provide, either through content or advertising.

And imagine, further down the line, that Google’s AI improves to the point where it can answer questions like these:

Did Dallas city council approve that zoning change last night?

Was the stimulus package too small to be effective?

What’s going to replace the Space Shuttle program?

Which is Terrence Malick’s best movie?

Did Osama bin Laden really use his wife as a human shield?

Is the new My Morning Jacket album any good?

Some of those are complex enough that Google probably wouldn’t be able to give a single definitive answer, the way it can with a database of census data. But it’s not hard to imagine it could provide a Metacritic-like look at the summary critical opinion of the My Morning Jacket record, or an analysis of customer reviews of Malick’s DVDs at Amazon. It could dip into the growing sea of public data about government activity to tell you what happened at city council (and maybe figure out which parts of the agenda were important, based on news stories, community bloggers, and social media traffic). It could gather up articles from high-trust news and government sources on NASA and algorithmically combine them into just as much info as the searcher wants. It’s a shift in the focus of Google’s judgment; websites shift from competitors to be ranked against each other to data sources to be diced and analyzed to figure out an answer.

These things aren’t right around the corner — they quickly get to be really complicated AI problems. But they all point to the fact that Google is working hard to reduce the number of times searchers need to leave google.com to get answers to their questions. For all the times that Google has said it’s not in the content business, it’s not hard to imagine a future where its mission to “organize the world’s information” goes way beyond spidering and linking and into algorithmically processing for answers instead of PageRank.

That — much more than news organizations’ previous complaints about Google — could put real pressure on the business models of news websites. It challenges ideas of how to navigate the link economy and what ideas like search engine optimization, fair use, and aggregation mean. And it sure looked like Schmidt pointed the way last night.

Eric Schmidt: Google wants to get so smart it can answer your questions without having to link you elsewhere


This post is by from Nieman Journalism Lab


Click here to view on the original site: Original Post




Last night, Google executive chairman Eric Schmidt spoke at this year’s iteration of the D: All Things Digital conference. And while coverage of the talk focused on subjects like Google’s frenemies Apple and Facebook, Schmidt said something about search that I think is of interest to news organizations and other publishers.

The Wall Street Journal’s Walt Mossberg asked Schmidt about perceptions that Google’s search results are decreasing in quality, and whether there was an opening for a new search competitor to move into the space with a new innovation. Schmidt said that Google is constantly making improvements to its search algorithms, and then said this (it’s at 6:28 of the video above):

But the other thing that we’re doing that’s more strategic is we’re trying to move from answers that are link-based to answers that are algorithmically based, where we can actually compute the right answer. And we now have enough artificial intelligence technology and enough scale and so forth that we can, for example, give you — literally compute the right answer.

The video above is edited down from the full interview, so you can’t see what Schmidt said next, but according to Engadget’s liveblog, he next said something along the lines of “This is exactly what drove the acquisition of ITA,” the flight-data company that Google bought last year. That purchase allowed Google to get into the flight search business, so a search for “flights from Boston to Chicago” can now give you direct information at the top of the search results page on flights and schedules — information that Google plans to expand to direct price comparisons of the sort you’d see on Orbitz or Kayak.

The video on Google’s page about the acquisition notes that Google purchased ITA to get beyond “the traditional 10 blue links” of a Google search page and start providing the information directly.

That’s great — unless you’re behind one of those 10 blue links and you’ve been counting on Google sending you search traffic when someone searches for “flights from Boston to Chicago.”

The kind of shift Schmidt is talking about — from “link-based” to “algorithmically based” — could have a big impact on publishers in the business of providing answers to searchers questions. And not just the Demand Medias of the world who are attached at the neck to search — traditional publishers too.

There are already some questions Google feels confident enough about to answer directly, without sending the searcher off to another site. Try:

What’s the weather like in Cambridge today?

How is Apple’s stock price doing?

What time is it in Zanzibar?

What was the score in last night’s Mavs-Heat game? (Sadly — go Mavs!)

How many euros would $100 buy me?

How many people live in Canada?

What’s 73 times 14 minus 12?

In each case, Google gives you a direct answer before it presents you with links. Note that these sorts of questions deal in defined data sets — they’re numbers, mostly, or tied to a known set of geographic locations.

When it gets a query outside of those defined sets, it sometimes tries to use the artificial intelligence Schmidt is talking about. So try a search for population of boston instead of asking about Canada and this is what you get:

Google’s trying to figure it out, based how its AI has analyzed the data it’s spidered from around the web. (And it’s not a bad guess; the 2010 census said 617,594 people lived in Boston proper, with 7.6 million in the metropolitan area. Note that Google feels good enough about its guess to highlight mentions of “600,000″ in the traditional search results.)

For now, Google’s ability to answer questions directly is bound by the sorts of things its algorithms can know. But they’ll get smarter — and Schmidt’s comments make clear it’s a strategic goal of the company to ensure they get smarter. So imagine a point in the near future where Google can give direct answers to questions like:

What time is Modern Family on?

Who are the Republicans running for president?

What red blouses are on sale at Macy’s?

Who’s the backup left tackle for the New Orleans Saints?

Those all seem achievable enough — that’s all structured data. But each one of those already starts to disrupt things news organizations try to provide, either through content or advertising.

And imagine, further down the line, that Google’s AI improves to the point where it can answer questions like these:

Did Dallas city council approve that zoning change last night?

Was the stimulus package too small to be effective?

What’s going to replace the Space Shuttle program?

Which is Terrence Malick’s best movie?

Did Osama bin Laden really use his wife as a human shield?

Is the new My Morning Jacket album any good?

Some of those are complex enough that Google probably wouldn’t be able to give a single definitive answer, the way it can with a database of census data. But it’s not hard to imagine it could provide a Metacritic-like look at the summary critical opinion of the My Morning Jacket record, or an analysis of customer reviews of Malick’s DVDs at Amazon. It could dip into the growing sea of public data about government activity to tell you what happened at city council (and maybe figure out which parts of the agenda were important, based on news stories, community bloggers, and social media traffic). It could gather up articles from high-trust news and government sources on NASA and algorithmically combine them into just as much info as the searcher wants. It’s a shift in the focus of Google’s judgment; websites shift from competitors to be ranked against each other to data sources to be diced and analyzed to figure out an answer.

These things aren’t right around the corner — they quickly get to be really complicated AI problems. But they all point to the fact that Google is working hard to reduce the number of times searchers need to leave google.com to get answers to their questions. For all the times that Google has said it’s not in the content business, it’s not hard to imagine a future where its mission to “organize the world’s information” goes way beyond spidering and linking and into algorithmically processing for answers instead of PageRank.

That — much more than news organizations’ previous complaints about Google — could put real pressure on the business models of news websites. It challenges ideas of how to navigate the link economy and what ideas like search engine optimization, fair use, and aggregation mean. And it sure looked like Schmidt pointed the way last night.