Little-known data-collection system could troll news, blogs, even e-mails.
Will it go too far?
The US government is developing a massive computer system that can
collect huge amounts of data and, by linking far-flung information from blogs
and e-mail to government records and intelligence reports, search for patterns
of terrorist activity.
The system - parts of which are operational, parts of which are still
under development - is already credited with helping to foil some plots. It
is the federal government's latest attempt to use broad data-collection and
powerful analysis in the fight against terrorism. But by delving deeply into
the digital minutiae of American life, the program is also raising concerns
that the government is intruding too deeply into citizens' privacy.
CONCERN: GOP Rep. Curt Weldon (l.) and Democrat Sen. Russell Feingold want details on federal data-mining.
"We don't realize that, as we live our lives and make little choices, like
buying groceries, buying on Amazon, Googling, we're leaving traces everywhere,"
says Lee Tien, a staff attorney with the Electronic Frontier Foundation. "We
have an attitude that no one will connect all those dots. But these programs are
about connecting those dots - analyzing and aggregating them - in a way that we
haven't thought about. It's one of the underlying fundamental issues we have yet
to come to grips with."
The core of this effort is a little-known system called Analysis, Dissemination,
Visualization, Insight, and Semantic Enhancement (ADVISE). Only a few public
documents mention it. ADVISE is a research and development program within the
Department of Homeland Security (DHS), part of its three-year-old "Threat
and Vulnerability, Testing and Assessment" portfolio. The TVTA received
nearly $50 million in federal funding this year.
DHS officials are circumspect when talking about ADVISE. "I've heard of
it," says Peter Sand, director of privacy technology. "I don't know
the actual status right now. But if it's a system that's been discussed, then
it's something we're involved in at some level."
Data-mining is a key technology
A major part of ADVISE involves data-mining - or "dataveillance,"
as some call it. It means sifting through data to look for patterns. If a supermarket
finds that customers who buy cider also tend to buy fresh-baked bread, it might
group the two together. To prevent fraud, credit-card issuers use data-mining
to look for patterns of suspicious activity.
What sets ADVISE apart is its scope. It would collect a vast array of corporate
and public online information - from financial records to CNN news stories -
and cross-reference it against US intelligence and law-enforcement records.
The system would then store it as "entities" - linked data about people,
places, things, organizations, and events, according to a report summarizing
a 2004 DHS conference in Alexandria, Va. The storage requirements alone are
huge - enough to retain information about 1 quadrillion entities, the report
estimated. If each entity were a penny, they would collectively form a cube
a half-mile high - roughly double the height of the Empire State Building.
But ADVISE and related DHS technologies aim to do much more, according to Joseph
Kielman, manager of the TVTA portfolio. The key is not merely to identify terrorists,
or sift for key words, but to identify critical patterns in data that illumine
their motives and intentions, he wrote in a presentation at a November conference
in Richland, Wash.
For example: Is a burst of Internet traffic between a few people the plotting
of terrorists, or just bloggers arguing? ADVISE algorithms would try to determine
that before flagging the data pattern for a human analyst's review.
At least a few pieces of ADVISE are already operational. Consider Starlight,
which along with other "visualization" software tools can give human
analysts a graphical view of data. Viewing data in this way could reveal patterns
not obvious in text or number form. Understanding the relationships among people,
organizations, places, and things - using social-behavior analysis and other
techniques - is essential to going beyond mere data-mining to comprehensive
"knowledge discovery in databases," Dr. Kielman wrote in his November
report. He declined to be interviewed for this article.
One data program has foiled terrorists
Starlight has already helped foil some terror plots, says Jim Thomas, one of
its developers and director of the government's new National Visualization Analytics
Center in Richland, Wash. He can't elaborate because the cases are classified,
he adds. But "there's no question that the technology we've invented here
at the lab has been used to protect our freedoms - and that's pretty cool."
As envisioned, ADVISE and its analytical tools would be used by other agencies
to look for terrorists. "All federal, state, local and private-sector security
entities will be able to share and collaborate in real time with distributed
data warehouses that will provide full support for analysis and action"
for the ADVISE system, says the 2004 workshop report.
A program in the shadows
Yet the scope of ADVISE - its stage of development, cost, and most other details
- is so obscure that critics say it poses a major privacy challenge.
"We just don't know enough about this technology, how it works, or what
it is used for," says Marcia Hofmann of the Electronic Privacy Information
Center in Washington. "It matters to a lot of people that these programs
and software exist. We don't really know to what extent the government is mining
Even congressmen with direct oversight of DHS, who favor data mining, say they
don't know enough about the program.
"I am not fully briefed on ADVISE," wrote Rep. Curt Weldon (R) of
Pennsylvania, vice chairman of the House Homeland Security Committee, in an
e-mail. "I'll get briefed this week."
Privacy concerns have torpedoed federal data-mining efforts in the past. In
2002, news reports revealed that the Defense Department was working on Total
Information Awareness, a project aimed at collecting and sifting vast amounts
of personal and government data for clues to terrorism. An uproar caused Congress
to cancel the TIA program a year later.
Echoes of a past controversial plan
ADVISE "looks very much like TIA," Mr. Tien of the Electronic Frontier
Foundation writes in an e-mail. "There's the same emphasis on broad collection
and pattern analysis."
But Mr. Sand, the DHS official, emphasizes that privacy protection would be
built-in. "Before a system leaves the department there's been a privacy
review.... That's our focus."
Some computer scientists support the concepts behind ADVISE.
"This sort of technology does protect against a real threat," says
Jeffrey Ullman, professor emeritus of computer science at Stanford University.
"If a computer suspects me of being a terrorist, but just says maybe an
analyst should look at it ... well, that's no big deal. This is the type of
thing we need to be willing to do, to give up a certain amount of privacy."
Others are less sure.
"It isn't a bad idea, but you have to do it in a way that demonstrates
its utility - and with provable privacy protection," says Latanya Sweeney,
founder of the Data Privacy Laboratory at Carnegie Mellon University. But since
speaking on privacy at the 2004 DHS workshop, she now doubts the department
is building privacy into ADVISE. "At this point, ADVISE has no funding
for privacy technology."
She cites a recent request for proposal by the Office of Naval Research on
behalf of DHS. Although it doesn't mention ADVISE by name, the proposal outlines
data-technology research that meshes closely with technology cited in ADVISE
Neither the proposal - nor any other she has seen - provides any funding for
provable privacy technology, she adds
Some antiterror efforts die - others just change names
November 2002 - The New York Times identifies a counterterrorism
program called Total Information Awareness.
September 2003 - After terminating TIA on privacy grounds,
Congress shuts down its successor, Terrorism Information Awareness, for
the same reasons.
Department of Homeland Security
February 2003 - The department's Transportation Security
Administration (TSA) announces it's replacing its 1990s-era Computer-Assisted
Passenger Prescreening System (CAPPS I).
July 2004 - TSA cancels CAPPS II because of privacy concerns.
August 2004 - TSA says it will begin testing a similar
system - Secure Flight - with built-in privacy features.
July 2005 - Government auditors charge that Secure Flight
is violating privacy laws by holding information on 43,000 people not suspected
Some in Congress push for more oversight of federal data-mining
Amid the furor over electronic eavesdropping by the National Security Agency,
Congress may be poised to expand its scrutiny of government efforts to "mine"
public data for hints of terrorist activity.
"One element of the NSA's domestic spying program that has gotten too
little attention is the government's reportedly widespread use of data-mining
technology to analyze the communications of ordinary Americans," said
Sen. Russell Feingold (D) of Wisconsin in a Jan. 23 statement.
Senator Feingold is among a handful of congressmen who have in the past sponsored
legislation - unsuccessfully - to require federal agencies to report on data-mining
programs and how they maintain privacy.
Without oversight and accountability, critics say, even well-intentioned
counterterrorism programs could experience mission creep, having their purview
expanded to include non- terrorists - or even political opponents or groups.
"The development of this type of data-mining technology has serious implications
for the future of personal privacy," says Steven Aftergood of the Federation
of American Scientists.
Even congressional supporters of the effort want more information about data-mining
"There has to be more and better congressional oversight," says
Rep. Curt Weldon (R) of Pennsylvania and vice chairman of the House committee
overseeing the Department of Homeland Security. "But there can't be oversight
till Congress understands what data-mining is. There needs to be a broad look
at this because they [intelligence agencies] are obviously seeing the value
Data-mining - the systematic, often automated gleaning of insights from databases
- is seen "increasingly as a useful tool" to help detect terrorist
threats, the General Accountability Office reported in 2004. Of the nearly
200 federal data-mining efforts the GAO counted, at least 14 were acknowledged
to focus on counterterrorism.
While privacy laws do place some restriction on government use of private
data - such as medical records - they don't prevent intelligence agencies
from buying information from commercial data collectors. Congress has done
little so far to regulate the practice or even require basic notification
from agencies, privacy experts say.
Indeed, even data that look anonymous aren't necessarily so. For example:
With name and Social Security number stripped from their files, 87 percent
of Americans can be identified simply by knowing their date of birth, gender,
and five-digit Zip code, according to research by Latanya Sweeney, a data-privacy
researcher at Carnegie Mellon University.
In a separate 2004 report to Congress, the GAO cited eight issues that need
to be addressed to provide adequate privacy barriers amid federal data-mining.
Top among them was establishing oversight boards for such programs.