About this post

ABOUT: This entry was posted January 2, 2007 at 8:49 p.m. It is 590 words long, which, in case you're curious, translates to about 16 inches. There are currently 0 comments on this post. Click here to add your own.

SUMMARY: Using XML and microformats to tag political candidates' speeches is a good idea.

TAGS: XML


Spread the love


Recent posts

Monday, February 2nd, 2009
From rows and columns to libraries of text.

Sunday, January 25th, 2009
Making the case for applying emerging technologies.

Monday, January 19th, 2009
Starting a conversation on text mining in journalism.

Sunday, September 28th, 2008
In which I explain the advantages of using a popular fraud-detection tool in reporting.

Monday, March 17th, 2008
In which I describe how to use MySQL's spatial functions and Python to do point-in-polygon detection.

Tagging the news

Posted Tuesday, January 2nd, 2007 at 8:49 p.m.

Note: This article was originally posted in July 2006.

As I've been working on this newsroom Django job with Brian this summer, I've found myself thinking a lot about Derek's "The Fix" essays -- particularly the parts about harvesting the oodles of information that slip through newsrooms each day. On that front, I've got an idea to chip in: tagging news coverage from a reporting perspective. Take politics, for example:

Part of our job for this Django project is to integrate heavy-duty data collection into the daily newsroom routine. This is a good idea for any number of reasons, not the least of which is that machines have better memories than people do. Think of a reporter working the campaign trail: between the speeches, press conferences, PR spin, daily deadlines and projects, a beat reporter can hardly be expected to remember everything a political candidate has said and done. When a candidate uses scrambled logic, contradicts himself or cites false facts, reporters, by no fault of their own, often don't notice. People simply aren't built to index that much information.

Machines, however, are. The markup I'm envisioning is similar to Adrian's quote, fact and date tags (which haven't received nearly enough attention, by the way). Ideally, a political reporter would mark up his or her stories -- or even better, notes -- to classify all the statements and quotes made by the candidates they cover. A CAR specialist could then write a parser to walk through the data and pluck out patterns reporters wouldn't otherwise see.

A few tags come to mind:

<dq>: How useful would it be to call up all direct quotes Candidate X has given during his campaign? Better still, give them attributes: for something like <dq date="07/04/2006">, reporters could view a candidate's statements over time. Throw in a category attribute -- <dq attribute="agriculture"> -- and a reporter could see how Candidate X's statements about agriculture changed based on significant events, or how statements he made yesterday conflict with statements from six months ago.

<statement>: Same concept as above, but with indirect statements.

<rel>: Relationship tags could be used whenever candidates reference each other, prominent figures, or themselves. Want to see the whole back-and-forth between Candidate X and Candidate Y? A few button-presses, and there you have it.

<geo>: A reporter could geocode all public statements -- like Flickr GeoTags -- possibly to see how the candidate treated issues differently in, say, low-income areas as opposed to high-income ones.

Finally, stealing an idea directly from Adrian, <fact>: By tagging all of a candidate's direct assertions of fact, reporters could easily call up the information upon which they have based their public arguments.

Taken together, these tags allow reporters to serve as a check on a candidate's logic. As campaigning becomes more subtle and sophisticated, anything reporters can do to index a candidate's arguments will allow fewer empty claims and pieces of bogus logic to slip through unchallenged. To Derek's essays, tags would help keep track of information that otherwise would have been shoveled into stories and forgotten. It's the concept of tagging, rather than this specific implementation, that intrigues me.

The obvious drawback is that someone has to spend time doing markup. A set of frontend tools, similar to HTML markup widgets like TinyMCE, might help speed up the process, but the subjective nature of this classification prohibits extensive automation. Will it work for the daily routine? Probably not, or at least not yet. But it could make for one hell of a project.

Post your comment

Optional