Networks, science, and shoe-leather reporting – The Brown Institute sent six students from Columbia’s Journalism School to this year’s edition of NICAR, an annual conference in computer-assisted reporting. This year’s conference was the biggest to date, with over 1,000 attendees. Read their individual reflectionson anything from mapping and analyzing a network to retrieving that precious data or how conventional reporting enhances data work below.
There is a word we like to use when referring to programming and data analysis in newsrooms: “revolution.” We associate it with a rapid, violent and complete transformation. It’s something new, exciting and we think it is going to change things for the better.
That’s how we often perceive data-journalism or computer-assisted reporting and it’s increasingly how it is perceived by the industry. Around a thousand journalists registered for this year’s NICAR conference. That’s more than ever before.
Yet, the range of what we think of as data stories can be quite narrow. One recurring theme of NICAR 14 was encouraging data journalists think outside the box.
Steve Stirling of the New Jersey Star Ledger and Ian Livingston of the Washington Post gave a talk about using weather data, because you can tell weather stories about other things than your natural disasters or how warm it will be tomorrow. And there is a wealth of weather data available, from the National Weather Service’s Application Programming Interface (API), to your local weather station.
One such story was presented by Jodi Upton and Mary Jo Webster during their talk ‘50 ideas in 50 minutes.’ Twincities.com looked into the fact that the Minnesota Twin’s were having a hard time hitting home runs since moving into their new home. By studying the weather data for the area, they realized the wind flow in the new stadium was disadvantageous to left-handed hitters. It turns out that the Minnesota Twins have a disproportionate amount of left-handed hitters on their team. Sometimes the best stories can be found in places you don’t expect.
Data is a tool. A mighty powerful one, but a tool nonetheless. The limitations of data reporting was a recurring theme at NICAR 14. Strange, perhaps, given that it was a data journalism conference, but also logical.
The ICIJ Offshore Leaks Database – a trove of over 2 million leaked files on offshore operations over the course of 30 years – is a case in point. Presenting the files, independent journalist Mar Cabra, a graduate of the Columbia J-School, emphasized how important regular beat reporting was in understanding the files.
King Juan Carlos I of Spain wouldn’t show up in searches, she said. Most prime ministers or presidents wouldn’t either. A bodyguard or one of their attorneys might, though. In this case, persistent reporting strengthened the data.
MaryJo Webster and Ben Goessling’s story for the Twin Cities on the Minnesota Twins’ left-hitters – discussed above – presented a conclusion that couldn’t have been drawn without the help of local meteorologist Paul Huttner. Here, it was the data that strengthened the reporters’ interaction with the expert.
Since when did reporters become researchers and scientists?
As journalists use more data or become more dependent on data, it can sometimes feel like we’re playing pretend “Dr.” But, while our methods for collecting data are different (we primarily code and scrape what’s already out there, they create data), it’s nice to know that reporters and researchers face the same challenges when it comes to analyzing data and finding stories in data.
On Saturday morning, four fellows from the American Association for the Advancement of Science talked about how they deal with data and how they address the most important data questions: Is the data good? What is good? How do you come up with a standard? How do you know when you’ve got enough?
The answer to the first three was simple: box plots. Box plots summarize the median, max and minimum values and quartiles of data into intervals. It’s a convenient and effective graphical format to use when looking for expected or normal behavior across numerous sets of data.
Carolyn Lauzon, who has a Ph.D. in biophysics, calls box plots “old reliable.” “Instead of always chasing after a gold standard, look at the community of a data set,” she said.
The answer to the last question – knowing when you have enough data – is trickier.
Beth Duckles, a social scientist at Bucknell University, deals mainly with qualitative data. She knows she’s done gathering data when she hears the same thing three or more times and when she can guess what someone will say. At the same, “you’re never done,” she said. “Data’s always changing.”
Duckles added that we should analyze data as we go. Identify normal behaviors. Question areas where there are differences. “Think like an archaeologist,” she said. And most importantly, Duckles said, embrace the mess. “If it’s not messy, you’re doing it wrong.”
Good to know that even the guys and gals in white coats with Ph.D.’s are in the same boat as us.
– @LisaHopeKing
While data applies to almost every aspect of our lives, obtaining databases often maintains an air of mystery. Journalists often don’t know how to ask for data, and agencies might not know how to package data. At NICAR 14, presenter Joanna Lin, of the Center for Investigative Reporting, said that to get around this, journalists should call the agency and talk to the person or department in charge of compiling the data. Because often, she said, they aren’t sure what you want out of the data – when really, you want the database itself (which is a record). Or, they’ll think to send a PDF, when they themselves store the data electronically, and are legally required to send it in the format they store it.
It also helps to know your rights when requesting data. Agencies may tell you some information is private under the law, and journalists may leave it at that, or you could argue your case. Reporters Committee for Freedom of the Press has a great rundown of your Freedom of Information Act (FOIA) rights.
Finally, Adam Goldstein, of the Student Press Law Center, said sometimes the best way to get a response from an agency is to make not filing your FOIA request more of a hassle than filing it. In other words, pester the agency until they would rather find your info than have to deal with you anymore. If they say they don’t have the info, the new story is “Agency doesn’t keep info on such and such.”
When it comes to education data, we are obsessed with standardized test scores. They represent the perfect test case for everything that is right and wrong with a data-driven society.
Measuring what students are learning is a wonderful thing. But standardized test scores don’t exactly do that.They are sets of questions designed by psychometricians that between 40 and 60 percent of all people should be able to answer correctly. This leads teachers to teach to the bottom third of the class so that all students have the knowledge of between 40 and 60 percent of the people.
The test designers admit that when people study for the test, the results become invalid. And yet children are studying for the tests in almost every school in America.
The test manifests the reality, rather than reality being reflected in the test scores. When these test scores drive education policy, as they have in Bloomberg’s New York, reality gets warped.
But fortunately, NICAR’s education session was all about going beyond test scores. School systems maintain a wealth of publicly available data that can actually tell us about the world we live in. This seminar covered everything from text books to school lunch schedules.
Meredith Broussard started stackedup.org as a project that would be dedicated to text books in Philadelphia. By comparing the money budgeted for books to the actual price of the books that were needed across the district, Broussard showed that there isn’t enough money to buy all the books that are necessary for for children to learn the mandated curriculum.
Parents can also look up whether there is a deficit or surplus of books at their child’s local school.
Coulter Jones of WNYC argued that schools should be covered more like communities than grade factories. This allows us to get beyond the current political war in education, he said.
He advocated pulling everything from building inspections to the crime data of surrounding areas to area business licenses to begin to get an idea of school environments.
Education data, particularly test scores, provide a striking example of the damage we can do with data sets when we use them irresponsibly or don’t really understand what they mean. It’s refreshing to see people in the NICAR community use education data in a way that isn’t just used to drive the current policy debate.
In real life, everybody is said to be related by six degrees of separation, meaning there are six variable people relating every person in the world. Online, there a three degrees of separation between us all. In any journalism story, there are varying degrees of separation between main and a secondary players. And connecting these players are links of power, money, and information.
A general theme of NICAR was network analysis: how we can use it in journalism and why we need it. Orgnet, a company specialized in network analysis, came to present their software called ‘inflow’. They promoted their product, but also made the case that journalists should use social network analysis in general.
The links between individuals have direction and travel at different lengths and speed. Understanding those links and the importance of the people connected to them can be the difference between a clear investigation and one filled with endless murky connections, floating seemingly without point or end goal. “Only dumb bad guys do direct quid pro quo,” said Orgnet CEO Valdis Krebs as he tried to drive the point of network analysis home.
He explained that networks help us determine where there is indirect influence. This year, I learned that activities don’t have to be illegal in order to subvert the intentions of our democratic institutions, and therefore, be worthy of investigation. Network analysis can help us understand when those seemingly innocuous stories are actually composed of more damaging material.
– @aglorios