The American Psychiatric Association (APA) has updated its Privacy Policy and Terms of Use, including with new information specifically addressed to individuals in the European Economic Area. As described in the Privacy Policy and Terms of Use, this website utilizes cookies, including for the purpose of offering an optimal online experience and services tailored to your preferences.

Please read the entire Privacy Policy and Terms of Use. By closing this message, browsing this website, continuing the navigation, or otherwise continuing to use the APA's websites, you confirm that you understand and accept the terms of the Privacy Policy and Terms of Use, including the utilization of cookies.

×
LettersFull Access

Large Data Sets Are Powerful

Published Online:https://doi.org/10.1176/appi.ps.54.5.746

In Reply: We are in complete agreement with the comments of Drs. Pandiani and Banks and with many of those of Dr. Segal. They have described the other side of the same coin. Theirs is the more commonly viewed side, the one that inspires numerous and increasing efforts to make use of existing large data sets, and the one that is presumably familiar to most readers of the journal. We were asked by the editor of Psychiatric Services to illuminate the dark side precisely because it is less well known.

Our contention is that large data sets "can be" dangerous, not that they are inherently dangerous. Our goal was not to stigmatize research using large data sets but rather to remind the scientific community of the frequently overlooked limitations of large data sets and of the seductive ways that they can lead investigators astray. A parallel editorial in the March 2003 issue of Scientific American suggests that the same concerns are pertinent in other areas of science (1). As the editors of Scientific American point out, the dangers of information overload, poor data quality, and capitalization on chance abound.

Often where there is opportunity there is liability. The use of large data sets presents many opportunities for the advancement of knowledge that is relevant for practice and policy, but it also requires careful attention to data quality and the disciplined application of statistical and inferential methods. The warnings in our editorial address the latter issues, which appear to be less salient to some users of large data sets on the basis of the journal's experience with manuscripts submitted for publication and on our experience as peer reviewers. Besides sharing the optimism of Drs. Pandiani and Banks about the potential of large data sets, we also wish that articles sent for review showed their admirable attention to quality.

Reference

1. Total information overload. Scientific American 288(3):12, 2003Google Scholar