Data sharing provides many potential benefits, although the amount of actual data reused is unknown. Here we track the reuse of data from three data repositories (NCBI's Gene Expression Omnibus, PANGAEA, and TreeBASE) by searching for dataset accession number or unique identifier in Google Scholar and using ISI Web of Science to find articles that cited the data collection article. We found that data reuse and data attribution patterns vary across repositories. Data reuse appears to correlate with the number of citations to the data collection article. This preliminary investigation has demonstrated the feasibility of this method for tracking data reuse.
Piwowar HA, Carlson JD, Vision TJ. 2011. Beginning to track 1000 datasets from public repositories into the published literature. Proceedings of the American Society for Information Science and Technology 48(1): 1-4.