Creative Smart Things

  • Creative Smart Blog
  • STEAM Projects

python

Data Science

Data Science 101: Don’t be Duped By Duplicate Data

Having followed the instructions in the previous article in this series we have downloaded Project Gutenberg’s entire collection of English texts and now have a local repository of   approximately 80,000 text files. But something doesn’t smell right here. How can we have this many files when Project Gutenberg only claims to contain Read more…

By laurence.molloy, 1 year1 year ago
Recent Posts
  • Data Science 101: Don’t be Duped By Duplicate Data
  • Data Science 101: Creating a repository of English language texts
Recent Comments
  • JamesDic on Data Science 101: Don’t be Duped By Duplicate Data
Archives
  • March 2020
  • February 2020
Categories
  • Data Science
  • Tutorial
Meta
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
  • Creative Smart Blog
  • STEAM Projects