Creative Smart Things

  • Creative Smart Blog
  • STEAM Projects

Creative Smart Blog

Data Science

Data Science 101: Don’t be Duped By Duplicate Data

Having followed the instructions in the previous article in this series we have downloaded Project Gutenberg’s entire collection of English texts and now have a local repository of   approximately 80,000 text files. But something doesn’t smell right here. How can we have Read more…

By laurence.molloy, 11 months11 months ago
Data Science

Data Science 101: Creating a repository of English language texts

This series of blog articles is a diary of my thoughts and experiences as I attempt to create a data science ready repository of English written texts and use it to answer some interesting questions about language usage. In this Read more…

By laurence.molloy, 11 months11 months ago
Recent Posts
  • Data Science 101: Don’t be Duped By Duplicate Data
  • Data Science 101: Creating a repository of English language texts
Recent Comments
  • JamesDic on Data Science 101: Don’t be Duped By Duplicate Data
Archives
  • March 2020
  • February 2020
Categories
  • Data Science
  • Tutorial
Meta
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
  • Creative Smart Blog
  • STEAM Projects