Dataset is a collection of data mostly stored in a data matrix or in a database format. Every Data Scientist needs an appropriate dataset for creating a machine learning project. As a machine learning enthusiast myself, I believe that data is the soul of a machine learning project, so it is important to choose the perfect dataset for its correct usage. Here are some of best websites and some of my personal favorites; I often use to download datasets.
Top 10 DataSet Portals in 2018
My personal favorite and one of the best maintained website with enormous amount of data available. Along with a data provider, this website is famous for many online data science and machine learning competitions and a cloud based workbench for data scientists and researchers. Kaggle has arisen to become a huge agglomeration working in data science under various subtopics. It is also the largest online community of data scientist in the world. With easy to use search options and choices available it definitely bubbles up to be the best in the league.
Plethora of movie related datasets can be found related to the movie information giant IMDB (Internet Movie Data Base). We get the best movie data both in quantity as well as quality. It is best rated for movie recommendation system projects for example: movie recommendation system based on the previously given reviews of a user and many others.
This is the Courtesy of University of California, Irvine created in 1987. It provides corrected data for projects in machine related fields. It is a huge collection of databases, domain theories and data generators. With its wide popularity among students, professionals, professors and researchers all over the world, the repositories have been cited over 1000 times.
The Government of US provides free access to many of its online catalogs and datasets for research and development purposes. This is one of the best sources providing huge amount of data at one place. With over 18k “.csv” datasets and many other databases, the site has gained huge popularity. It is useful for data scientists as well as researchers in the field of machine learning.
One of the coolest and interactive websites ever is the Government of India maintained dataset provider platform. The data present is useful for data analysts and deep learning, and machine learning problems. It is known for its authentic and high ranking data available on the internet. Most of the data is related to the census and statistics of the Indian subcontinent.
Yelp is one of the biggest reviewing sites in the US. It contains over 6 million reviews for hundreds of thousands of restaurants and businesses. It can be used in Sentiment analysis and Mining technologies along with Recommender Systems. It contains a cellar of reviews, users, tips, check-in data as well as business data to help get insights from data.
Reddit open data provides comments created data to everyone. It provides access to various open datasets in reddit. reddit.com/r/datasets/ provides common datasets for various topics like visualization and machine learning.
Created by Youtube, this is the best place to get a video dataset. It consists over 8 million video IDs and labels. As we know Youtube is the best source for providing video based entertainment, here you get abundance of video datasets. These data sets are best for creating Video related projects in machine learning as well as deep learning.
KEEL dataset is an open source data set repository from where we can download any of the listed dataset. The KEEL data set is used by many machine learning researchers working under the topics like Semi-supervised classification, unsupervised learning, regression and time-series.
The European Union Open data website is perfect for downloading datasets related to countries in the EU. With around 13k datasets in the repository, this is the one of the least known but also one of the best available destinations for data scientists.
A Machine Learning project cannot be made without using good quality data. These websites provides us free data to be downloaded for any personal or professional project and also for research. There are many options available to choose from on the web; here I’ve listed some of the best options present to create a machine learning project.