preprocessor


Good data set for Pre-processing


I am enrolled in an under-graduate course in Data Mining and I've got an assignment to code a Data Mining Pre-processor. I have the liberty to choose the programming language and the data set. I was wondering if anybody could suggest a good data set to use. I have been going through the UCI Repository and I've found many more such resources. But being a beginner I am not sure which data set would be a good choice. The preprocessor should be dealing with the following stuff:
Data cleaning
Missing Values
Errors
Outliers
Nomralization
De-duplication
Data Reduction
Sampling Techniques
Dimensionality Reduction
What kind of properties should I consider when choosing the data set? Any specific data set you would suggest?
You answered your own question. Choose list of data-set with the properties that you have mentioned as UCI repository has categorized dataset. You can chose anyone to start playing with it.
So to start with, if I were you,I would proceed step wise, have a feel how each of those look like and its effect on classifier performance and choose some of the popular dataset as they are used as benchmark dataset in most of the research paper. Much of those that you have listed are separate machine learning problems with lots of researches being conducted.
I would start with something like this :
for missing values : Iris, Voting,Heart disease
for Duplicate:921,810 song dataset(not form UCI I think)
Normalization : Any continuous valued dataset with different range for features
Sampling technique : Pima
Dimensionality reduction : Swiss Roll
Further, another best approach to look for the data set would be to refer some of respective publications. Such as , for dimensionality reduction, you can look into papers of PCA, ISOMAP etc, for sampling see SMOTE paper etc and see what type of data do they use for their experiments and proceed accordingly.

Related Links

Discriminating between (small) numbers and everything else in C preprocessor
Defining preprocessor symbols for CLion analyzer
Check multiple conditions at once using m4 preprocessor
XC8 warning: (107) illegal # directive “foo”
Compiling with ocamlbuild and camlp5
How not to output comments using the -C operation in mcpp
C++ Builder File Version not correct
How do you a preprocess statement for #include
Does the preprocessor pass environment variables?
YAML preprocessor / macro processor
Pre-Processing using m4
Is there a practical reason for “#if defined(X) && (X != 0)”?
How to check for presence of a directory in Inno Setup preprocessor?
What are analogs of “#ifdef”, “#ifndef”, “#else”, “#elif”, “#define”, “#undef” in D programming lnaguage?
Image pre-processing in OCR
which is more important, number of variables or subexpressions?

Categories

HOME
webdriver
depth
owl-carousel
pdfbox
vsm
computer-science
performancepoint
text-mining
plist
ipmitool
selectize.js
screen-readers
gruntfile
soci
foreign-keys
connector
apache2.4
kendo-mobile
libigl
object-storage
perlbrew
froala
expressionengine3
xcode-ui-testing
beautifier
entity-relationship-model
intersystems-cache
realm-mobile-platform
postback
zend-debugger
getpixel
lubridate
magento-1.9.2.4
apic
git-rewrite-history
tex
node-mssql
numberpicker
watchface
android-ibeacon
jboss-4.2.x
ovf
proget
globalize
datamaps
juice-ui
flex-monkey
xcode-server
mongodb-php
rspec2
ng-grid
php-socket
html-to-pdf
pushbots
libvlc
maybe
character-replacement
linklabel
text-search
coda
cgi-bin
sql-server-data-tools
mraid
fits
actionfilterattribute
execve
android-cookiemanager
iiop
punycode
create.js
memset
clickbank
visual-studio-express
jammer
beaker-testing
scringo
asplinkbutton
midlet
database-create
relative
catransform3d
android-holo-everywhere
w3c-geolocation
msr
interprocess
xpsdocument
linfu-dynamicproxy
int64
business-model
lemmatization
pnrp

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App