CS 457 Data Mining
Computers are collecting more and more information about us daily. How can we collect, organize, and distill the deluge of data around to answer important questions? In this class, we will use Excel functions, experiment with the iNZight statistical package built on top of the R programing language, as well as do some R coding to demonstrate big data techniques. Topics include Bayesian Classifiers, K-nearest neighbors classifiers, multiple regression, Network problems using Gephi and R, and forecasting using Holt-Winters smoothing.
Students will develop analytical and conceptual thinking skills for Big Data projects. Students will practice logical, yet creative, approaches to problem-solving. Students will refine their spreadsheet modeling skills. Students will improve their ability to learn new software tools for analysis.
We begin with excel pivot tables, progress to linear and non-linear single and multiple regressions using Excel and R. Exploratory Data Analysis using iNZight, Excel, or R. Classifying tweets based on training with pre-classified tweets. Visualize their own facebook or twitter neighborhood using graphical tools such as Gephi. Develop and present a persuasive argument to peers about an interesting application of Big Data.
Random forest text authorship exercise. Facebook or twitter neighborhood visualization using Gephi and scraping.
- Prerequisites: MATH 143 Quantitative Methods, CS212 Computer Programming for CS

