Skip to content

CS 457 Data Mining

Computers are collecting more and more information about us daily. How can we collect, organize, and distill the deluge of data around to answer important questions? In this class, we will use Excel functions, experiment with the iNZight statistical package built on top of the R programing language, as well as do some R coding to demonstrate big data techniques. Topics include Bayesian Classifiers, K-nearest neighbors classifiers, multiple regression, Network problems using Gephi and R, and forecasting using Holt-Winters smoothing.

Students will develop analytical and conceptual thinking skills for Big Data projects. Students will practice logical, yet creative, approaches to problem-solving. Students will refine their spreadsheet modeling skills. Students will improve their ability to learn new software tools for analysis.

We begin with excel pivot tables, progress to linear and non-linear single and multiple regressions using Excel and R. Exploratory Data Analysis using iNZight, Excel, or R. Classifying tweets based on training with pre-classified tweets. Visualize their own facebook or twitter neighborhood using graphical tools such as Gephi. Develop and present a persuasive argument to peers about an interesting application of Big Data.

Random forest text authorship exercise. Facebook or twitter neighborhood visualization using Gephi and scraping.

Activity Calendar

Featured Event: December 3, 2025

Christmas on the Hill
A festive end-of-year celebration featuring activities, music, and community bonding. This event brings together students, faculty, and staff to share in the holiday spirit before the break.