Data Science for Business (Semester B, 2014-14)


Instructor: Kaiquan Xu

Office: Room 1719, Anzhong Building, Gulou Campus

Email: xukaiquan [at] nju. edu .cn

Lectures: 2:00-4:00PM, Friday& Yi B407, Xianlin Campus

Office Hours: By Appointment

Other: Please see Below



Please download the dataset, Survey Dataset, and use what you have learn to analyze this dataset. Please write a report based on your analysis(including the codes). please send to By 25th April. Thanks
Data Description: survey of visitors to an amusement park. This data set comprises a few objective measures: whether the respondent visited on a weekend (which will be the variable weekend in the data frame), the number of children brought (num.child), and distance traveled to the park (distance). There are also subjective measures of satisfaction: expressed satisfaction overall (overall) and satisfaction with the rides, games, waiting time, and cleanliness (rides, games, wait, and clean, respectively)

Please Install R and R studio on your laptop!

Please Install Python 3.X (6 or 7) and PyCharm on your laptop!


In the era of big data, more and more organizations utilize data to unlock new business values: making better-informed decisions, discovering hidden insights and automating business processes. This leads to an emerging field, called data science. This course will cover the basic concepts of data science, basic algorithms and software tools for analyzing data (preprocessing data, representing data, model developing, and result evaluating). The course mainly includes three topics:
1) Data preproces & exploratory data analysis: preprocess tools/programming, data visualization etc.
2) Model: statistical modeling and machine learning
3) Applications: real world topics and case studies.

1) Basic Statistics and Linear Algebra(undergraduate level)
2) Basic experience in programming and database



What is Data Science?

To gain insights into data through computation, statistics, and visualization


A Data Scientist Is ...

“A data scientist is someone who knows more statistics than a computer scientist and more computer science than a statistician.”

- Josh Blumenstock

“Data Scientist = statistician + programmer + coach + storyteller + artist”

- Shlomo Aragmon

“The sexy job in the next 10 years will be statisticians(data scientists)”

-Hal Varian, chief economist at Google.

By 2018, the United States alone could face a shortage of up to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.

- McKinsey Global Institute