Introduction to Data Science in R
April 2 - 3, Bangalore
Status: Accepting votes
Attending this event?
Wait no more! This workshop has been designed to help aspirants to pick up skills and tools required to succeed. After completing this course, you will have a better understanding of what the field is all about, hands on experience with some real life datasets, the ability to solve data science problems end to end and most importantly you get to start doing “the sexiest job of 21st century (HBR)”.
Intro - “I think, therefore I am”
- What is data science?
- What type of questions can be answered?
- Frame/Acquire/Refine/Explore/Model/Insight framework
Frame - “The framing of a problem is often far more essential than its solution”
- How to frame a data science problem?
- Learn the hypothesis-driven approach?
- How do you start - question driven, dataset driven or both?
Acquire - “Data is the new oil”
- Sources of Data - Download from an internal system, Obtained from client, or other 3rd party, Extracted from a web-based API, Scraped from a website / pdfs, or Gathered manually and recorded
- Acquire data from a csv file or a database
- Acquire data from a 3rd part client (e.g. twitter)
Explore - “I don’t know, what I don’t know”
- Why do visual exploration?
- Understand Data Structure & Types
- Grammar of Graphics and Basics of visualisation
- Explore single variable graphs - (Quantitative, Categorical)
- Explore dual variable graphs - (Q & Q, Q & C, C & C)
- Explore multi-dimensional variable graphs
Refine - “Data is messy”
- Concept of Tidy Data - Why is it important?
- Missing e.g. Check for missing or incomplete data
- Quality e.g. Check for duplicates, accuracy, unusual data
- Parse e.g. extract year from date
- Merge e.g. first and surname for full name
- Convert e.g. free text to coded value
- Derive e.g. gender from title
- Calculate e.g. percentages, proportion
- Remove e.g. remove redundant data
- Aggregate e.g. rollup by year, cluster by area
- Filter e.g. exclude based on location
- Sample e.g. extract a representative data
- Summary e.g. show summary stats like mean
- Basic statistics: variance, standard deviation, co-variance, correlation
Model - “All models are wrong, Some of them are useful”
- Introduction to Machine Learning
- The power and limits of models
- Tradeoff between Prediction Accuracy and Model Interpretability
- Assessing Model Accuracy
- For Regression problems
- For classification problems
- Precision, Recall, AUC/ROC, F-Score, Mis-classification rate
- For Regression problems
- Bias-Variance tradeoff
- Linear Regression
- Logistic Regression
- L1, L2 Linear & Logistic Regression
- Classification model
- Decision Trees
- Visualizing decision trees
Insight - “The goal is to turn data into insight”
- Why do we need to communicate insight?
- Types of communication - Exploration vs. Explanation
- Explanation: Telling a story with data
- Exploration: Building an interface for people to find stories
Amit Kapoor is interested in learning and teaching the craft of telling visual stories with data. He uses storytelling and data visualization as tools for improving communication, persuasion and leadership. He conducts workshops and trainings for corporates, non-profits, colleges, and individuals at narrativeVIZ Consulting. He also teaches storytelling with data as guest faculty in executive courses at IIM Bangalore and IIM Ahmedabad.
His background is in strategy consulting in using data-driven stories to drive change across organizations and businesses. He has more than 12 years of management consulting experience, first with AT Kearney in India, then with Booz & Company in Europe and more recently for startups in Bangalore. He did his B.Tech in Mechanical Engineering from IIT, Delhi and PGDM (MBA) from IIM, Ahmedabad.
Nischal HP - Co-Founder,Unnati Data Labs
Nischal is a Data Engineer who enables the Data Scientists to work at peace. He makes sure that they get the data they need and in the way they need it. Previously he has built, from scratch, various systems for E-commerce like catalog management, recommendation engines and other systems that amass a lot of data, during his tenure at Redmart.
At SAP Labs, Nischal has built various data crawlers, intention mining systems and laid down initial work on an end to end Text Mining/Analysis Pipeline. The majority of his work, however, was centered around building a system that gamified technical indicators in a product for the Fintech domain.
He has conducted workshops in the field of Deep learning across the world. He is a strong believer of open source and loves to architect big, fast and reliable systems. In his free time, he enjoys psychedelic trance and travelling to remote places.
Raghotham S - Co-Founder,Unnati Data Labs
Raghotham is a Data Scientist who can work across the complete stack. Previously, at Touchpoints Inc., He single handedly built a data analytics platform for a fitness wearable company. With Redmart, he worked on the CRM system and has built a sentinment analyzer for Redmart’s Social Media. Prior to Redmart and Touchpoints, Raghotham worked at SAP Labs where he was a core part of what is currently SAP’s framework for building web and mobile products. He was a part of multiple SAP wide events helping to spread the knowledge both internally and to customers.
Having found deep love for data science, neural networks and the passion for teaching, Raghotham has conducted workshops across the world. Apart from getting his hands dirty with data, he loves travelling, Pink Floyd and masala dosas.