Data analysis - Aggregate functions

Preparation:

Data Literacy Fundamentals - Kapitel 5

Aggregationarrow-up-right

MySQL Maxarrow-up-right

MySQL Averagearrow-up-right

MySQL Countarrow-up-right

Peer instruction

SELECT employee_name, job, salary
FROM employees
WHERE job = 'CLERK' OR job = 'SALESMAN'
  AND salary > 1300;

A) All clerks and salesmen with salary above 1300.

B) All clerks (any salary) plus salesmen with salary above 1300.

C) Only salesmen with salary above 1300.

D) Invalid query.

SELECT employee_name, department_name
FROM employees e
JOIN departments d

A) A list of employees with their department names.

B) A list of departments and some of their employees

C) Nothing (nulls)

D) All combinations of employees and departments

A) Employees who report to someone with the job title MANAGER.

B) All managers and their employees.

C) All employees who are managers.

D) Invalid query.

Exercise 1: Individual - Write the query

  • Q1: what is the average speed of all pokemon?

    • A: 68.9338

  • Q2: How many pokemon are in the dataset?

    • A: 151

  • Q3: What is the maximum primary type?

    • A: Water

  • Q4: What is the minimum primary type?

    • A: Bug

  • Q5: Why is water the maximum and bug the minimum? Make an informed guess

    • A: ?

  • Q6: What is the max speed of all pokemon?

    • A: 140

  • Q7: What is the average speed of all pokemon with 'Ground' as primary type?

    • A: 58.1250

  • Q8: Display the average speed of each primary type of pokémon

Exercise 2: In pairs - Write the query

Q1: How many employees are in each department?

A:

  • ACCOUNTING 3

  • RESEARCH 5

  • SALES 6

Q2: What is the average salary in each department?

A:

  • ACCOUNTING > 2916.67

  • RESEARCH > 2175.00

  • SALES > 1566.67

Q3: Which department has the highest average salary?

A: ACCOUNTING (2916.67)

Q4: For each manager, show their name and the total salary of their team.

A:

  • KING > 8275

  • BLAKE > 6550

  • JONES > 6000

  • CLARK > 1300

  • SCOTT > 1100

  • FORD > 800

Q5: List all departments along with the maximum salary in each one.

A:

  • ACCOUNTING > 5000

  • RESEARCH > 3000

  • SALES > 2850

  • OPERATIONS > NULL

Project: Taxi Analysis

Datasetarrow-up-right

Descriptionarrow-up-right

  • Example from class: What is the busiest time of the day?

    Remember: Hours can be interpreted as a nominal scale - as a category with the categories 0-24

    To extract the hour from a DATETIME data type use:

  • Which payment type has the highest average fare amount ?

    • A: type 1 > 13.96

  • At what time of day (by the hour) are the longest trips on average (by distance)?

    • Remember: Hours can be interpreted as a nominal scale - as a category with the categories 0-24

    • A: pickup_hour 5 > 4.723

  • What are the average tip amount for each passenger count?

    • Interpret: Does more passengers mean more tips?

  • What is the distribution of amount of trips by passenger count?

    • A: 4 > 46, 6 > 91, 3 > 113

  • Which pickup location generates the most trips?

    • You should get a weird result.

    • Interpret: What could this answer signify?

    • How can we make the analysis better?

(Advanced - optional)

  • Do longer trip time mean more passengers?

  • Which dropoff location (DOLocationID) has the highest average tips?

  • Which hour of the day generates the highest total revenue (sum of total_amount)

How to import .csv to MySQL

Download the dataset: Datasetarrow-up-right

Create Database for the data

image-20230906154346980

Right click > Table Data Import Wizard

image-20230906154416037

Find File

image-20230906154443900

Rename table (otherwise the name is very long)

image-20230906154503723

Change tpep_pickup_datetime & tpep_dropoff_datetime to datetime datatype > next > next

image-20230908093104096
image-20230906154650078

???

image-20230906154705577

Profit

Last updated