SQL Project

 Sure! Let's create an end-to-end SQL project for a data analyst. We'll work on building a sales analysis dashboard for a retail company. The project will include data cleaning, transformation, analysis, and reporting using SQL. 


### Project Overview

You will create a database for a retail company, perform ETL (Extract, Transform, Load) operations, and generate reports to analyze sales performance.


### Project Steps


#### Step 1: Setting Up the Environment


1. **Install SQL Database:**

   - Install a SQL database system like MySQL, PostgreSQL, or SQL Server on your local machine or use a cloud-based solution like Amazon RDS.


2. **Create a Database:**

   ```sql

   CREATE DATABASE retail_db;

   USE retail_db;

   ```


#### Step 2: Creating Tables and Importing Data


1. **Create Tables:**

   - Create tables for customers, products, sales, and stores.

   ```sql

   CREATE TABLE customers (

       customer_id INT PRIMARY KEY,

       first_name VARCHAR(50),

       last_name VARCHAR(50),

       email VARCHAR(100),

       phone VARCHAR(15),

       address VARCHAR(200),

       city VARCHAR(50),

       state VARCHAR(50),

       zip_code VARCHAR(10)

   );


   CREATE TABLE products (

       product_id INT PRIMARY KEY,

       product_name VARCHAR(100),

       category VARCHAR(50),

       price DECIMAL(10, 2)

   );


   CREATE TABLE stores (

       store_id INT PRIMARY KEY,

       store_name VARCHAR(100),

       city VARCHAR(50),

       state VARCHAR(50)

   );


   CREATE TABLE sales (

       sale_id INT PRIMARY KEY,

       sale_date DATE,

       customer_id INT,

       product_id INT,

       store_id INT,

       quantity INT,

       total_amount DECIMAL(10, 2),

       FOREIGN KEY (customer_id) REFERENCES customers(customer_id),

       FOREIGN KEY (product_id) REFERENCES products(product_id),

       FOREIGN KEY (store_id) REFERENCES stores(store_id)

   );

   ```


2. **Import Data:**

   - Import CSV files into the tables using a tool like MySQL Workbench, pgAdmin, or using SQL commands.


   ```sql

   LOAD DATA INFILE 'path/to/customers.csv'

   INTO TABLE customers

   FIELDS TERMINATED BY ','

   ENCLOSED BY '"'

   LINES TERMINATED BY '\n'

   IGNORE 1 ROWS;


   LOAD DATA INFILE 'path/to/products.csv'

   INTO TABLE products

   FIELDS TERMINATED BY ','

   ENCLOSED BY '"'

   LINES TERMINATED BY '\n'

   IGNORE 1 ROWS;


   LOAD DATA INFILE 'path/to/stores.csv'

   INTO TABLE stores

   FIELDS TERMINATED BY ','

   ENCLOSED BY '"'

   LINES TERMINATED BY '\n'

   IGNORE 1 ROWS;


   LOAD DATA INFILE 'path/to/sales.csv'

   INTO TABLE sales

   FIELDS TERMINATED BY ','

   ENCLOSED BY '"'

   LINES TERMINATED BY '\n'

   IGNORE 1 ROWS;

   ```


#### Step 3: Data Cleaning and Transformation


1. **Data Cleaning:**

   - Check for and remove duplicates.

   ```sql

   DELETE FROM customers

   WHERE customer_id IN (SELECT customer_id FROM (

       SELECT customer_id, ROW_NUMBER() OVER (PARTITION BY email ORDER BY customer_id) AS row_num

       FROM customers

   ) t WHERE t.row_num > 1);

   ```


   - Handle missing values (NULLs).

   ```sql

   UPDATE sales SET quantity = 1 WHERE quantity IS NULL;

   ```


2. **Data Transformation:**

   - Calculate total sales for each product.

   ```sql

   UPDATE sales

   SET total_amount = quantity * (

       SELECT price FROM products WHERE products.product_id = sales.product_id

   );

   ```


#### Step 4: Data Analysis and Reporting


1. **Sales by Product:**

   - Query to find total sales and quantity sold for each product.

   ```sql

   SELECT p.product_name, SUM(s.total_amount) AS total_sales, SUM(s.quantity) AS total_quantity

   FROM sales s

   JOIN products p ON s.product_id = p.product_id

   GROUP BY p.product_name

   ORDER BY total_sales DESC;

   ```


2. **Sales by Store:**

   - Query to find total sales for each store.

   ```sql

   SELECT st.store_name, SUM(s.total_amount) AS total_sales

   FROM sales s

   JOIN stores st ON s.store_id = st.store_id

   GROUP BY st.store_name

   ORDER BY total_sales DESC;

   ```


3. **Monthly Sales Trend:**

   - Query to find monthly sales trend.

   ```sql

   SELECT DATE_FORMAT(s.sale_date, '%Y-%m') AS month, SUM(s.total_amount) AS total_sales

   FROM sales s

   GROUP BY month

   ORDER BY month;

   ```


4. **Top Customers:**

   - Query to find top 10 customers by total spending.

   ```sql

   SELECT c.first_name, c.last_name, SUM(s.total_amount) AS total_spent

   FROM sales s

   JOIN customers c ON s.customer_id = c.customer_id

   GROUP BY c.customer_id

   ORDER BY total_spent DESC

   LIMIT 10;

   ```


#### Step 5: Creating Views for Reporting


1. **Create Views:**

   - Create SQL views to simplify reporting.

   ```sql

   CREATE VIEW view_sales_by_product AS

   SELECT p.product_name, SUM(s.total_amount) AS total_sales, SUM(s.quantity) AS total_quantity

   FROM sales s

   JOIN products p ON s.product_id = p.product_id

   GROUP BY p.product_name;


   CREATE VIEW view_sales_by_store AS

   SELECT st.store_name, SUM(s.total_amount) AS total_sales

   FROM sales s

   JOIN stores st ON s.store_id = st.store_id

   GROUP BY st.store_name;

   ```


2. **Use Views in Reports:**

   - Query the views to generate reports.

   ```sql

   SELECT * FROM view_sales_by_product ORDER BY total_sales DESC;

   SELECT * FROM view_sales_by_store ORDER BY total_sales DESC;

   ```


#### Step 6: Advanced Analysis


1. **Sales Forecasting:**

   - Use SQL to prepare data for forecasting models.

   ```sql

   SELECT DATE_FORMAT(sale_date, '%Y-%m') AS month, SUM(total_amount) AS total_sales

   INTO temp_monthly_sales

   FROM sales

   GROUP BY month

   ORDER BY month;

   ```


2. **Customer Segmentation:**

   - Segment customers based on their total spending.

   ```sql

   SELECT customer_id, 

          CASE 

              WHEN total_spent > 1000 THEN 'High Value'

              WHEN total_spent BETWEEN 500 AND 1000 THEN 'Medium Value'

              ELSE 'Low Value'

          END AS customer_segment

   INTO customer_segments

   FROM (

       SELECT customer_id, SUM(total_amount) AS total_spent

       FROM sales

       GROUP BY customer_id

   ) t;

   ```


3. **Product Performance:**

   - Analyze product performance by category.

   ```sql

   SELECT category, SUM(total_amount) AS total_sales

   FROM sales s

   JOIN products p ON s.product_id = p.product_id

   GROUP BY category

   ORDER BY total_sales DESC;

   ```


#### Step 7: Visualization and Dashboard (Optional)


1. **Connect to a Visualization Tool:**

   - Use a BI tool like Power BI, Tableau, or even Excel to connect to your SQL database.


2. **Create Dashboards:**

   - Visualize the sales analysis reports created in SQL.

   - Example visuals: Total Sales by Product, Sales Trend by Month, Top Customers, Sales by Store.


3. **Publish and Share:**

   - Publish the dashboards and share them with stakeholders.


### Summary


This project guides you through the process of creating a database, performing ETL operations, analyzing data, and generating reports for a retail company's sales data. By following these steps, you'll gain hands-on experience with SQL and develop valuable skills in data analysis and reporting.


-----------------------------------------------------------------------------------------------------------------------


I understand that for a comprehensive project, you will need data files to work with. Below are sample data entries for the `customers`, `products`, `stores`, and `sales` tables. You can use these entries to create CSV files, which you can then import into your SQL database.


### Sample Data


#### 1. customers.csv

```csv

customer_id,first_name,last_name,email,phone,address,city,state,zip_code

1,John,Doe,john.doe@example.com,555-1234,123 Elm St,Anytown,CA,90210

2,Jane,Smith,jane.smith@example.com,555-5678,456 Oak St,Othertown,TX,73301

3,Alice,Jones,alice.jones@example.com,555-8765,789 Pine St,Anycity,FL,33101

4,Bob,Brown,bob.brown@example.com,555-1357,321 Maple St,Somecity,NY,10001

5,Eve,Davis,eve.davis@example.com,555-2468,654 Birch St,Elsewhere,IL,60007

```


#### 2. products.csv

```csv

product_id,product_name,category,price

1,Laptop,Electronics,999.99

2,Smartphone,Electronics,499.99

3,Tablet,Electronics,299.99

4,Headphones,Accessories,199.99

5,Charger,Accessories,49.99

```


#### 3. stores.csv

```csv

store_id,store_name,city,state

1,Main Street Store,Anytown,CA

2,Market Street Store,Othertown,TX

3,Downtown Store,Anycity,FL

4,City Center Store,Somecity,NY

5,Suburb Store,Elsewhere,IL

```


#### 4. sales.csv

```csv

sale_id,sale_date,customer_id,product_id,store_id,quantity,total_amount

1,2023-01-15,1,1,1,1,999.99

2,2023-01-20,2,2,2,2,999.98

3,2023-01-25,3,3,3,1,299.99

4,2023-01-30,4,4,4,1,199.99

5,2023-02-05,5,5,5,3,149.97

6,2023-02-10,1,1,1,1,999.99

7,2023-02-15,2,2,2,1,499.99

8,2023-02-20,3,3,3,2,599.98

9,2023-02-25,4,4,4,2,399.98

10,2023-02-28,5,5,5,2,99.98

```


### Creating CSV Files


1. Open a text editor and copy the content for each table into separate files.

2. Save each file with a `.csv` extension (e.g., `customers.csv`, `products.csv`, `stores.csv`, `sales.csv`).


### Loading Data into SQL


#### Using MySQL:

```sql

LOAD DATA INFILE 'path/to/customers.csv'

INTO TABLE customers

FIELDS TERMINATED BY ','

ENCLOSED BY '"'

LINES TERMINATED BY '\n'

IGNORE 1 ROWS;


LOAD DATA INFILE 'path/to/products.csv'

INTO TABLE products

FIELDS TERMINATED BY ','

ENCLOSED BY '"'

LINES TERMINATED BY '\n'

IGNORE 1 ROWS;


LOAD DATA INFILE 'path/to/stores.csv'

INTO TABLE stores

FIELDS TERMINATED BY ','

ENCLOSED BY '"'

LINES TERMINATED BY '\n'

IGNORE 1 ROWS;


LOAD DATA INFILE 'path/to/sales.csv'

INTO TABLE sales

FIELDS TERMINATED BY ','

ENCLOSED BY '"'

LINES TERMINATED BY '\n'

IGNORE 1 ROWS;

```


#### Using PostgreSQL:

```sql

COPY customers FROM '/path/to/customers.csv' DELIMITER ',' CSV HEADER;

COPY products FROM '/path/to/products.csv' DELIMITER ',' CSV HEADER;

COPY stores FROM '/path/to/stores.csv' DELIMITER ',' CSV HEADER;

COPY sales FROM '/path/to/sales.csv' DELIMITER ',' CSV HEADER;

```


### Notes:

- Replace `/path/to/` with the actual path to where you saved your CSV files.

- Make sure your SQL server has the necessary permissions to access the files.


Now you have a dataset to work with for your SQL project. Follow the project steps provided earlier to perform ETL operations, analyze the data, and generate reports.

Comments