Need 2 separate files for each doc. 

 Please include the field, Sales.SalesOrderDetail.UnitPrice in addition to all the other fields instructed when you are working on the package called ProductSalesInfo 

Dr. Alam, CS 538 ETL Assignment

Title: Extraction, Transformation, Loading (ETL) with SQL Server Integration Services (SSIS) CS 538 Business Intelligence and Data Mining Fall 2023

Instructions:

1. Create a new SSIS project.

2. Create a new database called ETL_Data in SQL Server.

3. Create an SSIS package for each of the following tasks:

· PersonBio – This package will export the data to the ETL_Data database from the source tables and columns listed below found in the AdventureWorks database:

Source Tables: Person.Person, Person.EmailAddress, Person.StateProvince, Person.PersonPhone, Person.BusinessEntityAddress, and Person.Address.

Source fields: FirstName, LastName, AddressLine1, AddressLine2, City, StateProvince.Name, EmailAddress, and PhoneNumber.

The new table will contain every employees’ information from the tables above regardless of they have a phone number or email address.

Name the new table PersonBio in your ETL_Data database.

Name the Source Assistant PersonBioSource, and the Destination Assistant PersonBioDestination

Name the package PersonBio.

· SplitByStateName – This package will split the data from the PersonBio table into different tables within ETL_Data database based on the first letter of the StateProvince name.

You will create 5 new destination tables (StatesWith A, B, C, Null, and Others).

You may have to place the condition for the states with NULL value first, before specifying other conditions.

This is a sample of the output table that will be generated by the ETL process.

A computer screen shot of a computer screen  Description automatically generated

A computer screen shot of a computer screen  Description automatically generated

· ProductSalesInfo – This package will calculate the sales amount and sales quarter for each product.

The data source for this package will be a query from the following tables: Production.Product, Production.ProductSubcategory, ProductCategory, Sales.SalesOrderHeader, Sales.SalesOrderDetail.

The query should show the following fields: Production.Product.Name, Production.ProductCategory.Name AS [CategoryName], Production.Product.ListPrice, Sales.SalesOrderHeader.OrderDate (only the orders after 2004), Sales.SalesOrderDetail.OrderQty.

Create two Derived Columns in the destination table. Name the first derived column SalesAmount. You can calculate the sales amount by multiplying ListPrice and OrderQty.

Name the second derived column SalesQtr. The data for this column should be extracted from the OrderDate field using a month function. You will need to build an “IF” statement around the month function that will check and assign the quarter value. The conditions for the IF statements can be, IF month of the date is > 9 then the value of the SalesQtr is 4th qtr, IF the month of the date is > 6 then then SalesQtr is 3rd Qtr, IF the month is > 3 then SalesQtr is 2nd Qtr, and for all the other months SalesQtr is 1st Qtr.

Name the output table ProductSalesInfo.

This is a sample of the output table that will be generated by the ETL process.

A computer screen shot of a computer screen  Description automatically generated

· SalesAggregate – This package will aggregate the data from the ProductSalesInfo table to show the total quantity and total sales amount for each product.

The data source for this package will be the ProductSalesInfo table.

Select these fields for the source query: Production.Product.Name, Production.ProductCategory.Name AS CategoryName, Sales.SalesOrderDetail.OrderQty, Sales.SalesOrderDetail.UnitPrice.

Create a derived column called SalesAmount by multiplying Price with Qty. After adding the derived column task, add an Aggregate task in the package. Aggregate the fields in such a way that for each product name total quantity and total sales amount are shown.

Name the output table SalesAggregate.

Using a Multicast task, export the data in a flat file and into a SQLServer table. Name the flat file SalesAggregate.txt and the SQLServer table SalesAggregate.

This is a sample of the output table that will be generated by the ETL process.

A computer screen with a computer screen  Description automatically generated

4. Save all of the packages in the same project.

5. Truncate the destination tables in each package before running the package.

6. Name the data flow tasks, data source tasks, destination source tasks, and any other task or transformation module meaningfully.

7. Zip the project folder and name the zipped folder with your last and first name.

8. Upload the zipped folder to Canvas.

Grading:

Your assignment will be graded on the following criteria:

· Correctness of the data in the destination tables.

· Use of meaningful names for data flow tasks, data source tasks, destination source tasks, and any other task or transformation module.

· Use of truncate SQL commands in every package to delete any destination table and prevent duplication before running any task in your package.

· Creation of the ETL_Data database in SQL Server.

2

image4.png

image1.png

image2.png

image3.png

,

Fall 2023 Dr. Alam, CS 538 Data Mart Project

Title: Design and Implement a Data Mart Part 1: Create a Data Model for a Data Mart using Dimensional Modeling Principles

Prerequisite:

Before beginning this assignment, ensure that you've thoroughly read and understood Chapters 9 (pages 197 to 235) and 10 (pages 237 to 251) on Dimensional Modeling from the course textbook.

Project Context:

You are stepping into the shoes of a Junior BI developer involved in a data mart project. As part of the requirements gathering phase, you have a discussion with Jim Riner, the Sales Manager. Jim identifies a crucial need for deeper sales data analysis that encompasses the following dimensions:

1. Products

2. Customers

3. Dates (Seasonality)

4. Orders

5. Sales Territory

Specific Dimension Requirements:

1. Product Dimension:

· Analyze sales based on categories, subcategories, product names, colors, and models.

· This will help in identifying top-selling items in various categories and attributes.

2. Customer Dimension:

· Explore sales data to determine which customers purchase which items, pinpoint top customers, and analyze sales by the customer's zip, territory, country, and city.

· This information can aid in tailoring promotional offers and understanding buying patterns of valued customers.

3. Date (Seasonality) Dimension:

· Analyze which products have high sales during specific seasons, days, weeks, or years.

· The granularity of this dimension should include: Date Surrogate Key, Date Value, Month, Year, IsHoliday, and Holiday Name.

4. Order Dimension:

· Sales analysis based on Order ID, Order Detail ID, and Customer ID.

5. Sales Territory Dimension:

· The analysis should cover territory name, territory group, country, or region codes.

· The objective is to determine the profitability of specific geographic locations, products sold there, and revenue comparison between regions.

Assignment Task:

Given the requirements and understanding from chapters 9 and 10 on Star Schema, your task is to:

1. Design an ERD diagram for a Star Schema that will integrate the central fact table with the required dimension tables.

2. Refer to Figures 9.10 and 9.18 (for the date dimension) in the textbook as guidance.

Submit your ERD diagram by 10/02/2023.

Good luck, and ensure your design captures the depth and granularity necessary for effective sales data analysis.

Title: Design and Implement a Data Mart Part 2: Create a Data Model for a Data Mart using Dimensional Modeling Principles Top of Form

In this part of the project, you will be implementing the data mart from your data mart ERD. You will be performing Extraction, Transformation, Loading (ETL) tasks to transfer the required data into the fact table and dimensional tables. You will need to create SSIS packages to perform all the ETL tasks necessary. You have all the source data available in the transaction database AdventureWorks.

Instructions:

1. Rename the AdventureWorks database inside your SQL Server to AdventureWorks2016CTP3 before starting to work on this project

2. Create a new database called AdventureWorksDM inside your SQL Server. This database will host the fact and dimensional tables

3. Create the packages in SSIS that will create the dimensional and fact tables

4. Here is the list of the tables that serve as the dimensions:

a. DimProduct

b. DimCustomer

c. DimDate

d. DimOrder

e. DimSalesTerritory

5. Here is the list of the fact table you will create:

a. FactSales

Important Notes:

· Create a surrogate key for the DimOrderDate table. The surrogate key value will be the same as the date value except you will not have any slashes. For example, if the date is 11/18/2017 the surrogate key value for that date will be 11182017.

· The DimOrderDate table should have dates from 1980 until 2050. The month and the year column will contain the month value and the year value of the date.

· IsHoliday and HolidayName could be one single date (such as 4th of July) or a range of dates (such as for Christmas, ranging between December 1st until December 25th). You decide which are the holiday dates based on National Holidays within the USA.

2