Module 0: Foundations of Applied AI Engineering
Lesson 0.2: Designing Technical Architecture
The "Why": A model without metrics is an opinion.
Key Concepts & Tools
- Statistics:
- Mean, median, variance
- Distributions
- Hypothesis testing
- Correlation vs. causation
- Python Libraries:
pandas.DataFrame.describe()matplotlib/seabornfor plotting
Hands-On Exercise (GitHub Commit)
Task: Create a script eda.py or a notebook 00-exploratory-data-analysis.ipynb.
- Load your dataset into a pandas DataFrame.
- Compute descriptive statistics using
.describe(). - Generate and save visualizations:
- Histograms for key numerical features.
- Box plots for categorical vs. numerical data.
- A correlation heatmap.
- Add a markdown summary of your initial findings.
Instructor Notes & Pitfalls
Pitfall: Students will be tempted to skip deep EDA to get to the "fun" modeling part.
Guidance: Emphasize that EDA is the most critical step. It prevents the "garbage-in, garbage-out" problem. Ask them: "What's the most surprising thing you found in the data? How might that affect the model you build?"
Our Git branching flow follows this pattern:
graph TD
A[main] --> B{Create feature branch}
B --> C[git checkout -b feature/new-login]
C --> D[Make commits...]
D --> E{Open Pull Request}
E --> F[Code Review & Approval]
F --> G[Merge to main]
G --> A