Skip to content

Commit ece180e

Browse files
committed
πŸ΄β€β˜ οΈ Add Black/Flake8 linting & GitHub Actions CI/CD - code clean as a well-swabbed deck, yarr
1 parent a29a5f7 commit ece180e

File tree

10 files changed

+755
-205
lines changed

10 files changed

+755
-205
lines changed

β€Ž.flake8

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
[flake8]
2+
max-line-length = 88
3+
exclude =
4+
.git,
5+
__pycache__,
6+
.venv,
7+
.eggs,
8+
*.egg,
9+
build,
10+
dist,
11+
.pytest_cache
12+
13+
ignore =
14+
E203,
15+
W503,
16+
E501
17+
18+
# Show source code for each error
19+
show-source = True
20+
show-pep8 = True
21+
statistics = True
22+
count = True

β€Ž.github/workflows/ci.yml

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Yarr! GitHub Actions workflow to keep our Python treasure chest clean and tested!
2+
name: CI/CD Pipeline
3+
4+
on:
5+
push:
6+
branches: [ main, develop ]
7+
pull_request:
8+
branches: [ main ]
9+
10+
jobs:
11+
test-and-lint:
12+
runs-on: ubuntu-latest
13+
14+
strategy:
15+
matrix:
16+
python-version: [3.8, 3.9, '3.10', '3.11', '3.12']
17+
18+
steps:
19+
- uses: actions/checkout@v4
20+
21+
- name: Set up Python ${{ matrix.python-version }}
22+
uses: actions/setup-python@v4
23+
with:
24+
python-version: ${{ matrix.python-version }}
25+
26+
- name: Install dependencies
27+
run: |
28+
python -m pip install --upgrade pip
29+
pip install -r requirements.txt
30+
31+
- name: Run Black formatter check
32+
run: |
33+
black --check --diff app/ tests/
34+
35+
- name: Run Flake8 linter
36+
run: |
37+
flake8 app/ tests/
38+
39+
- name: Run tests with pytest
40+
run: |
41+
pytest tests/ -v --tb=short

β€Ž.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,12 @@ data/*.csv
5656
data/*.json
5757
data/*.xlsx
5858

59+
# Large data files - too big for GitHub's hold
60+
data/kaggle_so_2023_data/
61+
data/kaggle_so_2023_data.zip
62+
*.csv
63+
*.zip
64+
5965
# Test coverage
6066
.coverage
6167
htmlcov/

β€ŽREADME.md

Lines changed: 62 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -56,27 +56,69 @@ This application is designed specifically for **data analysts** who need:
5656

5757
## πŸ΄β€β˜ οΈ Setup Instructions
5858

59-
### 1. Data Setup
60-
61-
The Stack Overflow 2023 survey data is provided as a compressed zip file to keep the repository size manageable:
62-
63-
**Option A: Automatic Extraction (Recommended)**
64-
- The application will automatically extract `data/kaggle_so_2023_data.zip` when first run
65-
- No manual action needed - just start the server!
66-
67-
**Option B: Manual Extraction**
68-
```bash
69-
# Navigate to the data directory
70-
cd data
71-
72-
# Extract the zip file
73-
unzip kaggle_so_2023_data.zip
74-
75-
# This creates the kaggle_so_2023/ directory with:
76-
# - survey_results_public.csv (151MB - main survey responses)
77-
# - survey_results_schema.csv (data schema and column descriptions)
78-
# - Additional documentation files
59+
### 1. Data Setup (Smart Zip Management System)
60+
61+
The application features an **intelligent data management system** designed for data analysts who work with multiple datasets:
62+
63+
#### πŸ€– Automatic Data Source Detection
64+
- **Zero Configuration**: Drop any survey data zip file into `data/` folder
65+
- **Auto-Extraction**: Zip files are automatically extracted on application startup
66+
- **Smart Detection**: CSV files are automatically discovered and configured
67+
- **Technology Analysis**: Columns with semicolon-separated tech lists are auto-detected
68+
69+
#### πŸ“¦ Current Data Sources
70+
- **Stack Overflow 2023**: `kaggle_so_2023.zip` (20MB compressed β†’ 151MB extracted)
71+
- Contains `survey_results_public.csv` with 89,000+ developer responses
72+
- Includes `survey_results_schema.csv` with column definitions
73+
- Pre-configured with 8 technology analysis categories
74+
75+
#### βž• Adding New Data Sources (Open-Ended Design)
76+
Perfect for data analysts working with multiple survey datasets:
77+
78+
1. **Prepare Your Data**:
79+
```
80+
your_survey_data/
81+
β”œβ”€β”€ main_survey_responses.csv # Main data (any CSV name works)
82+
β”œβ”€β”€ schema_definitions.csv # Optional (detected by "schema" in name)
83+
└── documentation.txt # Additional files (ignored)
84+
```
85+
86+
2. **Create Zip Archive**:
87+
```bash
88+
zip -r your_survey_2024.zip your_survey_data/
89+
```
90+
91+
3. **Deploy to Application**:
92+
```bash
93+
cp your_survey_2024.zip /path/to/project/data/
94+
# Application auto-detects and configures on next startup
95+
```
96+
97+
4. **Automatic Configuration**:
98+
- Main data file detected (largest CSV or one with "survey"/"results" in name)
99+
- Schema file detected (contains "schema" in filename)
100+
- Technology columns identified (contain "language", "database", "platform", etc.)
101+
- New data source registered and available in dashboard
102+
103+
#### πŸ” Data Format Requirements
104+
- **Primary Format**: CSV files with semicolon-separated technology lists
105+
- **Column Detection**: Automatic detection of technology-related columns
106+
- **Schema Support**: Optional schema files for column descriptions
107+
- **Size Limit**: Zip files should be under GitHub's 100MB limit
108+
109+
#### πŸ“Š Example Multi-Source Setup
79110
```
111+
data/
112+
β”œβ”€β”€ kaggle_so_2023.zip # Stack Overflow 2023
113+
β”œβ”€β”€ kaggle_so_2023/ # Auto-extracted
114+
β”œβ”€β”€ github_dev_survey_2024.zip # Your GitHub survey
115+
β”œβ”€β”€ github_dev_survey_2024/ # Auto-extracted
116+
β”œβ”€β”€ company_internal_survey.zip # Internal survey
117+
β”œβ”€β”€ company_internal_survey/ # Auto-extracted
118+
└── .gitignore # Excludes CSV files, includes zips
119+
```
120+
121+
Each data source becomes automatically available in the dashboard with detected technology categories!
80122

81123
**Data Contents:**
82124
- `survey_results_public.csv` - Main survey responses (151MB)

0 commit comments

Comments
Β (0)