A Django REST Framework API for validating spreadsheet data against XLSForm specifications.
This API allows you to validate spreadsheet data against an XLSForm specification. It checks that:
- All column headers in the spreadsheet match question names in the XLSForm
- All values in the spreadsheet match the expected types and constraints defined in the XLSForm
- Required questions have values
Validates a spreadsheet against an XLSForm specification.
- Content-Type:
multipart/form-data
- Body:
xlsform_file
: The XLSForm file (Excel format)spreadsheet_file
: The spreadsheet file to validate (Excel or CSV format)
For valid spreadsheets:
{
"result": "valid"
}
For invalid spreadsheets:
{
"result": "invalid",
"errors": [
{
"line": 1,
"column": 3,
"error_type": "type_mismatch",
"error_explanation": "Value 'text' is not a valid integer for question 'age'",
"question_name": "age"
},
...
]
}
Error types:
type_mismatch
: The value does not match the expected typeerror_constraint_unsatisfied
: The value does not satisfy the constrainterror_value_required
: A required value is missing
- Clone the repository:
git clone https://github.com/madewulf/spreadsheet-xlsform-validator.git
cd spreadsheet-xlsform-validator
- Install dependencies:
For standalone web application:
pip install -r requirements.txt
For core functionality only (when using as reusable Django app):
pip install -r requirements-core.txt
For deployment to production:
pip install -r requirements-core.txt -r requirements-deploy.txt
- Run migrations:
python manage.py migrate
- Start the development server:
python manage.py runserver
Run the tests:
python manage.py test
XLSForm is a form standard created to help simplify the authoring of forms in Excel. For more information, see XLSForm.org.
An XLSForm consists of two main sheets:
survey
: Contains the questions and their propertieschoices
: Contains the choices for select questions
The API validates that:
- All column headers in the spreadsheet match question names in the survey sheet
- Values for integer questions are valid integers
- Values for select_one questions are valid choices from the choices sheet
- Values for decimal questions are valid decimals
- Required questions have values
- Values satisfy any constraints defined in the XLSForm
This validator implements core XLSForm functionality but does not yet support all features of the XLSForm specification. The following features are not currently implemented:
Feature | Status | Description |
---|---|---|
relevant column |
❌ Not implemented | Conditional logic to show/hide questions based on other responses |
calculation column |
❌ Not implemented | Automatic calculation of values based on other question responses |
Advanced constraint expressions | Complex XPath expressions may not be fully supported | |
Repeat groups | ❌ Not implemented | Repeating sections of questions |
Advanced question types | Some specialized question types may not be validated |
This validator uses the following libraries:
- pyxform: For parsing XLSForm files and converting them to internal format
- elementpath (XPath1Parser): For validating XPath constraint expressions
- pandas & openpyxl: For processing Excel and CSV files
You can test this validator online at: https://data-validator.bluesquare.org/
This project can be used as a reusable Django app in your own Django projects. To install:
pip install django-xlsform-validator
Or install from source:
git clone https://github.com/madewulf/spreadsheet-xlsform-validator.git
cd spreadsheet-xlsform-validator
pip install -e .
Then add to your Django project:
# settings.py
INSTALLED_APPS = [
# ...
'django_xlsform_validator',
]
# urls.py
urlpatterns = [
# ...
path('validator/', include('django_xlsform_validator.urls', namespace='django_xlsform_validator')),
]
For more details on configuration options and usage, see the Reusable App Documentation.
- Install AWS CLI:
pip install awscli
- Configure AWS credentials:
aws configure
- Install EB CLI:
pip install awsebcli
- Initialize Elastic Beanstalk application:
eb init --platform python-3.9 --region us-east-1
- Create environment:
eb create production --database.engine postgres --database.username ebroot
- Set environment variables:
eb setenv DEBUG=False SECRET_KEY=your-secret-key-here ALLOWED_HOSTS=.elasticbeanstalk.com
eb setenv DB_ENGINE=django.db.backends.postgresql DB_NAME=ebdb DB_USER=ebroot DB_PASSWORD=your-db-password DB_HOST=your-rds-endpoint DB_PORT=5432
- Deploy application:
eb deploy
- Open application:
eb open
Set these environment variables in the Elastic Beanstalk console:
DEBUG
: Set toFalse
for productionSECRET_KEY
: Generate a secure secret key for DjangoALLOWED_HOSTS
: Set to your domain (e.g.,.elasticbeanstalk.com
)DB_ENGINE
:django.db.backends.postgresql
(recommended for production)DB_NAME
: Database name (default:ebdb
)DB_USER
: Database usernameDB_PASSWORD
: Database passwordDB_HOST
: RDS endpointDB_PORT
: Database port (default:5432
)
- The application uses PostgreSQL in production (recommended over SQLite)
- Static files are served using WhiteNoise
- Database migrations run automatically on deployment
- A default admin user is created (username:
admin
, password:changeme123
) - change this immediately