← Back to Tools
PromptUtils

Class Balance Analyzer

Analyze and fix dataset imbalance

Class Balance Analyzer

Upload or paste your class labels to analyze dataset balance. Get class distribution, imbalance severity, and calculated class weights for PyTorch, TensorFlow, and Scikit-learn.

📋 Paste Labels
📤 Upload CSV
📁
Click to upload or drag & drop
CSV file with label column
CSV Format: First row should have a header. Example: "label,value,target" where any column can be your labels.

Understanding Class Imbalance

  • Imbalance Ratio: The ratio of the largest class to the smallest class. Ratio > 2 usually indicates imbalance.
  • Minority Class %: The percentage of the smallest class. Below 10% is typically problematic.
  • Severity: LOW (ratio < 2), MEDIUM (ratio 2-10), HIGH (ratio > 10).
  • Why it matters: Models trained on imbalanced data often ignore the minority class entirely.

Solutions for Class Imbalance

EASY Class Weights

Assign higher weights to minority classes during training. Fastest solution, works for most cases. Built into most frameworks.

👍 Pro: Simple, no data changes
👎 Con: May not work for extreme imbalance

MEDIUM Oversampling

Duplicate minority class samples or use SMOTE to generate synthetic samples. Increases training data.

👍 Pro: Effective, preserves info
👎 Con: Risk of overfitting

MEDIUM Undersampling

Remove majority class samples to balance classes. Reduces training data and speed.

👍 Pro: Faster training
👎 Con: Loses information

HARD SMOTE / Advanced Techniques

Synthetic Minority Over-sampling Technique. Creates synthetic samples between existing minority samples.

👍 Pro: Best for severe imbalance
👎 Con: Complex, requires tuning

EASY Threshold Adjustment

Change the decision threshold instead of 0.5. Trades off precision vs recall.

👍 Pro: No retraining needed
👎 Con: Only for binary classification

EASY Different Metrics

Use F1, precision-recall AUC instead of accuracy. Better for imbalanced data.

👍 Pro: Free, immediate
👎 Con: Just evaluation, not fixing

Class Weights Explained

For a binary classification with 95 positives and 5 negatives:

  • Balanced formula: weight = total_samples / (num_classes × class_count)
  • Example: Negative weight = 100 / (2 × 95) = 0.53, Positive weight = 100 / (2 × 5) = 10
  • Interpretation: The model treats each positive sample as 10× more important during training
  • Alternative: Some prefer weight = 1 / class_count, others use log scaling

Related Tools