aryadomain commited on 27 days ago

Commit

533920b

verified ·

1 Parent(s): ef8f3ad

Add files using upload-large-folder tool

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

Reward_sana_idealized/README.md +41 -0
Reward_sana_idealized/RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_1/evaluation_results.txt +4 -0
Reward_sana_idealized/RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_1/log.log +203 -0
Reward_sana_idealized/RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_2/evaluation_results.txt +4 -0
Reward_sana_idealized/RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_2/log.log +258 -0
Reward_sana_idealized/RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_2/lr_curve.png +0 -0
Reward_sana_idealized/RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_3/evaluation_results.txt +4 -0
Reward_sana_idealized/RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_3/log.log +218 -0
Reward_sana_idealized/RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_3/lr_curve.png +0 -0
Reward_sana_idealized/RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_4/evaluation_results.txt +4 -0
Reward_sana_idealized/RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_4/log.log +218 -0
Reward_sana_idealized/RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_4/lr_curve.png +0 -0
Reward_sana_idealized/RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_4/rewards_curve.png +0 -0
Reward_sana_idealized/__pycache__/eval.cpython-311.pyc +0 -0
Reward_sana_idealized/__pycache__/gradient_ascent_utils.cpython-311.pyc +0 -0
Reward_sana_idealized/blip/__init__.py +1 -0
Reward_sana_idealized/blip/__pycache__/__init__.cpython-311.pyc +0 -0
Reward_sana_idealized/blip/__pycache__/blip.cpython-311.pyc +0 -0
Reward_sana_idealized/blip/__pycache__/blip_pretrain.cpython-311.pyc +0 -0
Reward_sana_idealized/blip/__pycache__/med.cpython-311.pyc +0 -0
Reward_sana_idealized/blip/blip.py +70 -0
Reward_sana_idealized/blip/blip_pretrain.py +43 -0
Reward_sana_idealized/config_analysis_tuning.ipynb +218 -0
Reward_sana_idealized/eval.py +1447 -0
Reward_sana_idealized/examples.sh +162 -0
Reward_sana_idealized/grad_ascent_configs.py +67 -0
Reward_sana_idealized/gradient_ascent_utils.py +391 -0
Reward_sana_idealized/hpsv2_score.py +110 -0
Reward_sana_idealized/imagereward_score.py +221 -0
Reward_sana_idealized/lr_scheduler.py +233 -0
Reward_sana_idealized/models/__pycache__/__init__.cpython-311.pyc +0 -0
Reward_sana_idealized/open_clip/__pycache__/coca_model.cpython-311.pyc +0 -0
Reward_sana_idealized/open_clip/__pycache__/factory.cpython-311.pyc +0 -0
Reward_sana_idealized/open_clip/__pycache__/model.cpython-311.pyc +0 -0
Reward_sana_idealized/open_clip/__pycache__/modified_resnet.cpython-311.pyc +0 -0
Reward_sana_idealized/open_clip/__pycache__/pretrained.cpython-311.pyc +0 -0
Reward_sana_idealized/open_clip/__pycache__/push_to_hf_hub.cpython-311.pyc +0 -0
Reward_sana_idealized/open_clip/__pycache__/timm_model.cpython-311.pyc +0 -0
Reward_sana_idealized/open_clip/__pycache__/tokenizer.cpython-311.pyc +0 -0
Reward_sana_idealized/open_clip/__pycache__/transformer.cpython-311.pyc +0 -0
Reward_sana_idealized/open_clip/model_configs/convnext_xlarge.json +19 -0
Reward_sana_idealized/pick_score.py +141 -0
Reward_sana_idealized/test.ipynb +47 -0
Reward_sana_idealized/tune_hyperparams.py +514 -0
Reward_sana_idealized/tune_parallel.sh +253 -0
Reward_sdxl_idealized/models/__pycache__/__init__.cpython-310.pyc +0 -0
Reward_sdxl_idealized/models/__pycache__/__init__.cpython-313.pyc +0 -0
Reward_sdxl_idealized/models/__pycache__/__init__.cpython-39.pyc +0 -0
Reward_sdxl_idealized/models/__pycache__/reward_model.cpython-39.pyc +0 -0
Reward_sdxl_idealized/models/__pycache__/reward_model_sdxl.cpython-310.pyc +0 -0

Reward_sana_idealized/README.md ADDED Viewed

	@@ -0,0 +1,41 @@

+# Reward SANA Idealized
+This folder is a SANA-only reward-guided inference package.
+## What is inside
+- `models/reward_model.py`
+  - Local SANA reward wrapper (no trainer import from other directories).
+  - Loads base SANA diffusers modules and local reward checkpoint weights.
+- `pipelines/sana_reward_pipeline.py`
+  - SANA pipeline with per-step reward tracking.
+- `pipelines/sana_gradient_ascent_pipeline.py`
+  - SANA pipeline with gradient-ascent latent updates.
+- `eval.py`
+  - End-to-end evaluation script.
+- `examples.sh`
+  - Cluster entrypoint for prefetch and evaluation.
+## Default checkpoint
+`examples.sh` defaults to:
+`/g/data/rr81/LPO/lrm/lrm_sana/logs/v8/reward_model/step_sana_sana_600m_512_variable-t_lr1e-5_step-8000_filter2_time951/checkpoint-gstep76000`
+Override with:
+```bash
+LRM_MODEL_PATH=/path/to/checkpoint-dir-or-model.safetensors
+```
+## Run (10-sample smoke test)
+```bash
+cd /g/data/rr81/LPO/Reward_sana_idealized
+OFFLINE_MODE=1 MAX_SAMPLES=10 MODE=gradient_ascent MODEL_PROFILE=sana_600m_512 ./examples.sh
+```
+## Notes
+- Uses existing Python env: `/g/data/rr81/aev/bin/python`.
+- GPU nodes should run with offline HF cache.

Reward_sana_idealized/RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_1/evaluation_results.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+mode: baseline
+metrics: ['clip', 'aesthetic', 'pickscore', 'hpsv2', 'hpsv21', 'imagereward']
+config: {'num_samples': 500, 'num_steps': 20, 'cfg_scale': 4.5, 'grad_range': [0, 700], 'grad_steps': 5, 'grad_step_size': 0.1}
+baseline: {'avg_reward': np.float64(0.6854833755493164), 'clip_score': np.float64(26.60960610508919), 'aesthetic_score': np.float64(5.930574191093445), 'pickscore': np.float64(21.89574451446533), 'hpsv2_score': np.float16(0.2805), 'hpsv21_score': np.float16(0.292), 'imagereward_score': np.float64(1.001599932681769)}

Reward_sana_idealized/RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_1/log.log ADDED Viewed

	@@ -0,0 +1,203 @@

+======================================================================
+FID EVALUATION: BASELINE vs GRADIENT ASCENT
+======================================================================
+Logging to: RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_1/log.log
+Device: cuda:0
+Dataset: PICKAPIC
+Data directory: ./data
+Base model: Efficient-Large-Model/Sana_600M_512px_diffusers
+Model variant: sana_600m_512
+LRM model: /g/data/rr81/LPO/lrm/lrm_sana/logs/v8/reward_model/step_sana_sana_600m_512_variable-t_lr1e-5_step-8000_filter2_time951/checkpoint-gstep76000
+HF cache dir: /scratch/rr81/ma5430/.cache/huggingface/hub
+HF offline mode: True
+Inference steps: 20
+CFG scale: 4.5
+Batch size: 1
+Max samples: All
+Output directory: RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_1
+Save images: False
+Evaluation mode: baseline
+Metrics to evaluate: CLIP, AESTHETIC, PICKSCORE, HPSV2, HPSV21, IMAGEREWARD
+Gradient ascent config: one_step_rectification_config
+======================================================================
+1. LOADING VALIDATION DATA
+======================================================================
+Loading Pick-a-Pic validation prompts...
+Loading cached Pick-a-Pic split 'validation_unique' from 1 parquet shards
+cache=/scratch/rr81/ma5430/.cache/huggingface/hub/datasets--pickapic-anonymous--pickapic_v1
+Loaded 500 Pick-a-Pic validation samples
+======================================================================
+2. LOADING REWARD MODEL
+======================================================================
+Loading SANA base reward backbone from Efficient-Large-Model/Sana_600M_512px_diffusers...
+Loading SANA reward checkpoint from /g/data/rr81/LPO/lrm/lrm_sana/logs/v8/reward_model/step_sana_sana_600m_512_variable-t_lr1e-5_step-8000_filter2_time951/checkpoint-gstep76000/model.safetensors...
+✓ Loaded checkpoint keys: 1214
+✓ Missing keys: 0 | Unexpected keys: 0
+✓ SANA LRM Reward Model initialized successfully!
+✓ Reward model loaded
+======================================================================
+3. LOADING PIPELINE
+======================================================================
+✓ Loaded SANA base model: Efficient-Large-Model/Sana_600M_512px_diffusers
+✓ Reward model attached to SANA pipeline
+✓ Pipeline loaded
+GPU memory before scorer load: 126.00 GB free / 140.06 GB total
+Scorer device: cuda:0
+======================================================================
+3.5. LOADING CLIP AND AESTHETIC SCORERS
+======================================================================
+✓ CLIP scorer loaded
+✓ Aesthetic scorer loaded
+✓ PickScore scorer loaded
+✓ HPSv2 scorer loaded
+✓ HPSv2.1 scorer loaded
+load checkpoint from /scratch/rr81/ma5430/.cache/huggingface/hub/models--THUDM--ImageReward/snapshots/5736be03b2652728fb87788c9797b0570450ab72/ImageReward.pt
+checkpoint loaded
+✓ ImageReward scorer loaded
+======================================================================
+4. CONFIGURING GRADIENT ASCENT
+======================================================================
+Loading gradient ascent config: one_step_rectification_config
+Config loaded: {'grad_timestep_range': (200, 800), 'num_grad_steps': 1, 'grad_step_size': 1.0, 'grad_scale': 1.0, 'lr_scheduler_type': 'constant', 'use_momentum': False, 'use_nesterov': False, 'use_iso_projection': False}
+Gradient timestep range: (200, 800)
+Gradient steps: 1
+Gradient step size (initial LR): 1.0
+LR Scheduler: constant
+✓ Gradient ascent enabled for timesteps (200, 800)
+======================================================================
+5. EVALUATING BASELINE
+======================================================================
+Generating images with baseline mode...
+[baseline] Batch 10/500 | Samples: 10/500 | Reward (t=136.0): 0.9946 | Reward (Avg): 0.7891 | CLIP: 25.9197 | Aesthetic: 6.1225 | PickScore: 21.9931 | HPSv2: 0.2854 | HPSv2.1: 0.3103 | ImageReward: 1.2588
+[baseline] Batch 20/500 | Samples: 20/500 | Reward (t=136.0): 0.1036 | Reward (Avg): 0.7397 | CLIP: 26.1207 | Aesthetic: 6.0740 | PickScore: 22.1701 | HPSv2: 0.2849 | HPSv2.1: 0.3103 | ImageReward: 1.1364
+[baseline] Batch 30/500 | Samples: 30/500 | Reward (t=136.0): 0.9990 | Reward (Avg): 0.7536 | CLIP: 26.2919 | Aesthetic: 6.0006 | PickScore: 22.3621 | HPSv2: 0.2859 | HPSv2.1: 0.3066 | ImageReward: 1.1024
+[baseline] Batch 40/500 | Samples: 40/500 | Reward (t=136.0): 0.4502 | Reward (Avg): 0.7308 | CLIP: 26.6222 | Aesthetic: 6.0754 | PickScore: 22.3074 | HPSv2: 0.2844 | HPSv2.1: 0.3040 | ImageReward: 0.9845
+[baseline] Batch 50/500 | Samples: 50/500 | Reward (t=136.0): 0.2013 | Reward (Avg): 0.7104 | CLIP: 26.4461 | Aesthetic: 6.0134 | PickScore: 22.1421 | HPSv2: 0.2832 | HPSv2.1: 0.3013 | ImageReward: 1.0774
+[baseline] Batch 60/500 | Samples: 60/500 | Reward (t=136.0): 0.8906 | Reward (Avg): 0.7145 | CLIP: 26.3397 | Aesthetic: 6.0189 | PickScore: 22.1258 | HPSv2: 0.2837 | HPSv2.1: 0.3013 | ImageReward: 1.0667
+[baseline] Batch 70/500 | Samples: 70/500 | Reward (t=136.0): 0.9961 | Reward (Avg): 0.6977 | CLIP: 26.5825 | Aesthetic: 5.9899 | PickScore: 22.1627 | HPSv2: 0.2839 | HPSv2.1: 0.3013 | ImageReward: 1.0167
+[baseline] Batch 80/500 | Samples: 80/500 | Reward (t=136.0): 0.9014 | Reward (Avg): 0.6814 | CLIP: 26.4912 | Aesthetic: 5.9676 | PickScore: 22.0616 | HPSv2: 0.2832 | HPSv2.1: 0.2974 | ImageReward: 0.9867
+[baseline] Batch 90/500 | Samples: 90/500 | Reward (t=136.0): 0.9990 | Reward (Avg): 0.6827 | CLIP: 26.6833 | Aesthetic: 5.9604 | PickScore: 22.0380 | HPSv2: 0.2830 | HPSv2.1: 0.2976 | ImageReward: 1.0014
+[baseline] Batch 100/500 | Samples: 100/500 | Reward (t=136.0): 0.0362 | Reward (Avg): 0.6914 | CLIP: 26.9976 | Aesthetic: 5.9972 | PickScore: 21.9799 | HPSv2: 0.2822 | HPSv2.1: 0.2957 | ImageReward: 1.0027
+[baseline] Batch 110/500 | Samples: 110/500 | Reward (t=136.0): 0.5342 | Reward (Avg): 0.6817 | CLIP: 27.1451 | Aesthetic: 5.9820 | PickScore: 21.9669 | HPSv2: 0.2825 | HPSv2.1: 0.2959 | ImageReward: 1.0009
+[baseline] Batch 120/500 | Samples: 120/500 | Reward (t=136.0): 0.0418 | Reward (Avg): 0.6660 | CLIP: 27.0372 | Aesthetic: 5.9733 | PickScore: 21.9549 | HPSv2: 0.2825 | HPSv2.1: 0.2964 | ImageReward: 1.0380
+[baseline] Batch 130/500 | Samples: 130/500 | Reward (t=136.0): 0.9771 | Reward (Avg): 0.6797 | CLIP: 27.0679 | Aesthetic: 5.9902 | PickScore: 22.0210 | HPSv2: 0.2830 | HPSv2.1: 0.2974 | ImageReward: 1.0360
+[baseline] Batch 140/500 | Samples: 140/500 | Reward (t=136.0): 0.9722 | Reward (Avg): 0.6796 | CLIP: 27.2356 | Aesthetic: 5.9616 | PickScore: 22.0095 | HPSv2: 0.2830 | HPSv2.1: 0.2961 | ImageReward: 1.0353
+[baseline] Batch 150/500 | Samples: 150/500 | Reward (t=136.0): 0.9639 | Reward (Avg): 0.6779 | CLIP: 27.0927 | Aesthetic: 5.9419 | PickScore: 21.9896 | HPSv2: 0.2825 | HPSv2.1: 0.2952 | ImageReward: 1.0313
+[baseline] Batch 160/500 | Samples: 160/500 | Reward (t=136.0): 0.8735 | Reward (Avg): 0.6787 | CLIP: 27.1935 | Aesthetic: 5.9386 | PickScore: 22.0361 | HPSv2: 0.2827 | HPSv2.1: 0.2954 | ImageReward: 1.0422
+[baseline] Batch 170/500 | Samples: 170/500 | Reward (t=136.0): 0.8418 | Reward (Avg): 0.6797 | CLIP: 26.9886 | Aesthetic: 5.9230 | PickScore: 21.9763 | HPSv2: 0.2820 | HPSv2.1: 0.2939 | ImageReward: 1.0346
+[baseline] Batch 180/500 | Samples: 180/500 | Reward (t=136.0): 0.3572 | Reward (Avg): 0.6742 | CLIP: 27.0903 | Aesthetic: 5.9289 | PickScore: 21.9842 | HPSv2: 0.2825 | HPSv2.1: 0.2947 | ImageReward: 1.0590
+[baseline] Batch 190/500 | Samples: 190/500 | Reward (t=136.0): 0.3916 | Reward (Avg): 0.6795 | CLIP: 27.0595 | Aesthetic: 5.9303 | PickScore: 21.9667 | HPSv2: 0.2817 | HPSv2.1: 0.2937 | ImageReward: 1.0438
+[baseline] Batch 200/500 | Samples: 200/500 | Reward (t=136.0): 0.3701 | Reward (Avg): 0.6735 | CLIP: 27.0636 | Aesthetic: 5.9264 | PickScore: 21.9664 | HPSv2: 0.2820 | HPSv2.1: 0.2944 | ImageReward: 1.0526
+[baseline] Batch 210/500 | Samples: 210/500 | Reward (t=136.0): 0.7476 | Reward (Avg): 0.6796 | CLIP: 27.0620 | Aesthetic: 5.9352 | PickScore: 21.9646 | HPSv2: 0.2820 | HPSv2.1: 0.2949 | ImageReward: 1.0607
+[baseline] Batch 220/500 | Samples: 220/500 | Reward (t=136.0): 0.9932 | Reward (Avg): 0.6813 | CLIP: 27.1314 | Aesthetic: 5.9390 | PickScore: 21.9501 | HPSv2: 0.2820 | HPSv2.1: 0.2947 | ImageReward: 1.0481
+[baseline] Batch 230/500 | Samples: 230/500 | Reward (t=136.0): 0.4731 | Reward (Avg): 0.6819 | CLIP: 27.1906 | Aesthetic: 5.9441 | PickScore: 21.9595 | HPSv2: 0.2820 | HPSv2.1: 0.2944 | ImageReward: 1.0323
+[baseline] Batch 240/500 | Samples: 240/500 | Reward (t=136.0): 0.2905 | Reward (Avg): 0.6844 | CLIP: 27.0801 | Aesthetic: 5.9538 | PickScore: 21.9540 | HPSv2: 0.2815 | HPSv2.1: 0.2937 | ImageReward: 1.0183
+[baseline] Batch 250/500 | Samples: 250/500 | Reward (t=136.0): 0.9868 | Reward (Avg): 0.6836 | CLIP: 27.0973 | Aesthetic: 5.9652 | PickScore: 21.9579 | HPSv2: 0.2817 | HPSv2.1: 0.2939 | ImageReward: 1.0174
+[baseline] Batch 260/500 | Samples: 260/500 | Reward (t=136.0): 0.6987 | Reward (Avg): 0.6825 | CLIP: 27.0369 | Aesthetic: 5.9730 | PickScore: 21.9534 | HPSv2: 0.2817 | HPSv2.1: 0.2939 | ImageReward: 1.0270
+[baseline] Batch 270/500 | Samples: 270/500 | Reward (t=136.0): 1.0000 | Reward (Avg): 0.6849 | CLIP: 27.0198 | Aesthetic: 5.9743 | PickScore: 21.9475 | HPSv2: 0.2820 | HPSv2.1: 0.2939 | ImageReward: 1.0323
+[baseline] Batch 280/500 | Samples: 280/500 | Reward (t=136.0): 0.9316 | Reward (Avg): 0.6886 | CLIP: 27.0667 | Aesthetic: 5.9771 | PickScore: 21.9809 | HPSv2: 0.2822 | HPSv2.1: 0.2949 | ImageReward: 1.0492
+[baseline] Batch 290/500 | Samples: 290/500 | Reward (t=136.0): 0.8652 | Reward (Avg): 0.6863 | CLIP: 26.9701 | Aesthetic: 5.9660 | PickScore: 21.9452 | HPSv2: 0.2820 | HPSv2.1: 0.2939 | ImageReward: 1.0336
+[baseline] Batch 300/500 | Samples: 300/500 | Reward (t=136.0): 0.9995 | Reward (Avg): 0.6888 | CLIP: 26.9522 | Aesthetic: 5.9680 | PickScore: 21.9559 | HPSv2: 0.2822 | HPSv2.1: 0.2947 | ImageReward: 1.0375
+[baseline] Batch 310/500 | Samples: 310/500 | Reward (t=136.0): 0.9971 | Reward (Avg): 0.6894 | CLIP: 27.0262 | Aesthetic: 5.9725 | PickScore: 21.9700 | HPSv2: 0.2825 | HPSv2.1: 0.2949 | ImageReward: 1.0583
+[baseline] Batch 320/500 | Samples: 320/500 | Reward (t=136.0): 0.8667 | Reward (Avg): 0.6890 | CLIP: 27.0493 | Aesthetic: 5.9674 | PickScore: 21.9830 | HPSv2: 0.2825 | HPSv2.1: 0.2947 | ImageReward: 1.0534
+[baseline] Batch 330/500 | Samples: 330/500 | Reward (t=136.0): 0.9683 | Reward (Avg): 0.6922 | CLIP: 27.0304 | Aesthetic: 5.9733 | PickScore: 21.9784 | HPSv2: 0.2822 | HPSv2.1: 0.2947 | ImageReward: 1.0603
+[baseline] Batch 340/500 | Samples: 340/500 | Reward (t=136.0): 0.8975 | Reward (Avg): 0.6945 | CLIP: 27.0049 | Aesthetic: 5.9703 | PickScore: 21.9790 | HPSv2: 0.2822 | HPSv2.1: 0.2949 | ImageReward: 1.0708
+[baseline] Batch 350/500 | Samples: 350/500 | Reward (t=136.0): 0.0694 | Reward (Avg): 0.6900 | CLIP: 27.0073 | Aesthetic: 5.9688 | PickScore: 21.9897 | HPSv2: 0.2822 | HPSv2.1: 0.2949 | ImageReward: 1.0626
+[baseline] Batch 360/500 | Samples: 360/500 | Reward (t=136.0): 0.9307 | Reward (Avg): 0.6921 | CLIP: 27.0431 | Aesthetic: 5.9667 | PickScore: 21.9860 | HPSv2: 0.2825 | HPSv2.1: 0.2952 | ImageReward: 1.0531
+[baseline] Batch 370/500 | Samples: 370/500 | Reward (t=136.0): 0.9175 | Reward (Avg): 0.6917 | CLIP: 26.9788 | Aesthetic: 5.9615 | PickScore: 21.9723 | HPSv2: 0.2822 | HPSv2.1: 0.2949 | ImageReward: 1.0486
+[baseline] Batch 380/500 | Samples: 380/500 | Reward (t=136.0): 0.3616 | Reward (Avg): 0.6916 | CLIP: 27.0571 | Aesthetic: 5.9690 | PickScore: 21.9754 | HPSv2: 0.2822 | HPSv2.1: 0.2952 | ImageReward: 1.0540
+[baseline] Batch 390/500 | Samples: 390/500 | Reward (t=136.0): 0.9912 | Reward (Avg): 0.6914 | CLIP: 26.9386 | Aesthetic: 5.9658 | PickScore: 21.9601 | HPSv2: 0.2820 | HPSv2.1: 0.2944 | ImageReward: 1.0402
+[baseline] Batch 400/500 | Samples: 400/500 | Reward (t=136.0): 0.0252 | Reward (Avg): 0.6910 | CLIP: 26.8978 | Aesthetic: 5.9574 | PickScore: 21.9578 | HPSv2: 0.2820 | HPSv2.1: 0.2942 | ImageReward: 1.0424
+[baseline] Batch 410/500 | Samples: 410/500 | Reward (t=136.0): 0.2007 | Reward (Avg): 0.6909 | CLIP: 26.8640 | Aesthetic: 5.9528 | PickScore: 21.9488 | HPSv2: 0.2815 | HPSv2.1: 0.2937 | ImageReward: 1.0236
+[baseline] Batch 420/500 | Samples: 420/500 | Reward (t=136.0): 0.9917 | Reward (Avg): 0.6889 | CLIP: 26.8175 | Aesthetic: 5.9515 | PickScore: 21.9370 | HPSv2: 0.2815 | HPSv2.1: 0.2935 | ImageReward: 1.0200
+[baseline] Batch 430/500 | Samples: 430/500 | Reward (t=136.0): 0.2085 | Reward (Avg): 0.6878 | CLIP: 26.8390 | Aesthetic: 5.9464 | PickScore: 21.9401 | HPSv2: 0.2815 | HPSv2.1: 0.2935 | ImageReward: 1.0237
+[baseline] Batch 440/500 | Samples: 440/500 | Reward (t=136.0): 0.7144 | Reward (Avg): 0.6867 | CLIP: 26.7663 | Aesthetic: 5.9488 | PickScore: 21.9390 | HPSv2: 0.2815 | HPSv2.1: 0.2935 | ImageReward: 1.0194
+[baseline] Batch 450/500 | Samples: 450/500 | Reward (t=136.0): 0.8086 | Reward (Avg): 0.6877 | CLIP: 26.7831 | Aesthetic: 5.9422 | PickScore: 21.9401 | HPSv2: 0.2812 | HPSv2.1: 0.2932 | ImageReward: 1.0113
+[baseline] Batch 460/500 | Samples: 460/500 | Reward (t=136.0): 0.6558 | Reward (Avg): 0.6879 | CLIP: 26.7559 | Aesthetic: 5.9414 | PickScore: 21.9357 | HPSv2: 0.2810 | HPSv2.1: 0.2927 | ImageReward: 1.0081
+[baseline] Batch 470/500 | Samples: 470/500 | Reward (t=136.0): 0.6001 | Reward (Avg): 0.6873 | CLIP: 26.6761 | Aesthetic: 5.9320 | PickScore: 21.9162 | HPSv2: 0.2808 | HPSv2.1: 0.2920 | ImageReward: 0.9957
+[baseline] Batch 480/500 | Samples: 480/500 | Reward (t=136.0): 0.7988 | Reward (Avg): 0.6846 | CLIP: 26.6003 | Aesthetic: 5.9285 | PickScore: 21.9059 | HPSv2: 0.2805 | HPSv2.1: 0.2920 | ImageReward: 0.9969
+[baseline] Batch 490/500 | Samples: 490/500 | Reward (t=136.0): 0.8433 | Reward (Avg): 0.6850 | CLIP: 26.6023 | Aesthetic: 5.9283 | PickScore: 21.8971 | HPSv2: 0.2805 | HPSv2.1: 0.2915 | ImageReward: 0.9980
+[baseline] Batch 500/500 | Samples: 500/500 | Reward (t=136.0): 0.9902 | Reward (Avg): 0.6855 | CLIP: 26.6096 | Aesthetic: 5.9306 | PickScore: 21.8957 | HPSv2: 0.2805 | HPSv2.1: 0.2920 | ImageReward: 1.0016
+✓ Baseline Avg Reward: 0.6855
+✓ Baseline Avg CLIP Score: 26.6096
+✓ Baseline Avg Aesthetic Score: 5.9306
+✓ Baseline Avg PickScore: 21.8957
+✓ Baseline Avg HPSv2 Score: 0.2805
+✓ Baseline Avg HPSv2.1 Score: 0.2920
+✓ Baseline Avg ImageReward: 1.0016
+======================================================================
+FINAL RESULTS
+======================================================================
+Baseline:
+  Avg Reward:       0.6855
+  Avg CLIP Score:   26.6096
+  Avg Aesthetic:    5.9306
+  Avg PickScore:    21.8957
+  Avg HPSv2:        0.2805
+  Avg HPSv2.1:      0.2920
+  Avg ImageReward:  1.0016
+✓ Results saved to: RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_1/evaluation_results.txt
+======================================================================

Reward_sana_idealized/RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_2/evaluation_results.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+mode: gradient_ascent
+metrics: ['clip', 'aesthetic', 'pickscore', 'hpsv2', 'hpsv21', 'imagereward']
+config: {'num_samples': 500, 'num_steps': 20, 'cfg_scale': 4.5, 'grad_range': [0, 700], 'grad_steps': 5, 'grad_step_size': 0.1}
+gradient_ascent: {'avg_reward': np.float64(0.910064338684082), 'clip_score': np.float64(26.665978475391864), 'aesthetic_score': np.float64(5.9369088306427), 'pickscore': np.float64(21.87682523727417), 'hpsv2_score': np.float16(0.28), 'hpsv21_score': np.float16(0.2903), 'imagereward_score': np.float64(0.9915356585062655), 'stats': {'num_applications': 10, 'total_reward_improvement': 0.00537109375, 'avg_reward_improvement': 0.000537109375, 'avg_grad_norm': 0.016338474582880735, 'max_grad_norm': 0.022640923038125038, 'detailed_stats': [{'timestep': 785, 'initial_reward': 0.986328125, 'final_reward': 0.9873046875, 'reward_improvement': 0.0009765625, 'grad_norms': [0.022640923038125038], 'reward_history': [0.986328125, 0.986328125], 'lr_history': [1.0], 'latent_change': 0.9999995827674866}, {'timestep': 749, 'initial_reward': 0.98779296875, 'final_reward': 0.98828125, 'reward_improvement': 0.00048828125, 'grad_norms': [0.021537061780691147], 'reward_history': [0.98779296875, 0.98779296875], 'lr_history': [1.0], 'latent_change': 0.9999995827674866}, {'timestep': 710, 'initial_reward': 0.98828125, 'final_reward': 0.98876953125, 'reward_improvement': 0.00048828125, 'grad_norms': [0.018791837617754936], 'reward_history': [0.98828125, 0.98828125], 'lr_history': [1.0], 'latent_change': 0.9999994039535522}, {'timestep': 666, 'initial_reward': 0.98876953125, 'final_reward': 0.990234375, 'reward_improvement': 0.00146484375, 'grad_norms': [0.020797280594706535], 'reward_history': [0.98876953125, 0.98876953125], 'lr_history': [1.0], 'latent_change': 0.9999995231628418}, {'timestep': 617, 'initial_reward': 0.98974609375, 'final_reward': 0.98974609375, 'reward_improvement': 0.0, 'grad_norms': [0.021422632038593292], 'reward_history': [0.98974609375, 0.98974609375], 'lr_history': [1.0], 'latent_change': 0.9999995231628418}, {'timestep': 562, 'initial_reward': 0.98974609375, 'final_reward': 0.990234375, 'reward_improvement': 0.00048828125, 'grad_norms': [0.014982366934418678], 'reward_history': [0.98974609375, 0.98974609375], 'lr_history': [1.0], 'latent_change': 0.9999992251396179}, {'timestep': 499, 'initial_reward': 0.990234375, 'final_reward': 0.99072265625, 'reward_improvement': 0.00048828125, 'grad_norms': [0.012734953314065933], 'reward_history': [0.990234375, 0.990234375], 'lr_history': [1.0], 'latent_change': 0.9999991655349731}, {'timestep': 428, 'initial_reward': 0.99072265625, 'final_reward': 0.9912109375, 'reward_improvement': 0.00048828125, 'grad_norms': [0.011904634535312653], 'reward_history': [0.99072265625, 0.99072265625], 'lr_history': [1.0], 'latent_change': 0.9999989867210388}, {'timestep': 345, 'initial_reward': 0.99169921875, 'final_reward': 0.99169921875, 'reward_improvement': 0.0, 'grad_norms': [0.009561818093061447], 'reward_history': [0.99169921875, 0.99169921875], 'lr_history': [1.0], 'latent_change': 0.9999989867210388}, {'timestep': 249, 'initial_reward': 0.99169921875, 'final_reward': 0.9921875, 'reward_improvement': 0.00048828125, 'grad_norms': [0.009011237882077694], 'reward_history': [0.99169921875, 0.99169921875], 'lr_history': [1.0], 'latent_change': 0.9999988675117493}]}}

Reward_sana_idealized/RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_2/log.log ADDED Viewed

	@@ -0,0 +1,258 @@

+======================================================================
+FID EVALUATION: BASELINE vs GRADIENT ASCENT
+======================================================================
+Logging to: RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_2/log.log
+Device: cuda:0
+Dataset: PICKAPIC
+Data directory: ./data
+Base model: Efficient-Large-Model/Sana_600M_512px_diffusers
+Model variant: sana_600m_512
+LRM model: /g/data/rr81/LPO/lrm/lrm_sana/logs/v8/reward_model/step_sana_sana_600m_512_variable-t_lr1e-5_step-8000_filter2_time951/checkpoint-gstep76000
+HF cache dir: /scratch/rr81/ma5430/.cache/huggingface/hub
+HF offline mode: True
+Inference steps: 20
+CFG scale: 4.5
+Batch size: 1
+Max samples: All
+Output directory: RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_2
+Save images: False
+Evaluation mode: gradient_ascent
+Metrics to evaluate: CLIP, AESTHETIC, PICKSCORE, HPSV2, HPSV21, IMAGEREWARD
+Gradient ascent config: one_step_rectification_config
+======================================================================
+1. LOADING VALIDATION DATA
+======================================================================
+Loading Pick-a-Pic validation prompts...
+Loading cached Pick-a-Pic split 'validation_unique' from 1 parquet shards
+cache=/scratch/rr81/ma5430/.cache/huggingface/hub/datasets--pickapic-anonymous--pickapic_v1
+Loaded 500 Pick-a-Pic validation samples
+======================================================================
+2. LOADING REWARD MODEL
+======================================================================
+Loading SANA base reward backbone from Efficient-Large-Model/Sana_600M_512px_diffusers...
+Loading SANA reward checkpoint from /g/data/rr81/LPO/lrm/lrm_sana/logs/v8/reward_model/step_sana_sana_600m_512_variable-t_lr1e-5_step-8000_filter2_time951/checkpoint-gstep76000/model.safetensors...
+✓ Loaded checkpoint keys: 1214
+✓ Missing keys: 0 | Unexpected keys: 0
+✓ SANA LRM Reward Model initialized successfully!
+✓ Reward model loaded
+======================================================================
+3. LOADING PIPELINE
+======================================================================
+✓ Loaded SANA base model: Efficient-Large-Model/Sana_600M_512px_diffusers
+✓ Reward model attached to SANA pipeline
+✓ Pipeline loaded
+GPU memory before scorer load: 93.09 GB free / 140.06 GB total
+Scorer device: cuda:0
+======================================================================
+3.5. LOADING CLIP AND AESTHETIC SCORERS
+======================================================================
+✓ CLIP scorer loaded
+✓ Aesthetic scorer loaded
+✓ PickScore scorer loaded
+✓ HPSv2 scorer loaded
+✓ HPSv2.1 scorer loaded
+load checkpoint from /scratch/rr81/ma5430/.cache/huggingface/hub/models--THUDM--ImageReward/snapshots/5736be03b2652728fb87788c9797b0570450ab72/ImageReward.pt
+checkpoint loaded
+✓ ImageReward scorer loaded
+======================================================================
+4. CONFIGURING GRADIENT ASCENT
+======================================================================
+Loading gradient ascent config: one_step_rectification_config
+Config loaded: {'grad_timestep_range': (200, 800), 'num_grad_steps': 1, 'grad_step_size': 1.0, 'grad_scale': 1.0, 'lr_scheduler_type': 'constant', 'use_momentum': False, 'use_nesterov': False, 'use_iso_projection': False}
+Gradient timestep range: (200, 800)
+Gradient steps: 1
+Gradient step size (initial LR): 1.0
+LR Scheduler: constant
+✓ Gradient ascent enabled for timesteps (200, 800)
+======================================================================
+6. EVALUATING GRADIENT ASCENT
+======================================================================
+Generating images with gradient_ascent mode...
+[gradient_ascent] Batch 10/500 | Samples: 10/500 | Reward (t=136.0): 0.9961 | Reward (Avg): 0.9693 | CLIP: 26.4394 | Aesthetic: 6.0884 | PickScore: 21.9325 | HPSv2: 0.2861 | HPSv2.1: 0.3071 | ImageReward: 1.2097
+[gradient_ascent] Batch 20/500 | Samples: 20/500 | Reward (t=136.0): 0.4675 | Reward (Avg): 0.9257 | CLIP: 26.3258 | Aesthetic: 6.0671 | PickScore: 22.1625 | HPSv2: 0.2861 | HPSv2.1: 0.3086 | ImageReward: 1.1082
+[gradient_ascent] Batch 30/500 | Samples: 30/500 | Reward (t=136.0): 0.9980 | Reward (Avg): 0.9233 | CLIP: 26.4155 | Aesthetic: 5.9805 | PickScore: 22.3315 | HPSv2: 0.2861 | HPSv2.1: 0.3057 | ImageReward: 1.0953
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+[gradient_ascent] Batch 40/500 | Samples: 40/500 | Reward (t=136.0): 0.9722 | Reward (Avg): 0.9284 | CLIP: 26.8101 | Aesthetic: 6.0611 | PickScore: 22.3091 | HPSv2: 0.2849 | HPSv2.1: 0.3037 | ImageReward: 0.9902
+[gradient_ascent] Batch 50/500 | Samples: 50/500 | Reward (t=136.0): 0.6968 | Reward (Avg): 0.9302 | CLIP: 26.6150 | Aesthetic: 5.9952 | PickScore: 22.1516 | HPSv2: 0.2832 | HPSv2.1: 0.3005 | ImageReward: 1.0679
+[gradient_ascent] Batch 60/500 | Samples: 60/500 | Reward (t=136.0): 0.9868 | Reward (Avg): 0.9303 | CLIP: 26.4753 | Aesthetic: 6.0003 | PickScore: 22.1196 | HPSv2: 0.2837 | HPSv2.1: 0.3000 | ImageReward: 1.0506
+[gradient_ascent] Batch 70/500 | Samples: 70/500 | Reward (t=136.0): 0.9966 | Reward (Avg): 0.9293 | CLIP: 26.7216 | Aesthetic: 5.9899 | PickScore: 22.1551 | HPSv2: 0.2842 | HPSv2.1: 0.3005 | ImageReward: 0.9863
+[gradient_ascent] Batch 80/500 | Samples: 80/500 | Reward (t=136.0): 0.9819 | Reward (Avg): 0.9147 | CLIP: 26.6154 | Aesthetic: 5.9622 | PickScore: 22.0316 | HPSv2: 0.2830 | HPSv2.1: 0.2959 | ImageReward: 0.9452
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+[gradient_ascent] Batch 90/500 | Samples: 90/500 | Reward (t=136.0): 0.9990 | Reward (Avg): 0.9137 | CLIP: 26.7968 | Aesthetic: 5.9589 | PickScore: 22.0089 | HPSv2: 0.2825 | HPSv2.1: 0.2959 | ImageReward: 0.9702
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+[gradient_ascent] Batch 100/500 | Samples: 100/500 | Reward (t=136.0): 0.7866 | Reward (Avg): 0.9190 | CLIP: 27.0602 | Aesthetic: 6.0023 | PickScore: 21.9608 | HPSv2: 0.2817 | HPSv2.1: 0.2942 | ImageReward: 0.9681
+[gradient_ascent] Batch 110/500 | Samples: 110/500 | Reward (t=136.0): 0.9692 | Reward (Avg): 0.9170 | CLIP: 27.2418 | Aesthetic: 5.9877 | PickScore: 21.9353 | HPSv2: 0.2820 | HPSv2.1: 0.2939 | ImageReward: 0.9614
+[gradient_ascent] Batch 120/500 | Samples: 120/500 | Reward (t=136.0): 0.8379 | Reward (Avg): 0.9146 | CLIP: 27.1626 | Aesthetic: 5.9762 | PickScore: 21.9197 | HPSv2: 0.2820 | HPSv2.1: 0.2942 | ImageReward: 0.9982
+[gradient_ascent] Batch 130/500 | Samples: 130/500 | Reward (t=136.0): 0.9858 | Reward (Avg): 0.9196 | CLIP: 27.2047 | Aesthetic: 5.9938 | PickScore: 21.9912 | HPSv2: 0.2825 | HPSv2.1: 0.2952 | ImageReward: 0.9948
+[gradient_ascent] Batch 140/500 | Samples: 140/500 | Reward (t=136.0): 0.9912 | Reward (Avg): 0.9157 | CLIP: 27.3311 | Aesthetic: 5.9660 | PickScore: 21.9761 | HPSv2: 0.2822 | HPSv2.1: 0.2944 | ImageReward: 0.9935
+[gradient_ascent] Batch 150/500 | Samples: 150/500 | Reward (t=136.0): 0.9932 | Reward (Avg): 0.9114 | CLIP: 27.1722 | Aesthetic: 5.9463 | PickScore: 21.9579 | HPSv2: 0.2817 | HPSv2.1: 0.2930 | ImageReward: 0.9898
+[gradient_ascent] Batch 160/500 | Samples: 160/500 | Reward (t=136.0): 0.9814 | Reward (Avg): 0.9088 | CLIP: 27.2961 | Aesthetic: 5.9422 | PickScore: 21.9990 | HPSv2: 0.2820 | HPSv2.1: 0.2932 | ImageReward: 1.0014
+[gradient_ascent] Batch 170/500 | Samples: 170/500 | Reward (t=136.0): 0.9531 | Reward (Avg): 0.9071 | CLIP: 27.0807 | Aesthetic: 5.9318 | PickScore: 21.9435 | HPSv2: 0.2815 | HPSv2.1: 0.2917 | ImageReward: 0.9945
+[gradient_ascent] Batch 180/500 | Samples: 180/500 | Reward (t=136.0): 0.8799 | Reward (Avg): 0.9030 | CLIP: 27.1614 | Aesthetic: 5.9334 | PickScore: 21.9554 | HPSv2: 0.2817 | HPSv2.1: 0.2927 | ImageReward: 1.0219
+[gradient_ascent] Batch 190/500 | Samples: 190/500 | Reward (t=136.0): 0.8799 | Reward (Avg): 0.9062 | CLIP: 27.1369 | Aesthetic: 5.9368 | PickScore: 21.9370 | HPSv2: 0.2812 | HPSv2.1: 0.2920 | ImageReward: 1.0060
+[gradient_ascent] Batch 200/500 | Samples: 200/500 | Reward (t=136.0): 0.9653 | Reward (Avg): 0.9037 | CLIP: 27.1264 | Aesthetic: 5.9329 | PickScore: 21.9405 | HPSv2: 0.2815 | HPSv2.1: 0.2925 | ImageReward: 1.0175
+[gradient_ascent] Batch 210/500 | Samples: 210/500 | Reward (t=136.0): 0.9897 | Reward (Avg): 0.9063 | CLIP: 27.1528 | Aesthetic: 5.9437 | PickScore: 21.9455 | HPSv2: 0.2815 | HPSv2.1: 0.2932 | ImageReward: 1.0271
+[gradient_ascent] Batch 220/500 | Samples: 220/500 | Reward (t=136.0): 0.9961 | Reward (Avg): 0.9069 | CLIP: 27.2116 | Aesthetic: 5.9480 | PickScore: 21.9343 | HPSv2: 0.2815 | HPSv2.1: 0.2932 | ImageReward: 1.0189
+[gradient_ascent] Batch 230/500 | Samples: 230/500 | Reward (t=136.0): 0.9409 | Reward (Avg): 0.9084 | CLIP: 27.2825 | Aesthetic: 5.9535 | PickScore: 21.9417 | HPSv2: 0.2815 | HPSv2.1: 0.2927 | ImageReward: 1.0007
+[gradient_ascent] Batch 240/500 | Samples: 240/500 | Reward (t=136.0): 0.8716 | Reward (Avg): 0.9101 | CLIP: 27.1896 | Aesthetic: 5.9654 | PickScore: 21.9351 | HPSv2: 0.2810 | HPSv2.1: 0.2920 | ImageReward: 0.9896
+[gradient_ascent] Batch 250/500 | Samples: 250/500 | Reward (t=136.0): 0.9966 | Reward (Avg): 0.9120 | CLIP: 27.2003 | Aesthetic: 5.9738 | PickScore: 21.9403 | HPSv2: 0.2812 | HPSv2.1: 0.2922 | ImageReward: 0.9901
+[gradient_ascent] Batch 260/500 | Samples: 260/500 | Reward (t=136.0): 0.8730 | Reward (Avg): 0.9100 | CLIP: 27.1515 | Aesthetic: 5.9809 | PickScore: 21.9393 | HPSv2: 0.2812 | HPSv2.1: 0.2922 | ImageReward: 1.0035
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+?? WARNING: Gradient exists but max value is 0.0
+[gradient_ascent] Batch 270/500 | Samples: 270/500 | Reward (t=136.0): 1.0000 | Reward (Avg): 0.9094 | CLIP: 27.1001 | Aesthetic: 5.9807 | PickScore: 21.9312 | HPSv2: 0.2812 | HPSv2.1: 0.2922 | ImageReward: 1.0090
+[gradient_ascent] Batch 280/500 | Samples: 280/500 | Reward (t=136.0): 0.9683 | Reward (Avg): 0.9115 | CLIP: 27.1456 | Aesthetic: 5.9814 | PickScore: 21.9610 | HPSv2: 0.2817 | HPSv2.1: 0.2932 | ImageReward: 1.0251
+[gradient_ascent] Batch 290/500 | Samples: 290/500 | Reward (t=136.0): 0.9956 | Reward (Avg): 0.9111 | CLIP: 27.0581 | Aesthetic: 5.9709 | PickScore: 21.9298 | HPSv2: 0.2812 | HPSv2.1: 0.2925 | ImageReward: 1.0123
+?? WARNING: Gradient exists but max value is 0.0
+[gradient_ascent] Batch 300/500 | Samples: 300/500 | Reward (t=136.0): 0.9995 | Reward (Avg): 0.9110 | CLIP: 27.0435 | Aesthetic: 5.9736 | PickScore: 21.9395 | HPSv2: 0.2817 | HPSv2.1: 0.2930 | ImageReward: 1.0170
+[gradient_ascent] Batch 310/500 | Samples: 310/500 | Reward (t=136.0): 0.9980 | Reward (Avg): 0.9117 | CLIP: 27.0912 | Aesthetic: 5.9763 | PickScore: 21.9508 | HPSv2: 0.2817 | HPSv2.1: 0.2932 | ImageReward: 1.0371
+[gradient_ascent] Batch 320/500 | Samples: 320/500 | Reward (t=136.0): 0.9775 | Reward (Avg): 0.9116 | CLIP: 27.1126 | Aesthetic: 5.9707 | PickScore: 21.9669 | HPSv2: 0.2817 | HPSv2.1: 0.2930 | ImageReward: 1.0342
+[gradient_ascent] Batch 330/500 | Samples: 330/500 | Reward (t=136.0): 0.9941 | Reward (Avg): 0.9121 | CLIP: 27.1159 | Aesthetic: 5.9762 | PickScore: 21.9632 | HPSv2: 0.2817 | HPSv2.1: 0.2932 | ImageReward: 1.0420
+[gradient_ascent] Batch 340/500 | Samples: 340/500 | Reward (t=136.0): 0.9785 | Reward (Avg): 0.9129 | CLIP: 27.1023 | Aesthetic: 5.9739 | PickScore: 21.9643 | HPSv2: 0.2817 | HPSv2.1: 0.2935 | ImageReward: 1.0503
+[gradient_ascent] Batch 350/500 | Samples: 350/500 | Reward (t=136.0): 0.5396 | Reward (Avg): 0.9110 | CLIP: 27.1031 | Aesthetic: 5.9720 | PickScore: 21.9747 | HPSv2: 0.2817 | HPSv2.1: 0.2935 | ImageReward: 1.0437
+[gradient_ascent] Batch 360/500 | Samples: 360/500 | Reward (t=136.0): 0.9805 | Reward (Avg): 0.9119 | CLIP: 27.1421 | Aesthetic: 5.9696 | PickScore: 21.9725 | HPSv2: 0.2817 | HPSv2.1: 0.2937 | ImageReward: 1.0372
+[gradient_ascent] Batch 370/500 | Samples: 370/500 | Reward (t=136.0): 0.9692 | Reward (Avg): 0.9117 | CLIP: 27.0698 | Aesthetic: 5.9642 | PickScore: 21.9583 | HPSv2: 0.2817 | HPSv2.1: 0.2935 | ImageReward: 1.0346
+[gradient_ascent] Batch 380/500 | Samples: 380/500 | Reward (t=136.0): 0.7930 | Reward (Avg): 0.9110 | CLIP: 27.1452 | Aesthetic: 5.9731 | PickScore: 21.9609 | HPSv2: 0.2817 | HPSv2.1: 0.2937 | ImageReward: 1.0401
+[gradient_ascent] Batch 390/500 | Samples: 390/500 | Reward (t=136.0): 0.9932 | Reward (Avg): 0.9114 | CLIP: 27.0228 | Aesthetic: 5.9694 | PickScore: 21.9430 | HPSv2: 0.2812 | HPSv2.1: 0.2930 | ImageReward: 1.0260
+[gradient_ascent] Batch 400/500 | Samples: 400/500 | Reward (t=136.0): 0.4897 | Reward (Avg): 0.9117 | CLIP: 26.9789 | Aesthetic: 5.9603 | PickScore: 21.9398 | HPSv2: 0.2812 | HPSv2.1: 0.2927 | ImageReward: 1.0275
+[gradient_ascent] Batch 410/500 | Samples: 410/500 | Reward (t=136.0): 0.8018 | Reward (Avg): 0.9123 | CLIP: 26.9305 | Aesthetic: 5.9569 | PickScore: 21.9279 | HPSv2: 0.2810 | HPSv2.1: 0.2922 | ImageReward: 1.0088
+[gradient_ascent] Batch 420/500 | Samples: 420/500 | Reward (t=136.0): 0.9966 | Reward (Avg): 0.9113 | CLIP: 26.8775 | Aesthetic: 5.9550 | PickScore: 21.9134 | HPSv2: 0.2808 | HPSv2.1: 0.2920 | ImageReward: 1.0062
+[gradient_ascent] Batch 430/500 | Samples: 430/500 | Reward (t=136.0): 0.7876 | Reward (Avg): 0.9102 | CLIP: 26.8827 | Aesthetic: 5.9510 | PickScore: 21.9192 | HPSv2: 0.2808 | HPSv2.1: 0.2920 | ImageReward: 1.0116
+[gradient_ascent] Batch 440/500 | Samples: 440/500 | Reward (t=136.0): 0.9561 | Reward (Avg): 0.9096 | CLIP: 26.8088 | Aesthetic: 5.9552 | PickScore: 21.9176 | HPSv2: 0.2808 | HPSv2.1: 0.2920 | ImageReward: 1.0110
+[gradient_ascent] Batch 450/500 | Samples: 450/500 | Reward (t=136.0): 0.9443 | Reward (Avg): 0.9086 | CLIP: 26.8014 | Aesthetic: 5.9476 | PickScore: 21.9168 | HPSv2: 0.2805 | HPSv2.1: 0.2915 | ImageReward: 1.0002
+[gradient_ascent] Batch 460/500 | Samples: 460/500 | Reward (t=136.0): 0.9888 | Reward (Avg): 0.9095 | CLIP: 26.7852 | Aesthetic: 5.9451 | PickScore: 21.9099 | HPSv2: 0.2805 | HPSv2.1: 0.2913 | ImageReward: 0.9956
+[gradient_ascent] Batch 470/500 | Samples: 470/500 | Reward (t=136.0): 0.9800 | Reward (Avg): 0.9107 | CLIP: 26.7099 | Aesthetic: 5.9361 | PickScore: 21.8909 | HPSv2: 0.2800 | HPSv2.1: 0.2905 | ImageReward: 0.9830
+[gradient_ascent] Batch 480/500 | Samples: 480/500 | Reward (t=136.0): 0.9692 | Reward (Avg): 0.9096 | CLIP: 26.6490 | Aesthetic: 5.9346 | PickScore: 21.8834 | HPSv2: 0.2800 | HPSv2.1: 0.2903 | ImageReward: 0.9868
+[gradient_ascent] Batch 490/500 | Samples: 490/500 | Reward (t=136.0): 0.9629 | Reward (Avg): 0.9101 | CLIP: 26.6586 | Aesthetic: 5.9349 | PickScore: 21.8766 | HPSv2: 0.2800 | HPSv2.1: 0.2900 | ImageReward: 0.9869
+[gradient_ascent] Batch 500/500 | Samples: 500/500 | Reward (t=136.0): 0.9927 | Reward (Avg): 0.9101 | CLIP: 26.6660 | Aesthetic: 5.9369 | PickScore: 21.8768 | HPSv2: 0.2800 | HPSv2.1: 0.2903 | ImageReward: 0.9915
+✓ Gradient Ascent Avg Reward: 0.9101
+✓ Gradient Ascent Avg CLIP Score: 26.6660
+✓ Gradient Ascent Avg Aesthetic Score: 5.9369
+✓ Gradient Ascent Avg PickScore: 21.8768
+✓ Gradient Ascent Avg HPSv2 Score: 0.2800
+✓ Gradient Ascent Avg HPSv2.1 Score: 0.2903
+✓ Gradient Ascent Avg ImageReward: 0.9915
+Gradient Ascent Statistics:
+  Applications: 10
+  Total reward improvement: +0.0054
+  Avg reward improvement: +0.0005
+✓ Saved LR curve plot to: RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_2/lr_curve.png
+   Total gradient steps: 10
+   LR range: 1.000000 → 1.000000
+✓ Saved Rewards curve plot to: RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_2/rewards_curve.png
+   Total gradient steps: 20
+   Reward range: 0.9971 → 0.9990
+   Total improvement: +0.0020
+======================================================================
+FINAL RESULTS
+======================================================================
+Gradient Ascent:
+  Avg Reward:       0.9101
+  Avg CLIP Score:   26.6660
+  Avg Aesthetic:    5.9369
+  Avg PickScore:    21.8768
+  Avg HPSv2:        0.2800
+  Avg HPSv2.1:      0.2903
+  Avg ImageReward:  0.9915
+✓ Results saved to: RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_2/evaluation_results.txt
+======================================================================

Reward_sana_idealized/RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_2/lr_curve.png ADDED Viewed

Reward_sana_idealized/RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_3/evaluation_results.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+mode: gradient_ascent
+metrics: ['clip', 'aesthetic', 'pickscore', 'hpsv2', 'hpsv21', 'imagereward']
+config: {'num_samples': 500, 'num_steps': 20, 'cfg_scale': 4.5, 'grad_range': [0, 700], 'grad_steps': 5, 'grad_step_size': 0.1}
+gradient_ascent: {'avg_reward': np.float64(0.92294921875), 'clip_score': np.float64(26.616577001571656), 'aesthetic_score': np.float64(5.955972215652466), 'pickscore': np.float64(21.881759201049803), 'hpsv2_score': np.float16(0.2798), 'hpsv21_score': np.float16(0.2905), 'imagereward_score': np.float64(0.9889077876545489), 'stats': {'num_applications': 11, 'total_reward_improvement': 0.078125, 'avg_reward_improvement': 0.007102272727272727, 'avg_grad_norm': 0.21211126311258835, 'max_grad_norm': 0.277950644493103, 'detailed_stats': [{'timestep': 785, 'initial_reward': 0.9140625, 'final_reward': 0.93359375, 'reward_improvement': 0.01953125, 'grad_norms': [0.26926133036613464], 'reward_history': [0.9140625, 0.9140625], 'lr_history': [1.0], 'latent_change': 1.0}, {'timestep': 749, 'initial_reward': 0.9375, 'final_reward': 0.94921875, 'reward_improvement': 0.01171875, 'grad_norms': [0.277950644493103], 'reward_history': [0.9375, 0.9375], 'lr_history': [1.0], 'latent_change': 0.9999999403953552}, {'timestep': 710, 'initial_reward': 0.94921875, 'final_reward': 0.9609375, 'reward_improvement': 0.01171875, 'grad_norms': [0.27435681223869324], 'reward_history': [0.94921875, 0.94921875], 'lr_history': [1.0], 'latent_change': 1.0000001192092896}, {'timestep': 666, 'initial_reward': 0.9609375, 'final_reward': 0.96875, 'reward_improvement': 0.0078125, 'grad_norms': [0.24942916631698608], 'reward_history': [0.9609375, 0.9609375], 'lr_history': [1.0], 'latent_change': 0.9999998807907104}, {'timestep': 617, 'initial_reward': 0.96875, 'final_reward': 0.97265625, 'reward_improvement': 0.00390625, 'grad_norms': [0.21902038156986237], 'reward_history': [0.96875, 0.96875], 'lr_history': [1.0], 'latent_change': 0.9999999403953552}, {'timestep': 562, 'initial_reward': 0.96875, 'final_reward': 0.9765625, 'reward_improvement': 0.0078125, 'grad_norms': [0.19418643414974213], 'reward_history': [0.96875, 0.96875], 'lr_history': [1.0], 'latent_change': 1.0}, {'timestep': 499, 'initial_reward': 0.97265625, 'final_reward': 0.9765625, 'reward_improvement': 0.00390625, 'grad_norms': [0.18358778953552246], 'reward_history': [0.97265625, 0.97265625], 'lr_history': [1.0], 'latent_change': 0.9999998807907104}, {'timestep': 428, 'initial_reward': 0.9765625, 'final_reward': 0.98046875, 'reward_improvement': 0.00390625, 'grad_norms': [0.1757386326789856], 'reward_history': [0.9765625, 0.9765625], 'lr_history': [1.0], 'latent_change': 0.9999999403953552}, {'timestep': 345, 'initial_reward': 0.9765625, 'final_reward': 0.98046875, 'reward_improvement': 0.00390625, 'grad_norms': [0.16876810789108276], 'reward_history': [0.9765625, 0.9765625], 'lr_history': [1.0], 'latent_change': 0.9999998807907104}, {'timestep': 249, 'initial_reward': 0.98046875, 'final_reward': 0.98046875, 'reward_improvement': 0.0, 'grad_norms': [0.16214041411876678], 'reward_history': [0.98046875, 0.98046875], 'lr_history': [1.0], 'latent_change': 0.9999998807907104}, {'timestep': 136, 'initial_reward': 0.9765625, 'final_reward': 0.98046875, 'reward_improvement': 0.00390625, 'grad_norms': [0.1587841808795929], 'reward_history': [0.9765625, 0.9765625], 'lr_history': [1.0], 'latent_change': 0.9999998807907104}]}}

Reward_sana_idealized/RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_3/log.log ADDED Viewed

	@@ -0,0 +1,218 @@

+======================================================================
+FID EVALUATION: BASELINE vs GRADIENT ASCENT
+======================================================================
+Logging to: RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_3/log.log
+Device: cuda:0
+Dataset: PICKAPIC
+Data directory: ./data
+Base model: Efficient-Large-Model/Sana_600M_512px_diffusers
+Model variant: sana_600m_512
+LRM model: /g/data/rr81/LPO/lrm/lrm_sana/logs/v8/reward_model/step_sana_sana_600m_512_variable-t_lr1e-5_step-8000_filter2_time951/checkpoint-gstep33000
+HF cache dir: /scratch/rr81/ma5430/.cache/huggingface/hub
+HF offline mode: True
+Inference steps: 20
+CFG scale: 4.5
+Batch size: 1
+Max samples: All
+Generation dtype: bf16
+Output directory: RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_3
+Save images: False
+Evaluation mode: gradient_ascent
+Metrics to evaluate: CLIP, AESTHETIC, PICKSCORE, HPSV2, HPSV21, IMAGEREWARD
+Gradient ascent config: one_step_rectification_config
+======================================================================
+1. LOADING VALIDATION DATA
+======================================================================
+Loading Pick-a-Pic validation prompts...
+Loading cached Pick-a-Pic split 'validation_unique' from 1 parquet shards
+cache=/scratch/rr81/ma5430/.cache/huggingface/hub/datasets--pickapic-anonymous--pickapic_v1
+Loaded 500 Pick-a-Pic validation samples
+======================================================================
+2. LOADING REWARD MODEL
+======================================================================
+Loading SANA base reward backbone from Efficient-Large-Model/Sana_600M_512px_diffusers...
+Loading SANA reward checkpoint from /g/data/rr81/LPO/lrm/lrm_sana/logs/v8/reward_model/step_sana_sana_600m_512_variable-t_lr1e-5_step-8000_filter2_time951/checkpoint-gstep33000/model.safetensors...
+✓ Loaded checkpoint keys: 1214
+✓ Missing keys: 0 | Unexpected keys: 0
+✓ SANA LRM Reward Model initialized successfully!
+✓ Reward model loaded
+======================================================================
+3. LOADING PIPELINE
+======================================================================
+✓ Loaded SANA base model: Efficient-Large-Model/Sana_600M_512px_diffusers
+✓ Reward model attached to SANA pipeline
+✓ Pipeline loaded
+GPU memory before scorer load: 125.86 GB free / 140.06 GB total
+Scorer device: cuda:0
+======================================================================
+3.5. LOADING CLIP AND AESTHETIC SCORERS
+======================================================================
+✓ CLIP scorer loaded
+✓ Aesthetic scorer loaded
+✓ PickScore scorer loaded
+✓ HPSv2 scorer loaded
+✓ HPSv2.1 scorer loaded
+load checkpoint from /scratch/rr81/ma5430/.cache/huggingface/hub/models--THUDM--ImageReward/snapshots/5736be03b2652728fb87788c9797b0570450ab72/ImageReward.pt
+checkpoint loaded
+✓ ImageReward scorer loaded
+======================================================================
+4. CONFIGURING GRADIENT ASCENT
+======================================================================
+Loading gradient ascent config: one_step_rectification_config
+Config loaded: {'grad_timestep_range': (100, 800), 'num_grad_steps': 1, 'grad_step_size': 1.0, 'grad_scale': 1.0, 'lr_scheduler_type': 'constant', 'use_momentum': False, 'use_nesterov': False, 'use_iso_projection': False}
+Gradient timestep range: (100, 800)
+Gradient steps: 1
+Gradient step size (initial LR): 1.0
+LR Scheduler: constant
+✓ Gradient ascent enabled for timesteps (100, 800)
+======================================================================
+6. EVALUATING GRADIENT ASCENT
+======================================================================
+Generating images with gradient_ascent mode...
+[gradient_ascent] Batch 10/500 | Samples: 10/500 | Reward (t=136.0): 1.0000 | Reward (Avg): 0.9625 | CLIP: 26.4663 | Aesthetic: 6.1413 | PickScore: 22.1120 | HPSv2: 0.2876 | HPSv2.1: 0.3157 | ImageReward: 1.2690
+[gradient_ascent] Batch 20/500 | Samples: 20/500 | Reward (t=136.0): 0.7266 | Reward (Avg): 0.9354 | CLIP: 26.3375 | Aesthetic: 6.1225 | PickScore: 22.1887 | HPSv2: 0.2869 | HPSv2.1: 0.3118 | ImageReward: 1.0984
+[gradient_ascent] Batch 30/500 | Samples: 30/500 | Reward (t=136.0): 1.0000 | Reward (Avg): 0.9434 | CLIP: 26.4766 | Aesthetic: 6.0533 | PickScore: 22.3414 | HPSv2: 0.2864 | HPSv2.1: 0.3079 | ImageReward: 1.0883
+[gradient_ascent] Batch 40/500 | Samples: 40/500 | Reward (t=136.0): 0.9609 | Reward (Avg): 0.9415 | CLIP: 26.6829 | Aesthetic: 6.0924 | PickScore: 22.2873 | HPSv2: 0.2844 | HPSv2.1: 0.3047 | ImageReward: 0.9498
+[gradient_ascent] Batch 50/500 | Samples: 50/500 | Reward (t=136.0): 0.8711 | Reward (Avg): 0.9413 | CLIP: 26.3142 | Aesthetic: 6.0252 | PickScore: 22.1119 | HPSv2: 0.2830 | HPSv2.1: 0.3010 | ImageReward: 1.0441
+[gradient_ascent] Batch 60/500 | Samples: 60/500 | Reward (t=136.0): 0.9961 | Reward (Avg): 0.9441 | CLIP: 26.1909 | Aesthetic: 6.0327 | PickScore: 22.1030 | HPSv2: 0.2837 | HPSv2.1: 0.3013 | ImageReward: 1.0347
+[gradient_ascent] Batch 70/500 | Samples: 70/500 | Reward (t=136.0): 1.0000 | Reward (Avg): 0.9421 | CLIP: 26.4568 | Aesthetic: 6.0299 | PickScore: 22.1226 | HPSv2: 0.2839 | HPSv2.1: 0.3013 | ImageReward: 1.0243
+[gradient_ascent] Batch 80/500 | Samples: 80/500 | Reward (t=136.0): 0.9922 | Reward (Avg): 0.9356 | CLIP: 26.2856 | Aesthetic: 6.0066 | PickScore: 22.0058 | HPSv2: 0.2830 | HPSv2.1: 0.2964 | ImageReward: 0.9735
+[gradient_ascent] Batch 90/500 | Samples: 90/500 | Reward (t=136.0): 0.9961 | Reward (Avg): 0.9348 | CLIP: 26.5472 | Aesthetic: 5.9998 | PickScore: 21.9813 | HPSv2: 0.2825 | HPSv2.1: 0.2964 | ImageReward: 0.9939
+[gradient_ascent] Batch 100/500 | Samples: 100/500 | Reward (t=136.0): 0.9844 | Reward (Avg): 0.9393 | CLIP: 26.8104 | Aesthetic: 6.0321 | PickScore: 21.9362 | HPSv2: 0.2817 | HPSv2.1: 0.2947 | ImageReward: 0.9892
+[gradient_ascent] Batch 110/500 | Samples: 110/500 | Reward (t=136.0): 0.9844 | Reward (Avg): 0.9385 | CLIP: 26.9949 | Aesthetic: 6.0235 | PickScore: 21.9147 | HPSv2: 0.2817 | HPSv2.1: 0.2942 | ImageReward: 0.9906
+[gradient_ascent] Batch 120/500 | Samples: 120/500 | Reward (t=136.0): 0.7188 | Reward (Avg): 0.9285 | CLIP: 26.8847 | Aesthetic: 6.0040 | PickScore: 21.8963 | HPSv2: 0.2815 | HPSv2.1: 0.2942 | ImageReward: 1.0242
+[gradient_ascent] Batch 130/500 | Samples: 130/500 | Reward (t=136.0): 0.9883 | Reward (Avg): 0.9322 | CLIP: 26.9108 | Aesthetic: 6.0127 | PickScore: 21.9706 | HPSv2: 0.2822 | HPSv2.1: 0.2949 | ImageReward: 1.0272
+[gradient_ascent] Batch 140/500 | Samples: 140/500 | Reward (t=136.0): 0.9727 | Reward (Avg): 0.9334 | CLIP: 27.1204 | Aesthetic: 5.9798 | PickScore: 21.9565 | HPSv2: 0.2820 | HPSv2.1: 0.2939 | ImageReward: 1.0218
+[gradient_ascent] Batch 150/500 | Samples: 150/500 | Reward (t=136.0): 0.9961 | Reward (Avg): 0.9266 | CLIP: 27.0076 | Aesthetic: 5.9507 | PickScore: 21.9421 | HPSv2: 0.2815 | HPSv2.1: 0.2927 | ImageReward: 1.0071
+[gradient_ascent] Batch 160/500 | Samples: 160/500 | Reward (t=136.0): 0.9375 | Reward (Avg): 0.9261 | CLIP: 27.1794 | Aesthetic: 5.9462 | PickScore: 21.9868 | HPSv2: 0.2815 | HPSv2.1: 0.2927 | ImageReward: 1.0162
+[gradient_ascent] Batch 170/500 | Samples: 170/500 | Reward (t=136.0): 0.8789 | Reward (Avg): 0.9254 | CLIP: 27.0033 | Aesthetic: 5.9378 | PickScore: 21.9386 | HPSv2: 0.2810 | HPSv2.1: 0.2915 | ImageReward: 1.0123
+[gradient_ascent] Batch 180/500 | Samples: 180/500 | Reward (t=136.0): 0.9570 | Reward (Avg): 0.9212 | CLIP: 27.1150 | Aesthetic: 5.9410 | PickScore: 21.9510 | HPSv2: 0.2815 | HPSv2.1: 0.2922 | ImageReward: 1.0373
+[gradient_ascent] Batch 190/500 | Samples: 190/500 | Reward (t=136.0): 0.7188 | Reward (Avg): 0.9218 | CLIP: 27.0857 | Aesthetic: 5.9454 | PickScore: 21.9378 | HPSv2: 0.2810 | HPSv2.1: 0.2915 | ImageReward: 1.0217
+[gradient_ascent] Batch 200/500 | Samples: 200/500 | Reward (t=136.0): 0.9609 | Reward (Avg): 0.9220 | CLIP: 27.0541 | Aesthetic: 5.9446 | PickScore: 21.9444 | HPSv2: 0.2812 | HPSv2.1: 0.2922 | ImageReward: 1.0335
+[gradient_ascent] Batch 210/500 | Samples: 210/500 | Reward (t=136.0): 0.9922 | Reward (Avg): 0.9241 | CLIP: 27.0499 | Aesthetic: 5.9568 | PickScore: 21.9480 | HPSv2: 0.2812 | HPSv2.1: 0.2930 | ImageReward: 1.0426
+[gradient_ascent] Batch 220/500 | Samples: 220/500 | Reward (t=136.0): 0.9922 | Reward (Avg): 0.9244 | CLIP: 27.1198 | Aesthetic: 5.9586 | PickScore: 21.9354 | HPSv2: 0.2812 | HPSv2.1: 0.2932 | ImageReward: 1.0333
+[gradient_ascent] Batch 230/500 | Samples: 230/500 | Reward (t=136.0): 0.8281 | Reward (Avg): 0.9250 | CLIP: 27.1823 | Aesthetic: 5.9633 | PickScore: 21.9462 | HPSv2: 0.2812 | HPSv2.1: 0.2927 | ImageReward: 1.0163
+[gradient_ascent] Batch 240/500 | Samples: 240/500 | Reward (t=136.0): 0.9453 | Reward (Avg): 0.9262 | CLIP: 27.0908 | Aesthetic: 5.9750 | PickScore: 21.9335 | HPSv2: 0.2808 | HPSv2.1: 0.2920 | ImageReward: 1.0059
+[gradient_ascent] Batch 250/500 | Samples: 250/500 | Reward (t=136.0): 0.9961 | Reward (Avg): 0.9277 | CLIP: 27.0939 | Aesthetic: 5.9814 | PickScore: 21.9393 | HPSv2: 0.2810 | HPSv2.1: 0.2922 | ImageReward: 1.0046
+[gradient_ascent] Batch 260/500 | Samples: 260/500 | Reward (t=136.0): 0.9648 | Reward (Avg): 0.9273 | CLIP: 27.0651 | Aesthetic: 5.9886 | PickScore: 21.9368 | HPSv2: 0.2810 | HPSv2.1: 0.2922 | ImageReward: 1.0170
+[gradient_ascent] Batch 270/500 | Samples: 270/500 | Reward (t=136.0): 1.0000 | Reward (Avg): 0.9268 | CLIP: 26.9957 | Aesthetic: 5.9890 | PickScore: 21.9255 | HPSv2: 0.2810 | HPSv2.1: 0.2922 | ImageReward: 1.0180
+[gradient_ascent] Batch 280/500 | Samples: 280/500 | Reward (t=136.0): 0.9688 | Reward (Avg): 0.9280 | CLIP: 27.0482 | Aesthetic: 5.9911 | PickScore: 21.9605 | HPSv2: 0.2812 | HPSv2.1: 0.2930 | ImageReward: 1.0319
+[gradient_ascent] Batch 290/500 | Samples: 290/500 | Reward (t=136.0): 0.9688 | Reward (Avg): 0.9285 | CLIP: 26.9579 | Aesthetic: 5.9804 | PickScore: 21.9239 | HPSv2: 0.2810 | HPSv2.1: 0.2922 | ImageReward: 1.0166
+[gradient_ascent] Batch 300/500 | Samples: 300/500 | Reward (t=136.0): 1.0000 | Reward (Avg): 0.9295 | CLIP: 26.9421 | Aesthetic: 5.9830 | PickScore: 21.9336 | HPSv2: 0.2812 | HPSv2.1: 0.2927 | ImageReward: 1.0217
+[gradient_ascent] Batch 310/500 | Samples: 310/500 | Reward (t=136.0): 0.9961 | Reward (Avg): 0.9297 | CLIP: 27.0328 | Aesthetic: 5.9872 | PickScore: 21.9477 | HPSv2: 0.2815 | HPSv2.1: 0.2932 | ImageReward: 1.0419
+[gradient_ascent] Batch 320/500 | Samples: 320/500 | Reward (t=136.0): 0.9492 | Reward (Avg): 0.9282 | CLIP: 27.0486 | Aesthetic: 5.9830 | PickScore: 21.9641 | HPSv2: 0.2812 | HPSv2.1: 0.2930 | ImageReward: 1.0378
+[gradient_ascent] Batch 330/500 | Samples: 330/500 | Reward (t=136.0): 0.9961 | Reward (Avg): 0.9291 | CLIP: 27.0375 | Aesthetic: 5.9877 | PickScore: 21.9561 | HPSv2: 0.2812 | HPSv2.1: 0.2930 | ImageReward: 1.0415
+[gradient_ascent] Batch 340/500 | Samples: 340/500 | Reward (t=136.0): 0.9688 | Reward (Avg): 0.9295 | CLIP: 27.0379 | Aesthetic: 5.9848 | PickScore: 21.9602 | HPSv2: 0.2815 | HPSv2.1: 0.2935 | ImageReward: 1.0553
+[gradient_ascent] Batch 350/500 | Samples: 350/500 | Reward (t=136.0): 0.7539 | Reward (Avg): 0.9292 | CLIP: 27.0467 | Aesthetic: 5.9841 | PickScore: 21.9707 | HPSv2: 0.2812 | HPSv2.1: 0.2935 | ImageReward: 1.0468
+[gradient_ascent] Batch 360/500 | Samples: 360/500 | Reward (t=136.0): 0.9609 | Reward (Avg): 0.9277 | CLIP: 27.0976 | Aesthetic: 5.9827 | PickScore: 21.9736 | HPSv2: 0.2815 | HPSv2.1: 0.2939 | ImageReward: 1.0415
+[gradient_ascent] Batch 370/500 | Samples: 370/500 | Reward (t=136.0): 0.9453 | Reward (Avg): 0.9267 | CLIP: 27.0320 | Aesthetic: 5.9800 | PickScore: 21.9612 | HPSv2: 0.2815 | HPSv2.1: 0.2935 | ImageReward: 1.0372
+[gradient_ascent] Batch 380/500 | Samples: 380/500 | Reward (t=136.0): 0.7891 | Reward (Avg): 0.9272 | CLIP: 27.1124 | Aesthetic: 5.9873 | PickScore: 21.9642 | HPSv2: 0.2815 | HPSv2.1: 0.2939 | ImageReward: 1.0436
+[gradient_ascent] Batch 390/500 | Samples: 390/500 | Reward (t=136.0): 0.9922 | Reward (Avg): 0.9274 | CLIP: 26.9883 | Aesthetic: 5.9833 | PickScore: 21.9461 | HPSv2: 0.2810 | HPSv2.1: 0.2930 | ImageReward: 1.0313
+[gradient_ascent] Batch 400/500 | Samples: 400/500 | Reward (t=136.0): 0.8711 | Reward (Avg): 0.9275 | CLIP: 26.9427 | Aesthetic: 5.9776 | PickScore: 21.9448 | HPSv2: 0.2810 | HPSv2.1: 0.2930 | ImageReward: 1.0291
+[gradient_ascent] Batch 410/500 | Samples: 410/500 | Reward (t=136.0): 0.3535 | Reward (Avg): 0.9265 | CLIP: 26.8892 | Aesthetic: 5.9751 | PickScore: 21.9345 | HPSv2: 0.2808 | HPSv2.1: 0.2925 | ImageReward: 1.0099
+[gradient_ascent] Batch 420/500 | Samples: 420/500 | Reward (t=136.0): 0.9844 | Reward (Avg): 0.9249 | CLIP: 26.8305 | Aesthetic: 5.9748 | PickScore: 21.9225 | HPSv2: 0.2805 | HPSv2.1: 0.2922 | ImageReward: 1.0087
+[gradient_ascent] Batch 430/500 | Samples: 430/500 | Reward (t=136.0): 0.9609 | Reward (Avg): 0.9245 | CLIP: 26.8481 | Aesthetic: 5.9711 | PickScore: 21.9269 | HPSv2: 0.2805 | HPSv2.1: 0.2922 | ImageReward: 1.0131
+[gradient_ascent] Batch 440/500 | Samples: 440/500 | Reward (t=136.0): 0.8984 | Reward (Avg): 0.9250 | CLIP: 26.7605 | Aesthetic: 5.9752 | PickScore: 21.9258 | HPSv2: 0.2805 | HPSv2.1: 0.2922 | ImageReward: 1.0092
+[gradient_ascent] Batch 450/500 | Samples: 450/500 | Reward (t=136.0): 0.9531 | Reward (Avg): 0.9243 | CLIP: 26.7746 | Aesthetic: 5.9678 | PickScore: 21.9262 | HPSv2: 0.2803 | HPSv2.1: 0.2917 | ImageReward: 1.0003
+[gradient_ascent] Batch 460/500 | Samples: 460/500 | Reward (t=136.0): 0.9805 | Reward (Avg): 0.9246 | CLIP: 26.7613 | Aesthetic: 5.9665 | PickScore: 21.9229 | HPSv2: 0.2803 | HPSv2.1: 0.2915 | ImageReward: 0.9966
+[gradient_ascent] Batch 470/500 | Samples: 470/500 | Reward (t=136.0): 0.9766 | Reward (Avg): 0.9251 | CLIP: 26.6828 | Aesthetic: 5.9578 | PickScore: 21.9037 | HPSv2: 0.2800 | HPSv2.1: 0.2908 | ImageReward: 0.9823
+[gradient_ascent] Batch 480/500 | Samples: 480/500 | Reward (t=136.0): 0.9688 | Reward (Avg): 0.9234 | CLIP: 26.6169 | Aesthetic: 5.9549 | PickScore: 21.8937 | HPSv2: 0.2798 | HPSv2.1: 0.2905 | ImageReward: 0.9846
+[gradient_ascent] Batch 490/500 | Samples: 490/500 | Reward (t=136.0): 0.9688 | Reward (Avg): 0.9227 | CLIP: 26.6094 | Aesthetic: 5.9548 | PickScore: 21.8840 | HPSv2: 0.2798 | HPSv2.1: 0.2903 | ImageReward: 0.9847
+[gradient_ascent] Batch 500/500 | Samples: 500/500 | Reward (t=136.0): 0.9805 | Reward (Avg): 0.9229 | CLIP: 26.6166 | Aesthetic: 5.9560 | PickScore: 21.8818 | HPSv2: 0.2798 | HPSv2.1: 0.2905 | ImageReward: 0.9889
+✓ Gradient Ascent Avg Reward: 0.9229
+✓ Gradient Ascent Avg CLIP Score: 26.6166
+✓ Gradient Ascent Avg Aesthetic Score: 5.9560
+✓ Gradient Ascent Avg PickScore: 21.8818
+✓ Gradient Ascent Avg HPSv2 Score: 0.2798
+✓ Gradient Ascent Avg HPSv2.1 Score: 0.2905
+✓ Gradient Ascent Avg ImageReward: 0.9889
+Gradient Ascent Statistics:
+  Applications: 11
+  Total reward improvement: +0.0781
+  Avg reward improvement: +0.0071
+✓ Saved LR curve plot to: RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_3/lr_curve.png
+   Total gradient steps: 11
+   LR range: 1.000000 → 1.000000
+✓ Saved Rewards curve plot to: RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_3/rewards_curve.png
+   Total gradient steps: 22
+   Reward range: 0.9922 → 1.0000
+   Total improvement: +0.0078
+======================================================================
+FINAL RESULTS
+======================================================================
+Gradient Ascent:
+  Avg Reward:       0.9229
+  Avg CLIP Score:   26.6166
+  Avg Aesthetic:    5.9560
+  Avg PickScore:    21.8818
+  Avg HPSv2:        0.2798
+  Avg HPSv2.1:      0.2905
+  Avg ImageReward:  0.9889
+✓ Results saved to: RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_3/evaluation_results.txt
+======================================================================

Reward_sana_idealized/RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_3/lr_curve.png ADDED Viewed

Reward_sana_idealized/RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_4/evaluation_results.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+mode: gradient_ascent
+metrics: ['clip', 'aesthetic', 'pickscore', 'hpsv2', 'hpsv21', 'imagereward']
+config: {'num_samples': 500, 'num_steps': 20, 'cfg_scale': 4.5, 'grad_range': [0, 700], 'grad_steps': 5, 'grad_step_size': 0.1}
+gradient_ascent: {'avg_reward': np.float64(0.86123828125), 'clip_score': np.float64(26.688936717987062), 'aesthetic_score': np.float64(5.964459408760071), 'pickscore': np.float64(21.888245372772218), 'hpsv2_score': np.float16(0.2798), 'hpsv21_score': np.float16(0.2896), 'imagereward_score': np.float64(0.9574673203025014), 'stats': {'num_applications': 11, 'total_reward_improvement': 0.3046875, 'avg_reward_improvement': 0.027698863636363636, 'avg_grad_norm': 0.20624772933396426, 'max_grad_norm': 0.24269965291023254, 'detailed_stats': [{'timestep': 785, 'initial_reward': 0.65234375, 'final_reward': 0.703125, 'reward_improvement': 0.05078125, 'grad_norms': [0.237853541970253], 'reward_history': [0.65234375, 0.65234375], 'lr_history': [1.0], 'latent_change': 0.9999999403953552}, {'timestep': 749, 'initial_reward': 0.71484375, 'final_reward': 0.76171875, 'reward_improvement': 0.046875, 'grad_norms': [0.24269965291023254], 'reward_history': [0.71484375, 0.71484375], 'lr_history': [1.0], 'latent_change': 0.9999999403953552}, {'timestep': 710, 'initial_reward': 0.76171875, 'final_reward': 0.80078125, 'reward_improvement': 0.0390625, 'grad_norms': [0.23808617889881134], 'reward_history': [0.76171875, 0.76171875], 'lr_history': [1.0], 'latent_change': 1.0}, {'timestep': 666, 'initial_reward': 0.80078125, 'final_reward': 0.83203125, 'reward_improvement': 0.03125, 'grad_norms': [0.2161739617586136], 'reward_history': [0.80078125, 0.80078125], 'lr_history': [1.0], 'latent_change': 0.9999998807907104}, {'timestep': 617, 'initial_reward': 0.828125, 'final_reward': 0.85546875, 'reward_improvement': 0.02734375, 'grad_norms': [0.20163142681121826], 'reward_history': [0.828125, 0.828125], 'lr_history': [1.0], 'latent_change': 1.0}, {'timestep': 562, 'initial_reward': 0.8515625, 'final_reward': 0.87109375, 'reward_improvement': 0.01953125, 'grad_norms': [0.1897481232881546], 'reward_history': [0.8515625, 0.8515625], 'lr_history': [1.0], 'latent_change': 0.9999999403953552}, {'timestep': 499, 'initial_reward': 0.8671875, 'final_reward': 0.88671875, 'reward_improvement': 0.01953125, 'grad_norms': [0.181509330868721], 'reward_history': [0.8671875, 0.8671875], 'lr_history': [1.0], 'latent_change': 0.9999998807907104}, {'timestep': 428, 'initial_reward': 0.875, 'final_reward': 0.89453125, 'reward_improvement': 0.01953125, 'grad_norms': [0.18168200552463531], 'reward_history': [0.875, 0.875], 'lr_history': [1.0], 'latent_change': 0.9999998807907104}, {'timestep': 345, 'initial_reward': 0.88671875, 'final_reward': 0.90234375, 'reward_improvement': 0.015625, 'grad_norms': [0.185561865568161], 'reward_history': [0.88671875, 0.88671875], 'lr_history': [1.0], 'latent_change': 0.9999999403953552}, {'timestep': 249, 'initial_reward': 0.890625, 'final_reward': 0.90625, 'reward_improvement': 0.015625, 'grad_norms': [0.19334888458251953], 'reward_history': [0.890625, 0.890625], 'lr_history': [1.0], 'latent_change': 0.9999998807907104}, {'timestep': 136, 'initial_reward': 0.890625, 'final_reward': 0.91015625, 'reward_improvement': 0.01953125, 'grad_norms': [0.20043005049228668], 'reward_history': [0.890625, 0.890625], 'lr_history': [1.0], 'latent_change': 0.9999999403953552}]}}

Reward_sana_idealized/RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_4/log.log ADDED Viewed

	@@ -0,0 +1,218 @@

+======================================================================
+FID EVALUATION: BASELINE vs GRADIENT ASCENT
+======================================================================
+Logging to: RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_4/log.log
+Device: cuda:0
+Dataset: PICKAPIC
+Data directory: ./data
+Base model: Efficient-Large-Model/Sana_600M_512px_diffusers
+Model variant: sana_600m_512
+LRM model: /g/data/rr81/LPO/lrm/lrm_sana/logs/v7/reward_model/step_sana_sana_600m_512_variable-t_lr1e-5_step-8000_filter2_time951/checkpoint-gstep32000
+HF cache dir: /scratch/rr81/ma5430/.cache/huggingface/hub
+HF offline mode: True
+Inference steps: 20
+CFG scale: 4.5
+Batch size: 1
+Max samples: All
+Generation dtype: bf16
+Output directory: RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_4
+Save images: False
+Evaluation mode: gradient_ascent
+Metrics to evaluate: CLIP, AESTHETIC, PICKSCORE, HPSV2, HPSV21, IMAGEREWARD
+Gradient ascent config: one_step_rectification_config
+======================================================================
+1. LOADING VALIDATION DATA
+======================================================================
+Loading Pick-a-Pic validation prompts...
+Loading cached Pick-a-Pic split 'validation_unique' from 1 parquet shards
+cache=/scratch/rr81/ma5430/.cache/huggingface/hub/datasets--pickapic-anonymous--pickapic_v1
+Loaded 500 Pick-a-Pic validation samples
+======================================================================
+2. LOADING REWARD MODEL
+======================================================================
+Loading SANA base reward backbone from Efficient-Large-Model/Sana_600M_512px_diffusers...
+Loading SANA reward checkpoint from /g/data/rr81/LPO/lrm/lrm_sana/logs/v7/reward_model/step_sana_sana_600m_512_variable-t_lr1e-5_step-8000_filter2_time951/checkpoint-gstep32000/model.safetensors...
+✓ Loaded checkpoint keys: 1214
+✓ Missing keys: 0 | Unexpected keys: 0
+✓ SANA LRM Reward Model initialized successfully!
+✓ Reward model loaded
+======================================================================
+3. LOADING PIPELINE
+======================================================================
+✓ Loaded SANA base model: Efficient-Large-Model/Sana_600M_512px_diffusers
+✓ Reward model attached to SANA pipeline
+✓ Pipeline loaded
+GPU memory before scorer load: 85.91 GB free / 140.06 GB total
+Scorer device: cuda:0
+======================================================================
+3.5. LOADING CLIP AND AESTHETIC SCORERS
+======================================================================
+✓ CLIP scorer loaded
+✓ Aesthetic scorer loaded
+✓ PickScore scorer loaded
+✓ HPSv2 scorer loaded
+✓ HPSv2.1 scorer loaded
+load checkpoint from /scratch/rr81/ma5430/.cache/huggingface/hub/models--THUDM--ImageReward/snapshots/5736be03b2652728fb87788c9797b0570450ab72/ImageReward.pt
+checkpoint loaded
+✓ ImageReward scorer loaded
+======================================================================
+4. CONFIGURING GRADIENT ASCENT
+======================================================================
+Loading gradient ascent config: one_step_rectification_config
+Config loaded: {'grad_timestep_range': (100, 800), 'num_grad_steps': 1, 'grad_step_size': 1.0, 'grad_scale': 1.0, 'lr_scheduler_type': 'constant', 'use_momentum': False, 'use_nesterov': False, 'use_iso_projection': False}
+Gradient timestep range: (100, 800)
+Gradient steps: 1
+Gradient step size (initial LR): 1.0
+LR Scheduler: constant
+✓ Gradient ascent enabled for timesteps (100, 800)
+======================================================================
+6. EVALUATING GRADIENT ASCENT
+======================================================================
+Generating images with gradient_ascent mode...
+[gradient_ascent] Batch 10/500 | Samples: 10/500 | Reward (t=136.0): 0.9883 | Reward (Avg): 0.9230 | CLIP: 27.2524 | Aesthetic: 6.1769 | PickScore: 22.0294 | HPSv2: 0.2856 | HPSv2.1: 0.3088 | ImageReward: 1.2917
+[gradient_ascent] Batch 20/500 | Samples: 20/500 | Reward (t=136.0): 0.8672 | Reward (Avg): 0.9189 | CLIP: 26.8319 | Aesthetic: 6.1308 | PickScore: 22.1532 | HPSv2: 0.2854 | HPSv2.1: 0.3091 | ImageReward: 1.1375
+[gradient_ascent] Batch 30/500 | Samples: 30/500 | Reward (t=136.0): 0.9922 | Reward (Avg): 0.9047 | CLIP: 26.7728 | Aesthetic: 6.0477 | PickScore: 22.3591 | HPSv2: 0.2854 | HPSv2.1: 0.3052 | ImageReward: 1.1096
+[gradient_ascent] Batch 40/500 | Samples: 40/500 | Reward (t=136.0): 0.8672 | Reward (Avg): 0.9026 | CLIP: 26.8790 | Aesthetic: 6.1062 | PickScore: 22.3062 | HPSv2: 0.2839 | HPSv2.1: 0.3013 | ImageReward: 0.9804
+[gradient_ascent] Batch 50/500 | Samples: 50/500 | Reward (t=136.0): 0.6523 | Reward (Avg): 0.8916 | CLIP: 26.6755 | Aesthetic: 6.0488 | PickScore: 22.1421 | HPSv2: 0.2827 | HPSv2.1: 0.2991 | ImageReward: 1.0652
+[gradient_ascent] Batch 60/500 | Samples: 60/500 | Reward (t=136.0): 0.9648 | Reward (Avg): 0.8891 | CLIP: 26.5230 | Aesthetic: 6.0566 | PickScore: 22.1336 | HPSv2: 0.2832 | HPSv2.1: 0.2991 | ImageReward: 1.0473
+[gradient_ascent] Batch 70/500 | Samples: 70/500 | Reward (t=136.0): 0.9805 | Reward (Avg): 0.8851 | CLIP: 26.6211 | Aesthetic: 6.0387 | PickScore: 22.1567 | HPSv2: 0.2834 | HPSv2.1: 0.2996 | ImageReward: 1.0103
+[gradient_ascent] Batch 80/500 | Samples: 80/500 | Reward (t=136.0): 0.9141 | Reward (Avg): 0.8869 | CLIP: 26.5639 | Aesthetic: 6.0182 | PickScore: 22.0309 | HPSv2: 0.2825 | HPSv2.1: 0.2947 | ImageReward: 0.9646
+[gradient_ascent] Batch 90/500 | Samples: 90/500 | Reward (t=136.0): 0.6836 | Reward (Avg): 0.8837 | CLIP: 26.7986 | Aesthetic: 6.0132 | PickScore: 22.0176 | HPSv2: 0.2825 | HPSv2.1: 0.2952 | ImageReward: 0.9975
+[gradient_ascent] Batch 100/500 | Samples: 100/500 | Reward (t=136.0): 0.5234 | Reward (Avg): 0.8824 | CLIP: 27.0476 | Aesthetic: 6.0459 | PickScore: 21.9703 | HPSv2: 0.2815 | HPSv2.1: 0.2932 | ImageReward: 0.9856
+[gradient_ascent] Batch 110/500 | Samples: 110/500 | Reward (t=136.0): 0.8086 | Reward (Avg): 0.8786 | CLIP: 27.2317 | Aesthetic: 6.0274 | PickScore: 21.9682 | HPSv2: 0.2820 | HPSv2.1: 0.2935 | ImageReward: 0.9922
+[gradient_ascent] Batch 120/500 | Samples: 120/500 | Reward (t=136.0): 0.5312 | Reward (Avg): 0.8745 | CLIP: 27.0959 | Aesthetic: 6.0128 | PickScore: 21.9585 | HPSv2: 0.2817 | HPSv2.1: 0.2935 | ImageReward: 1.0282
+[gradient_ascent] Batch 130/500 | Samples: 130/500 | Reward (t=136.0): 0.8984 | Reward (Avg): 0.8748 | CLIP: 27.1302 | Aesthetic: 6.0250 | PickScore: 22.0331 | HPSv2: 0.2825 | HPSv2.1: 0.2944 | ImageReward: 1.0238
+[gradient_ascent] Batch 140/500 | Samples: 140/500 | Reward (t=136.0): 0.8438 | Reward (Avg): 0.8711 | CLIP: 27.3238 | Aesthetic: 5.9973 | PickScore: 22.0136 | HPSv2: 0.2822 | HPSv2.1: 0.2935 | ImageReward: 1.0193
+[gradient_ascent] Batch 150/500 | Samples: 150/500 | Reward (t=136.0): 0.9844 | Reward (Avg): 0.8711 | CLIP: 27.2450 | Aesthetic: 5.9757 | PickScore: 21.9954 | HPSv2: 0.2817 | HPSv2.1: 0.2922 | ImageReward: 1.0077
+[gradient_ascent] Batch 160/500 | Samples: 160/500 | Reward (t=136.0): 0.8828 | Reward (Avg): 0.8731 | CLIP: 27.3797 | Aesthetic: 5.9691 | PickScore: 22.0329 | HPSv2: 0.2817 | HPSv2.1: 0.2922 | ImageReward: 1.0169
+[gradient_ascent] Batch 170/500 | Samples: 170/500 | Reward (t=136.0): 0.8477 | Reward (Avg): 0.8723 | CLIP: 27.1880 | Aesthetic: 5.9651 | PickScore: 21.9791 | HPSv2: 0.2812 | HPSv2.1: 0.2910 | ImageReward: 1.0030
+[gradient_ascent] Batch 180/500 | Samples: 180/500 | Reward (t=136.0): 0.8906 | Reward (Avg): 0.8697 | CLIP: 27.2816 | Aesthetic: 5.9708 | PickScore: 21.9840 | HPSv2: 0.2815 | HPSv2.1: 0.2917 | ImageReward: 1.0276
+[gradient_ascent] Batch 190/500 | Samples: 190/500 | Reward (t=136.0): 0.8672 | Reward (Avg): 0.8692 | CLIP: 27.2303 | Aesthetic: 5.9694 | PickScore: 21.9639 | HPSv2: 0.2810 | HPSv2.1: 0.2910 | ImageReward: 1.0080
+[gradient_ascent] Batch 200/500 | Samples: 200/500 | Reward (t=136.0): 0.8281 | Reward (Avg): 0.8684 | CLIP: 27.2174 | Aesthetic: 5.9641 | PickScore: 21.9716 | HPSv2: 0.2815 | HPSv2.1: 0.2917 | ImageReward: 1.0208
+[gradient_ascent] Batch 210/500 | Samples: 210/500 | Reward (t=136.0): 0.8789 | Reward (Avg): 0.8703 | CLIP: 27.1992 | Aesthetic: 5.9754 | PickScore: 21.9724 | HPSv2: 0.2812 | HPSv2.1: 0.2925 | ImageReward: 1.0306
+[gradient_ascent] Batch 220/500 | Samples: 220/500 | Reward (t=136.0): 0.9375 | Reward (Avg): 0.8702 | CLIP: 27.2434 | Aesthetic: 5.9779 | PickScore: 21.9538 | HPSv2: 0.2815 | HPSv2.1: 0.2922 | ImageReward: 1.0132
+[gradient_ascent] Batch 230/500 | Samples: 230/500 | Reward (t=136.0): 0.6250 | Reward (Avg): 0.8715 | CLIP: 27.2932 | Aesthetic: 5.9839 | PickScore: 21.9659 | HPSv2: 0.2812 | HPSv2.1: 0.2920 | ImageReward: 0.9930
+[gradient_ascent] Batch 240/500 | Samples: 240/500 | Reward (t=136.0): 0.8594 | Reward (Avg): 0.8721 | CLIP: 27.1944 | Aesthetic: 5.9938 | PickScore: 21.9538 | HPSv2: 0.2808 | HPSv2.1: 0.2913 | ImageReward: 0.9821
+[gradient_ascent] Batch 250/500 | Samples: 250/500 | Reward (t=136.0): 0.8047 | Reward (Avg): 0.8723 | CLIP: 27.1940 | Aesthetic: 5.9992 | PickScore: 21.9601 | HPSv2: 0.2810 | HPSv2.1: 0.2913 | ImageReward: 0.9797
+[gradient_ascent] Batch 260/500 | Samples: 260/500 | Reward (t=136.0): 0.9297 | Reward (Avg): 0.8711 | CLIP: 27.1640 | Aesthetic: 6.0070 | PickScore: 21.9547 | HPSv2: 0.2810 | HPSv2.1: 0.2915 | ImageReward: 0.9918
+[gradient_ascent] Batch 270/500 | Samples: 270/500 | Reward (t=136.0): 0.9961 | Reward (Avg): 0.8714 | CLIP: 27.1096 | Aesthetic: 6.0077 | PickScore: 21.9464 | HPSv2: 0.2810 | HPSv2.1: 0.2913 | ImageReward: 0.9900
+[gradient_ascent] Batch 280/500 | Samples: 280/500 | Reward (t=136.0): 0.9258 | Reward (Avg): 0.8715 | CLIP: 27.1485 | Aesthetic: 6.0107 | PickScore: 21.9772 | HPSv2: 0.2812 | HPSv2.1: 0.2920 | ImageReward: 1.0051
+[gradient_ascent] Batch 290/500 | Samples: 290/500 | Reward (t=136.0): 0.9062 | Reward (Avg): 0.8722 | CLIP: 27.0509 | Aesthetic: 5.9991 | PickScore: 21.9401 | HPSv2: 0.2810 | HPSv2.1: 0.2913 | ImageReward: 0.9888
+[gradient_ascent] Batch 300/500 | Samples: 300/500 | Reward (t=136.0): 0.9844 | Reward (Avg): 0.8718 | CLIP: 27.0241 | Aesthetic: 6.0017 | PickScore: 21.9507 | HPSv2: 0.2812 | HPSv2.1: 0.2920 | ImageReward: 0.9941
+[gradient_ascent] Batch 310/500 | Samples: 310/500 | Reward (t=136.0): 0.9375 | Reward (Avg): 0.8709 | CLIP: 27.0985 | Aesthetic: 6.0055 | PickScore: 21.9663 | HPSv2: 0.2815 | HPSv2.1: 0.2925 | ImageReward: 1.0157
+[gradient_ascent] Batch 320/500 | Samples: 320/500 | Reward (t=136.0): 0.9492 | Reward (Avg): 0.8703 | CLIP: 27.1293 | Aesthetic: 6.0013 | PickScore: 21.9831 | HPSv2: 0.2815 | HPSv2.1: 0.2922 | ImageReward: 1.0116
+[gradient_ascent] Batch 330/500 | Samples: 330/500 | Reward (t=136.0): 0.9609 | Reward (Avg): 0.8704 | CLIP: 27.1128 | Aesthetic: 6.0068 | PickScore: 21.9762 | HPSv2: 0.2815 | HPSv2.1: 0.2925 | ImageReward: 1.0155
+[gradient_ascent] Batch 340/500 | Samples: 340/500 | Reward (t=136.0): 0.9648 | Reward (Avg): 0.8709 | CLIP: 27.1100 | Aesthetic: 6.0010 | PickScore: 21.9779 | HPSv2: 0.2812 | HPSv2.1: 0.2927 | ImageReward: 1.0264
+[gradient_ascent] Batch 350/500 | Samples: 350/500 | Reward (t=136.0): 0.7969 | Reward (Avg): 0.8694 | CLIP: 27.1092 | Aesthetic: 5.9996 | PickScore: 21.9887 | HPSv2: 0.2812 | HPSv2.1: 0.2927 | ImageReward: 1.0177
+[gradient_ascent] Batch 360/500 | Samples: 360/500 | Reward (t=136.0): 0.7461 | Reward (Avg): 0.8684 | CLIP: 27.1495 | Aesthetic: 5.9991 | PickScore: 21.9878 | HPSv2: 0.2815 | HPSv2.1: 0.2930 | ImageReward: 1.0115
+[gradient_ascent] Batch 370/500 | Samples: 370/500 | Reward (t=136.0): 0.9336 | Reward (Avg): 0.8669 | CLIP: 27.0887 | Aesthetic: 5.9964 | PickScore: 21.9767 | HPSv2: 0.2815 | HPSv2.1: 0.2925 | ImageReward: 1.0073
+[gradient_ascent] Batch 380/500 | Samples: 380/500 | Reward (t=136.0): 0.6836 | Reward (Avg): 0.8673 | CLIP: 27.1722 | Aesthetic: 6.0026 | PickScore: 21.9784 | HPSv2: 0.2812 | HPSv2.1: 0.2927 | ImageReward: 1.0130
+[gradient_ascent] Batch 390/500 | Samples: 390/500 | Reward (t=136.0): 0.9688 | Reward (Avg): 0.8669 | CLIP: 27.0503 | Aesthetic: 5.9985 | PickScore: 21.9617 | HPSv2: 0.2810 | HPSv2.1: 0.2920 | ImageReward: 1.0016
+[gradient_ascent] Batch 400/500 | Samples: 400/500 | Reward (t=136.0): 0.6484 | Reward (Avg): 0.8664 | CLIP: 27.0083 | Aesthetic: 5.9924 | PickScore: 21.9558 | HPSv2: 0.2810 | HPSv2.1: 0.2920 | ImageReward: 0.9993
+[gradient_ascent] Batch 410/500 | Samples: 410/500 | Reward (t=136.0): 0.4082 | Reward (Avg): 0.8651 | CLIP: 26.9581 | Aesthetic: 5.9885 | PickScore: 21.9445 | HPSv2: 0.2808 | HPSv2.1: 0.2913 | ImageReward: 0.9801
+[gradient_ascent] Batch 420/500 | Samples: 420/500 | Reward (t=136.0): 0.6602 | Reward (Avg): 0.8630 | CLIP: 26.9090 | Aesthetic: 5.9864 | PickScore: 21.9316 | HPSv2: 0.2805 | HPSv2.1: 0.2910 | ImageReward: 0.9757
+[gradient_ascent] Batch 430/500 | Samples: 430/500 | Reward (t=136.0): 0.8164 | Reward (Avg): 0.8628 | CLIP: 26.9218 | Aesthetic: 5.9813 | PickScore: 21.9366 | HPSv2: 0.2805 | HPSv2.1: 0.2910 | ImageReward: 0.9807
+[gradient_ascent] Batch 440/500 | Samples: 440/500 | Reward (t=136.0): 0.8516 | Reward (Avg): 0.8632 | CLIP: 26.8428 | Aesthetic: 5.9854 | PickScore: 21.9359 | HPSv2: 0.2805 | HPSv2.1: 0.2910 | ImageReward: 0.9783
+[gradient_ascent] Batch 450/500 | Samples: 450/500 | Reward (t=136.0): 0.7812 | Reward (Avg): 0.8623 | CLIP: 26.8449 | Aesthetic: 5.9793 | PickScore: 21.9350 | HPSv2: 0.2803 | HPSv2.1: 0.2908 | ImageReward: 0.9707
+[gradient_ascent] Batch 460/500 | Samples: 460/500 | Reward (t=136.0): 0.8086 | Reward (Avg): 0.8627 | CLIP: 26.8162 | Aesthetic: 5.9783 | PickScore: 21.9268 | HPSv2: 0.2803 | HPSv2.1: 0.2903 | ImageReward: 0.9679
+[gradient_ascent] Batch 470/500 | Samples: 470/500 | Reward (t=136.0): 0.6836 | Reward (Avg): 0.8619 | CLIP: 26.7554 | Aesthetic: 5.9671 | PickScore: 21.9094 | HPSv2: 0.2798 | HPSv2.1: 0.2898 | ImageReward: 0.9538
+[gradient_ascent] Batch 480/500 | Samples: 480/500 | Reward (t=136.0): 0.9141 | Reward (Avg): 0.8607 | CLIP: 26.6819 | Aesthetic: 5.9631 | PickScore: 21.8992 | HPSv2: 0.2798 | HPSv2.1: 0.2896 | ImageReward: 0.9561
+[gradient_ascent] Batch 490/500 | Samples: 490/500 | Reward (t=136.0): 0.8906 | Reward (Avg): 0.8607 | CLIP: 26.6874 | Aesthetic: 5.9629 | PickScore: 21.8907 | HPSv2: 0.2795 | HPSv2.1: 0.2893 | ImageReward: 0.9554
+[gradient_ascent] Batch 500/500 | Samples: 500/500 | Reward (t=136.0): 0.9102 | Reward (Avg): 0.8612 | CLIP: 26.6889 | Aesthetic: 5.9645 | PickScore: 21.8882 | HPSv2: 0.2798 | HPSv2.1: 0.2896 | ImageReward: 0.9575
+✓ Gradient Ascent Avg Reward: 0.8612
+✓ Gradient Ascent Avg CLIP Score: 26.6889
+✓ Gradient Ascent Avg Aesthetic Score: 5.9645
+✓ Gradient Ascent Avg PickScore: 21.8882
+✓ Gradient Ascent Avg HPSv2 Score: 0.2798
+✓ Gradient Ascent Avg HPSv2.1 Score: 0.2896
+✓ Gradient Ascent Avg ImageReward: 0.9575
+Gradient Ascent Statistics:
+  Applications: 11
+  Total reward improvement: +0.3047
+  Avg reward improvement: +0.0277
+✓ Saved LR curve plot to: RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_4/lr_curve.png
+   Total gradient steps: 11
+   LR range: 1.000000 → 1.000000
+✓ Saved Rewards curve plot to: RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_4/rewards_curve.png
+   Total gradient steps: 22
+   Reward range: 0.9727 → 0.9961
+   Total improvement: +0.0234
+======================================================================
+FINAL RESULTS
+======================================================================
+Gradient Ascent:
+  Avg Reward:       0.8612
+  Avg CLIP Score:   26.6889
+  Avg Aesthetic:    5.9645
+  Avg PickScore:    21.8882
+  Avg HPSv2:        0.2798
+  Avg HPSv2.1:      0.2896
+  Avg ImageReward:  0.9575
+✓ Results saved to: RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_4/evaluation_results.txt
+======================================================================

Reward_sana_idealized/RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_4/lr_curve.png ADDED Viewed

Reward_sana_idealized/RESULTS/pickapic/one_step_rectification_config_sana_600m_512/run_4/rewards_curve.png ADDED Viewed

Reward_sana_idealized/__pycache__/eval.cpython-311.pyc ADDED Viewed

Binary file (75.8 kB). View file

Reward_sana_idealized/__pycache__/gradient_ascent_utils.cpython-311.pyc ADDED Viewed

Binary file (17.1 kB). View file

Reward_sana_idealized/blip/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ from .blip_pretrain import *

Reward_sana_idealized/blip/__pycache__/__init__.cpython-311.pyc ADDED Viewed

Binary file (202 Bytes). View file

Reward_sana_idealized/blip/__pycache__/blip.cpython-311.pyc ADDED Viewed

Binary file (4.03 kB). View file

Reward_sana_idealized/blip/__pycache__/blip_pretrain.cpython-311.pyc ADDED Viewed

Binary file (2.35 kB). View file

Reward_sana_idealized/blip/__pycache__/med.cpython-311.pyc ADDED Viewed

Binary file (46.9 kB). View file

Reward_sana_idealized/blip/blip.py ADDED Viewed

	@@ -0,0 +1,70 @@

+'''
+ * Adapted from BLIP (https://github.com/salesforce/BLIP)
+'''
+import warnings
+warnings.filterwarnings("ignore")
+import torch
+import os
+from urllib.parse import urlparse
+from timm.models.hub import download_cached_file
+from transformers import BertTokenizer
+from .vit import VisionTransformer, interpolate_pos_embed
+def init_tokenizer():
+    tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
+    tokenizer.add_special_tokens({'bos_token':'[DEC]'})
+    tokenizer.add_special_tokens({'additional_special_tokens':['[ENC]']})
+    tokenizer.enc_token_id = tokenizer.additional_special_tokens_ids[0]
+    return tokenizer
+def create_vit(vit, image_size, use_grad_checkpointing=False, ckpt_layer=0, drop_path_rate=0):
+    assert vit in ['base', 'large'], "vit parameter must be base or large"
+    if vit=='base':
+        vision_width = 768
+        visual_encoder = VisionTransformer(img_size=image_size, patch_size=16, embed_dim=vision_width, depth=12,
+                                           num_heads=12, use_grad_checkpointing=use_grad_checkpointing, ckpt_layer=ckpt_layer,
+                                           drop_path_rate=0 or drop_path_rate
+                                          )
+    elif vit=='large':
+        vision_width = 1024
+        visual_encoder = VisionTransformer(img_size=image_size, patch_size=16, embed_dim=vision_width, depth=24,
+                                           num_heads=16, use_grad_checkpointing=use_grad_checkpointing, ckpt_layer=ckpt_layer,
+                                           drop_path_rate=0.1 or drop_path_rate
+                                          )
+    return visual_encoder, vision_width
+def is_url(url_or_filename):
+    parsed = urlparse(url_or_filename)
+    return parsed.scheme in ("http", "https")
+def load_checkpoint(model,url_or_filename):
+    if is_url(url_or_filename):
+        cached_file = download_cached_file(url_or_filename, check_hash=False, progress=True)
+        checkpoint = torch.load(cached_file, map_location='cpu')
+    elif os.path.isfile(url_or_filename):
+        checkpoint = torch.load(url_or_filename, map_location='cpu')
+    else:
+        raise RuntimeError('checkpoint url or path is invalid')
+    state_dict = checkpoint['model']
+    state_dict['visual_encoder.pos_embed'] = interpolate_pos_embed(state_dict['visual_encoder.pos_embed'],model.visual_encoder)
+    if 'visual_encoder_m.pos_embed' in model.state_dict().keys():
+        state_dict['visual_encoder_m.pos_embed'] = interpolate_pos_embed(state_dict['visual_encoder_m.pos_embed'],
+                                                                         model.visual_encoder_m)
+    for key in model.state_dict().keys():
+        if key in state_dict.keys():
+            if state_dict[key].shape!=model.state_dict()[key].shape:
+                print(key, ": ", state_dict[key].shape, ', ', model.state_dict()[key].shape)
+                del state_dict[key]
+    msg = model.load_state_dict(state_dict,strict=False)
+    print('load checkpoint from %s'%url_or_filename)
+    return model,msg

Reward_sana_idealized/blip/blip_pretrain.py ADDED Viewed

	@@ -0,0 +1,43 @@

+'''
+ * Adapted from BLIP (https://github.com/salesforce/BLIP)
+'''
+import transformers
+transformers.logging.set_verbosity_error()
+from torch import nn
+import os
+from .med import BertConfig, BertModel
+from .blip import create_vit, init_tokenizer
+class BLIP_Pretrain(nn.Module):
+    def __init__(self,
+                 med_config = "med_config.json",
+                 image_size = 224,
+                 vit = 'base',
+                 vit_grad_ckpt = False,
+                 vit_ckpt_layer = 0,
+                 embed_dim = 256,
+                 queue_size = 57600,
+                 momentum = 0.995,
+                 ):
+        """
+        Args:
+            med_config (str): path for the mixture of encoder-decoder model's configuration file
+            image_size (int): input image size
+            vit (str): model size of vision transformer
+        """
+        super().__init__()
+        self.visual_encoder, vision_width = create_vit(vit,image_size, vit_grad_ckpt, vit_ckpt_layer, 0)
+        self.tokenizer = init_tokenizer()
+        encoder_config = BertConfig.from_json_file(med_config)
+        encoder_config.encoder_width = vision_width
+        self.text_encoder = BertModel(config=encoder_config, add_pooling_layer=False)
+        text_width = self.text_encoder.config.hidden_size
+        self.vision_proj = nn.Linear(vision_width, embed_dim)
+        self.text_proj = nn.Linear(text_width, embed_dim)

Reward_sana_idealized/config_analysis_tuning.ipynb ADDED Viewed

	@@ -0,0 +1,218 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a24d02a2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "import pandas as pd\n",
+    "import numpy as np\n",
+    "from pathlib import Path\n",
+    "from datetime import datetime\n",
+    "import warnings\n",
+    "warnings.filterwarnings('ignore')\n",
+    "\n",
+    "# ============================================================================\n",
+    "# SECTION 1: Load and Parse Results from GPU Tuning Runs\n",
+    "# ==========================-==================================================\n",
+    "print(\"=\" * 80)\n",
+    "print(\"LOADING TUNING RESULTS FROM GPU RUNS\")\n",
+    "print(\"=\" * 80)\n",
+    "\n",
+    "results_dir = Path(\"RESULTS_TURNING/run_2\")\n",
+    "all_experiments = []\n",
+    "baseline_metrics = None\n",
+    "\n",
+    "# Collect results from all GPU runs\n",
+    "for gpu_id in range(8):\n",
+    "    gpu_dir = results_dir / f\"gpu_{gpu_id}\"\n",
+    "    results_file = gpu_dir / \"tuning_results.json\"\n",
+    "    \n",
+    "    if results_file.exists():\n",
+    "        with open(results_file, 'r') as f:\n",
+    "            data = json.load(f)\n",
+    "        \n",
+    "        # Extract baseline (same across all GPUs)\n",
+    "        if baseline_metrics is None and \"baseline\" in data:\n",
+    "            baseline_metrics = data[\"baseline\"][\"metrics\"]\n",
+    "            print(f\"\\n📊 Baseline Metrics (cfg_scale=5.0):\")\n",
+    "            for metric, value in baseline_metrics.items():\n",
+    "                print(f\"   {metric:15s}: {value:.6f}\")\n",
+    "        \n",
+    "        # Collect all experiments\n",
+    "        if \"experiments\" in data:\n",
+    "            all_experiments.extend(data[\"experiments\"])\n",
+    "            print(f\"✓ GPU {gpu_id}: {len(data['experiments'])} results loaded\")\n",
+    "\n",
+    "print(f\"\\n✓ Total experiments loaded: {len(all_experiments)}\")\n",
+    "\n",
+    "# ============================================================================\n",
+    "# SECTION 2: Filter Top Configs with Improvements Across All Metrics\n",
+    "# ============================================================================\n",
+    "print(\"\\n\" + \"=\" * 80)\n",
+    "print(\"FILTERING CONFIGURATIONS WITH IMPROVEMENTS IN ALL METRICS\")\n",
+    "print(\"=\" * 80)\n",
+    "\n",
+    "# Define improvement metrics to track (using ImageReward instead of Reward)\n",
+    "improvement_metrics = [\n",
+    "                        \"aesthetic_improvement\", \n",
+    "                        \"imagereward_improvement\", \n",
+    "                        \"clip_improvement\", \n",
+    "                        \"pickscore_improvement\", \n",
+    "                        \"hpsv2_improvement\"\n",
+    "                      ]\n",
+    "\n",
+    "# Filter experiments with improvements in ALL metrics\n",
+    "top_configs = []\n",
+    "\n",
+    "for exp in all_experiments:\n",
+    "    if \"improvements\" not in exp or \"config\" not in exp or \"metrics\" not in exp:\n",
+    "        continue\n",
+    "    \n",
+    "    improvements = exp[\"improvements\"]\n",
+    "    config = exp[\"config\"]\n",
+    "    metrics = exp[\"metrics\"]\n",
+    "    \n",
+    "    # Check if ALL improvements are positive (>0)\n",
+    "    all_positive = all(improvements.get(metric, -1) > 0 for metric in improvement_metrics)\n",
+    "    \n",
+    "    if all_positive:\n",
+    "        # Calculate aggregate improvement score\n",
+    "        avg_improvement = np.mean([improvements.get(metric, 0) for metric in improvement_metrics])\n",
+    "        \n",
+    "        top_configs.append({\n",
+    "            \"config\": config,\n",
+    "            \"metrics\": metrics,\n",
+    "            \"improvements\": improvements,\n",
+    "            \"avg_improvement\": avg_improvement\n",
+    "        })\n",
+    "\n",
+    "print(f\"✓ Found {len(top_configs)} configurations with improvements in ALL metrics\")\n",
+    "\n",
+    "# Sort by average improvement\n",
+    "top_configs.sort(key=lambda x: x[\"avg_improvement\"], reverse=True)\n",
+    "\n",
+    "# Get top 10\n",
+    "top_10 = top_configs[:10]\n",
+    "print(f\"✓ Extracted top 10 best performing configurations\")\n",
+    "\n",
+    "# ============================================================================\n",
+    "# SECTION 3: Create Comprehensive Results Table\n",
+    "# ============================================================================\n",
+    "print(\"\\n\" + \"=\" * 80)\n",
+    "print(\"CREATING COMPREHENSIVE RESULTS TABLE\")\n",
+    "print(\"=\" * 80)\n",
+    "\n",
+    "# Build detailed table data\n",
+    "table_data = []\n",
+    "\n",
+    "for rank, result in enumerate(top_10, 1):\n",
+    "    cfg = result[\"config\"]\n",
+    "    metrics = result[\"metrics\"]\n",
+    "    improvements = result[\"improvements\"]\n",
+    "    \n",
+    "    row = {\n",
+    "        \"Rank\": rank,\n",
+    "        \"CFG Scale\": cfg.get(\"cfg_scale\", \"N/A\"),\n",
+    "        \"Grad Config\": cfg.get(\"grad_config\", \"N/A\"),\n",
+    "        \"Steps\": cfg.get(\"num_grad_steps\", \"N/A\"),\n",
+    "        \"LR\": cfg.get(\"grad_step_size\", \"N/A\"),\n",
+    "        \"Momentum\": cfg.get(\"momentum\", \"N/A\"),\n",
+    "        \"ImageReward\": f\"{metrics.get('imagereward', 0):.6f}\",\n",
+    "        \"ImageReward ↑\": f\"{improvements.get('imagereward_improvement', 0):+.2f}%\",\n",
+    "        \"CLIP\": f\"{metrics.get('clip', 0):.4f}\",\n",
+    "        \"CLIP ↑\": f\"{improvements.get('clip_improvement', 0):+.2f}%\",\n",
+    "        \"Aesthetic\": f\"{metrics.get('aesthetic', 0):.4f}\",\n",
+    "        \"Aesthetic ↑\": f\"{improvements.get('aesthetic_improvement', 0):+.2f}%\",\n",
+    "        \"PickScore\": f\"{metrics.get('pickscore', 0):.4f}\",\n",
+    "        \"PickScore ↑\": f\"{improvements.get('pickscore_improvement', 0):+.2f}%\",\n",
+    "        \"HPSv2\": f\"{metrics.get('hpsv2', 0):.4f}\",\n",
+    "        \"HPSv2 ↑\": f\"{improvements.get('hpsv2_improvement', 0):+.2f}%\",\n",
+    "        \"Avg Improvement\": f\"{result['avg_improvement']:+.2f}%\",\n",
+    "    }\n",
+    "    \n",
+    "    table_data.append(row)\n",
+    "\n",
+    "df_top_10 = pd.DataFrame(table_data)\n",
+    "\n",
+    "print(\"\\n📋 TOP 10 CONFIGURATIONS WITH IMPROVEMENTS IN ALL METRICS:\")\n",
+    "print(\"=\" * 180)\n",
+    "print(df_top_10.to_string(index=False))\n",
+    "print(\"=\" * 180)\n",
+    "\n",
+    "# ============================================================================\n",
+    "# SECTION 4: Visualize and Summary Statistics\n",
+    "# ============================================================================\n",
+    "print(\"\\n\" + \"=\" * 80)\n",
+    "print(\"SUMMARY STATISTICS\")\n",
+    "print(\"=\" * 80)\n",
+    "\n",
+    "# Extract numeric improvement values for analysis\n",
+    "improvement_summary = []\n",
+    "for result in top_10:\n",
+    "    improvements = result[\"improvements\"]\n",
+    "    for metric in [\"imagereward_improvement\", \"clip_improvement\", \"aesthetic_improvement\", \n",
+    "                   \"pickscore_improvement\", \"hpsv2_improvement\"]:\n",
+    "        metric_name = metric.replace(\"_improvement\", \"\").upper()\n",
+    "        improvement_summary.append({\n",
+    "            \"Metric\": metric_name,\n",
+    "            \"Improvement %\": improvements.get(metric, 0)\n",
+    "        })\n",
+    "\n",
+    "df_summary = pd.DataFrame(improvement_summary)\n",
+    "\n",
+    "print(\"\\n📊 Average Improvements by Metric (Top 10):\")\n",
+    "metric_stats = df_summary.groupby(\"Metric\")[\"Improvement %\"].agg([\"mean\", \"std\", \"min\", \"max\"])\n",
+    "print(metric_stats.round(2))\n",
+    "\n",
+    "print(\"\\n📈 Best Configuration Details:\")\n",
+    "best = top_10[0]\n",
+    "best_cfg = best[\"config\"]\n",
+    "best_metrics = best[\"metrics\"]\n",
+    "best_improvements = best[\"improvements\"]\n",
+    "\n",
+    "print(f\"\\n✓ RANK #1 - Best Performing Configuration:\")\n",
+    "print(f\"   Configuration:\")\n",
+    "print(f\"      • CFG Scale: {best_cfg.get('cfg_scale')}\")\n",
+    "print(f\"      • Gradient Config: {best_cfg.get('grad_config')}\")\n",
+    "print(f\"      • Gradient Steps: {best_cfg.get('num_grad_steps')}\")\n",
+    "print(f\"      • Step Size: {best_cfg.get('grad_step_size')}\")\n",
+    "print(f\"      • Momentum: {best_cfg.get('momentum')}\")\n",
+    "print(f\"\\n   Metrics:\")\n",
+    "for metric in [\"imagereward\", \"clip\", \"aesthetic\", \"pickscore\", \"hpsv2\"]:\n",
+    "    baseline_val = baseline_metrics.get(metric, 0)\n",
+    "    current_val = best_metrics.get(metric, 0)\n",
+    "    improvement = best_improvements.get(f\"{metric}_improvement\", 0)\n",
+    "    print(f\"      • {metric:12s}: {current_val:8.6f} (baseline: {baseline_val:8.6f}) ↑ {improvement:+6.2f}%\")\n",
+    "\n",
+    "print(\"\\n\" + \"=\" * 80)\n",
+    "print(\"✓ ANALYSIS COMPLETE - TOP 10 CONFIGURATIONS IDENTIFIED\")\n",
+    "print(\"=\" * 80)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.18"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

Reward_sana_idealized/eval.py ADDED Viewed

	@@ -0,0 +1,1447 @@

+"""
+Evaluation script for comparing baseline and gradient ascent pipelines using multiple metrics.
+This script evaluates both pipelines on COCO or Pick-a-Pic validation sets and computes
+various preference and quality metrics.
+"""
+import warnings
+warnings.filterwarnings("ignore")
+import torch
+import torch.nn as nn
+import json
+import os
+import sys
+import logging
+from glob import glob
+from pathlib import Path
+from PIL import Image
+from diffusers import SanaPipeline
+from models import LRMRewardModel
+from pipelines.sana_gradient_ascent_pipeline import SanaGradientAscentPipeline
+from torchmetrics.image.fid import FrechetInceptionDistance
+from torchmetrics.multimodal import CLIPScore
+from transformers import CLIPModel, CLIPProcessor
+from tqdm import tqdm
+import numpy as np
+import argparse
+from datasets import load_dataset
+from grad_ascent_configs import get_config, list_configs
+import matplotlib.pyplot as plt
+import matplotlib
+matplotlib.use('Agg')  # Use non-interactive backend
+from huggingface_hub import hf_hub_download
+import random
+SANA_PROFILE_TO_MODEL_ID = {
+    "sana_600m_512": "Efficient-Large-Model/Sana_600M_512px_diffusers",
+    "sana_1600m_512": "Efficient-Large-Model/Sana_1600M_512px_diffusers",
+    "sana_sprint_0_6b_1024": "Efficient-Large-Model/Sana_Sprint_0.6B_1024px_diffusers",
+    "sana_sprint_1_6b_1024": "Efficient-Large-Model/Sana_Sprint_1.6B_1024px_diffusers",
+}
+def configure_hf_runtime(hf_cache_dir=None, force_offline=False):
+    """Set Hugging Face cache/offline environment for cluster-safe execution."""
+    cache_dir = hf_cache_dir or os.getenv("HF_HUB_CACHE") or os.getenv("HUGGINGFACE_HUB_CACHE")
+    if cache_dir:
+        os.environ["HF_HUB_CACHE"] = cache_dir
+        os.environ["HUGGINGFACE_HUB_CACHE"] = cache_dir
+        os.environ["HF_HOME"] = os.path.dirname(cache_dir)
+    env_offline = os.getenv("HF_HUB_OFFLINE", "0").strip().lower() in {"1", "true", "yes", "on"}
+    offline_enabled = bool(force_offline or env_offline)
+    if offline_enabled:
+        os.environ["HF_DATASETS_OFFLINE"] = "1"
+        os.environ["HF_METRICS_OFFLINE"] = "1"
+        os.environ["HF_MODULES_OFFLINE"] = "1"
+        os.environ["TRANSFORMERS_OFFLINE"] = "1"
+        os.environ["DIFFUSERS_OFFLINE"] = "1"
+        os.environ["HF_HUB_OFFLINE"] = "1"
+    return cache_dir, offline_enabled
+def resolve_default_lrm_model():
+    """Prefer the local SANA reward checkpoint when available."""
+    project_root = Path(__file__).resolve().parents[1]
+    default_ckpt_dir = (
+        project_root
+        / "lrm"
+        / "lrm_sana"
+        / "logs"
+        / "v8"
+        / "reward_model"
+        / "step_sana_sana_600m_512_variable-t_lr1e-5_step-8000_filter2_time951"
+        / "checkpoint-gstep76000"
+    )
+    if default_ckpt_dir.exists():
+        return str(default_ckpt_dir)
+    return ""
+def load_pickapic_prompts(max_samples=None, cache_dir=None, offline=False):
+    """Load Pick-a-Pic prompts with robust offline fallback to cached parquet shards."""
+    split = "validation_unique"
+    if not offline:
+        try:
+            ds = load_dataset("pickapic-anonymous/pickapic_v1", split=split, streaming=True)
+            prompts = []
+            for i, sample in enumerate(ds):
+                prompts.append(sample["caption"])
+                if max_samples and i + 1 >= max_samples:
+                    break
+            return prompts
+        except Exception as e:
+            print(f"Warning: online streaming load failed ({e}). Trying cached offline parquet shards.")
+    cache_candidates = []
+    for p in [
+        cache_dir,
+        os.getenv("HF_HUB_CACHE"),
+        os.getenv("HUGGINGFACE_HUB_CACHE"),
+        (os.path.join(os.getenv("HF_HOME"), "hub") if os.getenv("HF_HOME") else None),
+        os.path.expanduser("~/.cache/huggingface/hub"),
+        "/scratch/rr81/ma5430/.cache/huggingface/hub",
+    ]:
+        if p and p not in cache_candidates:
+            cache_candidates.append(p)
+    for cache_root in cache_candidates:
+        repo_cache = os.path.join(cache_root, "datasets--pickapic-anonymous--pickapic_v1")
+        if not os.path.isdir(repo_cache):
+            continue
+        snapshot_dir = None
+        ref_main = os.path.join(repo_cache, "refs", "main")
+        if os.path.isfile(ref_main):
+            revision = open(ref_main, "r", encoding="utf-8").read().strip()
+            candidate = os.path.join(repo_cache, "snapshots", revision)
+            if os.path.isdir(candidate):
+                snapshot_dir = candidate
+        if snapshot_dir is None:
+            snapshots = sorted(glob(os.path.join(repo_cache, "snapshots", "*")))
+            if snapshots:
+                snapshot_dir = snapshots[-1]
+        if snapshot_dir is None:
+            continue
+        data_dir = os.path.join(snapshot_dir, "data")
+        if not os.path.isdir(data_dir):
+            continue
+        selected_split = split
+        parquet_files = sorted(glob(os.path.join(data_dir, f"{selected_split}-*.parquet")))
+        if not parquet_files:
+            for alt_split in ("test_unique", "test"):
+                alt_files = sorted(glob(os.path.join(data_dir, f"{alt_split}-*.parquet")))
+                if alt_files:
+                    selected_split = alt_split
+                    parquet_files = alt_files
+                    print(f"Offline cache missing split '{split}', falling back to '{selected_split}'.")
+                    break
+        if not parquet_files:
+            continue
+        print(
+            f"Loading cached Pick-a-Pic split '{selected_split}' from {len(parquet_files)} parquet shards\n"
+            f"cache={repo_cache}"
+        )
+        ds = load_dataset("parquet", data_files=parquet_files, split="train")
+        prompts = ds["caption"]
+        if max_samples:
+            prompts = prompts[:max_samples]
+        return list(prompts)
+    raise RuntimeError(
+        "Could not load pickapic prompts in offline mode. "
+        "Set --hf_cache_dir to a cache that contains datasets--pickapic-anonymous--pickapic_v1."
+    )
+def resolve_scorer_device(requested_device, generation_device, min_free_gb_for_gpu=14.0):
+    """Choose where metric scorers should run to avoid GPU OOM/cudnn init failures."""
+    if requested_device == "cpu":
+        return "cpu"
+    if not torch.cuda.is_available() or not str(generation_device).startswith("cuda"):
+        return "cpu"
+    if requested_device == "cuda":
+        return generation_device
+    # Auto mode: only keep scorers on GPU if enough headroom remains after loading generation models.
+    try:
+        free_bytes, total_bytes = torch.cuda.mem_get_info(torch.device(generation_device))
+        free_gb = free_bytes / (1024 ** 3)
+        total_gb = total_bytes / (1024 ** 3)
+        print(f"GPU memory before scorer load: {free_gb:.2f} GB free / {total_gb:.2f} GB total")
+        if free_gb >= min_free_gb_for_gpu:
+            return generation_device
+        print(
+            f"⚠ Low free VRAM ({free_gb:.2f} GB). Running scorers on CPU to keep diffusion stable. "
+            f"Use --scorer_device cuda to force GPU scorers."
+        )
+        return "cpu"
+    except Exception as e:
+        print(f"Warning: could not inspect CUDA free memory ({e}). Falling back to CPU scorers.")
+        return "cpu"
+def configure_cudnn_safely(device):
+    """Disable cuDNN when the current GPU or runtime cannot initialize it safely."""
+    if not torch.cuda.is_available() or not str(device).startswith("cuda"):
+        return
+    try:
+        major, minor = torch.cuda.get_device_capability(torch.device(device))
+        if (major, minor) < (7, 5):
+            print(
+                f"⚠ Detected compute capability sm_{major}{minor} (< 75). "
+                "Disabling cuDNN to prevent runtime initialization failures."
+            )
+            torch.backends.cudnn.enabled = False
+            return
+        # Force a cuDNN init probe early so failures are handled once at startup.
+        _ = torch.backends.cudnn.version()
+    except Exception as e:
+        print(f"⚠ cuDNN init probe failed ({e}). Disabling cuDNN for this run.")
+        torch.backends.cudnn.enabled = False
+def resolve_generation_dtype(requested_dtype, device):
+    """Pick a safe generation dtype for the current device."""
+    req = str(requested_dtype).strip().lower()
+    if req == "fp32":
+        return torch.float32
+    if not str(device).startswith("cuda"):
+        if req in {"fp16", "bf16", "auto"}:
+            print("⚠ Non-CUDA device detected. Falling back to fp32.")
+        return torch.float32
+    if req == "fp16":
+        return torch.float16
+    if req == "bf16":
+        if torch.cuda.is_bf16_supported():
+            return torch.bfloat16
+        print("⚠ bf16 requested but not supported on this GPU. Falling back to fp16.")
+        return torch.float16
+    # auto: prefer bf16 on supported GPUs to avoid fp16 underflow in tiny gradients.
+    if torch.cuda.is_bf16_supported():
+        return torch.bfloat16
+    return torch.float16
+def dtype_to_name(dtype: torch.dtype) -> str:
+    if dtype == torch.float16:
+        return "fp16"
+    if dtype == torch.bfloat16:
+        return "bf16"
+    return "fp32"
+def seed_everything(seed: int):
+    """Locks down all random number generators for absolute reproducibility."""
+    # 1. Python & Numpy
+    random.seed(seed)
+    np.random.seed(seed)
+    # 2. PyTorch Base
+    torch.manual_seed(seed)
+    if torch.cuda.is_available():
+        torch.cuda.manual_seed(seed)
+        torch.cuda.manual_seed_all(seed) # For multi-GPU
+    # 3. cuDNN Determinism (Crucial for consistent gradients)
+    torch.backends.cudnn.deterministic = True
+    torch.backends.cudnn.benchmark = False
+    # 4. Optional: Force deterministic algorithms for PyTorch 2.0+
+    # Uncomment if variance persists, but it may slow down generation slightly
+    # torch.use_deterministic_algorithms(True)
+class MLP(nn.Module):
+    """MLP for aesthetic scoring."""
+    def __init__(self):
+        super().__init__()
+        self.layers = nn.Sequential(
+            nn.Linear(768, 1024),
+            nn.Dropout(0.2),
+            nn.Linear(1024, 128),
+            nn.Dropout(0.2),
+            nn.Linear(128, 64),
+            nn.Dropout(0.1),
+            nn.Linear(64, 16),
+            nn.Linear(16, 1),
+        )
+    @torch.no_grad()
+    def forward(self, embed):
+        return self.layers(embed)
+class AestheticScorer(torch.nn.Module):
+    """Aesthetic scorer using CLIP and MLP."""
+    def __init__(self, dtype, device, clip_name_or_path="openai/clip-vit-large-patch14",
+                 aesthetic_path="./sac+logos+ava1-l14-linearMSE.pth"):
+        super().__init__()
+        self.clip = CLIPModel.from_pretrained(clip_name_or_path)
+        self.processor = CLIPProcessor.from_pretrained(clip_name_or_path)
+        self.mlp = MLP()
+        # Load aesthetic weights
+        if os.path.exists(aesthetic_path):
+            state_dict = torch.load(aesthetic_path, map_location='cpu')
+            self.mlp.load_state_dict(state_dict)
+        else:
+            print(f"Warning: Aesthetic weights not found at {aesthetic_path}")
+        self.dtype = dtype
+        self.to(device)
+        self.eval()
+    @torch.no_grad()
+    def __call__(self, images):
+        device = next(self.parameters()).device
+        inputs = self.processor(images=images, return_tensors="pt")
+        inputs = {k: v.to(self.dtype).to(device) for k, v in inputs.items()}
+        embed = self.clip.get_image_features(**inputs)
+        # normalize embedding
+        embed = embed / torch.linalg.vector_norm(embed, dim=-1, keepdim=True)
+        return self.mlp(embed).squeeze(1)
+class TeeLogger:
+    """Logger that writes to both console and file."""
+    def __init__(self, log_file):
+        self.terminal = sys.stdout
+        self.log = open(log_file, 'w')
+    def write(self, message):
+        self.terminal.write(message)
+        self.log.write(message)
+        self.log.flush()
+    def flush(self):
+        self.terminal.flush()
+        self.log.flush()
+    def close(self):
+        self.log.close()
+def setup_logging(output_dir):
+    """Setup logging to both console and file."""
+    output_path = Path(output_dir)
+    output_path.mkdir(parents=True, exist_ok=True)
+    log_file = output_path / "log.log"
+    # Redirect stdout to both console and file
+    tee = TeeLogger(log_file)
+    sys.stdout = tee
+    return tee, log_file
+def load_validation_data(data_dir, max_samples=None, dataset_type="coco", hf_cache_dir=None, offline=False):
+    """Load validation prompts and image paths.
+    Args:
+        data_dir: Path to data directory
+        max_samples: Maximum number of samples to load
+        dataset_type: Type of dataset ("coco" or "pickapic")
+    Returns:
+        prompts: List of text prompts
+        image_paths: List of image paths (None for pickapic streaming dataset)
+    """
+    if dataset_type == "coco":
+        data_dir = Path(data_dir)
+        val_json = data_dir / "coco" / "caption_val.json"
+        if not val_json.exists():
+            raise FileNotFoundError(f"Validation JSON not found: {val_json}")
+        with open(val_json, 'r') as f:
+            data = json.load(f)
+        # Validate that image folder exists
+        val_img_dir = data_dir / "coco" / "images" / "val"
+        if not val_img_dir.exists():
+            raise FileNotFoundError(f"Validation image directory not found: {val_img_dir}")
+        # Parse data
+        prompts = []
+        image_paths = []
+        for img_path, caption in data.items():
+            full_path = data_dir / "coco" / img_path
+            if full_path.exists():
+                prompts.append(caption)
+                image_paths.append(str(full_path))
+            else:
+                print(f"Warning: Image not found: {full_path}")
+        if max_samples:
+            prompts = prompts[:max_samples]
+            image_paths = image_paths[:max_samples]
+        print(f"Loaded {len(prompts)} COCO validation samples")
+        return prompts, image_paths
+    elif dataset_type == "pickapic":
+        print("Loading Pick-a-Pic validation prompts...")
+        prompts = load_pickapic_prompts(max_samples=max_samples, cache_dir=hf_cache_dir, offline=offline)
+        print(f"Loaded {len(prompts)} Pick-a-Pic validation samples")
+        return prompts, None  # No reference images for Pick-a-Pic
+    else:
+        raise ValueError(f"Unknown dataset type: {dataset_type}. Choose 'coco' or 'pickapic'.")
+def generate_and_evaluate(
+    pipeline,
+    prompts,
+    image_paths,
+    device,
+    dtype,
+    num_inference_steps=20,
+    guidance_scale=7.5,
+    seed=42,
+    batch_size=1,
+    apply_gradient_ascent=False,
+    mode_name="baseline",
+    log_interval=10,
+    output_dir=None,
+    save_images=False,
+    clip_scorer=None,
+    aesthetic_scorer=None,
+    pick_scorer=None,
+    hpsv2_scorer=None,
+    hpsv21_scorer=None,
+    imagereward_scorer=None,
+    compute_fid=True,
+    capture_trajectory=False
+):
+    """Generate images and update FID metric."""
+    pipeline.to(device)
+    print(f"\nGenerating images with {mode_name} mode...")
+    all_rewards = []
+    all_clip_scores = []
+    all_aesthetic_scores = []
+    all_pick_scores = []
+    all_hpsv2_scores = []
+    all_hpsv21_scores = []
+    all_imagereward_scores = []
+    lr_history_first_image = None  # Store LR history for first image
+    trajectory_first_image = []
+    num_batches = (len(prompts) + batch_size - 1) // batch_size
+    # Create output directory if saving images
+    if save_images and output_dir:
+        mode_output_dir = Path(output_dir) / mode_name
+        mode_output_dir.mkdir(parents=True, exist_ok=True)
+    # Disable internal progress bars
+    pipeline.set_progress_bar_config(disable=True)
+    for idx, i in enumerate(tqdm(range(0, len(prompts), batch_size), desc=f"Generating {mode_name}")):
+        batch_prompts = prompts[i:i+batch_size]
+        batch_real_paths = image_paths[i:i+batch_size] if image_paths is not None else None
+        batch_num = idx + 1
+        # Initialize FID metric if needed
+        fid_metric = None
+        real_images_tensor = None
+        if compute_fid and batch_real_paths is not None:
+            fid_metric = FrechetInceptionDistance().to(device)
+            # Load and update FID with real images for this batch
+            real_images = []
+            for path in batch_real_paths:
+                img = Image.open(path).convert("RGB")
+                img = img.resize((512, 512))  # Inception v3 input size
+                img_array = np.array(img)
+                real_images.append(img_array)
+            # Convert to tensor [B, H, W, C] -> [B, C, H, W]
+            real_images_tensor = torch.from_numpy(np.stack(real_images)).permute(0, 3, 1, 2).float()
+            real_images_tensor = real_images_tensor.to(device)
+        # Generate images
+        generator = torch.Generator(device=device).manual_seed(seed + i)
+        # Only capture trajectory for the very first batch to save RAM
+        def trajectory_callback(step, timestep, latents):
+            if idx == 0 and capture_trajectory:
+                # Detach and move to CPU immediately to prevent VRAM OOM
+                trajectory_first_image.append(latents.detach().cpu().clone())
+        with torch.no_grad():
+            result = pipeline(
+                prompt=batch_prompts,
+                num_inference_steps=num_inference_steps,
+                guidance_scale=guidance_scale,
+                generator=generator,
+                track_rewards=True,
+                print_rewards=False,
+                apply_gradient_ascent=apply_gradient_ascent,
+                verbose_grad=False,
+                callback=trajectory_callback if capture_trajectory else None,
+                callback_steps=1
+            )
+        # Process generated images
+        images = result.images
+        # Update FID metric if computing it
+        if compute_fid and fid_metric is not None:
+            image_tensors = []
+            for img in images:
+                img_resized = img.resize((512, 512))  # Inception v3 input size
+                img_array = np.array(img_resized)
+                image_tensors.append(img_array)
+            # Convert to tensor and update FID
+            images_tensor = torch.from_numpy(np.stack(image_tensors)).permute(0, 3, 1, 2).float()
+            images_tensor = images_tensor.to(device)
+            if batch_size == 1:
+                real_images_tensor = torch.cat([real_images_tensor, real_images_tensor], dim=0).to(dtype=torch.uint8)
+                images_tensor = torch.cat([images_tensor, images_tensor], dim=0).to(dtype=torch.uint8)
+            fid_metric.update(real_images_tensor, real=True)
+            fid_metric.update(images_tensor, real=False)
+        # Track rewards - get the final timestep reward (t=0)
+        current_batch_final_reward = None
+        current_batch_final_timestep = None
+        if hasattr(pipeline, 'reward_history') and pipeline.reward_history:
+            # For each image, get the reward from the last denoising step (t=0 or closest to 0)
+            num_steps_per_image = num_inference_steps
+            # Get the last entry which corresponds to the final timestep of the last image in batch
+            final_entry = pipeline.reward_history[-1]
+            current_batch_final_reward = final_entry['reward_score']
+            current_batch_final_timestep = final_entry['timestep']
+            all_rewards.append(current_batch_final_reward)
+        # Capture LR history from first image if gradient ascent is enabled
+        if apply_gradient_ascent and idx == 0 and lr_history_first_image is None:
+            if hasattr(pipeline, 'grad_guidance') and pipeline.grad_guidance:
+                grad_stats = pipeline.grad_guidance.get_statistics()
+                if grad_stats and 'detailed_stats' in grad_stats:
+                    # Extract LR history from the gradient ascent statistics
+                    lr_history_first_image = {
+                        'prompt': batch_prompts[0],
+                        'timesteps': [],
+                        'learning_rates': [],  # All LR values from all gradient steps
+                        'rewards': []
+                    }
+                    for stat in grad_stats['detailed_stats']:
+                        lr_history_first_image['timesteps'].append(stat['timestep'])
+                        if 'lr_history' in stat:
+                            # Extend with all LR values from this timestep's gradient steps
+                            lr_history_first_image['learning_rates'].extend(stat['lr_history'])
+                        # Collect all rewards from reward_history for each gradient step
+                        if 'reward_history' in stat:
+                            lr_history_first_image['rewards'].extend(stat['reward_history'])
+        # Compute CLIP score
+        if clip_scorer is not None:
+            clip_device = next(clip_scorer.parameters()).device
+            # Convert PIL images to tensor format for CLIP score [C, H, W] in range [0, 1]
+            for img, prompt in zip(images, batch_prompts):
+                img_array = np.array(img).astype(np.float32)
+                img_tensor = torch.from_numpy(img_array).permute(2, 0, 1).unsqueeze(0).to(clip_device)
+                clip_score = clip_scorer(img_tensor, [prompt]).item()
+                all_clip_scores.append(clip_score)
+        # Compute aesthetic score
+        if aesthetic_scorer is not None:
+            aesthetic_scores = aesthetic_scorer(images)
+            if isinstance(aesthetic_scores, torch.Tensor):
+                aesthetic_scores = aesthetic_scores.cpu().numpy()
+            if aesthetic_scores.ndim == 0:
+                aesthetic_scores = [aesthetic_scores.item()]
+            all_aesthetic_scores.extend(aesthetic_scores.tolist() if hasattr(aesthetic_scores, 'tolist') else [aesthetic_scores])
+        # Compute PickScore
+        if pick_scorer is not None:
+            for img, prompt in zip(images, batch_prompts):
+                pick_score = pick_scorer(prompt, [img])[0]
+                all_pick_scores.append(pick_score)
+        # Compute HPSv2 score
+        if hpsv2_scorer is not None:
+            for img, prompt in zip(images, batch_prompts):
+                hpsv2_score = hpsv2_scorer.score(img, prompt)[0]
+                all_hpsv2_scores.append(hpsv2_score)
+        # Compute HPSv2.1 score
+        if hpsv21_scorer is not None:
+            for img, prompt in zip(images, batch_prompts):
+                hpsv21_score = hpsv21_scorer.score(img, prompt)[0]
+                all_hpsv21_scores.append(hpsv21_score)
+        # Compute ImageReward score
+        if imagereward_scorer is not None:
+            for img, prompt in zip(images, batch_prompts):
+                imagereward_score = imagereward_scorer.score(prompt, img)
+                all_imagereward_scores.append(imagereward_score)
+        # Save generated images if requested
+        if save_images and output_dir:
+            for img_idx, img in enumerate(images):
+                global_idx = i + img_idx
+                img_path = mode_output_dir / f"sample_{global_idx:05d}.png"
+                img.save(img_path)
+        # Log intermediate FID and metrics every log_interval batches
+        if batch_num % log_interval == 0 or batch_num == num_batches:
+            num_samples_processed = min(i + batch_size, len(prompts))
+            log_msg = f"\n[{mode_name}] Batch {batch_num}/{num_batches} | Samples: {num_samples_processed}/{len(prompts)}"
+            # Add FID if computing
+            if compute_fid and fid_metric is not None:
+                try:
+                    current_fid = fid_metric.compute().item()
+                    log_msg += f" | FID: {current_fid:.4f}"
+                except Exception as e:
+                    log_msg += f" | FID: Computing..."
+            # Add reward - show both final timestep reward and average
+            if all_rewards:
+                avg_reward = np.mean(all_rewards)
+                if current_batch_final_reward is not None:
+                    log_msg += f" | Reward (t={current_batch_final_timestep}): {current_batch_final_reward:.4f}"
+                    log_msg += f" | Reward (Avg): {avg_reward:.4f}"
+                else:
+                    log_msg += f" | Reward (Avg): {avg_reward:.4f}"
+            # Add CLIP if computing
+            if clip_scorer is not None and all_clip_scores:
+                log_msg += f" | CLIP: {np.mean(all_clip_scores):.4f}"
+            # Add aesthetic if computing
+            if aesthetic_scorer is not None and all_aesthetic_scores:
+                log_msg += f" | Aesthetic: {np.mean(all_aesthetic_scores):.4f}"
+            # Add PickScore
+            if pick_scorer is not None and all_pick_scores:
+                log_msg += f" | PickScore: {np.mean(all_pick_scores):.4f}"
+            # Add HPSv2
+            if hpsv2_scorer is not None and all_hpsv2_scores:
+                log_msg += f" | HPSv2: {np.mean(all_hpsv2_scores):.4f}"
+            # Add HPSv2.1
+            if hpsv21_scorer is not None and all_hpsv21_scores:
+                log_msg += f" | HPSv2.1: {np.mean(all_hpsv21_scores):.4f}"
+            # Add ImageReward
+            if imagereward_scorer is not None and all_imagereward_scores:
+                log_msg += f" | ImageReward: {np.mean(all_imagereward_scores):.4f}"
+            print(log_msg)
+    # Re-enable progress bars
+    pipeline.set_progress_bar_config(disable=False)
+    avg_reward = np.mean(all_rewards) if all_rewards else 0.0
+    avg_clip_score = np.mean(all_clip_scores) if all_clip_scores else 0.0
+    avg_aesthetic_score = np.mean(all_aesthetic_scores) if all_aesthetic_scores else 0.0
+    avg_pick_score = np.mean(all_pick_scores) if all_pick_scores else 0.0
+    avg_hpsv2_score = np.mean(all_hpsv2_scores) if all_hpsv2_scores else 0.0
+    avg_hpsv21_score = np.mean(all_hpsv21_scores) if all_hpsv21_scores else 0.0
+    avg_imagereward_score = np.mean(all_imagereward_scores) if all_imagereward_scores else 0.0
+    return avg_reward, fid_metric, avg_clip_score, avg_aesthetic_score, avg_pick_score, avg_hpsv2_score, avg_hpsv21_score, avg_imagereward_score, lr_history_first_image, trajectory_first_image
+def auto_increment_path(base_path):
+    """
+    Create an auto-incrementing run folder inside base_path.
+    Returns: base_path/run_1, base_path/run_2, etc.
+    """
+    base_path = Path(base_path)
+    base_path.mkdir(parents=True, exist_ok=True)  # Ensure base directory exists
+    i = 1
+    while True:
+        new_path = base_path / f"run_{i}"
+        if not new_path.exists():
+            return new_path
+        i += 1
+def main():
+    parser = argparse.ArgumentParser(description="Evaluate baseline and gradient ascent pipelines")
+    parser.add_argument("--data_dir", type=str, default="./data", help="Path to data directory")
+    parser.add_argument("--dataset_type", type=str, default="coco", choices=["coco", "pickapic"],
+                        help="Dataset to use for evaluation: coco or pickapic (default: coco)")
+    parser.add_argument("--base_model", type=str, default=None, help="Override SANA base model repo id")
+    parser.add_argument(
+        "--model_variant",
+        type=str,
+        default="sana_600m_512",
+        choices=list(SANA_PROFILE_TO_MODEL_ID.keys()),
+        help="SANA model profile to use",
+    )
+    parser.add_argument("--lrm_model", type=str, default=None, help="SANA reward checkpoint path (directory or model.safetensors).")
+    parser.add_argument("--hf_cache_dir", type=str, default="/scratch/rr81/ma5430/.cache/huggingface/hub", help="Shared HF cache directory")
+    parser.add_argument("--offline", action="store_true", help="Force fully offline mode (recommended on GPU nodes)")
+    parser.add_argument("--num_steps", type=int, default=50, help="Number of inference steps")
+    parser.add_argument("--cfg_scale", type=float, default=4.5, help="Classifier-free guidance scale")
+    parser.add_argument("--dtype", type=str, default="bf16", choices=["auto", "bf16", "fp16", "fp32"],
+                        help="Generation/reward dtype. bf16 is recommended for tiny gradient stability.")
+    parser.add_argument("--seed", type=int, default=42, help="Random seed")
+    parser.add_argument("--max_samples", type=int, default=None, help="Max samples to evaluate (None for all)")
+    parser.add_argument("--batch_size", type=int, default=1, help="Batch size for generation (use 1 for reward model compatibility)")
+    parser.add_argument("--fid_batch_size", type=int, default=32, help="Batch size for FID computation")
+    parser.add_argument("--log_interval", type=int, default=10, help="Log FID and metrics every N batches")
+    parser.add_argument("--output_dir", type=str, default="eval_outputs", help="Directory to save generated images and results")
+    parser.add_argument("--save_images", action="store_true", help="Save all generated images to output directory")
+    parser.add_argument("--mode", type=str, default="both", choices=["baseline", "gradient_ascent", "both"],
+                        help="Which evaluation to run: baseline, gradient_ascent, or both (default: both)")
+    # Metrics selection
+    parser.add_argument("--metrics", type=str, nargs="+", default=["clip", "aesthetic"],
+                        choices=["fid", "clip", "aesthetic", "pickscore", "hpsv2", "hpsv21", "imagereward"],
+                        help="Which metrics to evaluate (default: clip aesthetic)")
+    parser.add_argument("--scorer_device", type=str, default="auto", choices=["auto", "cpu", "cuda"],
+                        help="Device for metric scorers. auto keeps scorers on GPU only when enough VRAM is free.")
+    # Gradient ascent config
+    parser.add_argument("--grad_config", type=str, default=None,
+                        help=f"Gradient ascent config preset (available: {', '.join(list_configs())}). "
+                             "If provided, overrides individual grad_* arguments.")
+    parser.add_argument("--grad_range_start", type=int, default=0, help="Gradient timestep range start")
+    parser.add_argument("--grad_range_end", type=int, default=700, help="Gradient timestep range end")
+    parser.add_argument("--grad_steps", type=int, default=5, help="Number of gradient steps per timestep (use 5 for better reward improvement)")
+    parser.add_argument("--grad_step_size", type=float, default=0.1, help="Gradient step size (initial LR)")
+    # Config overrides (these override values from grad_config if specified)
+    parser.add_argument("--override_momentum", type=float, default=None, help="Override momentum value from grad_config")
+    parser.add_argument("--override_num_grad_steps", type=int, default=None, help="Override num_grad_steps from grad_config")
+    parser.add_argument("--override_grad_step_size", type=float, default=None, help="Override grad_step_size from grad_config")
+    # Cuda
+    parser.add_argument("--cuda", type=int, default=0, help="Use CUDA device id")
+    args = parser.parse_args()
+    hf_cache_dir, offline_enabled = configure_hf_runtime(args.hf_cache_dir, force_offline=args.offline)
+    if args.base_model is None:
+        args.base_model = SANA_PROFILE_TO_MODEL_ID[args.model_variant]
+    if args.lrm_model is None:
+        args.lrm_model = resolve_default_lrm_model()
+    if not args.lrm_model:
+        raise ValueError(
+            "No SANA reward checkpoint found. Provide --lrm_model pointing to checkpoint dir or model.safetensors"
+        )
+    seed_everything(args.seed)
+    # Configuration
+    device = f"cuda:{args.cuda}" if torch.cuda.is_available() else "cpu"
+    dtype = resolve_generation_dtype(args.dtype, device)
+    os.environ["ACCELERATE_MIXED_PRECISION"] = dtype_to_name(dtype)
+    configure_cudnn_safely(device)
+    # Create auto-incremented output directory
+    args.output_dir = auto_increment_path(args.output_dir)
+    # Setup logging to file
+    tee_logger, log_file = setup_logging(args.output_dir)
+    print("="*70)
+    print("FID EVALUATION: BASELINE vs GRADIENT ASCENT")
+    print("="*70)
+    print(f"\nLogging to: {log_file}")
+    print(f"\nDevice: {device}")
+    print(f"Dataset: {args.dataset_type.upper()}")
+    print(f"Data directory: {args.data_dir}")
+    print(f"Base model: {args.base_model}")
+    print(f"Model variant: {args.model_variant}")
+    print(f"LRM model: {args.lrm_model}")
+    print(f"HF cache dir: {hf_cache_dir or 'default'}")
+    print(f"HF offline mode: {offline_enabled}")
+    print(f"Inference steps: {args.num_steps}")
+    print(f"CFG scale: {args.cfg_scale}")
+    print(f"Batch size: {args.batch_size}")
+    print(f"Max samples: {args.max_samples or 'All'}")
+    print(f"Generation dtype: {dtype_to_name(dtype)}")
+    print(f"Output directory: {args.output_dir}")
+    print(f"Save images: {args.save_images}")
+    print(f"Evaluation mode: {args.mode}")
+    print(f"Metrics to evaluate: {', '.join(args.metrics).upper()}")
+    if args.grad_config:
+        print(f"Gradient ascent config: {args.grad_config}")
+    # Load validation data
+    print("\n" + "="*70)
+    print("1. LOADING VALIDATION DATA")
+    print("="*70)
+    prompts, image_paths = load_validation_data(
+        args.data_dir,
+        args.max_samples,
+        args.dataset_type,
+        hf_cache_dir=hf_cache_dir,
+        offline=offline_enabled,
+    )
+    # Automatically disable FID if no reference images available (e.g., Pick-a-Pic dataset)
+    can_compute_fid = image_paths is not None
+    if not can_compute_fid and "fid" in args.metrics:
+        print("\n⚠ Warning: FID metric requested but no reference images available. FID will be skipped.")
+        args.metrics = [m for m in args.metrics if m != "fid"]
+    # Load reward model
+    print("\n" + "="*70)
+    print("2. LOADING REWARD MODEL")
+    print("="*70)
+    reward_model = LRMRewardModel(
+        pretrained_model_name_or_path=args.base_model,
+        lrm_model_path=args.lrm_model,
+        model_profile=args.model_variant,
+        guidance_scale=args.cfg_scale,
+        device=device
+    )
+    if dtype == torch.float16:
+        reward_model = reward_model.half()
+    elif dtype == torch.bfloat16:
+        reward_model = reward_model.to(dtype=torch.bfloat16)
+    else:
+        reward_model = reward_model.to(dtype=torch.float32)
+    reward_model.eval()
+    print("✓ Reward model loaded")
+    # Load pipeline
+    print("\n" + "="*70)
+    print("3. LOADING PIPELINE")
+    print("="*70)
+    pretrained_kwargs = {"local_files_only": offline_enabled}
+    if hf_cache_dir:
+        pretrained_kwargs["cache_dir"] = hf_cache_dir
+    base_pipeline = SanaPipeline.from_pretrained(
+        args.base_model,
+        torch_dtype=dtype,
+        **pretrained_kwargs,
+    )
+    print(f"✓ Loaded SANA base model: {args.base_model}")
+    pipeline = SanaGradientAscentPipeline(**base_pipeline.components)
+    pipeline = pipeline.to(device)
+    pipeline.set_reward_model(reward_model)
+    print("✓ Pipeline loaded")
+    scorer_device = resolve_scorer_device(args.scorer_device, device)
+    scorer_dtype = dtype if str(scorer_device).startswith("cuda") else torch.float32
+    print(f"Scorer device: {scorer_device}")
+    if torch.cuda.is_available():
+        torch.cuda.empty_cache()
+    # Load CLIP scorer
+    print("\n" + "="*70)
+    print("3.5. LOADING CLIP AND AESTHETIC SCORERS")
+    print("="*70)
+    # Only load scorers for requested metrics
+    clip_scorer = None
+    aesthetic_scorer = None
+    pick_scorer = None
+    hpsv2_scorer = None
+    hpsv21_scorer = None
+    imagereward_scorer = None
+    if "clip" in args.metrics:
+        try:
+            clip_scorer = CLIPScore(model_name_or_path="openai/clip-vit-large-patch14").to(scorer_device)
+            print("✓ CLIP scorer loaded")
+        except Exception as e:
+            print(f"Warning: Could not load CLIP scorer: {e}")
+            clip_scorer = None
+    else:
+        print("⊘ CLIP scorer skipped (not in selected metrics)")
+    if "aesthetic" in args.metrics:
+        try:
+            aesthetic_scorer = AestheticScorer(dtype=scorer_dtype, device=scorer_device)
+            print("✓ Aesthetic scorer loaded")
+        except Exception as e:
+            print(f"Warning: Could not load Aesthetic scorer: {e}")
+            aesthetic_scorer = None
+    else:
+        print("⊘ Aesthetic scorer skipped (not in selected metrics)")
+    if "pickscore" in args.metrics:
+        try:
+            from pick_score import PickScorer
+            pick_scorer = PickScorer(
+                processor_name_or_path="laion/CLIP-ViT-H-14-laion2B-s32B-b79K",
+                model_pretrained_name_or_path="yuvalkirstain/PickScore_v1",
+                device=scorer_device
+            )
+            print("✓ PickScore scorer loaded")
+        except Exception as e:
+            print(f"Warning: Could not load PickScore scorer: {e}")
+            pick_scorer = None
+    else:
+        print("⊘ PickScore scorer skipped (not in selected metrics)")
+    if "hpsv2" in args.metrics:
+        try:
+            from hpsv2_score import HPSv2Scorer
+            hf_dl_kwargs = {"local_files_only": offline_enabled}
+            if hf_cache_dir:
+                hf_dl_kwargs["cache_dir"] = hf_cache_dir
+            hpsv2_scorer = HPSv2Scorer(
+                clip_pretrained_name_or_path=hf_hub_download(
+                    repo_id="laion/CLIP-ViT-H-14-laion2B-s32B-b79K",
+                    filename="open_clip_pytorch_model.bin",
+                    **hf_dl_kwargs,
+                ),
+                model_pretrained_name_or_path=hf_hub_download(
+                    repo_id="xswu/HPSv2",
+                    filename="HPS_v2_compressed.pt",
+                    **hf_dl_kwargs,
+                ),
+                device=scorer_device
+            )
+            print("✓ HPSv2 scorer loaded")
+        except Exception as e:
+            print(f"Warning: Could not load HPSv2 scorer: {e}")
+            hpsv2_scorer = None
+    else:
+        print("⊘ HPSv2 scorer skipped (not in selected metrics)")
+    if "hpsv21" in args.metrics:
+        try:
+            from hpsv2_score import HPSv2Scorer
+            hf_dl_kwargs = {"local_files_only": offline_enabled}
+            if hf_cache_dir:
+                hf_dl_kwargs["cache_dir"] = hf_cache_dir
+            hpsv21_scorer = HPSv2Scorer(
+                clip_pretrained_name_or_path=hf_hub_download(
+                    repo_id="laion/CLIP-ViT-H-14-laion2B-s32B-b79K",
+                    filename="open_clip_pytorch_model.bin",
+                    **hf_dl_kwargs,
+                ),
+                model_pretrained_name_or_path=hf_hub_download(
+                    repo_id="xswu/HPSv2",
+                    filename="HPS_v2.1_compressed.pt",
+                    **hf_dl_kwargs,
+                ),
+                device=scorer_device
+            )
+            print("✓ HPSv2.1 scorer loaded")
+        except Exception as e:
+            print(f"Warning: Could not load HPSv2.1 scorer: {e}")
+            hpsv21_scorer = None
+    else:
+        print("⊘ HPSv2.1 scorer skipped (not in selected metrics)")
+    if "imagereward" in args.metrics:
+        try:
+            from imagereward_score import load_imagereward
+            hf_dl_kwargs = {"local_files_only": offline_enabled}
+            if hf_cache_dir:
+                hf_dl_kwargs["cache_dir"] = hf_cache_dir
+            imagereward_scorer = load_imagereward(
+                model_path=hf_hub_download(repo_id="THUDM/ImageReward", filename="ImageReward.pt", **hf_dl_kwargs),
+                med_config=hf_hub_download(repo_id="THUDM/ImageReward", filename="med_config.json", **hf_dl_kwargs),
+                device=scorer_device
+            )
+            print("✓ ImageReward scorer loaded")
+        except Exception as e:
+            print(f"Warning: Could not load ImageReward scorer: {e}")
+            imagereward_scorer = None
+    else:
+        print("⊘ ImageReward scorer skipped (not in selected metrics)")
+    # Configure gradient ascent
+    print("\n" + "="*70)
+    print("4. CONFIGURING GRADIENT ASCENT")
+    print("="*70)
+    # Use config preset if provided, otherwise use individual args
+    if args.grad_config:
+        print(f"Loading gradient ascent config: {args.grad_config}")
+        grad_config = get_config(args.grad_config)
+        print(f"Config loaded: {grad_config}")
+        # Apply overrides if specified
+        if args.override_momentum is not None:
+            grad_config['momentum'] = args.override_momentum
+            print(f"  Overriding momentum: {args.override_momentum}")
+        if args.override_num_grad_steps is not None:
+            grad_config['num_grad_steps'] = args.override_num_grad_steps
+            print(f"  Overriding num_grad_steps: {args.override_num_grad_steps}")
+        if args.override_grad_step_size is not None:
+            grad_config['grad_step_size'] = args.override_grad_step_size
+            print(f"  Overriding grad_step_size: {args.override_grad_step_size}")
+    else:
+        grad_config = {
+            "grad_timestep_range": (args.grad_range_start, args.grad_range_end),
+            "num_grad_steps": args.grad_steps,
+            "grad_step_size": args.grad_step_size,
+        }
+        print(f"Using manual gradient ascent configuration")
+    print(f"Gradient timestep range: {grad_config.get('grad_timestep_range', (args.grad_range_start, args.grad_range_end))}")
+    print(f"Gradient steps: {grad_config.get('num_grad_steps', args.grad_steps)}")
+    print(f"Gradient step size (initial LR): {grad_config.get('grad_step_size', args.grad_step_size)}")
+    if grad_config.get('lr_scheduler_type'):
+        print(f"LR Scheduler: {grad_config['lr_scheduler_type']}")
+    if grad_config.get('use_momentum'):
+        print(f"Momentum: {grad_config.get('momentum', 0.9)} (Nesterov: {grad_config.get('use_nesterov', False)})")
+    pipeline.enable_gradient_ascent(**grad_config)
+    # Initialize result variables
+    fid_score_baseline = None
+    avg_reward_baseline = None
+    clip_score_baseline = None
+    aesthetic_score_baseline = None
+    pick_score_baseline = None
+    hpsv2_score_baseline = None
+    hpsv21_score_baseline = None
+    imagereward_score_baseline = None
+    fid_score_grad = None
+    avg_reward_grad = None
+    clip_score_grad = None
+    aesthetic_score_grad = None
+    pick_score_grad = None
+    hpsv2_score_grad = None
+    hpsv21_score_grad = None
+    imagereward_score_grad = None
+    grad_stats = None
+    # ========== BASELINE EVALUATION ==========
+    if args.mode in ["baseline", "both"]:
+        print("\n" + "="*70)
+        print("5. EVALUATING BASELINE")
+        print("="*70)
+        # Generate and evaluate baseline
+        avg_reward_baseline, fid_baseline, clip_score_baseline, aesthetic_score_baseline, pick_score_baseline, hpsv2_score_baseline, hpsv21_score_baseline, imagereward_score_baseline, _, baseline_trajectory = generate_and_evaluate(
+            pipeline=pipeline,
+            prompts=prompts,
+            image_paths=image_paths,
+            device=device,
+            dtype=dtype,
+            num_inference_steps=args.num_steps,
+            guidance_scale=args.cfg_scale,
+            seed=args.seed,
+            batch_size=args.batch_size,
+            apply_gradient_ascent=False,
+            mode_name="baseline",
+            log_interval=args.log_interval,
+            output_dir=args.output_dir,
+            save_images=args.save_images,
+            clip_scorer=clip_scorer,
+            aesthetic_scorer=aesthetic_scorer,
+            pick_scorer=pick_scorer,
+            hpsv2_scorer=hpsv2_scorer,
+            hpsv21_scorer=hpsv21_scorer,
+            imagereward_scorer=imagereward_scorer,
+            compute_fid=("fid" in args.metrics and can_compute_fid),
+            capture_trajectory=True
+        )
+        # Compute FID for baseline if requested
+        if "fid" in args.metrics and fid_baseline is not None:
+            fid_score_baseline = fid_baseline.compute().item()
+            print(f"\n✓ Baseline FID: {fid_score_baseline:.4f}")
+        print(f"✓ Baseline Avg Reward: {avg_reward_baseline:.4f}")
+        if "clip" in args.metrics:
+            print(f"✓ Baseline Avg CLIP Score: {clip_score_baseline:.4f}")
+        if "aesthetic" in args.metrics:
+            print(f"✓ Baseline Avg Aesthetic Score: {aesthetic_score_baseline:.4f}")
+        if "pickscore" in args.metrics and pick_score_baseline is not None:
+            print(f"✓ Baseline Avg PickScore: {pick_score_baseline:.4f}")
+        if "hpsv2" in args.metrics and hpsv2_score_baseline is not None:
+            print(f"✓ Baseline Avg HPSv2 Score: {hpsv2_score_baseline:.4f}")
+        if "hpsv21" in args.metrics and hpsv21_score_baseline is not None:
+            print(f"✓ Baseline Avg HPSv2.1 Score: {hpsv21_score_baseline:.4f}")
+        if "imagereward" in args.metrics and imagereward_score_baseline is not None:
+            print(f"✓ Baseline Avg ImageReward: {imagereward_score_baseline:.4f}")
+    # ========== GRADIENT ASCENT EVALUATION ==========
+    if args.mode in ["gradient_ascent", "both"]:
+        print("\n" + "="*70)
+        print("6. EVALUATING GRADIENT ASCENT")
+        print("="*70)
+        # Generate and evaluate with gradient ascent
+        avg_reward_grad, fid_grad, clip_score_grad, aesthetic_score_grad, pick_score_grad, hpsv2_score_grad, hpsv21_score_grad, imagereward_score_grad, lr_history, guided_trajectory = generate_and_evaluate(
+            pipeline=pipeline,
+            prompts=prompts,
+            image_paths=image_paths,
+            device=device,
+            dtype=dtype,
+            num_inference_steps=args.num_steps,
+            guidance_scale=args.cfg_scale,
+            seed=args.seed,
+            batch_size=args.batch_size,
+            apply_gradient_ascent=True,
+            mode_name="gradient_ascent",
+            log_interval=args.log_interval,
+            output_dir=args.output_dir,
+            save_images=args.save_images,
+            clip_scorer=clip_scorer,
+            aesthetic_scorer=aesthetic_scorer,
+            pick_scorer=pick_scorer,
+            hpsv2_scorer=hpsv2_scorer,
+            hpsv21_scorer=hpsv21_scorer,
+            imagereward_scorer=imagereward_scorer,
+            compute_fid=("fid" in args.metrics and can_compute_fid),
+            capture_trajectory=True
+        )
+        # Compute FID for gradient ascent if requested
+        if "fid" in args.metrics and fid_grad is not None:
+            fid_score_grad = fid_grad.compute().item()
+            print(f"\n✓ Gradient Ascent FID: {fid_score_grad:.4f}")
+        print(f"✓ Gradient Ascent Avg Reward: {avg_reward_grad:.4f}")
+        if "clip" in args.metrics:
+            print(f"✓ Gradient Ascent Avg CLIP Score: {clip_score_grad:.4f}")
+        if "aesthetic" in args.metrics:
+            print(f"✓ Gradient Ascent Avg Aesthetic Score: {aesthetic_score_grad:.4f}")
+        if "pickscore" in args.metrics and pick_score_grad is not None:
+            print(f"✓ Gradient Ascent Avg PickScore: {pick_score_grad:.4f}")
+        if "hpsv2" in args.metrics and hpsv2_score_grad is not None:
+            print(f"✓ Gradient Ascent Avg HPSv2 Score: {hpsv2_score_grad:.4f}")
+        if "hpsv21" in args.metrics and hpsv21_score_grad is not None:
+            print(f"✓ Gradient Ascent Avg HPSv2.1 Score: {hpsv21_score_grad:.4f}")
+        if "imagereward" in args.metrics and imagereward_score_grad is not None:
+            print(f"✓ Gradient Ascent Avg ImageReward: {imagereward_score_grad:.4f}")
+        # Get gradient stats
+        grad_stats = pipeline.grad_guidance.get_statistics()
+        if grad_stats:
+            print(f"\nGradient Ascent Statistics:")
+            print(f"  Applications: {grad_stats['num_applications']}")
+            print(f"  Total reward improvement: {grad_stats['total_reward_improvement']:+.4f}")
+            print(f"  Avg reward improvement: {grad_stats['avg_reward_improvement']:+.4f}")
+        # Plot LR curve if we captured it
+        if lr_history is not None and lr_history['learning_rates']:
+            plot_path = Path(args.output_dir) / "lr_curve.png"
+            # LR values are now continuous across all gradient steps
+            lrs = lr_history['learning_rates']
+            steps = list(range(len(lrs)))  # Step indices (0 to total_steps-1)
+            plt.figure(figsize=(12, 6))
+            plt.plot(steps, lrs, linewidth=2, color='blue', alpha=0.8)
+            # Mark the first step with a star
+            plt.plot(steps[0], lrs[0], marker='*', markersize=20, color='gold',
+                    markeredgecolor='darkgoldenrod', markeredgewidth=2, zorder=5)
+            # Mark timestep boundaries
+            num_timesteps = len(lr_history['timesteps'])
+            num_grad_steps_per_timestep = len(lrs) // num_timesteps if num_timesteps > 0 else 0
+            if num_grad_steps_per_timestep > 0:
+                for i in range(num_timesteps + 1):
+                    step_idx = i * num_grad_steps_per_timestep
+                    if step_idx <= len(lrs):
+                        plt.axvline(x=step_idx, color='red', linestyle='--', alpha=0.3, linewidth=1)
+                        if i < num_timesteps:
+                            plt.text(step_idx, plt.ylim()[1] * 0.95, f't={lr_history["timesteps"][i]}',
+                                   fontsize=8, color='red', alpha=0.7, ha='left')
+            plt.xlabel('Global Gradient Step', fontsize=12)
+            plt.ylabel('Learning Rate', fontsize=12)
+            plt.title(f'Learning Rate Evolution Across All Gradient Steps\\nPrompt: "{lr_history["prompt"][:60]}..."',
+                     fontsize=12, fontweight='bold')
+            plt.grid(True, alpha=0.3)
+            # Add info text
+            num_timesteps = len(lr_history['timesteps'])
+            num_grad_steps_per_timestep = len(lrs) // num_timesteps if num_timesteps > 0 else 0
+            plt.text(0.02, 0.98,
+                    f'Total timesteps: {num_timesteps}\\nGrad steps/timestep: {num_grad_steps_per_timestep}\\nTotal grad steps: {len(lrs)}',
+                    transform=plt.gca().transAxes, fontsize=10, verticalalignment='top',
+                    bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))
+            plt.tight_layout()
+            plt.savefig(plot_path, dpi=150, bbox_inches='tight')
+            plt.close()
+            print(f"\n✓ Saved LR curve plot to: {plot_path}")
+            print(f"   Total gradient steps: {len(lrs)}")
+            print(f"   LR range: {min(lrs):.6f} → {max(lrs):.6f}")
+        # Plot Rewards curve if we captured it
+        if lr_history is not None and lr_history['rewards']:
+            plot_path = Path(args.output_dir) / "rewards_curve.png"
+            # Reward values are now continuous across all gradient steps
+            rewards = lr_history['rewards']
+            steps = list(range(len(rewards)))  # Step indices (0 to total_steps-1)
+            plt.figure(figsize=(12, 6))
+            plt.plot(steps, rewards, linewidth=2, color='green', alpha=0.8)
+            # Mark the first step with a star
+            plt.plot(steps[0], rewards[0], marker='*', markersize=20, color='gold',
+                    markeredgecolor='darkgoldenrod', markeredgewidth=2, zorder=5)
+            # Mark timestep boundaries
+            num_timesteps = len(lr_history['timesteps'])
+            # rewards has one extra value at the start (initial) compared to gradient steps
+            num_grad_steps_per_timestep = (len(rewards) - num_timesteps) // num_timesteps if num_timesteps > 0 else 0
+            if num_grad_steps_per_timestep > 0:
+                for i in range(num_timesteps + 1):
+                    step_idx = i * (num_grad_steps_per_timestep + 1)  # +1 because reward_history includes initial
+                    if step_idx <= len(rewards):
+                        plt.axvline(x=step_idx, color='red', linestyle='--', alpha=0.3, linewidth=1)
+                        if i < num_timesteps:
+                            plt.text(step_idx, plt.ylim()[1] * 0.95, f't={lr_history["timesteps"][i]}',
+                                   fontsize=8, color='red', alpha=0.7, ha='left')
+            plt.xlabel('Global Gradient Step', fontsize=12)
+            plt.ylabel('Reward Score', fontsize=12)
+            plt.title(f'Reward Evolution Across All Gradient Steps\nPrompt: "{lr_history["prompt"][:60]}..."',
+                     fontsize=12, fontweight='bold')
+            plt.grid(True, alpha=0.3)
+            # Add info text
+            num_timesteps = len(lr_history['timesteps'])
+            reward_improvement = rewards[-1] - rewards[0] if len(rewards) > 1 else 0
+            plt.text(0.02, 0.98,
+                    f'Total timesteps: {num_timesteps}\nTotal grad steps: {len(rewards)}\n'
+                    f'Initial reward: {rewards[0]:.4f}\nFinal reward: {rewards[-1]:.4f}\n'
+                    f'Improvement: {reward_improvement:+.4f}',
+                    transform=plt.gca().transAxes, fontsize=10, verticalalignment='top',
+                    bbox=dict(boxstyle='round', facecolor='lightgreen', alpha=0.5))
+            plt.tight_layout()
+            plt.savefig(plot_path, dpi=150, bbox_inches='tight')
+            plt.close()
+            print(f"\n✓ Saved Rewards curve plot to: {plot_path}")
+            print(f"   Total gradient steps: {len(rewards)}")
+            print(f"   Reward range: {min(rewards):.4f} → {max(rewards):.4f}")
+            print(f"   Total improvement: {reward_improvement:+.4f}")
+        # ---> NEW: PLOT TRAJECTORY DIVERGENCE (MANIFOLD DRIFT) <---
+        if args.mode == "both" and 'baseline_trajectory' in locals() and 'guided_trajectory' in locals():
+            if len(baseline_trajectory) == len(guided_trajectory) and len(baseline_trajectory) > 0:
+                print("\n" + "="*70)
+                print("7. CALCULATING TRAJECTORY DIVERGENCE (THEOREM 1 & 2)")
+                print("="*70)
+                drift_path = Path(args.output_dir) / "trajectory_drift.png"
+                l2_distances = []
+                # Calculate L2 norm ||z_t_guided - z_t_base||_2 for each step
+                for b_lat, g_lat in zip(baseline_trajectory, guided_trajectory):
+                    dist = torch.norm(g_lat.float() - b_lat.float(), p=2).item()
+                    l2_distances.append(dist)
+                steps = list(range(len(l2_distances)))
+                plt.figure(figsize=(10, 6))
+                plt.plot(steps, l2_distances, linewidth=2.5, color='purple', marker='o', markersize=4)
+                plt.xlabel('Denoising Step', fontsize=12)
+                plt.ylabel('L2 Distance: ||z_guided - z_base||_2', fontsize=12)
+                plt.title('Latent Trajectory Divergence (Manifold Drift)', fontsize=14, fontweight='bold')
+                plt.grid(True, alpha=0.3)
+                # Add interpretation text based on your theory
+                max_drift = max(l2_distances)
+                plt.text(0.02, 0.98,
+                        f'Max Drift: {max_drift:.4f}\n'
+                        f'Final Drift: {l2_distances[-1]:.4f}\n'
+                        f'(Matches bounded drift from Thm 1\n'
+                        f'or ODE stiffness collapse from Thm 2)',
+                        transform=plt.gca().transAxes, fontsize=10, verticalalignment='top',
+                        bbox=dict(boxstyle='round', facecolor='thistle', alpha=0.5))
+                plt.tight_layout()
+                plt.savefig(drift_path, dpi=150, bbox_inches='tight')
+                plt.close()
+                print(f"? Saved Manifold Drift curve to: {drift_path}")
+                print(f"  Max L2 Distance from baseline: {max_drift:.4f}")
+    # ========== FINAL RESULTS ==========
+    print("\n" + "="*70)
+    print("FINAL RESULTS")
+    print("="*70)
+    if avg_reward_baseline is not None:
+        print(f"\nBaseline:")
+        if fid_score_baseline is not None:
+            print(f"  FID Score:        {fid_score_baseline:.4f}")
+        print(f"  Avg Reward:       {avg_reward_baseline:.4f}")
+        if "clip" in args.metrics and clip_score_baseline is not None:
+            print(f"  Avg CLIP Score:   {clip_score_baseline:.4f}")
+        if "aesthetic" in args.metrics and aesthetic_score_baseline is not None:
+            print(f"  Avg Aesthetic:    {aesthetic_score_baseline:.4f}")
+        if "pickscore" in args.metrics and pick_score_baseline is not None:
+            print(f"  Avg PickScore:    {pick_score_baseline:.4f}")
+        if "hpsv2" in args.metrics and hpsv2_score_baseline is not None:
+            print(f"  Avg HPSv2:        {hpsv2_score_baseline:.4f}")
+        if "hpsv21" in args.metrics and hpsv21_score_baseline is not None:
+            print(f"  Avg HPSv2.1:      {hpsv21_score_baseline:.4f}")
+        if "imagereward" in args.metrics and imagereward_score_baseline is not None:
+            print(f"  Avg ImageReward:  {imagereward_score_baseline:.4f}")
+    if avg_reward_grad is not None:
+        print(f"\nGradient Ascent:")
+        if fid_score_grad is not None:
+            print(f"  FID Score:        {fid_score_grad:.4f}")
+        print(f"  Avg Reward:       {avg_reward_grad:.4f}")
+        if "clip" in args.metrics and clip_score_grad is not None:
+            print(f"  Avg CLIP Score:   {clip_score_grad:.4f}")
+        if "aesthetic" in args.metrics and aesthetic_score_grad is not None:
+            print(f"  Avg Aesthetic:    {aesthetic_score_grad:.4f}")
+        if "pickscore" in args.metrics and pick_score_grad is not None:
+            print(f"  Avg PickScore:    {pick_score_grad:.4f}")
+        if "hpsv2" in args.metrics and hpsv2_score_grad is not None:
+            print(f"  Avg HPSv2:        {hpsv2_score_grad:.4f}")
+        if "hpsv21" in args.metrics and hpsv21_score_grad is not None:
+            print(f"  Avg HPSv2.1:      {hpsv21_score_grad:.4f}")
+        if "imagereward" in args.metrics and imagereward_score_grad is not None:
+            print(f"  Avg ImageReward:  {imagereward_score_grad:.4f}")
+    if avg_reward_baseline is not None and avg_reward_grad is not None:
+        print(f"\nComparison:")
+        if fid_score_baseline is not None and fid_score_grad is not None:
+            fid_diff = fid_score_grad - fid_score_baseline
+            print(f"  FID Change:       {fid_diff:+.4f} ({'worse' if fid_diff > 0 else 'better'}, lower is better)")
+        reward_diff = avg_reward_grad - avg_reward_baseline
+        print(f"  Reward Change:    {reward_diff:+.4f} ({'better' if reward_diff > 0 else 'worse'}, higher is better)")
+        if "clip" in args.metrics and clip_score_baseline is not None and clip_score_grad is not None:
+            clip_diff = clip_score_grad - clip_score_baseline
+            print(f"  CLIP Change:      {clip_diff:+.4f} ({'better' if clip_diff > 0 else 'worse'}, higher is better)")
+        if "aesthetic" in args.metrics and aesthetic_score_baseline is not None and aesthetic_score_grad is not None:
+            aesthetic_diff = aesthetic_score_grad - aesthetic_score_baseline
+            print(f"  Aesthetic Change: {aesthetic_diff:+.4f} ({'better' if aesthetic_diff > 0 else 'worse'}, higher is better)")
+        if "pickscore" in args.metrics and pick_score_baseline is not None and pick_score_grad is not None:
+            pick_diff = pick_score_grad - pick_score_baseline
+            print(f"  PickScore Change: {pick_diff:+.4f} ({'better' if pick_diff > 0 else 'worse'}, higher is better)")
+        if "hpsv2" in args.metrics and hpsv2_score_baseline is not None and hpsv2_score_grad is not None:
+            hpsv2_diff = hpsv2_score_grad - hpsv2_score_baseline
+            print(f"  HPSv2 Change:     {hpsv2_diff:+.4f} ({'better' if hpsv2_diff > 0 else 'worse'}, higher is better)")
+        if "hpsv21" in args.metrics and hpsv21_score_baseline is not None and hpsv21_score_grad is not None:
+            hpsv21_diff = hpsv21_score_grad - hpsv21_score_baseline
+            print(f"  HPSv2.1 Change:   {hpsv21_diff:+.4f} ({'better' if hpsv21_diff > 0 else 'worse'}, higher is better)")
+        if "imagereward" in args.metrics and imagereward_score_baseline is not None and imagereward_score_grad is not None:
+            imagereward_diff = imagereward_score_grad - imagereward_score_baseline
+            print(f"  ImageReward Chg:  {imagereward_diff:+.4f} ({'better' if imagereward_diff > 0 else 'worse'}, higher is better)")
+    # Save results to file
+    results = {
+        "mode": args.mode,
+        "metrics": args.metrics,
+        "config": {
+            "num_samples": len(prompts),
+            "num_steps": args.num_steps,
+            "cfg_scale": args.cfg_scale,
+            "grad_range": [args.grad_range_start, args.grad_range_end],
+            "grad_steps": args.grad_steps,
+            "grad_step_size": args.grad_step_size
+        }
+    }
+    if avg_reward_baseline is not None:
+        results["baseline"] = {"avg_reward": avg_reward_baseline}
+        if fid_score_baseline is not None:
+            results["baseline"]["fid"] = fid_score_baseline
+        if "clip" in args.metrics and clip_score_baseline is not None:
+            results["baseline"]["clip_score"] = clip_score_baseline
+        if "aesthetic" in args.metrics and aesthetic_score_baseline is not None:
+            results["baseline"]["aesthetic_score"] = aesthetic_score_baseline
+        if "pickscore" in args.metrics and pick_score_baseline is not None:
+            results["baseline"]["pickscore"] = pick_score_baseline
+        if "hpsv2" in args.metrics and hpsv2_score_baseline is not None:
+            results["baseline"]["hpsv2_score"] = hpsv2_score_baseline
+        if "hpsv21" in args.metrics and hpsv21_score_baseline is not None:
+            results["baseline"]["hpsv21_score"] = hpsv21_score_baseline
+        if "imagereward" in args.metrics and imagereward_score_baseline is not None:
+            results["baseline"]["imagereward_score"] = imagereward_score_baseline
+    if avg_reward_grad is not None:
+        results["gradient_ascent"] = {"avg_reward": avg_reward_grad}
+        if fid_score_grad is not None:
+            results["gradient_ascent"]["fid"] = fid_score_grad
+        if "clip" in args.metrics and clip_score_grad is not None:
+            results["gradient_ascent"]["clip_score"] = clip_score_grad
+        if "aesthetic" in args.metrics and aesthetic_score_grad is not None:
+            results["gradient_ascent"]["aesthetic_score"] = aesthetic_score_grad
+        if "pickscore" in args.metrics and pick_score_grad is not None:
+            results["gradient_ascent"]["pickscore"] = pick_score_grad
+        if "hpsv2" in args.metrics and hpsv2_score_grad is not None:
+            results["gradient_ascent"]["hpsv2_score"] = hpsv2_score_grad
+        if "hpsv21" in args.metrics and hpsv21_score_grad is not None:
+            results["gradient_ascent"]["hpsv21_score"] = hpsv21_score_grad
+        if "imagereward" in args.metrics and imagereward_score_grad is not None:
+            results["gradient_ascent"]["imagereward_score"] = imagereward_score_grad
+        if grad_stats:
+            results["gradient_ascent"]["stats"] = grad_stats
+    if avg_reward_baseline is not None and avg_reward_grad is not None:
+        results["comparison"] = {
+            "reward_difference": avg_reward_grad - avg_reward_baseline
+        }
+        if fid_score_baseline is not None and fid_score_grad is not None:
+            results["comparison"]["fid_difference"] = fid_score_grad - fid_score_baseline
+        if "clip" in args.metrics and clip_score_baseline is not None and clip_score_grad is not None:
+            results["comparison"]["clip_difference"] = clip_score_grad - clip_score_baseline
+        if "aesthetic" in args.metrics and aesthetic_score_baseline is not None and aesthetic_score_grad is not None:
+            results["comparison"]["aesthetic_difference"] = aesthetic_score_grad - aesthetic_score_baseline
+        if "pickscore" in args.metrics and pick_score_baseline is not None and pick_score_grad is not None:
+            results["comparison"]["pickscore_difference"] = pick_score_grad - pick_score_baseline
+        if "hpsv2" in args.metrics and hpsv2_score_baseline is not None and hpsv2_score_grad is not None:
+            results["comparison"]["hpsv2_difference"] = hpsv2_score_grad - hpsv2_score_baseline
+        if "hpsv21" in args.metrics and hpsv21_score_baseline is not None and hpsv21_score_grad is not None:
+            results["comparison"]["hpsv21_difference"] = hpsv21_score_grad - hpsv21_score_baseline
+        if "imagereward" in args.metrics and imagereward_score_baseline is not None and imagereward_score_grad is not None:
+            results["comparison"]["imagereward_difference"] = imagereward_score_grad - imagereward_score_baseline
+    # Save results to output directory
+    output_path = Path(args.output_dir)
+    output_path.mkdir(parents=True, exist_ok=True)
+    results_path = output_path / "evaluation_results.txt"
+    with open(results_path, "w") as f:
+        for k, v in results.items():
+            f.write(f"{k}: {v}\n")
+    print(f"\n✓ Results saved to: {results_path}")
+    if args.save_images:
+        print(f"✓ Generated images saved to: {output_path}/baseline/ and {output_path}/gradient_ascent/")
+    print("\n" + "="*70)
+    # Close logger
+    tee_logger.close()
+    sys.stdout = tee_logger.terminal
+if __name__ == "__main__":
+    main()

Reward_sana_idealized/examples.sh ADDED Viewed

	@@ -0,0 +1,162 @@

+#!/usr/bin/env bash
+set -euo pipefail
+# bash examples.sh
+if [[ -n "${TERM:-}" ]]; then
+    clear
+fi
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+cd "$SCRIPT_DIR"
+# Shared HF cache used on this cluster.
+HF_HUB_CACHE_DIR="${HF_HUB_CACHE_DIR:-/scratch/rr81/ma5430/.cache/huggingface/hub}"
+export HF_HUB_CACHE="$HF_HUB_CACHE_DIR"
+export HUGGINGFACE_HUB_CACHE="$HF_HUB_CACHE_DIR"
+export HF_HOME="$(dirname "$HF_HUB_CACHE_DIR")"
+# GPU nodes have no internet, while login nodes do.
+# Auto default: offline on GPU nodes, online on login nodes.
+DEFAULT_OFFLINE_MODE="1"
+if ! (command -v nvidia-smi >/dev/null 2>&1 && nvidia-smi -L >/dev/null 2>&1); then
+    DEFAULT_OFFLINE_MODE="0"
+fi
+OFFLINE_MODE="${OFFLINE_MODE:-$DEFAULT_OFFLINE_MODE}"
+if [[ "$OFFLINE_MODE" == "1" ]]; then
+    export HF_DATASETS_OFFLINE="1"
+    export HF_METRICS_OFFLINE="1"
+    export HF_MODULES_OFFLINE="1"
+    export TRANSFORMERS_OFFLINE="1"
+    export DIFFUSERS_OFFLINE="1"
+    export HF_HUB_OFFLINE="1"
+else
+    export HF_DATASETS_OFFLINE="0"
+    export HF_METRICS_OFFLINE="0"
+    export HF_MODULES_OFFLINE="0"
+    export TRANSFORMERS_OFFLINE="0"
+    export DIFFUSERS_OFFLINE="0"
+    export HF_HUB_OFFLINE="0"
+fi
+# Existing environment requested by user.
+PYTHON_BIN="${PYTHON_BIN:-/g/data/rr81/aev/bin/python}"
+if [[ ! -x "$PYTHON_BIN" ]]; then
+    echo "[examples.sh] Missing Python executable: $PYTHON_BIN" >&2
+    exit 1
+fi
+DATASET_NAME="${DATASET_NAME:-pickapic}"      # coco | pickapic
+GRAD_CONFIG="${GRAD_CONFIG:-one_step_rectification_config}"
+MODEL_PROFILE="${MODEL_PROFILE:-sana_600m_512}"  # sana_600m_512 | sana_1600m_512 | sana_sprint_0_6b_1024 | sana_sprint_1_6b_1024
+MODE="${MODE:-gradient_ascent}"               # gradient_ascent | baseline | both
+# Empty MAX_SAMPLES means evaluate all available samples.
+MAX_SAMPLES="${MAX_SAMPLES:-}"
+NUM_STEPS="${NUM_STEPS:-20}"
+CFG_SCALE="${CFG_SCALE:-4.5}"
+DTYPE="${DTYPE:-bf16}"                    # auto | bf16 | fp16 | fp32
+METRICS="${METRICS:-clip aesthetic pickscore hpsv2 hpsv21 imagereward}"
+PREFETCH_ONLY="${PREFETCH_ONLY:-0}"
+# Override this path whenever you want to swap reward weights.
+# LRM_MODEL_PATH="${LRM_MODEL_PATH:-/g/data/rr81/LPO/lrm/lrm_sana/logs/v8/reward_model/step_sana_sana_600m_512_variable-t_lr1e-5_step-8000_filter2_time951/checkpoint-gstep33000}"
+LRM_MODEL_PATH="${LRM_MODEL_PATH:-/g/data/rr81/LPO/lrm/lrm_sana/logs/v7/reward_model/step_sana_sana_600m_512_variable-t_lr1e-5_step-8000_filter2_time951/checkpoint-gstep32000}"
+if [[ -z "${GPU_ID:-}" ]]; then
+    if command -v nvidia-smi >/dev/null 2>&1 && nvidia-smi -L >/dev/null 2>&1; then
+        GPU_ID="$(nvidia-smi --query-gpu=index,memory.used --format=csv,noheader,nounits | sort -k2 -n | head -n1 | cut -d',' -f1 | tr -d ' ')"
+        GPU_ID="${GPU_ID:-0}"
+    else
+        GPU_ID="0"
+        echo "[examples.sh] No visible NVIDIA GPU on this node. Defaulting GPU_ID=0."
+        echo "[examples.sh] eval.py will run on CPU if CUDA is unavailable."
+    fi
+fi
+echo "Using GPU ID: $GPU_ID"
+echo "Using LRM weights: $LRM_MODEL_PATH"
+echo "HF offline mode: $OFFLINE_MODE"
+echo "Generation dtype: $DTYPE"
+if [[ "$PREFETCH_ONLY" == "1" ]]; then
+    echo "[examples.sh] PREFETCH_ONLY=1 -> downloading required model files to shared cache and exiting."
+    export MODEL_PROFILE
+    export METRICS
+    "$PYTHON_BIN" - <<'PY'
+import os
+from huggingface_hub import hf_hub_download, snapshot_download
+cache_dir = os.environ["HF_HUB_CACHE"]
+model_profile = os.environ.get("MODEL_PROFILE", "sana_600m_512")
+metrics = set(os.environ.get("METRICS", "clip aesthetic").split())
+profile_to_repo = {
+    "sana_600m_512": "Efficient-Large-Model/Sana_600M_512px_diffusers",
+    "sana_1600m_512": "Efficient-Large-Model/Sana_1600M_512px_diffusers",
+    "sana_sprint_0_6b_1024": "Efficient-Large-Model/Sana_Sprint_0.6B_1024px_diffusers",
+    "sana_sprint_1_6b_1024": "Efficient-Large-Model/Sana_Sprint_1.6B_1024px_diffusers",
+}
+def snap(repo_id):
+    print(f"[prefetch] snapshot_download: {repo_id}")
+    snapshot_download(repo_id=repo_id, cache_dir=cache_dir, local_files_only=False)
+def one(repo_id, filename):
+    print(f"[prefetch] hf_hub_download: {repo_id}/{filename}")
+    hf_hub_download(repo_id=repo_id, filename=filename, cache_dir=cache_dir, local_files_only=False)
+if model_profile not in profile_to_repo:
+    raise ValueError(f"Unknown MODEL_PROFILE={model_profile}")
+# Base SANA model used for generation + reward backbone
+snap(profile_to_repo[model_profile])
+# Required for CLIP-based metrics and LRM text projection init fallback
+if "clip" in metrics or "aesthetic" in metrics:
+    snap("openai/clip-vit-large-patch14")
+if "pickscore" in metrics:
+    snap("laion/CLIP-ViT-H-14-laion2B-s32B-b79K")
+    snap("yuvalkirstain/PickScore_v1")
+if "hpsv2" in metrics or "hpsv21" in metrics:
+    one("laion/CLIP-ViT-H-14-laion2B-s32B-b79K", "open_clip_pytorch_model.bin")
+if "hpsv2" in metrics:
+    one("xswu/HPSv2", "HPS_v2_compressed.pt")
+if "hpsv21" in metrics:
+    one("xswu/HPSv2", "HPS_v2.1_compressed.pt")
+if "imagereward" in metrics:
+    one("THUDM/ImageReward", "ImageReward.pt")
+    one("THUDM/ImageReward", "med_config.json")
+print("[prefetch] done")
+PY
+    exit 0
+fi
+read -r -a METRICS_ARR <<< "$METRICS"
+CMD=(
+    "$PYTHON_BIN" eval.py
+    --model_variant "$MODEL_PROFILE"
+    --dataset_type "$DATASET_NAME"
+    --lrm_model "$LRM_MODEL_PATH"
+    --grad_config "$GRAD_CONFIG"
+    --metrics "${METRICS_ARR[@]}"
+    --num_steps "$NUM_STEPS"
+    --cfg_scale "$CFG_SCALE"
+    --dtype "$DTYPE"
+    --hf_cache_dir "$HF_HUB_CACHE_DIR"
+    --output_dir "RESULTS/$DATASET_NAME/${GRAD_CONFIG}_${MODEL_PROFILE}"
+    --cuda "$GPU_ID"
+    --mode "$MODE"
+)
+if [[ -n "$MAX_SAMPLES" ]]; then
+    CMD+=(--max_samples "$MAX_SAMPLES")
+fi
+if [[ "$OFFLINE_MODE" == "1" ]]; then
+    CMD+=(--offline)
+fi
+"${CMD[@]}"

Reward_sana_idealized/grad_ascent_configs.py ADDED Viewed

	@@ -0,0 +1,67 @@

+"""
+Configuration presets for gradient ascent optimization.
+Provides pre-configured settings for various optimization strategies
+including learning rate scheduling and momentum configurations.
+"""
+from typing import Dict, Any
+ONE_STEP_RECTIFICATION_CONFIG = {
+    "grad_timestep_range": (100, 800), # Match SDXL one-step rectification window
+    "num_grad_steps": 1,
+    "grad_step_size": 1.0,
+    "grad_scale": 1.0,
+    "lr_scheduler_type": "constant",
+    "use_momentum": False,
+    "use_nesterov": False,
+    "use_iso_projection": False
+}
+# ============================================================================
+# Config Dictionary (for easy access)
+# ============================================================================
+CONFIGS = {
+    "one_step_rectification_config": ONE_STEP_RECTIFICATION_CONFIG,
+}
+def get_config(config_name: str) -> Dict[str, Any]:
+    """
+    Get a gradient ascent configuration by name.
+    Args:
+        config_name: Name of the configuration
+    Returns:
+        Configuration dictionary
+    Raises:
+        ValueError: If config_name is not found
+    Example:
+        config = get_config("cosine_nesterov")
+        pipeline.enable_gradient_ascent(**config)
+    """
+    if config_name not in CONFIGS:
+        available = ", ".join(sorted(CONFIGS.keys()))
+        raise ValueError(f"Unknown config: {config_name}. Available: {available}")
+    return CONFIGS[config_name].copy()
+def list_configs() -> list:
+    """List all available configuration names."""
+    return sorted(CONFIGS.keys())
+def print_config(config_name: str):
+    """Print a configuration in a readable format."""
+    config = get_config(config_name)
+    print(f"\nConfiguration: {config_name}")
+    print("=" * 60)
+    for key, value in config.items():
+        print(f"  {key}: {value}")
+    print("=" * 60)

Reward_sana_idealized/gradient_ascent_utils.py ADDED Viewed

	@@ -0,0 +1,391 @@

+"""
+Gradient Ascent utilities for reward-guided diffusion generation.
+This module implements gradient ascent on the LRM reward score to guide
+the diffusion process toward higher preference scores.
+"""
+import torch
+import torch.nn.functional as F
+from typing import Optional, Tuple, List, Literal
+from tqdm import tqdm
+from lr_scheduler import create_lr_scheduler, LRScheduler
+class RewardGuidedDiffusion:
+    """
+    Implements reward-guided generation using gradient ascent.
+    During denoising, at specified timesteps, we:
+    1. Compute the reward score for current latents
+    2. Calculate gradients of reward w.r.t. latents
+    3. Update latents in the direction that increases reward
+    This guides generation toward higher preference scores.
+    """
+    def __init__(
+        self,
+        reward_model,
+        grad_scale: float = 1.0,
+        grad_timestep_range: Optional[Tuple[int, int]] = None,
+        num_grad_steps: int = 5,
+        grad_step_size: float = 0.1,
+        gradient_checkpoint: bool = False,
+        # LR Scheduling
+        lr_scheduler_type: Literal["constant", "linear", "cosine", "exponential", "step"] = "constant",
+        lr_scheduler_kwargs: Optional[dict] = None,
+        # Momentum
+        use_momentum: bool = False,
+        momentum: float = 0.9,
+        use_nesterov: bool = False,
+        use_iso_projection: bool = False
+    ):
+        """
+        Initialize reward-guided diffusion.
+        Args:
+            reward_model: LRM reward model for computing preference scores
+            grad_scale: Scale factor for gradient updates (default: 1.0)
+            grad_timestep_range: Tuple of (min_t, max_t) for gradient ascent.
+                                If None, applies to all timesteps.
+            num_grad_steps: Number of gradient ascent steps per timestep
+            grad_step_size: Step size for each gradient update (initial LR)
+            gradient_checkpoint: Whether to use gradient checkpointing
+            lr_scheduler_type: Type of LR scheduler ("constant", "linear", "cosine", "exponential", "step")
+            lr_scheduler_kwargs: Additional kwargs for LR scheduler (e.g., end_lr, min_lr, warmup_steps)
+            use_momentum: Whether to use momentum in gradient updates
+            momentum: Momentum coefficient (typically 0.9)
+            use_nesterov: Whether to use Nesterov momentum
+            use_iso_projection: Whether to use Iso Projection
+        """
+        self.reward_model = reward_model
+        self.grad_scale = grad_scale
+        self.grad_timestep_range = grad_timestep_range
+        self.num_grad_steps = num_grad_steps
+        self.grad_step_size = grad_step_size
+        self.gradient_checkpoint = gradient_checkpoint
+        # LR Scheduler
+        self.lr_scheduler_type = lr_scheduler_type
+        self.lr_scheduler_kwargs = lr_scheduler_kwargs or {}
+        self.lr_scheduler: Optional[LRScheduler] = None
+        self.global_lr_scheduler: Optional[LRScheduler] = None  # Scheduler across denoising timesteps
+        # Momentum
+        self.use_momentum = use_momentum
+        self.momentum = momentum
+        self.use_nesterov = use_nesterov
+        self.velocity = None  # Will be initialized per optimization
+        self.use_iso_projection = use_iso_projection
+        # Statistics
+        self.grad_stats = []
+        self.timestep_counter = 0  # Track which timestep we're on
+    def should_apply_gradient(self, timestep: int) -> bool:
+        """Check if gradient ascent should be applied at this timestep."""
+        if self.grad_timestep_range is None:
+            return False
+        min_t, max_t = self.grad_timestep_range
+        return min_t <= timestep <= max_t
+    @torch.enable_grad()
+    def compute_reward_gradient(
+        self,
+        latents: torch.Tensor,
+        prompt,
+        timestep: int,
+    ) -> Tuple[torch.Tensor, float]:
+        """
+        Compute gradient of reward score w.r.t. latents in FP32 to prevent underflow.
+        """
+        # 1. Cast to FP32 and ensure we are detached from previous iterations
+        latents_fp32 = latents.detach().to(torch.float32).clone()
+        latents_fp32.requires_grad_(True)
+        # 2. Compute reward score
+        # Note: Even if the model internally uses fp16/bf16, autograd will
+        # safely accumulate the gradient in fp32 for our leaf node.
+        reward_score = self.reward_model.get_reward_score(
+            latents_fp32,
+            prompt,
+            timestep,
+            enable_grad=True,
+            return_logits=True,
+        )
+        reward_score_mean = reward_score.mean()
+        if not torch.isfinite(reward_score_mean):
+            return torch.zeros_like(latents), 0.0
+        # 3. Extract gradient
+        # CRITICAL: retain_graph=True prevents the graph from dying across multiple
+        # gradient steps if your reward model relies on cached text embeddings.
+        grad = torch.autograd.grad(
+            outputs=reward_score_mean,
+            inputs=latents_fp32,
+            create_graph=False,
+            retain_graph=True,  # Keeps the graph alive for the next step!
+            allow_unused=True,
+        )[0]
+        # 4. Handle None gradients and cast back to the pipeline's original dtype
+        if grad is None:
+            grad = torch.zeros_like(latents)
+        else:
+            grad = torch.nan_to_num(grad, nan=0.0, posinf=0.0, neginf=0.0)
+            grad = grad.to(latents.dtype)
+        return grad, reward_score_mean.item()
+    def apply_gradient_ascent(
+        self,
+        latents: torch.Tensor,
+        prompt,
+        timestep: int,
+        base_noise: Optional[torch.Tensor] = None, # Required for Iso-Marginal projection
+        verbose: bool = True,
+        total_denoising_steps: Optional[int] = None,
+    ) -> Tuple[torch.Tensor, dict]:
+        # 1. UPCAST TO FP32 AND SETUP OPTIMIZER (Targeting Latents)
+        original_latents = latents.detach().clone().to(torch.float32)
+        current_latents = torch.nn.Parameter(original_latents.clone())
+        # Initial reward tracking
+        with torch.no_grad():
+            initial_reward = self.reward_model.get_reward_score(
+                latents,
+                prompt,
+                timestep
+            )
+            initial_reward_val = initial_reward.item() if initial_reward.numel() == 1 else initial_reward.mean().item()
+        # Initialize tracking lists
+        grad_norms = []
+        reward_history = [initial_reward_val]
+        lr_history = []
+        # 2. FORWARD PASS (model precision follows eval dtype; latents stay fp32 here)
+        reward = self.reward_model.get_reward_score(
+            current_latents.to(latents.dtype),
+            prompt,
+            timestep,
+            enable_grad=True,
+            return_logits=True,
+        )
+        reward_mean = reward.mean()
+        if not torch.isfinite(reward_mean):
+            if verbose:
+                print("?? WARNING: Non-finite reward encountered; skipping gradient step.")
+            rectified_latents = original_latents.clone()
+            final_latents = rectified_latents.detach().to(latents.dtype)
+            stats = {
+                'timestep': timestep,
+                'initial_reward': initial_reward_val,
+                'final_reward': initial_reward_val,
+                'reward_improvement': 0.0,
+                'grad_norms': [0.0],
+                'reward_history': reward_history,
+                'lr_history': [0.0],
+                'latent_change': 0.0,
+            }
+            self.grad_stats.append(stats)
+            return final_latents, stats
+        loss = -reward_mean
+        loss.backward()
+        # Extract latent gradient
+        raw_grad = current_latents.grad
+        if raw_grad is not None:
+            raw_grad = torch.nan_to_num(raw_grad, nan=0.0, posinf=0.0, neginf=0.0)
+        reward_history.append(torch.sigmoid(reward_mean).item())
+        # 3. ISO-MARGINAL PROJECTION WITH ASYMMETRIC INCLUSION
+        if raw_grad is not None and base_noise is not None and self.use_iso_projection:
+            gamma = 1e-8
+            B = raw_grad.shape[0]
+            grad_flat = raw_grad.view(B, -1)
+            noise_flat = base_noise.view(B, -1).to(torch.float32)
+            # Compute projection scalar for raw_grad (which is -?R)
+            dot_product = (grad_flat * noise_flat).sum(dim=1, keepdim=True)
+            noise_norm_sq = (noise_flat * noise_flat).sum(dim=1, keepdim=True)
+            proj_scalar = dot_product / (noise_norm_sq + gamma)
+            proj_scalar = proj_scalar.view(B, 1, 1, 1)
+            # 1. Decompose
+            grad_parallel = proj_scalar * base_noise.to(torch.float32)
+            grad_perp = raw_grad - grad_parallel
+            # 2. Asymmetric Inclusion
+            # proj_scalar > 0 means the applied step (+?R) points toward -epsilon (Denoising. GOOD.)
+            # proj_scalar < 0 means the applied step (+?R) points toward +epsilon (Noising. BAD.)
+            safe_proj_scalar = torch.clamp(proj_scalar, min=0.0)
+            beta = 1.0 # Retention factor for the safe parallel gradient
+            safe_grad_parallel = beta * (safe_proj_scalar * base_noise.to(torch.float32))
+            # 3. Recombine
+            grad_perp = grad_perp + safe_grad_parallel
+        else:
+            grad_perp = raw_grad
+            if base_noise is None and self.use_iso_projection:
+                print("?? WARNING: base_noise missing. Skipping Iso-Marginal projection.")
+        # 4. KINETIC RECTIFICATION (Applied to the projected latent gradient)
+        if grad_perp is not None:
+            grad_norm = grad_perp.float().norm().item()
+            max_abs_grad = grad_perp.float().abs().max().item()
+            recovered_with_fallback = False
+            if grad_norm <= 0 or max_abs_grad <= 0:
+                fallback_grad, _ = self.compute_reward_gradient(
+                    original_latents,
+                    prompt,
+                    timestep,
+                )
+                fallback_grad = torch.nan_to_num(fallback_grad, nan=0.0, posinf=0.0, neginf=0.0)
+                fallback_grad = fallback_grad.to(dtype=original_latents.dtype)
+                fallback_norm = fallback_grad.float().norm().item()
+                fallback_max_abs = fallback_grad.float().abs().max().item()
+                if fallback_norm > 0 and fallback_max_abs > 0:
+                    grad_perp = fallback_grad
+                    grad_norm = fallback_norm
+                    max_abs_grad = fallback_max_abs
+                    recovered_with_fallback = True
+            if grad_norm > 0 and max_abs_grad > 0:
+                kinetic_direction = grad_perp / (grad_norm + 1e-8)
+                # Because the max element is 1.0, alpha is the EXACT float32 change applied.
+                alpha = self.grad_step_size
+                with torch.no_grad():
+                    rectified_latents = original_latents - (alpha * kinetic_direction)
+                if recovered_with_fallback:
+                    print(
+                        "✓ Recovered collapsed gradient using fp32 fallback "
+                        f"(norm={grad_norm:.3e}, max_abs={max_abs_grad:.3e})"
+                    )
+            else:
+                print(
+                    "?? WARNING: Gradient tensor exists but magnitude collapsed to zero "
+                    f"(norm={grad_norm:.3e}, max_abs={max_abs_grad:.3e}, dtype={grad_perp.dtype})"
+                )
+                rectified_latents = original_latents.clone()
+                alpha = 0.0
+            max_grad = grad_norm
+        else:
+            print("?? FATAL: PyTorch completely dropped the latent gradient!")
+            rectified_latents = original_latents.clone()
+            max_grad = 0.0
+            alpha = 0.0
+        if verbose:
+            print(f"    Grad step | LR: {alpha:.6f} | Reward: {reward.mean().item():.4f} | Max Grad: {max_grad:.4f}")
+        # 5. DOWNCAST AND RETURN
+        final_latents = rectified_latents.detach().to(latents.dtype)
+        with torch.no_grad():
+            final_reward = self.reward_model.get_reward_score(
+                final_latents, prompt, timestep
+            )
+            final_reward_val = final_reward.item() if final_reward.numel() == 1 else final_reward.mean().item()
+        stats = {
+            'timestep': timestep,
+            'initial_reward': initial_reward_val,
+            'final_reward': final_reward_val,
+            'reward_improvement': final_reward_val - initial_reward_val,
+            'grad_norms': [max_grad],
+            'reward_history': reward_history,
+            'lr_history': [alpha],       # Kept for plotting logic
+            'latent_change': (final_latents - original_latents.to(latents.dtype)).norm().item(),
+        }
+        self.grad_stats.append(stats)
+        return final_latents, stats
+    def get_statistics(self) -> dict:
+        """Get aggregated statistics across all gradient ascent applications."""
+        if not self.grad_stats:
+            return {}
+        total_improvement = sum(s['reward_improvement'] for s in self.grad_stats)
+        avg_improvement = total_improvement / len(self.grad_stats)
+        all_grad_norms = [n for s in self.grad_stats for n in s['grad_norms']]
+        return {
+            'num_applications': len(self.grad_stats),
+            'total_reward_improvement': total_improvement,
+            'avg_reward_improvement': avg_improvement,
+            'avg_grad_norm': sum(all_grad_norms) / len(all_grad_norms) if all_grad_norms else 0,
+            'max_grad_norm': max(all_grad_norms) if all_grad_norms else 0,
+            'detailed_stats': self.grad_stats,
+        }
+    def reset_statistics(self):
+        """Reset statistics and global scheduler."""
+        self.grad_stats = []
+        self.global_lr_scheduler = None
+        self.timestep_counter = 0
+def create_reward_guided_generator(
+    reward_model,
+    grad_timestep_range: Tuple[int, int] = (500, 700),
+    grad_scale: float = 1.0,
+    num_grad_steps: int = 5,
+    grad_step_size: float = 0.1,
+    lr_scheduler_type: str = "constant",
+    lr_scheduler_kwargs: Optional[dict] = None,
+    use_momentum: bool = False,
+    momentum: float = 0.9,
+    use_nesterov: bool = False,
+    use_iso_projection: bool = False
+) -> RewardGuidedDiffusion:
+    """
+    Convenience function to create a reward-guided diffusion generator.
+    Args:
+        reward_model: LRM reward model
+        grad_timestep_range: Tuple of (min_t, max_t) for applying gradients
+        grad_scale: Scale factor for gradient magnitude
+        num_grad_steps: Number of gradient ascent iterations per timestep
+        grad_step_size: Step size for each gradient update (initial LR)
+        lr_scheduler_type: Type of LR scheduler
+        lr_scheduler_kwargs: Additional kwargs for LR scheduler
+        use_momentum: Whether to use momentum
+        momentum: Momentum coefficient
+        use_nesterov: Whether to use Nesterov momentum
+        use_iso_projection: Whether to use Iso Projection
+    Returns:
+        RewardGuidedDiffusion instance
+    """
+    return RewardGuidedDiffusion(
+        reward_model=reward_model,
+        grad_scale=grad_scale,
+        grad_timestep_range=grad_timestep_range,
+        num_grad_steps=num_grad_steps,
+        grad_step_size=grad_step_size,
+        lr_scheduler_type=lr_scheduler_type,
+        lr_scheduler_kwargs=lr_scheduler_kwargs,
+        use_momentum=use_momentum,
+        momentum=momentum,
+        use_nesterov=use_nesterov,
+        use_iso_projection= False
+    )

Reward_sana_idealized/hpsv2_score.py ADDED Viewed

	@@ -0,0 +1,110 @@

+"""
+Adapted from https://github.com/tgxs002/HPSv2. Originally Apache License, Version 2.0, January 2004.
+"""
+import torch
+from open_clip import create_model_and_transforms, get_tokenizer
+from PIL import Image
+class HPSv2Scorer():
+    def __init__(self, clip_pretrained_name_or_path, model_pretrained_name_or_path, device='cuda'):
+        self.model, _, self.preprocess_val = create_model_and_transforms(
+            'ViT-H-14',
+            # 'laion2B-s32B-b79K',
+            clip_pretrained_name_or_path,
+            precision='amp',
+            device=device,
+            jit=False,
+            force_quick_gelu=False,
+            force_custom_text=False,
+            force_patch_dropout=False,
+            force_image_size=None,
+            pretrained_image=False,
+            image_mean=None,
+            image_std=None,
+            light_augmentation=True,
+            aug_cfg={},
+            output_dict=True,
+            with_score_predictor=False,
+            with_region_predictor=False
+        )
+        self.device = device
+        checkpoint = torch.load(model_pretrained_name_or_path, map_location=device)
+        self.model.load_state_dict(checkpoint['state_dict'])
+        self.tokenizer = get_tokenizer('ViT-H-14')
+        self.model = self.model.to(device)
+    def score(self, img_path, prompt):
+        if isinstance(img_path, list):
+            result = []
+            for one_img_path in img_path:
+                # Load your image and prompt
+                with torch.no_grad():
+                    # Process the image
+                    if isinstance(one_img_path, str):
+                        image = self.preprocess_val(Image.open(one_img_path)).unsqueeze(0).to(device=self.device, non_blocking=True)
+                    elif isinstance(one_img_path, Image.Image):
+                        image = self.preprocess_val(one_img_path).unsqueeze(0).to(device=self.device, non_blocking=True)
+                    else:
+                        raise TypeError('The type of parameter img_path is illegal.')
+                    # Process the prompt
+                    text = self.tokenizer([prompt]).to(device=self.device, non_blocking=True)
+                    # Calculate the HPS
+                    with torch.cuda.amp.autocast():
+                        outputs = self.model(image, text)
+                        image_features, text_features = outputs["image_features"], outputs["text_features"]
+                        logits_per_image = image_features @ text_features.T
+                        hps_score = torch.diagonal(logits_per_image).cpu().numpy()
+                result.append(hps_score[0])
+            return result
+        elif isinstance(img_path, str):
+            # Load your image and prompt
+            with torch.no_grad():
+                # Process the image
+                image = self.preprocess_val(Image.open(img_path)).unsqueeze(0).to(device=self.device, non_blocking=True)
+                # Process the prompt
+                text = self.tokenizer([prompt]).to(device=self.device, non_blocking=True)
+                # Calculate the HPS
+                with torch.cuda.amp.autocast():
+                    outputs = self.model(image, text)
+                    image_features, text_features = outputs["image_features"], outputs["text_features"]
+                    logits_per_image = image_features @ text_features.T
+                    hps_score = torch.diagonal(logits_per_image).cpu().numpy()
+            return [hps_score[0]]
+        elif isinstance(img_path, Image.Image):
+            # Load your image and prompt
+            with torch.no_grad():
+                # Process the image
+                image = self.preprocess_val(img_path).unsqueeze(0).to(device=self.device, non_blocking=True)
+                # Process the prompt
+                text = self.tokenizer([prompt]).to(device=self.device, non_blocking=True)
+                # Calculate the HPS
+                with torch.cuda.amp.autocast():
+                    outputs = self.model(image, text)
+                    image_features, text_features = outputs["image_features"], outputs["text_features"]
+                    logits_per_image = image_features @ text_features.T
+                    hps_score = torch.diagonal(logits_per_image).cpu().numpy()
+            return [hps_score[0]]
+        else:
+            raise TypeError('The type of parameter img_path is illegal.')
+if __name__ == "__main__":
+    from huggingface_hub import hf_hub_download
+    clip_model_path = hf_hub_download(repo_id="laion/CLIP-ViT-H-14-laion2B-s32B-b79K", filename="open_clip_pytorch_model.bin")
+    hps_model_path = hf_hub_download(repo_id="xswu/HPSv2", filename="HPS_v2_compressed.pt")
+    hpsv2_scorer = HPSv2Scorer(clip_pretrained_name_or_path=clip_model_path,
+                                model_pretrained_name_or_path=hps_model_path)
+    score = hpsv2_scorer.score(img_path=['./image0.png', './image1.png'],
+                                prompt='photorealistic image of a lone painter standing in a gallery, watching an exhibition of paintings made entirely with AI. In the foreground of the image a robot looks proudly at his art')
+    print(score)

Reward_sana_idealized/imagereward_score.py ADDED Viewed

	@@ -0,0 +1,221 @@

+"""
+Adapted from https://github.com/THUDM/ImageReward. Originally Apache License, Version 2.0, January 2004.
+"""
+import os
+import torch
+import torch.nn as nn
+from io import BytesIO
+from PIL import Image
+from blip.blip_pretrain import BLIP_Pretrain
+from torchvision.transforms import Compose, Resize, CenterCrop, ToTensor, Normalize
+from typing import Any, Union, List
+try:
+    from torchvision.transforms import InterpolationMode
+    BICUBIC = InterpolationMode.BICUBIC
+except ImportError:
+    BICUBIC = Image.BICUBIC
+def open_image(image):
+    if isinstance(image, bytes):
+        image = Image.open(BytesIO(image))
+    elif isinstance(image, str):
+        image = Image.open(image)
+    image = image.convert("RGB")
+    return image
+def _convert_image_to_rgb(image):
+    return image.convert("RGB")
+def _transform(n_px):
+    return Compose([
+        Resize(n_px, interpolation=BICUBIC),
+        CenterCrop(n_px),
+        _convert_image_to_rgb,
+        ToTensor(),
+        Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711)),
+    ])
+class MLP(nn.Module):
+    def __init__(self, input_size):
+        super().__init__()
+        self.input_size = input_size
+        self.layers = nn.Sequential(
+            nn.Linear(self.input_size, 1024),
+            #nn.ReLU(),
+            nn.Dropout(0.2),
+            nn.Linear(1024, 128),
+            #nn.ReLU(),
+            nn.Dropout(0.2),
+            nn.Linear(128, 64),
+            #nn.ReLU(),
+            nn.Dropout(0.1),
+            nn.Linear(64, 16),
+            #nn.ReLU(),
+            nn.Linear(16, 1)
+        )
+        # initial MLP param
+        for name, param in self.layers.named_parameters():
+            if 'weight' in name:
+                nn.init.normal_(param, mean=0.0, std=1.0/(self.input_size+1))
+            if 'bias' in name:
+                nn.init.constant_(param, val=0)
+    def forward(self, input):
+        return self.layers(input)
+class ImageReward(nn.Module):
+    def __init__(self, med_config, device='cpu'):
+        super().__init__()
+        self.device = device
+        self.blip = BLIP_Pretrain(image_size=224, vit='large', med_config=med_config)
+        self.preprocess = _transform(224)
+        self.mlp = MLP(768)
+        self.mean = 0.16717362830052426
+        self.std = 1.0333394966054072
+    def score_gard(self, prompt_ids, prompt_attention_mask, image):
+        image_embeds = self.blip.visual_encoder(image)
+        # text encode cross attention with image
+        image_atts = torch.ones(image_embeds.size()[:-1],dtype=torch.long).to(self.device)
+        text_output = self.blip.text_encoder(prompt_ids,
+                                                    attention_mask = prompt_attention_mask,
+                                                    encoder_hidden_states = image_embeds,
+                                                    encoder_attention_mask = image_atts,
+                                                    return_dict = True,
+                                                )
+        txt_features = text_output.last_hidden_state[:,0,:] # (feature_dim)
+        rewards = self.mlp(txt_features)
+        rewards = (rewards - self.mean) / self.std
+        return rewards
+    def score(self, prompt, image):
+        if (type(image).__name__=='list'):
+            _, rewards = self.inference_rank(prompt, image)
+            return rewards
+        # text encode
+        text_input = self.blip.tokenizer(prompt, padding='max_length', truncation=True, max_length=35, return_tensors="pt").to(self.device)
+        # image encode
+        if isinstance(image, Image.Image):
+            pil_image = image
+        elif isinstance(image, str):
+            if os.path.isfile(image):
+                pil_image = Image.open(image)
+        else:
+            raise TypeError(r'This image parameter type has not been supportted yet. Please pass PIL.Image or file path str.')
+        image = self.preprocess(pil_image).unsqueeze(0).to(self.device)
+        image_embeds = self.blip.visual_encoder(image)
+        # text encode cross attention with image
+        image_atts = torch.ones(image_embeds.size()[:-1],dtype=torch.long).to(self.device)
+        text_output = self.blip.text_encoder(text_input.input_ids,
+                                                attention_mask = text_input.attention_mask,
+                                                encoder_hidden_states = image_embeds,
+                                                encoder_attention_mask = image_atts,
+                                                return_dict = True,
+                                            )
+        txt_features = text_output.last_hidden_state[:,0,:].float() # (feature_dim)
+        rewards = self.mlp(txt_features)
+        rewards = (rewards - self.mean) / self.std
+        return rewards.detach().cpu().numpy().item()
+    def inference_rank(self, prompt, generations_list):
+        text_input = self.blip.tokenizer(prompt, padding='max_length', truncation=True, max_length=35, return_tensors="pt").to(self.device)
+        txt_set = []
+        for generation in generations_list:
+            # image encode
+            if isinstance(generation, Image.Image):
+                pil_image = generation
+            elif isinstance(generation, str):
+                if os.path.isfile(generation):
+                    pil_image = Image.open(generation)
+            else:
+                raise TypeError(r'This image parameter type has not been supportted yet. Please pass PIL.Image or file path str.')
+            image = self.preprocess(pil_image).unsqueeze(0).to(self.device)
+            image_embeds = self.blip.visual_encoder(image)
+            # text encode cross attention with image
+            image_atts = torch.ones(image_embeds.size()[:-1],dtype=torch.long).to(self.device)
+            text_output = self.blip.text_encoder(text_input.input_ids,
+                                                    attention_mask = text_input.attention_mask,
+                                                    encoder_hidden_states = image_embeds,
+                                                    encoder_attention_mask = image_atts,
+                                                    return_dict = True,
+                                                )
+            txt_set.append(text_output.last_hidden_state[:,0,:])
+        txt_features = torch.cat(txt_set, 0).float() # [image_num, feature_dim]
+        rewards = self.mlp(txt_features) # [image_num, 1]
+        rewards = (rewards - self.mean) / self.std
+        rewards = torch.squeeze(rewards)
+        _, rank = torch.sort(rewards, dim=0, descending=True)
+        _, indices = torch.sort(rank, dim=0)
+        indices = indices + 1
+        return indices.detach().cpu().numpy().tolist(), rewards.detach().cpu().numpy().tolist()
+def load_imagereward(model_path: str, med_config: str = None, device: Union[str, torch.device] = "cuda" if torch.cuda.is_available() else "cpu"):
+    """Load a ImageReward model
+    Parameters
+    ----------
+    name : str
+        A model name listed by `ImageReward.available_models()`, or the path to a model checkpoint containing the state_dict
+    device : Union[str, torch.device]
+        The device to put the loaded model
+    Returns
+    -------
+    model : torch.nn.Module
+        The ImageReward model
+    """
+    print('load checkpoint from %s'%model_path)
+    state_dict = torch.load(model_path, map_location='cpu')
+    model = ImageReward(device=device, med_config=med_config).to(device)
+    msg = model.load_state_dict(state_dict, strict=False)
+    print("checkpoint loaded")
+    model.eval()
+    return model
+if __name__ == '__main__':
+    from huggingface_hub import hf_hub_download
+    model_path = hf_hub_download(repo_id="THUDM/ImageReward", filename="ImageReward.pt")
+    config_path = hf_hub_download(repo_id="THUDM/ImageReward", filename="med_config.json")
+    image0 = open_image('./image0.png')
+    image1 = open_image('./image1.png')
+    prompt = "photorealistic image of a lone painter standing in a gallery, watching an exhibition of paintings made entirely with AI. In the foreground of the image a robot looks proudly at his art"
+    model = load_imagereward(model_path=model_path, med_config=config_path, device='cuda')
+    print(model.score(prompt, [image0, image1]))

Reward_sana_idealized/lr_scheduler.py ADDED Viewed

	@@ -0,0 +1,233 @@

+"""
+Learning rate schedulers for gradient ascent optimization.
+Provides various LR scheduling strategies for reward-guided gradient ascent,
+including cosine annealing, linear decay, and custom schedules.
+"""
+import math
+from typing import Optional, Literal
+class LRScheduler:
+    """Base class for learning rate schedulers."""
+    def __init__(self, initial_lr: float, num_steps: int):
+        """
+        Initialize LR scheduler.
+        Args:
+            initial_lr: Initial learning rate
+            num_steps: Total number of optimization steps
+        """
+        self.initial_lr = initial_lr
+        self.num_steps = num_steps
+        self.current_step = 0
+    def get_lr(self) -> float:
+        """Get current learning rate."""
+        raise NotImplementedError
+    def step(self):
+        """Update scheduler state after a step."""
+        self.current_step += 1
+    def reset(self):
+        """Reset scheduler state."""
+        self.current_step = 0
+class ConstantLR(LRScheduler):
+    """Constant learning rate (no scheduling)."""
+    def get_lr(self) -> float:
+        return self.initial_lr
+class LinearLR(LRScheduler):
+    """Linear learning rate decay."""
+    def __init__(
+        self,
+        initial_lr: float,
+        num_steps: int,
+        end_lr: float = 0.0,
+        start_step: int = 0,
+    ):
+        """
+        Initialize linear LR scheduler.
+        Args:
+            initial_lr: Starting learning rate
+            num_steps: Total number of steps
+            end_lr: Ending learning rate (default: 0.0)
+            start_step: Step to begin decay (default: 0)
+        """
+        super().__init__(initial_lr, num_steps)
+        self.end_lr = end_lr
+        self.start_step = start_step
+    def get_lr(self) -> float:
+        if self.current_step < self.start_step:
+            return self.initial_lr
+        progress = (self.current_step - self.start_step) / (self.num_steps - self.start_step)
+        progress = min(1.0, progress)
+        return self.initial_lr + (self.end_lr - self.initial_lr) * progress
+class CosineLR(LRScheduler):
+    """Cosine annealing learning rate schedule."""
+    def __init__(
+        self,
+        initial_lr: float,
+        num_steps: int,
+        min_lr: float = 0.0,
+        warmup_steps: int = 0,
+    ):
+        """
+        Initialize cosine LR scheduler.
+        Args:
+            initial_lr: Maximum learning rate
+            num_steps: Total number of steps
+            min_lr: Minimum learning rate (default: 0.0)
+            warmup_steps: Number of linear warmup steps (default: 0)
+        """
+        super().__init__(initial_lr, num_steps)
+        self.min_lr = min_lr
+        self.warmup_steps = warmup_steps
+    def get_lr(self) -> float:
+        if self.current_step < self.warmup_steps:
+            # Linear warmup
+            return self.initial_lr * (self.current_step / self.warmup_steps)
+        # Cosine annealing
+        progress = (self.current_step - self.warmup_steps) / (self.num_steps - self.warmup_steps)
+        progress = min(1.0, progress)
+        cosine_decay = 0.5 * (1 + math.cos(math.pi * progress))
+        return self.min_lr + (self.initial_lr - self.min_lr) * cosine_decay
+class ExponentialLR(LRScheduler):
+    """Exponential learning rate decay."""
+    def __init__(
+        self,
+        initial_lr: float,
+        num_steps: int,
+        gamma: float = 0.95,
+    ):
+        """
+        Initialize exponential LR scheduler.
+        Args:
+            initial_lr: Starting learning rate
+            num_steps: Total number of steps
+            gamma: Multiplicative decay factor per step
+        """
+        super().__init__(initial_lr, num_steps)
+        self.gamma = gamma
+    def get_lr(self) -> float:
+        return self.initial_lr * (self.gamma ** self.current_step)
+class StepLR(LRScheduler):
+    """Step-wise learning rate decay."""
+    def __init__(
+        self,
+        initial_lr: float,
+        num_steps: int,
+        step_size: int,
+        gamma: float = 0.1,
+    ):
+        """
+        Initialize step LR scheduler.
+        Args:
+            initial_lr: Starting learning rate
+            num_steps: Total number of steps
+            step_size: Number of steps between each decay
+            gamma: Multiplicative decay factor
+        """
+        super().__init__(initial_lr, num_steps)
+        self.step_size = step_size
+        self.gamma = gamma
+    def get_lr(self) -> float:
+        num_decays = self.current_step // self.step_size
+        return self.initial_lr * (self.gamma ** num_decays)
+def create_lr_scheduler(
+    scheduler_type: Literal["constant", "linear", "cosine", "exponential", "step"],
+    initial_lr: float,
+    num_steps: int,
+    **kwargs
+) -> LRScheduler:
+    """
+    Factory function to create learning rate schedulers.
+    Args:
+        scheduler_type: Type of scheduler ("constant", "linear", "cosine", "exponential", "step")
+        initial_lr: Initial learning rate
+        num_steps: Total number of optimization steps
+        **kwargs: Additional scheduler-specific arguments
+            For linear: end_lr, start_step
+            For cosine: min_lr, warmup_steps
+            For exponential: gamma
+            For step: step_size, gamma
+    Returns:
+        LRScheduler instance
+    Examples:
+        # Constant LR
+        scheduler = create_lr_scheduler("constant", initial_lr=0.1, num_steps=100)
+        # Linear decay
+        scheduler = create_lr_scheduler("linear", initial_lr=0.1, num_steps=100, end_lr=0.01)
+        # Cosine annealing with warmup
+        scheduler = create_lr_scheduler("cosine", initial_lr=0.1, num_steps=100,
+                                       min_lr=0.001, warmup_steps=10)
+    """
+    if scheduler_type == "constant":
+        return ConstantLR(initial_lr, num_steps)
+    elif scheduler_type == "linear":
+        return LinearLR(
+            initial_lr, num_steps,
+            end_lr=kwargs.get("end_lr", 0.0),
+            start_step=kwargs.get("start_step", 0),
+        )
+    elif scheduler_type == "cosine":
+        return CosineLR(
+            initial_lr, num_steps,
+            min_lr=kwargs.get("min_lr", 0.0),
+            warmup_steps=kwargs.get("warmup_steps", 0),
+        )
+    elif scheduler_type == "exponential":
+        return ExponentialLR(
+            initial_lr, num_steps,
+            gamma=kwargs.get("gamma", 0.95),
+        )
+    elif scheduler_type == "step":
+        return StepLR(
+            initial_lr, num_steps,
+            step_size=kwargs.get("step_size", 10),
+            gamma=kwargs.get("gamma", 0.1),
+        )
+    else:
+        raise ValueError(f"Unknown scheduler type: {scheduler_type}. "
+                        f"Choose from: constant, linear, cosine, exponential, step")

Reward_sana_idealized/models/__pycache__/__init__.cpython-311.pyc ADDED Viewed

Binary file (285 Bytes). View file

Reward_sana_idealized/open_clip/__pycache__/coca_model.cpython-311.pyc ADDED Viewed

Binary file (18.4 kB). View file

Reward_sana_idealized/open_clip/__pycache__/factory.cpython-311.pyc ADDED Viewed

Binary file (19.3 kB). View file

Reward_sana_idealized/open_clip/__pycache__/model.cpython-311.pyc ADDED Viewed

Binary file (25.1 kB). View file

Reward_sana_idealized/open_clip/__pycache__/modified_resnet.cpython-311.pyc ADDED Viewed

Binary file (13.2 kB). View file

Reward_sana_idealized/open_clip/__pycache__/pretrained.cpython-311.pyc ADDED Viewed

Binary file (18.5 kB). View file

Reward_sana_idealized/open_clip/__pycache__/push_to_hf_hub.cpython-311.pyc ADDED Viewed

Binary file (9.29 kB). View file

Reward_sana_idealized/open_clip/__pycache__/timm_model.cpython-311.pyc ADDED Viewed

Binary file (6.73 kB). View file

Reward_sana_idealized/open_clip/__pycache__/tokenizer.cpython-311.pyc ADDED Viewed

Binary file (16 kB). View file

Reward_sana_idealized/open_clip/__pycache__/transformer.cpython-311.pyc ADDED Viewed

Binary file (42.6 kB). View file

Reward_sana_idealized/open_clip/model_configs/convnext_xlarge.json ADDED Viewed

	@@ -0,0 +1,19 @@

+{
+    "embed_dim": 1024,
+    "vision_cfg": {
+        "timm_model_name": "convnext_xlarge",
+        "timm_model_pretrained": false,
+        "timm_pool": "",
+        "timm_proj": "linear",
+        "timm_drop": 0.0,
+        "timm_drop_path": 0.1,
+        "image_size": 256
+    },
+    "text_cfg": {
+        "context_length": 77,
+        "vocab_size": 49408,
+        "width": 1024,
+        "heads": 16,
+        "layers": 20
+    }
+}

Reward_sana_idealized/pick_score.py ADDED Viewed

	@@ -0,0 +1,141 @@

+"""
+Adapted from https://github.com/yuvalkirstain/PickScore. Originally MIT License, Copyright (c) 2021.
+"""
+from transformers import AutoProcessor, AutoModel
+from PIL import Image
+import torch
+import numpy as np
+from transformers import AutoProcessor, AutoModel
+from datasets import load_from_disk, load_dataset
+import torch
+from PIL import Image
+from io import BytesIO
+from tqdm.auto import tqdm
+import sys
+def open_image(image):
+    if isinstance(image, bytes):
+        image = Image.open(BytesIO(image))
+    elif isinstance(image, str):
+        image = Image.open(image)
+    image = image.convert("RGB")
+    return image
+class PickScorer(torch.nn.Module):
+    def __init__(self, processor_name_or_path, model_pretrained_name_or_path, device='cuda'):
+        super().__init__()
+        self.processor = AutoProcessor.from_pretrained(processor_name_or_path)
+        self.model = AutoModel.from_pretrained(model_pretrained_name_or_path).to(device)
+        self.device = device
+        self.eval()
+    @torch.no_grad()
+    def __call__(self, prompt, images):
+        # preprocess
+        image_inputs = self.processor(
+            images=images,
+            padding=True,
+            truncation=True,
+            max_length=77,
+            return_tensors="pt",
+        ).to(self.device)
+        text_inputs = self.processor(
+            text=prompt,
+            padding=True,
+            truncation=True,
+            max_length=77,
+            return_tensors="pt",
+        ).to(self.device)
+        with torch.no_grad():
+            # embed
+            image_embs = self.model.get_image_features(**image_inputs)
+            image_embs = image_embs / torch.norm(image_embs, dim=-1, keepdim=True)
+            text_embs = self.model.get_text_features(**text_inputs)
+            text_embs = text_embs / torch.norm(text_embs, dim=-1, keepdim=True)
+            # score
+            scores = self.model.logit_scale.exp() * (text_embs @ image_embs.T)[0]
+            # get probabilities if you have multiple images to choose from
+            if len(scores) == 1:
+                probs = scores
+            else:
+                probs = torch.softmax(scores, dim=-1)
+        return probs.cpu().tolist()
+    def score(self, img_path, prompt):
+        if isinstance(img_path, list):
+            result = []
+            for one_img_path in img_path:
+                # Load your image and prompt
+                with torch.no_grad():
+                    # Process the image
+                    if isinstance(one_img_path, str):
+                        image = self.preprocess_val(Image.open(one_img_path)).unsqueeze(0).to(device=self.device, non_blocking=True)
+                    elif isinstance(one_img_path, Image.Image):
+                        image = self.preprocess_val(one_img_path).unsqueeze(0).to(device=self.device, non_blocking=True)
+                    else:
+                        raise TypeError('The type of parameter img_path is illegal.')
+                    # Process the prompt
+                    text = self.tokenizer([prompt]).to(device=self.device, non_blocking=True)
+                    with torch.cuda.amp.autocast():
+                        outputs = self.model(image, text)
+                        image_features, text_features = outputs["image_features"], outputs["text_features"]
+                        logits_per_image = image_features @ text_features.T
+                        pick_score = self.model.logit_scale.exp() * torch.diagonal(logits_per_image).cpu().numpy()
+                result.append(pick_score[0])
+            return result
+        elif isinstance(img_path, str):
+            # Load your image and prompt
+            with torch.no_grad():
+                # Process the image
+                image = self.preprocess_val(Image.open(img_path)).unsqueeze(0).to(device=self.device, non_blocking=True)
+                # Process the prompt
+                text = self.tokenizer([prompt]).to(device=self.device, non_blocking=True)
+                with torch.cuda.amp.autocast():
+                    outputs = self.model(image, text)
+                    image_features, text_features = outputs["image_features"], outputs["text_features"]
+                    logits_per_image = image_features @ text_features.T
+                    pick_score = self.model.logit_scale.exp() * torch.diagonal(logits_per_image).cpu().numpy()
+            return [pick_score[0]]
+        elif isinstance(img_path, Image.Image):
+            # Load your image and prompt
+            with torch.no_grad():
+                # Process the image
+                image = self.preprocess_val(img_path).unsqueeze(0).to(device=self.device, non_blocking=True)
+                # Process the prompt
+                text = self.tokenizer([prompt]).to(device=self.device, non_blocking=True)
+                with torch.cuda.amp.autocast():
+                    outputs = self.model(image, text)
+                    image_features, text_features = outputs["image_features"], outputs["text_features"]
+                    logits_per_image = image_features @ text_features.T
+                    pick_score = self.model.logit_scale.exp() * torch.diagonal(logits_per_image).cpu().numpy()
+            return [pick_score[0]]
+        else:
+            raise TypeError('The type of parameter img_path is illegal.')
+if __name__ == "__main__":
+    pickscorer = PickScorer(processor_name_or_path="laion/CLIP-ViT-H-14-laion2B-s32B-b79K", model_pretrained_name_or_path="yuvalkirstain/PickScore_v1")
+    image0 = open_image('./image0.png')
+    image1 = open_image('./image1.png')
+    prompt = "photorealistic image of a lone painter standing in a gallery, watching an exhibition of paintings made entirely with AI. In the foreground of the image a robot looks proudly at his art"
+    probs = pickscorer(prompt, [image0])
+    probs1 = pickscorer(prompt, [image1])
+    print(probs)
+    print(probs1)

Reward_sana_idealized/test.ipynb ADDED Viewed

	@@ -0,0 +1,47 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "d8044219",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "False"
+      ]
+     },
+     "execution_count": 1,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import torch\n",
+    "torch.cuda.is_available()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.18"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

Reward_sana_idealized/tune_hyperparams.py ADDED Viewed

	@@ -0,0 +1,514 @@

+"""
+Hyperparameter tuning script for gradient ascent optimization.
+This script performs a systematic search over hyperparameter combinations
+to find the optimal configuration for maximum evaluation scores.
+"""
+import subprocess
+import json
+import argparse
+from pathlib import Path
+from datetime import datetime
+import itertools
+import numpy as np
+from typing import Dict, List, Any
+import re
+class HyperparameterTuner:
+    """Hyperparameter tuner for gradient ascent."""
+    def __init__(
+        self,
+        output_dir: str = "tuning_results",
+        max_samples: int = 30,
+        num_steps: int = 20,
+        dataset_type: str = "pickapic",
+        model_variant: str = "lpo",
+        cuda_id: int = 0,
+        metrics: List[str] = None
+    ):
+        self.output_dir = Path(output_dir)
+        self.output_dir.mkdir(parents=True, exist_ok=True)
+        self.max_samples = max_samples
+        self.num_steps = num_steps
+        self.dataset_type = dataset_type
+        self.model_variant = model_variant
+        self.cuda_id = cuda_id
+        self.metrics = metrics or ["clip", "aesthetic", "pickscore", "hpsv2", "imagereward"]
+        # Store results
+        self.results = []
+        self.baseline_results = None
+    def define_search_space(self) -> List[Dict[str, Any]]:
+        """Define the hyperparameter search space - FULL GRID SEARCH.
+        Tests all combinations of parameters including momentum overrides for configs that support it.
+        """
+        # Define all parameter values
+        cfg_scales = [3.0, 5.0, 7.5] #
+        # All available gradient configs from grad_ascent_configs.py
+        grad_configs = [
+            # "constant",
+            # "linear",
+            "cosine_nesterov",
+            # "low_to_high_nesterov",
+            # "high_to_low_nesterov",
+            "low_to_high_momentum",
+            "high_to_low_momentum",
+        ]
+        num_grad_steps_list = [1, 2] # 5, 7, 10
+        grad_step_sizes = [0.001, 0.005, 0.01, 0.05] #
+        momentums = [0.5, 0.8, 0.9]  #
+        # Generate ALL combinations using itertools.product
+        configs = []
+        for cfg, grad_cfg, num_steps, step_size, momentum in itertools.product(
+            cfg_scales, grad_configs, num_grad_steps_list, grad_step_sizes, momentums
+        ):
+            configs.append({
+                "cfg_scale": cfg,
+                "grad_config": grad_cfg,
+                "num_grad_steps": num_steps,
+                "grad_step_size": step_size,
+                "momentum": momentum,
+            })
+        print(f"\nGenerated {len(configs)} total configurations")
+        print(f"  cfg_scales: {len(cfg_scales)}")
+        print(f"  grad_configs: {len(grad_configs)}")
+        print(f"  num_grad_steps: {len(num_grad_steps_list)}")
+        print(f"  grad_step_sizes: {len(grad_step_sizes)}")
+        print(f"  momentums: {len(momentums)}")
+        print(f"  Total: {len(cfg_scales)} × {len(grad_configs)} × {len(num_grad_steps_list)} × {len(grad_step_sizes)} × {len(momentums)} = {len(configs)}")
+        return configs
+    def run_baseline(self) -> Dict[str, float]:
+        """Run baseline evaluation once."""
+        print("\n" + "="*80)
+        print("RUNNING BASELINE EVALUATION")
+        print("="*80)
+        # Use median cfg_scale for baseline
+        cfg_scale = 5.0
+        output_dir = self.output_dir / "baseline"
+        cmd = [
+            "python", "eval.py",
+            "--model_variant", self.model_variant,
+            "--dataset_type", self.dataset_type,
+            "--max_samples", str(self.max_samples),
+            "--num_steps", str(self.num_steps),
+            "--cfg_scale", str(cfg_scale),
+            "--output_dir", str(output_dir),
+            "--cuda", str(self.cuda_id),
+            "--mode", "baseline",
+            "--metrics", *self.metrics,
+        ]
+        print(f"Command: {' '.join(cmd)}")
+        try:
+            result = subprocess.run(cmd, capture_output=True, text=True, check=True)
+            # Parse results from output
+            metrics = self._parse_metrics(result.stdout, "baseline")
+            print(f"\nBaseline Results:")
+            for metric, value in metrics.items():
+                print(f"  {metric}: {value:.4f}")
+            self.baseline_results = {
+                "cfg_scale": cfg_scale,
+                "metrics": metrics,
+            }
+            return metrics
+        except subprocess.CalledProcessError as e:
+            print(f"Error running baseline: {e}")
+            print(f"Stdout: {e.stdout}")
+            print(f"Stderr: {e.stderr}")
+            return {}
+    def run_experiment(self, config: Dict[str, Any]) -> Dict[str, Any]:
+        """Run a single experiment with given hyperparameters."""
+        # Create output directory for this config
+        config_name = f"cfg{config['cfg_scale']}_" \
+                     f"{config['grad_config']}_" \
+                     f"steps{config['num_grad_steps']}_" \
+                     f"lr{config['grad_step_size']}_" \
+                     f"mom{config['momentum']}"
+        output_dir = self.output_dir / config_name
+        # Build command
+        cmd = [
+            "python", "eval.py",
+            "--model_variant", self.model_variant,
+            "--dataset_type", self.dataset_type,
+            "--grad_config", config["grad_config"],
+            "--max_samples", str(self.max_samples),
+            "--num_steps", str(self.num_steps),
+            "--cfg_scale", str(config["cfg_scale"]),
+            "--output_dir", str(output_dir),
+            "--cuda", str(self.cuda_id),
+            "--mode", "gradient_ascent",
+            "--metrics", *self.metrics,
+            # Override config parameters
+            "--override_num_grad_steps", str(config["num_grad_steps"]),
+            "--override_grad_step_size", str(config["grad_step_size"]),
+            "--override_momentum", str(config["momentum"]),
+        ]
+        print(f"\nRunning experiment: {config_name}")
+        print(f"Config: {config}")
+        try:
+            result = subprocess.run(cmd, capture_output=True, text=True, check=True)
+            # Parse metrics from output
+            metrics = self._parse_metrics(result.stdout, "gradient_ascent")
+            # Compute improvement over baseline
+            improvements = {}
+            if self.baseline_results:
+                baseline_metrics = self.baseline_results["metrics"]
+                for metric, value in metrics.items():
+                    if metric in baseline_metrics:
+                        baseline_val = baseline_metrics[metric]
+                        if baseline_val != 0:
+                            improvement = ((value - baseline_val) / abs(baseline_val)) * 100
+                            improvements[f"{metric}_improvement"] = improvement
+            result_dict = {
+                "config": config,
+                "metrics": metrics,
+                "improvements": improvements,
+                "output_dir": str(output_dir),
+                "timestamp": datetime.now().isoformat(),
+            }
+            print(f"Results:")
+            for metric, value in metrics.items():
+                print(f"  {metric}: {value:.4f}")
+            if improvements:
+                print(f"Improvements over baseline:")
+                for metric, value in improvements.items():
+                    print(f"  {metric}: {value:+.2f}%")
+            return result_dict
+        except subprocess.CalledProcessError as e:
+            print(f"Error running experiment: {e}")
+            print(f"Stderr: {e.stderr}")
+            return {
+                "config": config,
+                "error": str(e),
+                "timestamp": datetime.now().isoformat(),
+            }
+    def _parse_metrics(self, output: str, mode: str) -> Dict[str, float]:
+        """Parse metrics from eval.py output."""
+        metrics = {}
+        # Look for the summary section
+        lines = output.split('\n')
+        # Pattern to match metric lines like "  Reward: 0.1234"
+        metric_patterns = {
+            "reward": r"Reward:\s+([-+]?\d*\.?\d+)",
+            "clip": r"CLIP Score:\s+([-+]?\d*\.?\d+)",
+            "aesthetic": r"Aesthetic Score:\s+([-+]?\d*\.?\d+)",
+            "pickscore": r"PickScore:\s+([-+]?\d*\.?\d+)",
+            "hpsv2": r"HPSv2 Score:\s+([-+]?\d*\.?\d+)",
+            "hpsv21": r"HPSv2\.1 Score:\s+([-+]?\d*\.?\d+)",
+            "imagereward": r"ImageReward:\s+([-+]?\d*\.?\d+)",
+            "fid": r"FID:\s+([-+]?\d*\.?\d+)",
+        }
+        for line in lines:
+            for metric_name, pattern in metric_patterns.items():
+                match = re.search(pattern, line)
+                if match:
+                    metrics[metric_name] = float(match.group(1))
+        return metrics
+    def compute_aggregate_score(self, metrics: Dict[str, float]) -> float:
+        """
+        Compute aggregate score for ranking configurations.
+        Uses weighted combination of metrics (higher is better for most,
+        except FID which is lower is better).
+        """
+        weights = {
+            "reward": 1.0,
+            "clip": 0.8,
+            "aesthetic": 0.8,
+            "pickscore": 1.0,
+            "hpsv2": 1.0,
+            "hpsv21": 1.0,
+            "imagereward": 1.0,
+            "fid": -0.5,  # Negative weight (lower FID is better)
+        }
+        score = 0.0
+        total_weight = 0.0
+        for metric, value in metrics.items():
+            if metric in weights:
+                score += weights[metric] * value
+                total_weight += abs(weights[metric])
+        # Normalize by total weight
+        if total_weight > 0:
+            score /= total_weight
+        return score
+    def run_search(
+        self,
+        search_type: str = "grid",
+        start_idx: int = 0,
+        end_idx: int = None
+    ) -> List[Dict[str, Any]]:
+        """
+        Run hyperparameter search.
+        Args:
+            search_type: Type of search ("grid" or "random")
+            start_idx: Starting index for experiments (for GPU distribution)
+            end_idx: Ending index for experiments (for GPU distribution)
+        """
+        all_configs = self.define_search_space()
+        print("\n" + "="*80)
+        print("HYPERPARAMETER SEARCH CONFIGURATION")
+        print("="*80)
+        print(f"Dataset: {self.dataset_type}")
+        print(f"Model: {self.model_variant}")
+        print(f"Samples: {self.max_samples}")
+        print(f"Inference steps: {self.num_steps}")
+        print(f"Metrics: {', '.join(self.metrics)}")
+        # Select subset of configs if indices provided
+        if search_type == "grid":
+            configs = all_configs
+        elif search_type == "random":
+            # Random sample from all configs
+            n_samples = min(50, len(all_configs))
+            indices = np.random.choice(len(all_configs), n_samples, replace=False)
+            configs = [all_configs[i] for i in indices]
+        else:
+            raise ValueError(f"Unknown search type: {search_type}")
+        # Apply index slicing for GPU distribution
+        if end_idx is None:
+            end_idx = len(configs)
+        configs = configs[start_idx:end_idx]
+        print(f"\nTotal configurations: {len(all_configs)}")
+        print(f"Assigned to this worker: {len(configs)} (indices {start_idx} to {end_idx})")
+        # Run baseline first
+        if self.baseline_results is None:
+            self.run_baseline()
+        # Run experiments
+        print("\n" + "="*80)
+        print("RUNNING EXPERIMENTS")
+        print("="*80)
+        for i, config in enumerate(configs, 1):
+            print(f"\n{'='*80}")
+            print(f"Experiment {i}/{len(configs)}")
+            print(f"{'='*80}")
+            result = self.run_experiment(config)
+            self.results.append(result)
+            # Save intermediate results
+            self._save_results()
+        return self.results
+    def _generate_grid_configs(self, search_space: Dict[str, List[Any]]) -> List[Dict[str, Any]]:
+        """Generate all combinations for grid search."""
+        keys = list(search_space.keys())
+        values = list(search_space.values())
+        configs = []
+        for combination in itertools.product(*values):
+            config = dict(zip(keys, combination))
+            configs.append(config)
+        return configs
+    def _generate_random_configs(
+        self,
+        search_space: Dict[str, List[Any]],
+        n_samples: int = 20
+    ) -> List[Dict[str, Any]]:
+        """Generate random configurations for random search."""
+        configs = []
+        for _ in range(n_samples):
+            config = {}
+            for param, values in search_space.items():
+                config[param] = np.random.choice(values)
+            configs.append(config)
+        return configs
+    def _save_results(self):
+        """Save results to JSON file."""
+        results_file = self.output_dir / "tuning_results.json"
+        data = {
+            "baseline": self.baseline_results,
+            "experiments": self.results,
+            "timestamp": datetime.now().isoformat(),
+            "config": {
+                "max_samples": self.max_samples,
+                "num_steps": self.num_steps,
+                "dataset_type": self.dataset_type,
+                "model_variant": self.model_variant,
+            }
+        }
+        with open(results_file, 'w') as f:
+            json.dump(data, f, indent=2)
+        print(f"\nResults saved to: {results_file}")
+    def analyze_results(self) -> Dict[str, Any]:
+        """Analyze results and find best configuration."""
+        if not self.results:
+            print("No results to analyze!")
+            return {}
+        print("\n" + "="*80)
+        print("ANALYSIS: FINDING BEST CONFIGURATION")
+        print("="*80)
+        # Filter out failed experiments
+        successful_results = [r for r in self.results if "metrics" in r]
+        if not successful_results:
+            print("No successful experiments!")
+            return {}
+        # Compute aggregate scores
+        for result in successful_results:
+            metrics = result["metrics"]
+            result["aggregate_score"] = self.compute_aggregate_score(metrics)
+        # Sort by aggregate score
+        successful_results.sort(key=lambda x: x["aggregate_score"], reverse=True)
+        # Print top 5 configurations
+        print("\nTop 5 Configurations:")
+        print("="*80)
+        for i, result in enumerate(successful_results[:5], 1):
+            print(f"\n#{i} - Aggregate Score: {result['aggregate_score']:.4f}")
+            print(f"Config: {result['config']}")
+            print(f"Metrics:")
+            for metric, value in result['metrics'].items():
+                print(f"  {metric}: {value:.4f}")
+            if result.get('improvements'):
+                print(f"Improvements over baseline:")
+                for metric, value in result['improvements'].items():
+                    print(f"  {metric}: {value:+.2f}%")
+        # Save best config
+        best_result = successful_results[0]
+        best_config_file = self.output_dir / "best_config.json"
+        with open(best_config_file, 'w') as f:
+            json.dump({
+                "config": best_result["config"],
+                "metrics": best_result["metrics"],
+                "aggregate_score": best_result["aggregate_score"],
+                "improvements": best_result.get("improvements", {}),
+            }, f, indent=2)
+        print(f"\n✓ Best configuration saved to: {best_config_file}")
+        return best_result
+def main():
+    parser = argparse.ArgumentParser(description="Hyperparameter tuning for gradient ascent")
+    parser.add_argument("--output_dir", type=str, default="tuning_results",
+                        help="Directory to save tuning results")
+    parser.add_argument("--max_samples", type=int, default=30,
+                        help="Number of samples to use for tuning")
+    parser.add_argument("--num_steps", type=int, default=20,
+                        help="Number of inference steps (fixed)")
+    parser.add_argument("--dataset_type", type=str, default="pickapic",
+                        choices=["coco", "pickapic"],
+                        help="Dataset to use")
+    parser.add_argument("--model_variant", type=str, default="lpo",
+                        choices=["origin", "spo", "diffusion_dpo", "lpo"],
+                        help="Model variant to use")
+    parser.add_argument("--cuda", type=int, default=0,
+                        help="CUDA device ID")
+    parser.add_argument("--search_type", type=str, default="grid",
+                        choices=["grid", "random"],
+                        help="Type of hyperparameter search")
+    parser.add_argument("--metrics", type=str, nargs="+",
+                        default=["clip", "aesthetic", "pickscore", "hpsv2", "imagereward"],
+                        help="Metrics to evaluate")
+    parser.add_argument("--start_idx", type=int, default=0,
+                        help="Starting index for experiments (for GPU distribution)")
+    parser.add_argument("--end_idx", type=int, default=None,
+                        help="Ending index for experiments (for GPU distribution)")
+    args = parser.parse_args()
+    # Create tuner
+    tuner = HyperparameterTuner(
+        output_dir=args.output_dir,
+        max_samples=args.max_samples,
+        num_steps=args.num_steps,
+        dataset_type=args.dataset_type,
+        model_variant=args.model_variant,
+        cuda_id=args.cuda,
+        metrics=args.metrics,
+    )
+    # Run search
+    results = tuner.run_search(
+        search_type=args.search_type,
+        start_idx=args.start_idx,
+        end_idx=args.end_idx
+    )
+    # Analyze results
+    best_result = tuner.analyze_results()
+    print("\n" + "="*80)
+    print("TUNING COMPLETE!")
+    print("="*80)
+    print(f"Total experiments: {len(results)}")
+    print(f"Results directory: {args.output_dir}")
+    if best_result:
+        print(f"\nBest configuration:")
+        print(json.dumps(best_result["config"], indent=2))
+        print(f"\nAggregate score: {best_result['aggregate_score']:.4f}")
+if __name__ == "__main__":
+    main()

Reward_sana_idealized/tune_parallel.sh ADDED Viewed

	@@ -0,0 +1,253 @@

+#!/bin/bash
+# Parallel hyperparameter tuning across 8 GPUs
+# This script distributes experiments evenly across all available GPUs
+clear
+# Activate conda environment
+source ~/miniconda3/etc/profile.d/conda.sh
+conda activate /home/ec2-user/aev
+# Configuration
+DATASET_TYPE="pickapic"  # "coco" or "pickapic"
+MODEL_VARIANT="lpo"      # "origin", "spo", "diffusion_dpo", or "lpo"
+MAX_SAMPLES=500           # Number of samples for tuning
+NUM_STEPS=50             # Fixed inference steps
+SEARCH_TYPE="grid"       # "grid" or "random"
+OUTPUT_DIR="RESULTS_TURNING/run_2"
+NUM_GPUS=8               # Number of GPUs to use
+echo "=============================================="
+echo "  PARALLEL HYPERPARAMETER TUNING"
+echo "=============================================="
+echo ""
+echo "Configuration:"
+echo "  Dataset: $DATASET_TYPE"
+echo "  Model: $MODEL_VARIANT"
+echo "  Samples: $MAX_SAMPLES"
+echo "  Inference Steps: $NUM_STEPS"
+echo "  Search Type: $SEARCH_TYPE"
+echo "  GPUs: $NUM_GPUS"
+echo "  Output: $OUTPUT_DIR"
+echo ""
+# First, calculate total number of experiments
+echo "Calculating total experiments..."
+TOTAL_CONFIGS=$(python -c "
+from tune_hyperparams import HyperparameterTuner
+import sys
+tuner = HyperparameterTuner()
+configs = tuner.define_search_space()
+sys.stderr.write(f'Generated {len(configs)} configurations\n')
+print(len(configs))
+" 2>&1 | tail -1)
+echo "Total configurations: $TOTAL_CONFIGS"
+echo ""
+# Calculate experiments per GPU
+CONFIGS_PER_GPU=$((TOTAL_CONFIGS / NUM_GPUS))
+REMAINDER=$((TOTAL_CONFIGS % NUM_GPUS))
+echo "Distributing work:"
+echo "  Base configs per GPU: $CONFIGS_PER_GPU"
+echo "  Extra configs for first GPUs: $REMAINDER"
+echo ""
+# Create output directory
+mkdir -p "$OUTPUT_DIR"
+# Array to store background process IDs
+PIDS=()
+# Launch parallel processes on each GPU
+for GPU_ID in $(seq 0 $((NUM_GPUS - 1))); do
+    # Calculate start and end indices for this GPU
+    START_IDX=$((GPU_ID * CONFIGS_PER_GPU))
+    # Give extra configs to first GPUs
+    if [ $GPU_ID -lt $REMAINDER ]; then
+        START_IDX=$((START_IDX + GPU_ID))
+        END_IDX=$((START_IDX + CONFIGS_PER_GPU + 1))
+    else
+        START_IDX=$((START_IDX + REMAINDER))
+        END_IDX=$((START_IDX + CONFIGS_PER_GPU))
+    fi
+    # Create GPU-specific output directory
+    GPU_OUTPUT_DIR="${OUTPUT_DIR}/gpu_${GPU_ID}"
+    mkdir -p "$GPU_OUTPUT_DIR"
+    echo "GPU $GPU_ID: configs $START_IDX to $END_IDX"
+    # Launch tuning process in background
+    nohup python tune_hyperparams.py \
+        --output_dir "$GPU_OUTPUT_DIR" \
+        --max_samples $MAX_SAMPLES \
+        --num_steps $NUM_STEPS \
+        --dataset_type "$DATASET_TYPE" \
+        --model_variant "$MODEL_VARIANT" \
+        --cuda $GPU_ID \
+        --search_type "$SEARCH_TYPE" \
+        --start_idx $START_IDX \
+        --end_idx $END_IDX \
+        --metrics clip aesthetic pickscore hpsv2 imagereward \
+        > "${GPU_OUTPUT_DIR}/tuning.log" 2>&1 &
+    # Store PID
+    PIDS+=($!)
+    echo "  Launched with PID: ${PIDS[$GPU_ID]}"
+    # Small delay to avoid race conditions
+    sleep 2
+done
+echo ""
+echo "=============================================="
+echo "  ALL PROCESSES LAUNCHED"
+echo "=============================================="
+echo ""
+echo "Background processes running:"
+for GPU_ID in $(seq 0 $((NUM_GPUS - 1))); do
+    echo "  GPU $GPU_ID: PID ${PIDS[$GPU_ID]} -> ${OUTPUT_DIR}/gpu_${GPU_ID}/tuning.log"
+done
+echo ""
+echo "To monitor progress:"
+echo "  tail -f ${OUTPUT_DIR}/gpu_0/tuning.log"
+echo "  tail -f ${OUTPUT_DIR}/gpu_1/tuning.log"
+echo "  ... etc"
+echo ""
+echo "To check all GPU processes:"
+echo "  ps aux | grep tune_hyperparams.py"
+echo ""
+echo "To monitor GPU usage:"
+echo "  watch -n 1 nvidia-smi"
+echo ""
+echo "To kill all processes:"
+echo "  kill ${PIDS[@]}"
+echo ""
+echo "Waiting for all processes to complete..."
+echo "(Press Ctrl+C to stop waiting, processes will continue in background)"
+echo ""
+# Wait for all background processes
+for PID in "${PIDS[@]}"; do
+    wait $PID
+done
+echo ""
+echo "=============================================="
+echo "  ALL TUNING PROCESSES COMPLETE"
+echo "=============================================="
+echo ""
+# Merge results from all GPUs
+echo "Merging results from all GPUs..."
+# Activate conda environment for Python script
+source ~/miniconda3/etc/profile.d/conda.sh
+conda activate /home/ec2-user/aev
+python - <<'EOF'
+import json
+from pathlib import Path
+import sys
+output_dir = Path("RESULTS_TURNING")
+all_results = []
+baseline_result = None
+# Collect results from each GPU
+for gpu_id in range(8):
+    gpu_dir = output_dir / f"gpu_{gpu_id}"
+    results_file = gpu_dir / "tuning_results.json"
+    if results_file.exists():
+        with open(results_file, 'r') as f:
+            data = json.load(f)
+        # Get baseline (should be same from all)
+        if baseline_result is None and "baseline" in data:
+            baseline_result = data["baseline"]
+        # Collect experiments
+        if "experiments" in data:
+            all_results.extend(data["experiments"])
+        print(f"GPU {gpu_id}: {len(data.get('experiments', []))} results")
+# Merge all results
+merged_data = {
+    "baseline": baseline_result,
+    "experiments": all_results,
+    "num_gpus": 8,
+    "total_experiments": len(all_results)
+}
+# Save merged results
+merged_file = output_dir / "merged_results.json"
+with open(merged_file, 'w') as f:
+    json.dump(merged_data, f, indent=2)
+print(f"\nMerged {len(all_results)} total results")
+print(f"Saved to: {merged_file}")
+# Find best configuration
+successful = [r for r in all_results if "metrics" in r]
+if successful:
+    # Compute aggregate scores
+    def compute_score(metrics):
+        weights = {
+            "reward": 1.0, "clip": 0.8, "aesthetic": 0.8,
+            "pickscore": 1.0, "hpsv2": 1.0, "imagereward": 1.0,
+            "fid": -0.5
+        }
+        score = sum(weights.get(k, 0) * v for k, v in metrics.items())
+        return score / sum(abs(w) for w in weights.values())
+    for r in successful:
+        r["aggregate_score"] = compute_score(r["metrics"])
+    successful.sort(key=lambda x: x["aggregate_score"], reverse=True)
+    best = successful[0]
+    best_file = output_dir / "best_config.json"
+    with open(best_file, 'w') as f:
+        json.dump({
+            "config": best["config"],
+            "metrics": best["metrics"],
+            "aggregate_score": best["aggregate_score"],
+            "improvements": best.get("improvements", {})
+        }, f, indent=2)
+    print(f"\n{'='*60}")
+    print("BEST CONFIGURATION:")
+    print(f"{'='*60}")
+    print(json.dumps(best["config"], indent=2))
+    print(f"\nAggregate Score: {best['aggregate_score']:.4f}")
+    print(f"Saved to: {best_file}")
+else:
+    print("\nNo successful experiments found!")
+    sys.exit(1)
+EOF
+if [ $? -eq 0 ]; then
+    echo ""
+    echo "=============================================="
+    echo "  TUNING COMPLETE!"
+    echo "=============================================="
+    echo ""
+    echo "Results:"
+    echo "  Merged results: ${OUTPUT_DIR}/merged_results.json"
+    echo "  Best config: ${OUTPUT_DIR}/best_config.json"
+    echo ""
+    echo "View best configuration:"
+    echo "  cat ${OUTPUT_DIR}/best_config.json"
+    echo ""
+else
+    echo ""
+    echo "ERROR: Failed to merge results"
+    exit 1
+fi

Reward_sdxl_idealized/models/__pycache__/__init__.cpython-310.pyc ADDED Viewed

Binary file (242 Bytes). View file

Reward_sdxl_idealized/models/__pycache__/__init__.cpython-313.pyc ADDED Viewed

Binary file (280 Bytes). View file

Reward_sdxl_idealized/models/__pycache__/__init__.cpython-39.pyc ADDED Viewed

Binary file (240 Bytes). View file

Reward_sdxl_idealized/models/__pycache__/reward_model.cpython-39.pyc ADDED Viewed

Binary file (9.1 kB). View file

Reward_sdxl_idealized/models/__pycache__/reward_model_sdxl.cpython-310.pyc ADDED Viewed

Binary file (9.96 kB). View file