- 11 Posts
- 5 Comments
Joined 3 years ago
Cake day: June 15th, 2023
You are not logged in. If you use a Fediverse account that is able to follow users, you can follow this user.
nsa@kbin.socialto
Machine Learning@kbin.social•Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing
1·3 years agoThat’s appreciated!
nsa@kbin.socialOPto
Machine Learning@kbin.social•Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-trainingEnglish
1·3 years agoResearch into efficient optimization techniques seems pretty important given the scale of LLMs these days. Nice to see a second-order approach that achieves reasonable wall-clock improvements.
nsa@kbin.socialto
Machine Learning@kbin.social•Hardwiring ViT Patch Selectivity into CNNs using Patch MixingEnglish
2·3 years agoIf there isn’t any discussion on reddit (no discussion in this case), I don’t see a reason to link to reddit; you can just link to the project page. That said, if you think there is important discussion happening that is helpful for understanding the paper, then use a teddit link instead, like:
https://teddit.net/r/MachineLearning/comments/14pq5mq/r_hardwiring_vit_patch_selectivity_into_cnns/
nsa@kbin.socialto
Machine Learning@kbin.social•Hardwiring ViT Patch Selectivity into CNNs using Patch MixingEnglish
0·3 years agoPlease don’t post links to reddit.


Averaging model weights seems to help across textual domains as well, see Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models and Scaling Expert Language Models with Unsupervised Domain Discovery. I wonder if the two types of averaging (across hyperparameters and across domains) can be combined to produce even better models.