Skip to content

参考文献

基礎(必読)

  • Vaswani et al., "Attention Is All You Need" (NeurIPS 2017)
  • Deng et al., "ArcFace: Additive Angular Margin Loss for Deep Face Recognition" (CVPR 2019)
  • Loshchilov et al., "nGPT: Normalized Transformer with Representation Learning on the Hypersphere" (arXiv:2410.01131, 2024)
  • Su et al., "RoFormer: Enhanced Transformer with Rotary Position Embedding" (arXiv 2021; Neurocomputing 2024) - RoPEの原論文
  • Nagata et al., "Variance Matters: Detecting Semantic Differences without Corpus/Word Alignment" (EMNLP 2023)
  • Yamagiwa et al., "Revisiting Cosine Similarity via Normalized ICA-transformed Embeddings" (arXiv 2024)

離散と連続の界面

  • Jang et al., "Categorical Reparameterization with Gumbel-Softmax" (ICLR 2017)
  • Maddison et al., "The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables" (ICLR 2017)
  • Bengio et al., "Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation" (arXiv 2013) - STEの原論文

MoE・スパース性

  • Shazeer et al., "Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer" (ICLR 2017) - MoEの基礎
  • Fedus et al., "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity" (JMLR 2022)
  • Jiang et al., "Mixtral of Experts" (arXiv 2024)
  • Dai et al., "DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models" (arXiv 2024)

双曲幾何学

  • Nickel & Kiela, "Poincaré Embeddings for Learning Hierarchical Representations" (NeurIPS 2017)
  • Mathieu et al., "Continuous Hierarchical Representations with Poincaré Variational Auto-Encoders" (NeurIPS 2019)
  • Yang et al., "Hyperbolic Fine-Tuning for Large Language Models (HypLoRA)" (arXiv:2410.04010, 2024)
    • ※AQuAで最大13.0%向上。発展版のHoRA(適応的曲率)では17.30%向上との報告あり
  • Sinha et al., "Learning Structured Representations with Hyperbolic Embeddings" (NeurIPS 2024)
  • He et al., "Hyperbolic Deep Learning for Foundation Models: A Survey" (arXiv:2507.17787, 2025)

拡散モデル

  • Ho et al., "Denoising Diffusion Probabilistic Models" (NeurIPS 2020)
  • Song et al., "Score-Based Generative Modeling through Stochastic Differential Equations" (ICLR 2021)

情報幾何

  • Amari, "Information Geometry and Its Applications" (Springer 2016)
  • Martens & Grosse, "Optimizing Neural Networks with Kronecker-factored Approximate Curvature" (ICML 2015)
  • Liu et al., "Reconstructing Deep Neural Networks: Unleashing the Optimization Potential of Natural Gradient Descent" (NeurIPS 2024)
  • 参考(査読未通過): Hwang, "FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information" (arXiv:2405.12807)
    • ※ICLR 2025で取り下げ。Adamと自然勾配の関係についての興味深い視点を提供するが、確立された理論ではない点に注意

Model Collapse

  • Shumailov et al., "The Curse of Recursion: Training on Generated Data Makes Models Forget" (arXiv 2023)

古典

  • 甘利俊一『情報幾何学の新展開』
  • Bishop "Pattern Recognition and Machine Learning" (PRML)
  • Goodfellow et al., "Deep Learning" (2016)

発展

  • Carlsson, "Topology and Data" (Bulletin of the AMS, 2009) - TDA入門
  • Lee, "Introduction to Riemannian Manifolds" (2018) - 数学的基礎
  • Bronstein et al., "Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges" (2021)

最新動向(2024-2025年)

  • Chlenski et al., "Mixed-Curvature Decision Trees and Random Forests" (ICML 2025)
  • Fein-Ashley et al., "Hyperbolic Vision Transformers (HVT)" (2024)
  • Grover et al., "Spectro-Riemannian Graph Neural Networks" (ICLR 2025)

オンライン資源

  • 3Blue1Brown: "Neural Networks" シリーズ(幾何学的直感)
  • Distill.pub: インタラクティブな可視化記事
  • The Annotated Transformer: コード付き解説
  • Awesome Hyperbolic Representation Learning (GitHub): 双曲表現学習の論文リスト
  • MoE-Infinity (GitHub): MoE実装のリファレンス