ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models (2023)
I found a way to speed up CPU based LLM inference using a HNSW index on the output embeddings
I made a Three Body Problem Simulator to explore the emergence of complexity from simple physical systems.
I built a Three Body Problem Simulator to explore the emergence of complexity from simple physical systems.
Accelerate GPT Output Embedding computations with a Vector Index