Nice!
Thoughts I'd put on the table:
(1) The US does not have a corner on smart. We can look at AI as a kind of arms race, but human ingenuity is on full display, and China is one of several of which I can think that end run around constraints thought to hobble. Israel is another place one must admire.
(2) The MoE and precision techniques, along with others that show how scrappy the DeepSeek folks are as they work in their trenches AND share their findings, are cool and, with them, the DeepSeek folks have uncovered potential that can be used to maximize the use of resources to more thoroughly train. DeepSeek demonstrated that traditional training methods (you know, the techniques used over the last 18 months ;-)) left available resources on the table. So those with more have potential to do still more.
(3) Did you see the part where DeepSeek used a technique to end run around having read everything?
"DeepSeek took another approach. Its librarian hasn’t read all the books but is trained to hunt out the right book for the answer after it is asked a question."
For me, this points to a potential to learn more continuously... if an LLM doesn't know, go look it up and answer, but learn in situ. Continuous training and release might be around the corner, given the extra resources that might be applied.
DeepSeek learnings point to a discovery of wasted capacity… we can do more with less, so let’s do more with more, as well.
Pretty amazing stuff -- we live in exciting times!
And the potential to use these models more economically, well... that will usher in a race of opportunity to imagine new apps that go beyond Copilots, to be sure.