Category: Deepseek

Chatting up the Internet

5/12/2025

Chatting up the Internet

AI chatbots are not new, despite the recent flood of conversational LLMs. ELIZA, developed by Joseph Weizenbaum at MIT AI lab in 1964 is considered the ‘first’ AI chatbot as it used pattern matching as the basis for its responses, but over the years much of what were known as chatbots were based on pre-written scripts and rule based systems such as the “Ask XYZ” systems that have been available on many websites for years. The current crop of AI LLM based chatbots are far more sophisticated and based on large, high-performance models. More recently chatbots features have been extended to included internet search capabilities and early predictions were that search enabled chatbots would spell the end to standard search systems.
While there certainly has been impact from Chatbot search, the numbers reflect only a small incursion on regular internet search visits. A recent survey[1] indicated that over the period between April 2023 and March 2025, there were 1.863 trillion search engine queries, down 0.51% y/y. The obvious dominant search engine was Google (GOOG) Search with an 87.6% share, and a1.41% reduction in search volume. Bing (MSFT) was a big gainer in search volume, up 27.77% during the period, but even as the number two search engine, Bing only has a 3.2% share of the search market.

[1] Onelittleweb.com

The chatbot search top 10 is similar in that ChatGPT (OpenAI) is the dominant player, however when one looks at the number of searches generated by chatbots against generic search engines, the search engine traffic is over 33 times greater, so as of today, chatbot generated searches are only 2.96% of what is generated by generic search engines. However chatbots offering search capabilities are new, as can be seen by the ‘Search Added’ and the ‘Current Period Change’ columns, where newer chatbot show extremely large y/y increases but small shares. That said, most chatbots that have been around for more than a year have more reasonable search growth rates.

Based on the current data it would seem that chatbot searches have had little effect on overall generic searches, but it is far to early in the cycle to make a long-term judgement. In fact we expect at least Gemini, Google’s LLM/chatbot, chatbot related searches will be incremental to generic search traffic. Likely, although to a lesser degree, Co-Pilot will have the same effect on Bing. On a longer-term basis, as AI chatbots become more embedded in operating systems, it would seem logical that unless you were looking for an answer to a question that you specifically request includes an internet search, the AI would try to answer the question using its most recent training or update data. If, and that is a big if, the data corpus of the AI is broad enough, the answer might not require the chatbot to conduct an internet search, and in that way could weigh on internet search growth.
Much in that scenario depends on how ‘transparent’ the Chatbot is. If it is always available on a home page and looks and acts like a search bar, users will gravitate toward the chatbot/AI. If it has to be chosen, is slow to answer, or comes up with skimpy answers, users will remain generic search fans. But there are other factors that come into play.
It is understandable from the standpoint of the AI owner that they keep the compute time as low as possible, so the default would likely be a query first run without an internet search. However as time goes by, we expect most chatbots will default to include internet search results in query answers, but it is not quite that easy because those searches are not always free.
Currently, some (Perplexity (pvt), Co-Pilot, You.com (pvt), Komo (pvt), Andi (pvt), and Brave Search (pvt)) check the internet on every search, while others either have their own decision mechanism (Gemini, Claude (Anthropic), ChatGPT, GROK (xAI), META (FB)) or users can request an internet search. But going forward things get more complicated. Some CE companies use their own AI infrastructure to run embedded chatbots, while others are based on ChatGPT or other existing infrastructure. Apple (AAPL) runs Apple intelligence on its own servers, it does not ‘pay’ for searches, although the all-in cost of each search is amortized over the infrastructure cost. Others, who might use ChatGPT or Gemini as their AI infrastructure providers that support their chatbot, would have to pay for each search, as both ChatGPT and Google charge on a per search basis, so the advantage goes to those who can support their own chatbot/search infrastructure.
All of this comes down to the value that consumers see in ‘free’ chatbots. If results from trained data are enough for most, chatbots will have a modest impact on search and a modest cost to chatbot owners. If chatbot results are not sufficient for the average user, the cost to maintain ‘free’ chatbots will rise as search fees climb, unless the chatbot owner can convert users to paid plans.
Search has been around for a while and chatbots are relatively new so we expect the impact from chatbots will be small over the next year, but as the general public gets more used to using chatbots, the competition between them will increase and we expect search will be an integral part of all chatbots, especially as the average consumer becomes more aware of chatbot data sources.
We use eight different chatbots, typically querying at least two each time, especially if the query has a need for very current data. We know which chatbots are search oriented and which are not, and which will cite sources, either trained or search related, but the average person will likely use what is easiest, and that is usually whatever is built in to the OS or specific applications, so it is up to the brand to decide what level of search is sufficient for their users. As Ai is such a high-level feature now, the competition will continue to increase and that means more ‘free’ features which is certainly good for consumers

0 Comments

DeepSeek

1/27/2025

0 Comments

DeepSeek

The definition of panic is: “Sudden uncontrollable fear or anxiety, often causing wildly unthinking behavior.”, but that does little to shine light on what is causing the panic or the circumstances leading up to said panic. Today’s ‘panic’ was caused by a Hangzhou, China Ai research lab, less than 2 years old, that was spun off of a high-profile quant hedge fund. Their most recent model, DeepSeek (pvt) V3 has been able to outperform many of the most popular models and is open source, giving ‘pay for’ models a new competitor that can be used to develop AI applications without paying a monthly or yearly fee. By itself, this should be added to the list of worries that AI model developers already consider, but there are a number of existing AI models that are open source and they have not put OpenAI (pvt), Google (GOOG), Anthropic (pvt), or Meta (FB) out of business. It is inevitable that as soon as new models are released, another one comes along that performs a bit better. But that is not why panic has set in today.
We believe that valuation for Ai companies is much simpler than one might think, as any valuation, no matter how high, is valid only as long as someone else is willing to find a reason to justify a higher valuation. Models that help with valuation in the Ai space tend to extrapolate sales and profitability based on parameters that don’t really exist yet or are so speculative as to mean little. There are some parameters that are calculable, such as the cost of power, or the cost of GPU hardware today, but trying to estimate revenue based on the number of paying users and the contracted price for AI CPU time 5 or 10 years out is like trying to herd cats. It’s not going to go the way you think it is.
One variable in such long-term valuation models is the cost of computing time and the time it takes to train the increasingly large models that are currently being developed. In May of 2017 the AlphaGo Zero model, the leading model at the time, cost $600,000 to train. That model, for reference, had ~20m parameters and two ‘heads’ (Think of a tree with two main branches), one which predicted the probability of playing each possible move, and the other estimating the likelihood of winning the game from a given position. While this is a simple model compared to those available today, it was able to beat the world’s champion Go player based on reinforcement learning (the ‘Good Dog’ training approach) without any human instruction in its training data. The model initially made random moves and examined the result of each move, improving its ability each time, without any pre-training.
In 2022 GPT 4, a pre-trained transformer model with ~1.75 trillion[1] parameters, cost $40m in training costs, and a 2024 training cost study estimated that the training cost for such models has been growing at 2.4x per year since 2016 (” If the trend of growing development costs continues, the largest training runs will cost more than a billion dollars by 2027, meaning that only the most well-funded organizations will be able to finance frontier AI models.[2]”). There are two aspects to those costs, first, the hardware acquisition cost, with ~44% for computing chips, primarily GPUs (Graphics processing units), which are used to process data rather than graphics. Additionally, ~29% is for server hardware, 17% for interconnects, and ~10% for power systems. The second is the amortized cost over the life of the hardware, which includes between 47% and 65% for R&D staff, but runs between 0.5x and 1x of the acquisition costs.
All in, as models get larger, training gets more expensive, and with many Ai companies still experimenting with fee structure, model training costs are a critical part of the profitability equation, and based on the above, will keep climbing, making profitability more difficult to predict. That doesn’t seem to have stopped Ai funding or valuation increases, but that is where DeepSeek V3 creates a unique situation.
The DeepSeek model is still a transformer model, similar to most of the current large models, but it was developed with the idea of reducing the massive amount of training time required for a model of its size (671 billion parameters), without compromising results. Here’s how it works:

Training data is tokenized. For example a simple sentence might be broken down into individual words, punctuation, spaces, etc., or letter groups, such as ‘sh’, ‘er’, or ‘ing’, depending on the algorithm. But the finer the token, the more data processed, so tradeoffs are made between detail and cost.
The tokens are passed to a gate network which decides which of the expert networks will be best to process that particular token. The gating network, acting as a route director, choosing the expert(s) that have done a good job with similar tokens previously. While one might think of the ‘expert networks’ as doctors, lawyers, or engineers with specialized skills, each of the 257 experts in the DeepSeek model can change their specialty. This is called dynamic specialization, and while the experts are not initially trained for specific tasks, the gate networks notices that, for example, Expert 17 seems to be the best at handling tokens that represent ‘ing’, and assigns ‘ing’ tokens to that expert more often.

Here's is where DeepSeek differs…

The data that the experts pass to the next level is extremely complex, multi-dimensional information about the token, how it fits into the sequence, and many other factors. While the numbers vary considerable for each token, The data being passed between an expert network and an its ‘Attention Heads” can be as high as 65,000 data points (Note: This is a very rough estimate).
The Expert networks each have 128 ‘Attention heads’, each of which looks for a particular relationship within the that mass of muti-dimensional data that the Expert Networks pass to them. They could be structural (grammatical), semantic, or other dependencies, but DeepSeek has found a way to compress that data being transferred from the experts to the attention heads, which reduces the amount of computational demand from the Attention Heads. With 257 expert networks, each with 128 Attention heads and the large amount of data contained in each transfer, the compute time is the big cost driver for training.
DeepSeek has found a process (actually two processes) that can compress the data that each expert network is passing to it Attention Heads by compressing the multi-dimensional data. Typically compression would hinder the Attention Heads’ ability to capture the subtle nuances that is contained in the data, but DeepSeek seems to have been able to use compression techniques that do not affect the sensitivity of the Attention Heads for those subtleties.

[1] estimated

[2] Cottier, Ben, et al. “The Rising Costs of Training Frontier AI Models.” arXiV, arxiv.org/. Accessed 31 May 2024.

Looking back at the cost of training large models that we mentioned above, one would think that a model the size of DeepSeek (671 billion parameters and 14.8 trillion training tokens) would take a massive amount of GPU time and cost $20m to $30m, yet the cost to train DeepSeek was just a bit over $5.5m, based on 2.789 million hours of H800 time at $2.00 per hour, closer to the cost of much smaller models and outside of the expected range. This means that someone has found a way to reduce the cost of training a large model, potentially making it easier for model developers to produce competitive models. To make matters worse, in the case of DeepSeek, it is open source, which allows anyone to use the model for application development. This undercuts the concept of fee-based models who expect to charge more for every increasingly large model, and justify those fees on the increasing cost of training. Of course the fact that such an advanced model is free makes the long-term fee structure models that encourage high valuations less valid.
We note that the DeepSeek model benchmarks shown below are impressive, but some of that improvement might come from the fact that the DeepSeek V3 training data was more oriented toward mathematics and code. Also, we always remind investors that it is easy to cherry-pick benchmarks that present the best aspects of the model. That said, not every developer requires the most sophisticated general model for their project, so even if DeepSeek did cherry-pick benchmarks (We are not saying they did), a free model of this size and quality is a gift to developers and the lower training costs are a gift to those that have to pay for processing time or hardware. Its not the end of the AI era, but it might affect valuations and long-term expectations if DeepSeek’s compression methodology proves to be as successful in the wild as the benchmarks make it seem it might be, but the fact that this step forward in AI came from a Chinese company will likely cause ulcers and migraines across the US political spectrum and could cause even more stringent clampdowns on the importation of GPUs and HBM to China, despite the fact that they don’t seem to be having much of an effect.

Figure 1 - DeepSeek V3 Benchmarks - Source: DeepSeek

0 Comments

Chatting up the Internet

Chatting up the Internet

DeepSeek

DeepSeek

Author

Archives

Categories

Chatting up the Internet

Chatting up the Internet​

DeepSeek

DeepSeek ​

Author

Archives

Categories

Chatting up the Internet

DeepSeek