Supply Chain Market Research - SCMR LLC
  • Blog
  • Home
  • About us
  • Contact

Essay Test

3/17/2025

0 Comments

 

Essay Test
​

Last Friday we conducted a test.  The test, different from those usually used to test and rank AI models, has a practical bent, more the way that users interact with LLMs than how close to PhD level the AI is responding in math, coding, science, or any of the other hundreds of benchmarks that AI model designers use to point to how their particular model stands out.  As we have mentioned in the past, given the vast number (over 100) of benchmarks, model developers can pick and choose which benchmarks show off their model’s capabilities best, yet few benchmarks have a connection to the everyday use that is the bailiwick of a casual or even a regular user.  Considering that we spend 75% of our day doing research, we are steeped in the difference between what a benchmark might show and what AI reality actually is, so we put together 10 questions that we feel test our regular crop of AI by asking things that might not be part of the typical crop of AI benchmarks.
Some of our questions are generic, more to compare how each AI sees and answers the question, some are quixotic, more to understand if the AI can grasp unusual concepts, some spur creativity, and some are unusually odd, to see if the AI is able to understand what is being asked for as much as the answer.  We posed these questions to the following AI LLMs:
  • Gemini – Google (GOOG)
  • Claude – Anthropic (pvt)
  • Meta AI – Meta (FB)
  • Co-Pilot  - Microsoft (MSFT)
  • ChatGPT – OpenAI (pvt)
  • Deepseek - Hangzhou DeepSeek (pvt)
  • Perplexity – Perplexity AI (pvt)
  • GROK – xAI (pvt))
While less of a contest looking for a winner and more of a chance to see how AIs respond to the day-to-day questioning that they get regularly, we did score the results.  We admit that much of the scoring is subjective.  For example Question nine, which requested each AI to create a 12-line poem about a grandfather clock in the style of Edgar Allen Poe, would have to be subjective based on the reader’s likes and dislikes.  That said, we did not feel worthy of judging the creativity of our AIs, so we fed the replies to each AI and asked which of the 8 poems they thought was the closest to EAP’s style.  Surprisingly, 6 out of the 8 AIs agreed on which they thought mimicked EAP’s style most closely.  There are some questions based on calculations that have a specific 2-part answer, and if the AI was able to get both parts of the question correct it received 10 points (max for each question).  One part received 5 points and an answer within 10% of the correct answer received an extra point. 
While the final score is usually the focus of such a test, we are less interested in the final score than the nuance for each question and some of the overall statistics.
Picture
Figure 1 - AI Essay Test Score Results - Source: SCMR LLC
Picture
Figure 2 - AI Essay Test - Word Count - Source: SCMR LLC
​Questions & Answers
Here we note the questions (10) and the point behind each, along with a brief summary of the answers given by each AI.  The answers here are not complete but summaries of key points.  Full data is available on request.  Again, the objective was to see how flexible the AIs are to both standard and non-standard questions, some of which might need the AI to reason a bit before answering.
1.What does ‘She’ sell and where does she sell them?
The simple answer comes from a tongue twister that has been mentioned in literature since the early 1800’s.  It references Mary Anning, at the time, the daughter of a cabinet maker in Dorset, England.  Mary eventually became a paleontologist but made some of her most interesting discoveries as a teenager.  The tongue twister came about after her father died in 1810 and she was forced to collect and sell seashell ‘trinkets’ to survive.  The question was a test to see if the AI had knowledge that included a Charles Dickens piece on Mary Anning from 1865, and whether the AI would understand the reference if it did not have access to either the Dickens piece or biographical information about Mary Anning.
 
              Gemini – Assumed the question referenced a generic reseller of clothing and various goods.
              Claude – Correctly identified the tongue twister.  The only one that did.
              Meta – No answer – Needed more information about ‘she’.
              Co-Pilot - No answer – Needed more information about whether  ‘she’ was a brand or person.
              ChatGPT - No answer – Needed more information about whether  ‘she’ was a brand or person.
              Deepseek - No answer – Needed more information about whether  ‘she’ was a brand or person.
              Perplexity – Perplexity thought the question referred to a business (“She sells Sea Shells”) on Sanibel Island, Florida that sells jewelry, books, craft supplies, clothing and handmade ornaments.  Perplexity also gave the street address and indicated that the store was open 365 days a year, also indicating that  the question might relate to “Sea Shell Wholesale”, a venture  that focused on sales training and product distribution.
GROK – Grok made the assumption that ‘She’ referred to SHEIN (pvt), a large Chinese clothing retailer, and detailed SHEIN’s product line.
 
2.What is the origin of the domestic house cat?
This question was simply a test of how the AI would respond to a question that could contain ‘standard’ information, such that might appear in an encyclopedia or online in Wikipedia.
Gemini – Details (one sentence) on each of four topics – Ancestor, Location, Timeline, and Key Factors.
              Claude – Exactly the same answer as Gemini – word for word
              Meta – Simplistic (3 sentences) focused on the human shift from nomadic life to agriculture and how grain storage attracted rodents and then wild cats
              Co-Pilot – Similar to Meta’s answer but added Egyptian influence.
              ChatGPT – Similar to above, added a bit on feline DNA.
              Deepseek – Similar to above with slightly more detail.
              Perplexity – Similar to above but with emphasis on timeline detail.
GROK – Similar to above.
 
3.How many 115” (diagonal) rectangles with a 9:16 aspect ratio can fit into a Gen 8.6 glass substrate and what percentage of the substrate will remain unused?
 
This question requires some research and specific math calculations but also requires the AI to try to fit the rectangle onto the substrate in different ways.  If that step is incorrect, the resulting calculations will be incorrect.  This type of question should be broken down into smaller problems in order to find the correct answer.  As a Gen 8.6 substrate is sometimes represented as either 2250mm x 2500mm or 2250mm x 260mm, the correct answer is one panel with either 35.2%  or 37.7% of the substrate remaining unused.
 
Gemini – Broke the question into 4 primary steps. Answer: 1 panel with 39.32% unused.
              Claude –  Worked through much of the problem using JavaScript but incorrect answer of 2 panels with 26.4% unused.
              Meta – Assumed there was only one way to fit the rectangle in the space but came up with answer of 1 panel and 33.86% unused.
              Co-Pilot – Correct answer of 1 panel and 38.82% unused
              ChatGPT – Correct on 1 panel but unused share of 47.2%
              Deepseek – Correct with 1 panel and within parameter on 33.81% unused
              Perplexity – Correct on 1 panel and correct on 37.7% for the unused portion.  Right on target
GROK – Knew to try rotating the rectangle. Correct on 1 panel and correct on 37.7% for the unused portion.  Right on target
 
4.What global location (City, town, region) has the best year-round weather, inclusive of temperature, humidity, precipitation, and dangerous storms or other climactic events?
This question requires the AI to search for data that fits the parameters given and then make a determination as to which of those locations chosen would be considered the best.  We would have hoped that the AIs would have combed through weather data and drawn definitive conclusions, but we believe they relied on training data that made travel suggestions rather than evaluating real-time data.
Gemini – Canary Islands, Spain; Medellin, Columbia; San Diego, CA; Swakopmund, Namibia
              Claude – Medellin, Columbia; San Diego, CA; Kunming, China; Loja, Ecuador; Canary Islands, Spain; Malaga, Spain; Perth, Australia
              Meta – Honolulu, HI; Santa Barbara, CA; Costa Rica, Mediterranean coast; Southern CA
              Co-Pilot -  Santa Barbara, CA; Canary Islands, Spain; San Diego, CA; Madiera, Portugal; Nairobi, Kenya
              ChatGPT - Santa Barbara, CA; Medellin, Columbia; San Diego, CA; Kunming, China; Azores, Portugal; Canary Islands, Spain
              Deepseek - San Diego, CA; Canary Islands, Spain; Medellin, Columbia; Cape Town, South Africa; Sydney, Australia; Madeira, Portugal
              Perplexity – Dakhla, Morocco; San Diego, CA; Honolulu, HI; Santa Barbara, CA; Guatemala City, Guatemala
GROK – San Diego, CA; Lisbon, Portugal; Medellin, Columbia; Canary Islands, Spain, but emphasized San Diego
5.If all good people go to heaven and you can see them when you die, will heaven eventually run out of space?
This question was intended to see how the AI might approach a spiritual or philosophical question, particularly one that might be considered a bit outlandish or comical.  All AIs took the question seriously fitting it into a more theological framework and none saw it as humorous or unusual.
 
Gemini – Pointed to ‘relevant considerations’ of “the Nature of Heaven”, “The Concept of Infinity”, “Varying Theological Views”, and “Beyond Physical Limitations” with the conclusion: In summary, from many theological standpoints, the concept of heaven "running out of space" is not a concern, as heaven is often understood to exist outside of or beyond our physical understanding of space.
Claude – Focused on the idea that heaven is not conceived as a physical place in theological tradition and is therefore a spiritual realm not bound by physical constraints, a state of being or consciousness rather than a location, and a dimension that transcends our understanding of space and time, with The idea of "seeing" loved ones wouldn't necessarily require physical proximity as we understand it.
              Meta – Similar to above, focused on the idea that many spiritual and religious traditions view heaven as a non-physical, eternal realm that transcends our conventional understanding of space and time.  Instead, heaven might be seen as a boundless, infinite realm that can accommodate all souls or spirits.
              Co-Pilot – Same basic spiritual and religious traditional thinking about the non-traditional aspects of heaven, unconstrained by space and time.
              ChatGPT – Offers an alternative to the idea that heaven is boundless and timeless, where heaven might operate on a different kind of physics where the number of ‘rooms’ is infinite.
              Deepseek – The concept of heaven running out of space does not align with traditional theological views.  Deepseek says to consider heaven as a non-physical realm, heaven having infinite capacity, heaven being timeless, and heaven as a symbolic interpretation, a state of being or union with the divine, rather than a literal place.
                            Perplexity – The concept of heaven as a physical place disagrees with most religious traditions, with Christianity describing heaven as a place of infinite joy and communion with God, not bound by the laws of physics or space, with  Islam teaches that paradise (Jannah) is vast and expansive, created by Allah to accommodate all who are worthy. Similarly in Christian philosophy, God is described as omnipotent and infinite and in Hinduism and Buddhism, the afterlife or spiritual realms are described as boundless.  Perplexity also notes that Souls are often thought to be immaterial and not bound by physical constraints like size or volume. This means they wouldn’t take up "space" in the way physical bodies do.
GROK – GROK states:  “Christian interpretations—where heaven is a spiritual realm for the souls of the virtuous, it’s often imagined as infinite or beyond physical limitations. In that case, space wouldn’t be an issue because it’s not a physical place with finite square
 
6.Based on the following data, what would your prediction for total sales for the 2025 year be?

This question has many possible answers and was designed more to see what processes the AIs might use to come up with an answer.  It was graded based on both the explanation of the processes used, the potential for the answer to be correct, and the way in which the final answer was calculated.  Each Ai had its own ‘thoughts’ about what methodology to use and whether to use more than one to achieve the correct answer.  Given that we will have to wait 10 months to see the correct answer, we valued the methodology and the rationality behind the final answer.
Gemini – Summed each year’s sales. Plotted a chart of annual sales (unshown). Analyze data to conclude no clear linear trend.  Significant sales jump in 2021 and relatively lower in most recent years. Methods:  Simple Average (using three years) 223.78, Simple Average (all years) 277.61, Moving Average (never performed moving average calculation, Exponential smoothing – Used Python and Statsmodels library – 229.09, Consideration of 2021 spike (never referred back to 2021).  Ultimate range 223 - 230
              Claude – Python calculations primarily based on averages.  Monthly average method 221.91; Using growth rate method 220.86; Using Linear regression 180.61; Averaging all three methods 207.8
              Meta – Showed no work or methodology.   Predicted total sales for 2025 of $36.05B.
              Co-Pilot – Used sum of sales for 2025 (2 months) and estimated sales for remaining months based on 5-year averages.  Final: $281.95
              ChatGPT – Used a proportional method based on the first two months of 2025. Answer 39,315
              Deepseek – Based on Average monthly sales – Answer: 223.00.
              Perplexity – Simply based on the first two months of 2025 and assuming the rest of the year follows that average. Answer: 223.00
GROK – Similar to Gemini…average of recent (2022-2024) years, year over year growth (2023-2024), Linear trend (all years).  Consider seasonality (weighted average).  Based on all methods (Linear trend was negative so excluded), GROK chose weighted average. Answer: 213.40.
7.What is the most important part of LLM training and why?
This question was intended to see how much the AIs might reveal about themselves, particularly their training, the specifics of which are typically off limits to users.  It seems that all have been well-trained (sorry!) to focus on ‘high-quality training data’ as the key to LLM training and ultimate model success, although there were some apparent differences in secondary subjects.
Gemini – In order – Data Quality & Quantity, Model Architecture, Training Techniques,  Tokenization
Claude – High quality, diverse training data
              Meta – High quality training data, architecture & design, Optimization & Hyperparameters, Computational resources
              Co-Pilot – Knowledge base, Diversity, Data labeling & Pre-processing, Ethical considerations, Iterative feedback
              ChatGPT – Garbage In, Garbage Out (High quality data), Diversity & Representativeness, Filtering & Cleaning, Tokenization & Preprocessing, Fine tuning & reinforcement learning
              Deepseek – Data quality & diversity,
              Perplexity – Quality of training data, training efficiency, model versatility, bias mitigation, handling of complex language features.  Emphasis on data curation and preprocessing.
GROK – Quality & diversity of training data, Shape knowledge & reasoning, generalizations, Bias, Contextual understanding.
 
8.If mares eat oats and does eat oats and little lambs eat ivy, what would a kid eat?
Similar to Question 1, this question tests the AIs ability to understand subtlety and references, as opposed to absolutes.
Gemini – Noted that a ‘kid’ would eat milk, grass, hay & plants – Missed the fact that this is a children’s song.
              Claude – Stated a kid would eat ivy and referenced the Mairzy Doats’ rhyme.
              Meta – “A kid will eat ivy” and also referenced the Mairzy Doats’ rhyme.  Also noted that ivy can be toxic for goats
              Co-Pilot – “A kid would eat ivy too” noting the reference and the fact that it sounds ‘delightfully nonsensical’.
              ChatGPT – Similar to Co-Pilot but added “It’s a fun linguistic trick!”
              Deepseek – While acknowledging the rhyme, Deepseek also provided a literal answer about what a child should eat.
              Perplexity – Acknowledged the rhyme and added that it suggests that a kid (lamb) would eat anything if he likes it.
GROK – Recognized the rhyme and  the context
9.Create a 12-line poem about a grandfather clock in the style of Edgar Allen Poe
This question allowed the AIs free rein with only the Author, length, and topic as parameters.  Given that our background does not encompass aggressive English language studies, we fed these poems back to each AI, asking them to choose which has the style closest to that of Edgar Allen Poe.  The GROK poem got 6 of 8 picks, while Claude got 1 and Gemini got one.  The actual poems are at the bottom of the note.
 
10.Which of these models is the best? GEMINI, CLAUDE, META, CO-PILOT, CHATGPT, DEEPSEEK, Perplexity, or GROK?
 We have often asked AIs questions about other AIs, sometimes getting no answer and other times getting very specific details that are not always correct.  The AIs that have strict training date ends have difficulty here, as they do not have access to more current data, but it seems to be an area that gets updated more often than most other areas, likely to maintain a competitive edge.  Meta was particularly short on giving details about other models and Deepseek pointed out a number of shortcomings, the only model that did.  Here’s the summary of what was said about each model:
Gemini – Multi-modal, integrated into Google ecosystem, advanced creative reasoning, very good at creative tasks, well rounded, transparency, sometimes inconsistent, real-time
       Claude – Can handle large amounts of text, ethics and safety, careful content creation, strong conversation, coding, strategic planning, customer service tasks
Meta – Open-Source development, Integrated into WhatsApp and Instagram, less refined than others, not as strong as GPT-4, community driven development, large parameter base
Co-Pilot – Integrated into Microsoft products, useful to developers, ChatGPT inside Microsoft products, focus on coding
ChatGPT – Strong for general purpose work, emotional intelligence (4.5 version), lots of plug-ins, expensive to scale
Deepseek – High performance & cost effective, security concerns, content filters, lacks features, advanced reasoning
Perplexity – High accuracy and citations, sourced answers, search, weak for deep conversation and creative tasks
GROK – Integrated with X platform, informal ‘edgy’ style, coding, lacks native image recognition.
 
All in, AIs will not save the world, nor are they a panacea that will relieve us from the burden of having to do things that take work.  AIs are limited in what they can do but are promoted, intentionally or not, to be the final step in the evolution of humans from cave men always looking to find their next meal to the unburdened dilettante that can idle away the hours contemplating butterfly wings or a blade of grass.  We still have to work to make things happen and create the things that are unique, beautiful, and generate income, but AIs can help as long as folks understand what they can and cannot do.  Perhaps this test and others like them will lead to a better understanding of what AIs are capable of, without over or under promising.  It’s hard to keep from promoting your product when you have spent billions developing it, but the hard reality of  what it can and cannot do remains the same regardless of the promotion.
AI Poetry (Question 9):

Within the hall, a shadowed, somber keep, Where time's dark pendulum doth softly weep, A titan stands, of ebony and dread, Whose hollow chime tolls for the silent dead. Each measured tick, a sigh of ages flown, A mournful dirge, upon a wind-wrung groan. The gilded face, a moon in midnight's shroud, Reflects the specters, gathering in the cloud. Its heavy hands, like claws of ancient night, Drag moments down to everlasting blight. And with each stroke, a whisper, cold and deep, "Remember, mortal, where thy secrets sleep.

The Eternal Timekeeper
In shadows dark where silence reigns supreme, The ancient clock its vigil keeps alone; With pendulum that swings as in a dream, And midnight chimes in somber monotone.
Its weathered face, like spectres from the tomb, Counts endless hours with hands of tarnished brass; While phantoms dance across the dusty room, As mortal moments mercilessly pass.
Each doleful stroke reverberates with dread, A knell that marks our journey to the grave; Its ceaseless rhythm echoes in my head— Time's cruel master, and we but its slave.

In twilight halls, a sentinel stands,
The grandfather clock, with solemn hands.
Its face a moon, with numbers pale,
Marking the hours, with mournful tale.
Its chimes, a dirge, that echoes through,
A melancholy sound, that whispers "Flee!"
Yet, we are drawn, to its steady beat,
A morbid fascination, that our souls repeat.
In darkness, it looms, a specter tall,
A reminder of time's relentless call.
Its tick, a countdown, to our final sleep,
A haunting presence, that our hearts do keep.

Beneath the shadows' wailing cry, Stood the clock with hollowed eye. Its pendulum swayed, a mournful knell, Whispering secrets it dared not tell.
Each tick, a dirge for hours lost, Each tock, a reminder of time's cruel cost. Its mahogany face, solemn and drear, Spoke of ages wrapped in fear.
Oh, grandfather of the ceaseless night, Your chime foretells no morning light. An eternal guardian, you softly moan, Marking hearts left cold, alone.

Upon the hearth where shadows creep,
There stands a clock so dark, so deep.
Its mournful chime, a hollow knell,
A whispered curse, a tolling spell.

Its pendulum, like fate, swings low,
A measured dirge of time’s cruel flow.
The midnight hour, a ghostly sigh,
As if the past still lingers nigh.
Its face is worn, its hands are slow,
Yet secrets in its echoes grow.
Oh, dread device of time’s cruel art,
Thy chime strikes terror in my heart!

In the shadowed hall, it stands alone,
A sentinel of time, its presence known.
With pendulum's swing, it whispers low,
Of moments lost, and futures slow.
Its chimes, like ghosts, in midnight air,
Echo tales of despair and care.
Each toll a knell, a mournful sound,
Of hours passed, to the cold ground.
Beneath its face, the hands do creep,
Through endless nights, where shadows sleep.
A timeless warden, it guards the years,
A silent witness to joy and tears.
Yet in its heart, a secret lies,
Of fleeting life, and swift goodbyes.

In twilight halls, where shadows play,
A sentinel of time stands tall today,
The grandfather clock, with solemn face,
Guards secrets of the past in a silent space.
Its chimes echo through the midnight air,
A mournful sound that whispers despair,
The hours tick by with a funeral pace,
As darkness gathers in a solemn place.
In moonlit nights, its shadow looms near,
A specter watching all who come and fear,
The clock's heartbeat marks the passage of time,
A reminder of mortality's dark rhyme.
 
In shadows deep, where silence dwells profound,
The grandfather clock with mournful chime doth stand,
Its pendulum swings o’er time’s unyielding ground,
A sentinel grim of fate’s relentless hand.
Each tick resounds, a dirge through hollow halls,
Each tock a knell for moments lost to gloom,
Its face, a skull, with eyeless stare enthralls,
Presiding o’er the heart’s eternal doom.
The midnight hour creeps forth in sable shroud,
Its brazen voice a raven’s dread refrain,
And trapped within its gears, the soul is cowed--
Forever bound to time’s unceasing chain.
0 Comments

DeepSeek

1/27/2025

0 Comments

 

DeepSeek
​

The definition of panic is: “Sudden uncontrollable fear or anxiety, often causing wildly unthinking behavior.”, but that does little to shine light on what is causing the panic or the circumstances leading up to said panic.  Today’s ‘panic’ was caused by a Hangzhou, China Ai research lab, less than 2 years old, that was spun off of a high-profile quant hedge fund.  Their most recent model, DeepSeek (pvt) V3 has been able to outperform many of the most popular models and is open source, giving ‘pay for’ models a new competitor that can be used to develop AI applications without paying a monthly or yearly fee.  By itself, this should be added to the list of worries that AI model developers already consider, but there are a number of existing AI models that are open source and they have not put OpenAI (pvt), Google (GOOG), Anthropic (pvt), or Meta (FB) out of business.  It is inevitable that as soon as new models are released, another one comes along that performs a bit better.  But that is not why panic has set in today.
We believe that valuation for Ai companies is much simpler than one might think, as any valuation, no matter how high, is valid only as long as someone else is willing to find a reason to justify a higher valuation.  Models that help with valuation in the Ai space tend to extrapolate sales and profitability based on parameters that don’t really exist yet or are so speculative as to mean little.  There are some parameters that are calculable, such as the cost of power, or the cost of GPU hardware today, but trying to estimate revenue based on the number of paying users and the contracted price for AI CPU time 5 or 10 years out is like trying to herd cats.  It’s not going to go the way you think it is.
One variable in such long-term valuation models is the cost of computing time and the time it takes to train the increasingly large models that are currently being developed.  In May of 2017 the AlphaGo Zero model, the leading model at the time, cost $600,000 to train.  That model, for reference, had ~20m parameters and two ‘heads’ (Think of a tree with two main branches), one which predicted the probability of playing each possible move, and the other estimating the likelihood of winning the game from a given position.  While this is a simple model compared to those available today, it was able to beat the world’s champion Go player based on reinforcement learning (the ‘Good Dog’ training approach) without any human instruction in its training data.  The model initially made random moves and examined the result of each move, improving its ability each time, without any pre-training.
In 2022 GPT 4, a pre-trained transformer model with ~1.75 trillion[1] parameters, cost $40m in training costs, and a 2024 training cost study estimated that the training cost for such models has been growing at 2.4x per year since 2016 (” If the trend of growing development costs continues, the largest training runs will cost more than a billion dollars by 2027, meaning that only the most well-funded organizations will be able to finance frontier AI models.[2]”).  There are two aspects to those costs, first, the hardware acquisition cost, with ~44% for computing chips, primarily GPUs (Graphics processing units), which are used to process data rather than graphics.  Additionally, ~29% is for server hardware, 17% for interconnects, and ~10% for power systems. The second is the amortized cost over the life of the hardware, which includes between 47% and 65% for R&D staff, but runs between 0.5x and 1x of the acquisition costs.
All in, as models get larger, training gets more expensive, and with many Ai companies still experimenting with fee structure, model training costs are a critical part of the profitability equation, and based on the above, will keep climbing, making profitability more difficult to predict.  That doesn’t seem to have stopped Ai funding or valuation increases, but that is where DeepSeek V3 creates a unique situation.
The DeepSeek model is still a transformer model, similar to most of the current large models, but it was developed with the idea of reducing the massive amount of training time required for a model of its size (671 billion parameters), without compromising results.  Here’s how it works:
  • Training data is tokenized.  For example a simple sentence might be broken down into individual words, punctuation, spaces, etc., or letter groups, such as ‘sh’, ‘er’, or ‘ing’, depending on the algorithm.  But the finer the token, the more data processed, so tradeoffs are made between detail and cost.
  • The tokens are passed to a gate network which decides which of the expert networks will be best to process that particular token.  The gating network, acting as a route director, choosing the expert(s) that have done a good job with similar tokens previously.  While one might think of the ‘expert networks’ as doctors, lawyers, or engineers with specialized skills, each of the 257 experts in the DeepSeek model can change their specialty.  This is called dynamic specialization, and while the experts are not initially trained for specific tasks, the gate networks notices that, for example, Expert 17 seems to be the best at handling tokens that represent ‘ing’, and assigns ‘ing’ tokens to that expert more often.
Here's is where DeepSeek differs…
  • The data that the experts pass to the next level is extremely complex, multi-dimensional information about the token, how it fits into the sequence, and many other factors.  While the numbers vary considerable for each token, The data being passed between an expert network and an its ‘Attention Heads” can be as high as 65,000 data points (Note: This is a very rough estimate).
  • The Expert networks each have 128 ‘Attention heads’, each of which looks for a particular relationship within the that mass of muti-dimensional data that the Expert Networks pass to them.   They could be structural (grammatical), semantic, or other dependencies, but DeepSeek has found a way to compress that data being transferred from the experts to the attention heads, which reduces the amount of computational demand from the Attention Heads.  With 257 expert networks, each with 128 Attention heads and the large amount of data contained in each transfer, the compute time is the big cost driver for training.
  • DeepSeek has found a process (actually two processes) that can compress the data that each expert network is passing to it Attention Heads by compressing the multi-dimensional data.  Typically compression would hinder the Attention Heads’ ability to capture the subtle nuances that is contained in the data, but DeepSeek seems to have been able to use compression techniques that do not affect the sensitivity of the Attention Heads for those subtleties.  


[1] estimated

[2] Cottier, Ben, et al. “The Rising Costs of Training Frontier AI Models.” arXiV, arxiv.org/. Accessed 31 May 2024.
 
Picture
Looking back at the cost of training large models that we mentioned above, one would think that a model the size of DeepSeek (671 billion parameters and 14.8 trillion training tokens) would take a massive amount of GPU time and cost $20m to $30m, yet the cost to train DeepSeek was just a bit over $5.5m, based on 2.789 million hours of H800 time at $2.00 per hour, closer to the cost of much smaller models and outside of the expected range.  This means that someone has found a way to reduce the cost of training a large model, potentially making it easier for model developers to produce competitive models.  To make matters worse, in the case of DeepSeek, it is open source, which allows anyone to use the model for application development.  This undercuts the concept of fee-based models who expect to charge more for every increasingly large model, and justify those fees on the increasing cost of training. Of course the fact that such an advanced model is free makes the long-term fee structure models that encourage high valuations less valid.
We note that the DeepSeek model benchmarks shown below are impressive, but some of that improvement might come from the fact that the DeepSeek V3 training data was more oriented toward mathematics and code.  Also, we always remind investors that it is easy to cherry-pick benchmarks that present the best aspects of the model.  That said, not every developer requires the most sophisticated general model for their project, so even if DeepSeek did cherry-pick benchmarks (We are not saying they did), a free model of this size and quality is a gift to developers and the lower training costs are a gift to those that have to pay for processing time or hardware.  Its not the end of the AI era, but it might affect valuations and long-term expectations if DeepSeek’s compression methodology proves to be as successful in the wild as the benchmarks make it seem it might be, but the fact that this step forward in AI came from a Chinese company will likely cause ulcers and migraines across the US political spectrum and could cause even more stringent clampdowns on the  importation of GPUs and HBM to China, despite the fact that they don’t seem to be having much of an effect.
Picture
Figure 1 - DeepSeek V3 Benchmarks - Source: DeepSeek
0 Comments

Meta Mistake

1/24/2025

0 Comments

 

Meta Mistake
​

0 Comments

Meeting Moohan?

1/24/2025

0 Comments

 

Meeting Moohan?
​

​Yesterday we spent some time on how Samsung (005930.KS) has integrated AI into its just announced flagship Galaxy S Series smartphone line.  Aside from a lean toward on-device AI processing, similar to Apple (AAPL) Intelligence, there seem to be a concerted effort to move typical user physical tasks to the AI, particularly through voice control.  By this we mean by asking the AI to perform a task (“Summarize the transcript of the meeting I had with the marketing staff yesterday, write a cover letter, let me check it, and then send the cover letter and meeting summary to my Tier 3 e-mail list.  Send me a confirmation when its done”).  In terms of workflow, without the AI and the tight integration with Samsung applications, this would have entailed:
  • Opening the app that recorded the meeting.
  • Opening an app that could create a transcript of the meeting and creating and saving the transcript.
  • Opening a word processor and loading the transcript.
  • Editing the transcript into a summary and saving.
  • Opening an e-mail app.
  • Creating a cover letter.
  • Defining the send list on the cover letter.
  • Attaching the summary to the cover letter and sending
  • Closing all apps.
In theory, if all of the applications were either Samsung applications or 3rd party applications that were able to access the AI API, the new workflow would be:
  • Tell the AI what to do.
  • Check the summary and cover letter and edit it if necessary.
  • Do other stuff…
The AI is able to break your task down into its component parts and creates ‘agents’ to perform each part.  Because the AI is so closely integrated into the Samsung applications, the agents are able to open the necessary apps, perform their function, with the Ai directing the process.  The user does not have to open or close any apps or do any other work other than review, unless the summary or cover letter needs editing.  Note that we start the description of the ‘new’ workflow with ‘in theory’ as it is difficult to determine the actual level of AI/application integration until the phones are available for testing under Samsung’s 1UI 7 user interface.  We expect the level of integration might not be quite at this level yet but simplifying workflow, even a little, is what makes consumers think about upgrading.
The reason we continue to focus on Samsung’s level of Ai integration is because of Samsung’s upcoming release of its XR headset (aka ‘Project Moohan’ –( 프로젝트 무한 ) which translates to ‘Project Infinity’) seems to follow a similar path toward deep AI/application integration, particularly through its use of Android XR, Google’s (GOOG) new XR OS that will be used for the first time in the upcoming Samsung Moohan release.  While the details of Android XR are still sparse, the overall objective is to create a platform based on open standards for XR devices and smart glasses (AR) that uses existing Android frameworks, development tools, and existing Android elements (buttons, menus, fields, etc.), making it able to access all existing Android applications in a spatial environment. 
In a practical sense, it would allow existing 2D Android applications to be opened as 2D windows in a 3D environment, and with modification, can become 3D (Google is redesigning YouTube, Google TV, Google Photos, Google Maps and other major 2D applications for a 3D setting).  Android XR will also support a variety of input modes, including voice and gestures through head, eye, and hand tracking, and will support some form of spatial audio.
One feature of the Samsung XR headset that we believe will be well received is the visual integration with the AI.  Siri can hear what you say and can respond, but while it can ‘see’ what the user sees, it doesn’t have the capability to analyze that information on the fly and use it.  Meta’s headsets can hear what the user hears and perform a level of analysis for context, but that function is primarily for parsing voice commands.  Typically the Meta system does not access the camera information unless requested by the user and then takes a snapshot.  It is able to perform limited scene analysis (object recognition, lighting, depth, etc.) to allow for virtual object placement, but works specifically on the snapshot and only ‘sees’ what is in the real world, excluding virtual world objects. 
If the recent demo versions of the Samsung XR headset are carried through to the final product, the headset will hear and see, both real world and virtual objects, and analyze that information on a continuous basis.  This allows the user to say, “What kind of dog is that?” to the AI at any time and have the AI respond based on a continuous analysis of the user’s visual focus.  The user can also ‘Circle to search’ an object within view with a gesture, as the Ai also recognizes virtual objects (the circle) as well as the real-world data.  According to Google, the embedded AI in the Samsung headset also has a rolling ~10-minute memory that enables it to remember key details of the user’s visuals, which means you can also ask “What kind of dog was that in the window of the store we passed 5 minutes ago?” without having to go back to the store’s location.
We know there will be limitations to all of the features we expect on both the Samsung Galaxy S series smartphones and the Samsung XR headset, but, as we noted yesterday, Samsung seems to understand the concept that both the AI functionality and the user’s Ai experience are based on how tightly the Ai is integrated into the device OS and the applications themselves.  This desire has led them to work closely with Google and that allows users to use familiar Android apps, along with those specifically designed or remodeled for the spatial environment.  Hopefully they price it right at the onset, learning from the poor Vision Pro results, but we will have to wait a few more weeks to find out.
0 Comments

What’s It Worth?

1/23/2025

0 Comments

 

What’s It Worth?
​

Back in 2017 a paper on the use of AI in video generation written by two computer science professors spurred two European entrepreneurs two form Synthesia (pvt), a company that began operation with a AI dubbing tool that used computer vision to create natural mouth movements that could be adapted to a variety of languages.  While the Ai dubbing tool generated sales, the company shifted gears in 2020 toward video creation.  The video tools they have developed are based on the concept that a 16-year old should be able to create a Hollywood level video using only a device in his or her bedroom.  Considering the cost of a short (1-2 min.) video can run anywhere from $1,000 to $10,000 depending on resources and quality, that was a tall order.
The company used the same computer vision system to create ~159 avatars that could be placed in templates with the ability to easily change features, such as style, skin color, gender, age, etc., all on a smartphone or laptop.  Given the previous focus on voice, the avatars can take the user’s script and can ‘perform’ that script with accurate and realistic mouth movements and facial expressions, and a synthetic voice.  But it does not stop there.  If the user is willing to record 5 to 10 minutes of audio, using text script provided by the company, the AI can clone your voice and transfer it to the avatar in over 140 languages.   The system can generate a video from a pdf, a PowerPoint presentation, or even a website, all on a laptop or phone, and the most recent version gives the user the ability to edit the video, match a screenshot, zoom, or transcribe voiceovers, so they seem to have realized their goal.
Synthesia says they have over 200,000 users, ranging from individuals to companies like Dupont (DD), Heineken (HEIA.NL), and Reuters (TRI), most of whom use the application to create inexpensive training videos.  Heineken used the service to create videos that trained employees in the basic concepts of continuous improvement, replacing PowerPoint presentations that were uninspiring.  Given that Heineken has over 90,000 employees in 170 countries, the cost of producing even simple videos in each language was enormous.  Using the Synthesia application, it took only a few minutes to duplicate the videos into any language. 
Picture
Figure 1 - Synthesia Avatar Examples – Source: Synthesia
The original concept of low-cost, easy to use video creation comes with three price tiers, a free (trial) that allows 3 minutes of video to be created but does not allow it to be downloaded, the  Starter tier, for $18/month, that allows for 120 minutes of video creation and does allow downloading, and the creator/business tier, for $64/month that allows for 360 minutes of video and all the bells and whistles.  That said, should one search the internet for Synthesia, a number of competitors show up that provide a similar service for a dollar or two more or less each month. 
The competition has not stopped Synthesia from raising money as can be seen in the table below, and with Nvidia (NVDA) and Kleiner Perkins on the investor list, the company has VC credibility.  While some say the valuation placed of AI application companies remains low (in comparison to the AI hardware space), Sythesia’s valuation at its Series C round was $1b and for it’s Series D round was $2.1b, making it the most valuable media AI company in the UK.  While we do not know the details of Synthesia’s financials for 2024, we know that the generated ₤8.6m in sales in 2022 and ₤25.7m in 2023 and posted losses of ₤4.5m and ₤23.5m in those years.  What’s it really worth?
Picture
0 Comments

Integration at Samsung

1/22/2025

0 Comments

 

Integration at Samsung
​

Samsung (005930.KS) announced its flagship smartphone line for 2025, the Galaxy S25 series today.  While the event detailed a number of hardware improvements, the real focus was on (not surprisingly) AI.  It seems that while Samsung continues to upgrade hardware to maintain a competitive stance against Chinese smartphone brands, we believe they understand that the incremental hardware improvements made each year are not enough to stimulate consumer to upgrade unless their phone begins to age out.  Foldables represent a new mobile category but due to their high price, volumes are relatively low, so it is essential that major brands maintain a flagship line to offset mid-range, lower margin lines.
There are times when hardware improvements can be a driver for consumers, as were OLED displays when they were new to the mobile world, and multiple cameras back a few years ago. But at this juncture, even OLED displays cannot be much larger and there is no new display technology on the near-term horizon that is appreciably different from what is available today.  Higher resolution cameras will always be possible as semiconductor technology improves, as will chipsets, CPUs, and GPUs, but other than foldables, mobile phone hardware will improve slowly and slowly does not excite consumers.
Smartphone software is in a similar position.  Smartphone applications have changed little over the past few years and do almost nothing to convince consumers to upgrade their mobile devices, but Ai for mobile devices is developing quickly and represents a platform where smartphone brands can compete and attract attention.  Ai does need processing power but Qualcomm (QCOM), Mediatek (2454.TT), Apple (AAPL), Google (GOOG), and Samsung continue to adapt their chipsets to the needs of AI, and while in the true sense, AI is a hardware-based system, on a mobile device it appears to consumers as software and requires relatively little hardware or mechanical changes or design restrictions.  However AI represents change and change in the CE market is something every marketing department looks toward to sell more devices.
Samsung seems to understand the fact that there are two AI’s.  One, answers your questions and interacts directly with users, while the other works silently in the background.  But they also seem to understand that the two should be working together and if there was anything to be taken away from the Samsung S25 event, it was that Samsung is interested in merging those two AI processes.  This not only improves the user’s experience with the phone but lessens the need for breakthrough hardware or software application improvement to attract consumers.  By leveraging Ai to allow applications on the phone to work together, the applications seem to be improved, even if they are not, and the ability of the AI to control or direct applications without the user having to open an application for each task is an improvement worth buying a new phone for.
Samsung’s multi-modal Ai allows the user to speak to the Ai directly (voice or even audio) and gives the AI the ability to create ‘agents’ that perform tasks that the user would typically have to do by pausing what they are doing to open a separate application.  Here’s an example.  The user is listening to a conference call which happens to be in Korean.  The Ai translates the call in real time but also compiles a transcript of the call.  When the call is finished the user tells the AI to summarize the call and reviews the AI summary.  The user tells the AI to change the 3rd paragraph to be more concise and reviews the change.  The user then tells the Ai to write a cover letter describing the circumstances of the call and to send the cover letter and the transcript summary to those in the ‘Level 2 Client’ list. 
Rather than having to open a number of applications to complete each part of the full task, the user either read or listened to the AI during each step and opened no applications.  The AI interfaced with the necessary applications and completed each task.  The system also gives the option to allow the Ai to collect information from other applications and devices in the Samsung ecosystem that can help it build a detailed profile of the user in order to make better or more personalized suggestions.
This can only happen when the AI is integrated into applications and Samsung has the advantage of having a recognizable enough brand that users are willing to use Samsung applications on its phones, along with a variety of externally developed applications.  Samsung offers external developers a number of tools to give them access to Samsung’s One UI 7, the user interface that sits on top of Android and provides the hooks to the AI, but there is nothing better than having that interface and the applications themselves developed in-house.  Only Samsung, Apple, and Google have the ability to tie their hardware and AI to such large application bases, with both Apple and Samsung concentrating on processing AI on device whenever possible.
As noted Samsung said all the right things about its AI at the Galaxy S25 event but everyday use can be much different from well produced event videos.  The Galaxy S25 family is now on pre-order and will be available in stores on February 7, at which point we should be able to get a better idea as to whether the phones live up to their marketing pitch.  The good news is that if they do, there is no premium being charged for the AI capabilities as the phones are priced the same as last year’s models, a plus for consumers.  Below we show only the differences between the hardware in the Galaxy S25 and last year’s Galaxy S24.
Picture
⁎ The 4th main camera (ultrawide) in the S25 Ultra is 50 MP f/1.9 while the 4th mani camera in the S24 is 12MP f/2.2.  The other three cameras are the same.
​
0 Comments

The 6th Sense?

1/21/2025

0 Comments

 

The 6th Sense?
​

​As we have noted previously, we find comparisons between humans and AI meaningless as human intelligence is based on sensory input and AI is not.  Yes, Ai systems can sort through vast fields of data far faster than our eyes and brain allow us, but that same Ai system cannot tell whether those numbers mean something other than what it has been told to look for..  It might be able to improve its ability to sort through those numbers, but it has no context other than what it was taught.  Humans, have limitations, but because they have context, are able to feel emotion in a musical piece that an AI would not.  They are able to see the emotion in Edvard Munch’s “The Scream”, without knowing the color value of every pixel, and humans can know not to sit next to someone on the subway whose nose is running.
AI’s need senses if they are ever going to challenge human intelligence and creativity, and all the tokenization of literature, images, videos, and Instagram posts cannot help.  Yes, they might be able to answer a set of test questions better than a high-school student or even a college professor, but Ai art  is not art, its copying without emotion.  That said, what would happen if AI systems wer given senses?  What if they could ‘see’ the outside world in real-time? What if they could hear the sound of a real laughing child and at the same time see what made the child laugh?  Could they begin to develop an understanding of context?  It’s a difficult question to answer and likely would require all of the human senses working together to truly allow the AI to understand context, but we are certainly not there yet, and finding ways to grant AI systems the sense of touch and smell are challenging at best.
There are plans to give AI systems a better sense of sight, by collecting data from moving vehicles.  The data becomes part of a real-world model that strives to identify ‘how things work’, which can then be used to train other models.  However for a model to be ‘real-world’,  it has to be huge.  Humans take in massive amounts of sensory ‘noise’ that has only a minute influence on decisions, but is essential in understanding how the world works.  Much of that ‘noise’ is incomplete, ambiguous, or distracting but is part of the context we need to handle the uncertainty that our complex environment brings.  Of couse, efficiency is also important, and humans have the sometimes dubious ability to filter out noise, while retaining the essence of an situation, something Ai systems would have to be programmed to do, and with all the noise being fed to a real-world model, the storage and processing needed would be astronomical.
Ethics are a hard concept to explain, and building algorithms that contain ethics are prone to bias, Humans are also prone to bias, but they are typically taught lessons in ethics by others around them.  Whither they respond with their own interpretation of those ‘lessons’ or just mimic what they see, is a human issue.  Ai systems form biases only based on their training data and their algoritms, so while a real-world model might tell the AI that driving a vehicle into a concrete wall triggers a rule of physics, it doesn’t tell them that they should feel regret for doing so when they have borrowed that vehicle from their parents.  Humas also continue to learn, at least most do, so Ai real-world models must be ever expanding to be effective and that requires more power, more processing speed, and more storage.
So the idea of a general real-world model has a number of missing parts.  That said, real-world ‘mini-models’ are a bit more feasible.  Rather than trying to model the unbelieveable complexity of the real world, building smaller models that contain sensory data that is relevant to a particular application is, at least, more realistic.  We can use visual (camera) data to control stoplights, but those systems react poorly to anomolies and that is where additional sensory data is needed.  Someone crossing against the light might be looking at the street to avoid traffic, but at the same time can hear (no earbuds) the faint sound of a speeding car that has yet to hit their peripheral vision, and that information, as unsignificant as it might be to someone walking on the sidewalk, becomes very important to the person crossing against traffic. 
Real-world models that try to mimic real-world situations must have sensory information and the ability to filter that information in a ‘human’ way, so the development of real-world models without more complete sensory information will not produce the human-like abilities to react to the endless number of potential scenarions that every second of our lives provides.  Networks could help AI’s gather data, but until the AI is able to feel the stroke of camel hair on canvas, smell the paint and see the yellow of a sunflower, they cannot understand context in the truest sense, something humans begin to learn before their first birthday.  We expect mini-real-world models will be effective for lots of applications but without sensory input, real-world context is a dream.
0 Comments

Not So Smart

1/17/2025

0 Comments

 

Not So Smart
​

”AI will be the most transformative technology since electricity.” (Eric Schmidt); "AI will not replace jobs, but it will change the nature of work." (Kai-Fu Lee); "AI will not replace humans, but those who use AI will replace those who don't." (various); "AI will be the most powerful technology ever created, and it will fundamentally alter the way we live, work, and interact." (Andrew Ng).  These are quotes about how AI will change the world from some very smart/ successful people, although a group that is heavily ‘invested’ in AI technology, giving them a bit of bias.  We certainly don’t denigrate the fact that AI has been able to help understand the complexities of human genetic code, improve weather forecasting, help to develop new materials, and is able to comb through vast amounts of data to find patterns we humans might have missed.  But while the AI community might want consumers to believe that AI is the Mighty Mouse (“Here I come to save the day…”)[1]of the 21st century, its not that easy.
In order for AI to fulfill all the hopes and dreams of its supporters, it not only has to be fast (it is), but it has to be able to work 24/7 (it can), able to learn from its mistakes (sometimes), and has to be correct 99.9% of the time (it’s not).  But the business end of AI does not have the patience to wait until AI is able to meet those specifications and has ushered us into the world of AI as a tool for getting a leg on the competition.  CE companies are among the most aggressive in promoting AI, and the hype continues to escalate, but the reality, at least for the general public, is a bit less enthusiastic, despite initially high expectations.  In a 2024 survey, businesses indicated that 23% found that Ai had underperformed their expectations, 59% said it met their expectations, and 18% said it exceeded their expectations,[2] with only 37% stating that they believe their business to be fully prepared to implement its AI strategy (86% said it will take 3 years), a little less enthusiastic than the hype might indicate.
From a business standpoint the potential issues that rank the highest are data privacy, the potential for cyber-security problems, and regulatory issues, while consumers seem to be a bit more wary, with only 27% saying they would trust AI to execute financial transactions and 25% saying they would trust AI accuracy when it comes to medical diagnosis or treatment recommendations.  To be fair, consumers (55%) do trust AI to perform simple tasks, such as collating product information before making a purchase and 50% would trust product recommendations, but that drops to 44% concerning the use of AI support in written communications[3].  Why is there a lack of trust in Ai at the consumer level?  There is certainly a generational issue that has to be taken into consideration, and an existential fear (end of the world’) from a small group, but there seems to be a big difference between the attitude toward AI among business leaders and consumers, and a recent YouGov survey[4] points to why.
US citizens were asked a number of questions about their feelings toward AI in three specific situations: making ethical decisions, making unbiased decisions, and providing accurate information.  Here are the results:


[1] - [1], Fair use, https://en.wikipedia.org/w/index.php?curid=76753763
 

[2] https://www.riverbed.com/riverbed-wp-content/uploads/2024/11/global-ai-digital-experience-survey.pdf

[3] https://www.statista.com/statistics/1475638/consumer-trust-in-ai-activities-globally/
 

[4] https://today.yougov.com/technology/articles/51368-do-americans-think-ai-will-have-positive-or-negative-impact-society-artificial-intelligence-poll
Picture
Figure 2 – YouGov AI Survey – Ethical Decisions – Source: SCMR LLC, YouGov
Picture
Figure 3 -0 YouGov AI Survey – Unbiased Decisions – Source: SCMR LLC, YouGov
Picture
Figure 4 - YouGov AI Survey –Accurate Information – Source: SCMR LLC, YouGov
It is not surprising that many Americans do not trust AI to make ethical decisions for them, but over 50% of the US population does not think AI systems are making unbiased decisions, and we expect that is without the more detailed understanding of AIs that might lead one to an even higher distrust.  That said, we were surprised that 49% of Americans believe that AIs were providing accurate information, against 39% who disagreed.  We believe that the push to include AI in almost every CE product as a selling point this early in the development of AI systems that interface with users, will do little to convince users that the information they receive from AI systems is accurate, and has the possibility of reducing that level of comfort. 
LLMs and AI Chatbots have become so important from a marketing standpoint that few in the CE space can resist using them, even if their underlying technology is not fully developed.  Even Apple (AAPL), who tends to be among the last major CE brand to adopt new technology, was forced into providing ‘Apple Intelligence’, a brand product that was obviously not fully developed or tested. While Apple uses AI for facial and object recognition, to assist Siri’s understanding of user questions, and to suggest words as you type, there was no official name for Apple’s AI features until iOS 18.1, when the name ‘Apple Intelligence’ was used as a broad title for Apple’s AI.  The two main AI functions that appeared in iOS 18.1 were notification summaries, and the use of AI to better understand context in Apple’s ‘focus mode’.  iOS 18.2 added Ai to improve recognition in photo selection, gave Siri a better understanding of questions to improve its suggestions, and allowed users to use natural language when creating ‘shortcuts’, essentially a sequence of actions to automate a task, and also enhanced the system’s ability to make action suggestions as the shortcut was being formulated.
None of these functions are unusual, particularly the notification summaries, which are similar to the Google (GOOG) search summaries found in Chrome, but there was a hitch.  It turns out that Apple AI was producing summaries of news stories that were inaccurate, with the problem becoming most obvious when Apple’s AI system suggested that the murderer of United Healthcare’s CEO had shot himself, causing complaints from the BBC.  Apple has now released a beta of iOS 18.3, that disables the news and entertainment summaries and allows users to remove summary functions on an application-by-application basis.  It also changes all AI summaries to italics to make sure that users can identify when a notification is from a news source, or is an Apple Intelligence AI generated summary.
While this is an embarrassment for Apple, it makes two points.  First, AI systems are ‘best match’ systems.  They match queries against what their training data looked like and try to choose the letter or word that is most similar to what they have seen in their training data.  This is a bit of an oversimplification, as during training the AI builds up far more nuanced detail than a letter or word matching system (think “What would be the best match in this instance, based on the letters, words and sentences that have come before this letter, or word, including those in the previous sentence or sentences?”), but even with massive training datasets, AI’s don’t ‘understand’ more esoteric functions, such as implications or the effect of a conclusion, so they make mistakes, especially when dealing with narrow topics. 
Mistakes, sometimes known as hallucinations, can be answers that are factually incorrect or unusual reactions to questions.  In some cases the Ai will invent information to fill a data gap or even create a fictionalized source to justify the answer, even if incorrect.  In other cases the Ai system will slant information to a particular end or sound confident that the information is correct, until it is questioned.  More subtle (and more dangerous) hallucinations appear in answers that sound correct on the surface but are false, making them hard to detect unless one has more specialized knowledge of a topic.  While there are many reasons why AI systems hallucinate, AI’s struggle to understand the real world, physical laws, and the implications surrounding factual information.  Without this knowledge of how things work in the real world, AIs will sometimes mold a response to its own level of understanding, coming up with an answer that might be close to being correct but is missing a key point (Think of a story about a forest without knowing about gravity…” Some trees in the forest are able to float their leaves through the air to other trees…”.  Could it be true?  Possibly, unless there is gravity)
Second, it erodes confidence in AI and can shift consumer sentiment from ‘world changing’ to “maybe correct’ and that is hard to recover from.  Consumers are forgiving folks and while they get hot under the collar when they are shown that they are being ignored, lied to, or overcharged, brands know enough to lay low for a while and then jump back on whatever bandwagon is current at the time, but ‘fooled once, fooled twice’ can take a while to dissipate.  AI will get better, especially non-user facing AI, but if consumers begin to feel that they might not be able to trust AI’s answers, the industry will have to rely on the enthusiasm of the corporate world to support it and given the cost of training and running large models, we expect they will need all the paying users they can find.  Don’t overpromise.
Picture
0 Comments

Body Snatchers

1/15/2025

0 Comments

 

Body Snatchers
​

AI systems are always searching for data and with more models being trained on similar or the same basic datasets, the search for ‘fresh data’ is of great importance to model builders.  As we have previously noted, the quality of model training data has considerable consequence in keeping the model from becoming ‘uncreative’ and losing its ability to generalize when it sees new or previously unseen data.  As we have also noted, model builders use the internet to harvest data by sending out bots to scrape data from websites that seem to have ‘fresh’ data that would help to keep the training data from becoming stale.
This is not a problem for large sites, but it can become a serious problem for small ones, as the bots are inherently impatient, some making tens of thousands of server requests to try to download information from the site quickly.  This can overload the server, which was not designed for such high-volume traffic and can crash the site.  Further, the bots use a large number of IP addresses, which keeps them under the radar of those systems that look for high volume requests from a single IP address.  In theory such bots are not supposed to crawl sites that have a paywall and are not supposed to collect any data that would allow for the tracking of personal identities.  A simple file on the website called Robots.txt tells bots what they can and cannot look at on the site or  can limit their access based on their IP.  That said, it is imperative for that file to be correctly configured, even if there are warnings about scraping the site in other places, or the bots will scrape everything on the site.
Here's the example (real):
A small company with only seven employees has spent 10 years building a database of 3D image files it has scanned from human models (with their permission).  These are 3D files and images of hands, feet, and other body parts, all the way to full body scans.  They sell these images, which can include a variety of facial expressions or movements, with over 65,000pages of content, each with at least three images per page.  They sell these images to 3D artists, game developers, or anyone who needs images with real human features.
Picture
Figure 4 - Sample page - Source: Triplegangers
Unfortunately a recent visit to the site by OpenAI’s (pvt) GPTBots sent tens of thousands of server requests in order to download the entire content of the site.  As the site requires payment to download its content, the bot should not have been able to make those requests, but it did and it crashed the site, which also had a Robots.txt file and a clear Code of Conduct and Terms of Use that strictly forbid scraping.  With the bot using different IP addresses for each request, it seems to the security software that they are coming from multiple users, and the only way to figure out how to block those and other crawlers is to spend days working through each server request to confirm its legitimacy.  In the meanwhile, the site was down, potentially the rights of the human models have been violated, and the site will receive a huge bill from Amazon (AMZN) for the massive server surge that the bot caused.  To make it worse they still have not found a way to get OpenAI to delete the material, other than sending an official request.
As it turns out, most small sites don’t know that they have been scraped as some bots are more subtle in making content requests to the server.  If they don’t cause a server overload, the only way one would know that there proprietary data was scraped would be by manually searching through pages of server logs, something small sites just don’t have the time to do.  So while there are ‘good’ bots that observe rules and keep themselves under control, there are ‘bad’ bots that just hammer away at sites and can cause the damage indicated above.  It is almost impossible to guard against the wide variety of crawlers that are developed almost daily and the very aggressive needs for ‘fresh data’, so small sites remain at risk to this AI menace  This was a real case of bodysnatching…
Picture
Figure 5 - Bodysnatchers - Source: https://mymacabreroadtrip.com/
0 Comments

Will AI Cause the End of Social Media?

1/14/2025

0 Comments

 

Will AI Cause the End of Social Media?
​

We have written a number of times about deepfakes, those seemingly real images created by AI systems.  They are, at the least, annoying, but also erode what confidence in social media’s ‘validity’ that currently exists.  If you cannot believe what you see, then why bother to look, especially as Ai systems improve their ability to create even more accurate images. There will always be those who don’t care if the image is real, as long as it provides a few moments of entertainment, and perhaps generates some focus on those who circulate them, but while deepfakes are a problem, and a difficult one to solve, there is a bigger one.
In order for models to increase their accuracy, they need more examples.  This could mean more examples of text from famous authors, more annotated images of gazebos, dogs, ships, flagpoles, or more examples of even more specific data, such as court cases or company financial information.  Current Large Language Models (LLM) are trained on textual and code datasets that contain trillions of words, but the constantly expanding sum total of human text is loosely estimated in the quadrillions, so even a massive training dataset would represent less than a 10th of a percent of the corpus of human text.  It would seem that the chance that model builders will run out of data for training models would be something of concern far in the future, but that is not the case.
Models are now able to scrap information from the internet, which is eventually added to its training data when fine-tuned or updated.  The next iteration of the model is trained recursively, using the previous models’ expanded dataset, so Model V.2 generates output based on model V.1’s original training data and what it found on the internet.  Model V.3 uses model V.2’s expanded dataset, including what it finds on the internet to derive its own output, with subsequent models continuing that process.  This means that while model V.1 was originally trained on ‘reality, adding data from the internet, which we might loosely call ‘mostly realistic’ taints that models output slightly, say from ‘realistic’ to ‘almost completely realistic’.  Model V.2’s input is now ‘almost completely realistic’ but its output is ‘mostly realistic’ and with that input for the next iteration, model V3, its output is ‘somewhat realistic’.
Of course these are illustrations of the concept, but they do represent the degenerative process that can occur when models are trained on ‘polluted’ data, particularly data created by other models.  The result is model collapse, which can happen relatively quickly as the model loses information about the distribution of its own information over time.  Google (GOOG) and other model builders have noted the risk and have tried to limit the source of internet articles and data to more trustworthy sources, although that is a subjective classification, but as the scale of LLMs continues to increase the need for more training data will inevitably lead to the inclusion of data generated from other models and some of that data will come without provenance.
There is the possibility that the AI community will coordinate efforts to certify the data being used for training, or the data being scraped from the internet, and will share that information.  But at least at this point in the Ai cycle, model builders cannot even agree what data needs to be licensed and what does not, so it would seem that adding internet data will only hasten the degradation of LLM model effectiveness.
How does this affect social media?  It doesn’t.  Social media has a low common denominator.  The point of social media is not to inform and educate, it is to entertain and communicate, so there will always be a large community of social media users that don’t care whether what they see on social media is accurate or even real, as long as it can capture attention for some finite period of time and possibly generate status for the information provider, regardless of its accuracy.  Case in point the ‘Child under the Rubble’ photo we showed recently or the image of the Pentagon on Fire that was circulated months ago, both of which were deepfakes.
 In fact, we believe it is easier to spot deepfakes than it might be to spot inaccuracies or incorrect information from textual LLMs, as specific subject knowledge would be required for each LLM output statement.  It is a scary thought that while the industry predicts that model accuracy will continue to improve until models have the ability to far surpass human response accuracy, there is the potential that models will slowly (or rapidly) begin to lose sight of the hard data on which they were originally trained as it becomes polluted with less accurate and less realistic data; sort of similar to the old grade school game of telephone.  If that is the case, social media will continue but the value of LLMs will diminish.
 
Picture
Figure 5 - The Telephone Game - Source: Normal Rockwell
0 Comments
<<Previous
Forward>>

    Author

    We publish daily notes to clients.  We archive selected notes here, please contact us at: ​[email protected] for detail or subscription information.

    Archives

    May 2025
    April 2025
    March 2025
    February 2025
    January 2025
    January 2024
    November 2023
    October 2023
    September 2023
    August 2023
    June 2023
    May 2023
    February 2023
    January 2023
    December 2022
    November 2022
    October 2022
    September 2022
    August 2022
    July 2022
    June 2022
    May 2022
    April 2022
    March 2022
    February 2022
    January 2022
    December 2021
    November 2021
    October 2021
    September 2021
    August 2021
    July 2021
    June 2021
    May 2021
    April 2021
    March 2021
    February 2021
    January 2021
    December 2020
    October 2020
    July 2020
    May 2020
    November 2019
    April 2019
    January 2019
    January 2018
    August 2017
    July 2017
    June 2017
    May 2017
    April 2017
    March 2017
    February 2017
    January 2017
    November 2016
    October 2016
    September 2016

    Categories

    All
    5G
    8K
    Aapl
    AI
    AMZN
    AR
    ASML
    Audio
    AUO
    Autonomous Engineering
    Bixby
    Boe
    China Consumer Electronics
    China - Consumer Electronics
    Chinastar
    Chromebooks
    Components
    Connected Home
    Consumer Electronics General
    Consumer Electronics - General
    Corning
    COVID
    Crypto
    Deepfake
    Deepseek
    Display Panels
    DLB
    E-Ink
    E Paper
    E-paper
    Facebook
    Facial Recognition
    Foldables
    Foxconn
    Free Space Optical Communication
    Global Foundries
    GOOG
    Hacking
    Hannstar
    Headphones
    Hisense
    HKC
    Huawei
    Idemitsu Kosan
    Igzo
    Ink Jet Printing
    Innolux
    Japan Display
    JOLED
    LEDs
    Lg Display
    Lg Electronics
    LG Innotek
    LIDAR
    Matter
    Mediatek
    Meta
    Metaverse
    Micro LED
    Micro-LED
    Micro-OLED
    Mini LED
    Misc.
    MmWave
    Monitors
    Nanosys
    NFT
    Notebooks
    Oled
    OpenAI
    QCOM
    QD/OLED
    Quantum Dots
    RFID
    Robotics
    Royole
    Samsung
    Samsung Display
    Samsung Electronics
    Sanan
    Semiconductors
    Sensors
    Sharp
    Shipping
    Smartphones
    Smart Stuff
    SNE
    Software
    Tariffs
    TCL
    Thaad
    Tianma
    TikTok
    TSM
    TV
    Universal Display
    Visionox
    VR
    Wearables
    Xiaomi

    RSS Feed

Site powered by Weebly. Managed by Bluehost