ID 176883026 © Issah1 | Dreamstime.com
680906d79a5eac2c9e71b052 Distributedcomputing Dreamstime L 176883026

Making 1.5 Bits Work for Large Language Models

April 23, 2025
“To infinity and beyond” using only three states: −1, 0, 1.

What you’ll learn:

 

Large language models (LLMs) are just one type of artificial intelligence/machine learning (AI/ML), but they along with chatbots have changed the way people use computers. Like most artificial neural networks (ANNs), LLMs use arrays of weights as part of matrix operations of multiple layers within their AI model. The weights can be integers or floating-point numbers. They usually aren’t the same precision as that used for calculations, although they can go there.

The better the precision, the better the results—in theory. This tends to be true for almost any numerical calculation. Issues like overflow and underflow also must be considered. Various numerical implementations such as variable length integers and ratios are useful in certain applications.

In general, higher precision requires more storage space and it takes longer to manipulate and calculate, so there’s a tradeoff between accuracy and power/performance. Optimizations in AI/ML applications like DeepSeek were accomplished by tailoring precision and accuracy to meet the demands of the application while staying within the limits of the hardware.

What’s driving this diversity in design is that the tradeoff between weight precision and the output of the models isn’t linear. Significantly shrinking the precision can result in minimal degradation of the model’s results overall. Optimizing the model architecture can improve overall performance even with a reduction in weight precision.

What is BitNet 1.58?

BitNet 1.58 is just one of models designed to minimize the size of the weight used. As noted, higher-precision floating point has advantages, but at a cost of space and performance that in turn affects system power requirements. Most floating-point hardware implementations used to follow the IEEE 754 standard. However, AI has had emerging de facto standards like BFloat, FP8, and FP16.

Integer weights have also shrunk with the 8-bit whole number integer, INT8 (−128 to 127), becoming the benchmark standard for AI/ML hardware acceleration. Of course, the smallest integer is a single bit, but that’s just 0 or 1. It takes 1.5 bits to encode −1, 0, and 1, which is where BitNet 1.58 and other models come into play.

>>Check out these TechXchanges for similar articles and videos

Dreamstime.com
tinyml_techxchange_promo_2
AI and machine learning, including tinyML, can run on microcontrollers and small SoCs.
Ruslan Batiuk | Dreamstime
Generative Ai Tech Xchange Promo Ruslan Batiuk Dreamstime 271129759
Generative artificial intelligence (AI) like chatbots are changing the way many use AI.

Keep in mind that the encoding is more related to storage and moving data rather than the calculation involved. For example, adding a 1-bit value to a 64-bit register actually adds two 64-bit values together, where one is limited to 63 bits of 0 and the 1-bit value.

Looking at the storage of a bunch of 1.5-bit values can just be a matter of two values sharing a bit. For example, a 3-bit value encodes eight states whereas our [−1, 0, 1] value range requires only three states. Check out the BitNet website if you need to see the details.

Why is [−1,0,1] Significant to Large Language Models?

Storage is a big factor in LLMs simply because of their size. Gigabytes of data is a killer for smaller microcontrollers because they simply don’t have that capacity even if the models are given the majority of storage in the hardware. Reducing the size of the weights reduces the memory footprint that in turn will reduce the overall power requirements due to storage.

Of course, storage is just part of the issue. Moving that data around takes power and instruction cycles. Applications that move lots of data often benefit from compression algorithms even with the overhead of compression and decompression. This is similar to what’s being done here with smaller weights, although the size reduction doesn’t use compression. Still, an encode/decode process is significantly simpler and easily implemented on most CPUs.

The big gain in efficiency and power actually comes from the calculations involved, which translate from a more expensive arithmetic operation to a simpler one. Consider multiplying by −1, 0, or 1. In two of the three cases, the result is a sign change, not a value change.

While things are a little more complex when talking about LLMs, the idea is the same in terms of optimization. What it does is make the operations so simple that AI hardware acceleration isn’t required to make the models usable on conventional CPUs.

What Do 1-Bit LLMs Mean for Machine Learning and IoT

The significant reduction in storage requirements and similar reduction in computational requirements means that suitably compact LLMs can work on microcontrollers, including those designed for the Internet of Things (IoT). Sensors are able to be processed on the edge. This AI edge computing can allow for local control or reduce the amount of data or frequency of communication with the cloud.

Keep in mind that LLMs are still large, so very small microcontrollers may lack the space or computational performance to handle many models. However, even these smaller platforms may be sufficient for this type of reduced LLM or even other ANN models that might have similar computational and storage reductions.

As with most software solutions, what can fit within the limits of the desired resources is something designers and programmers should consider. Many models will not fit these constraints, but now it’s possible that more solutions will work for a given set of hardware. Likewise, AI hardware acceleration may make other applications practical. The amazing thing these days is the amount of research being done to improve the software enough so that it’s practical on existing hardware.

>>Check out these TechXchanges for similar articles and videos

Dreasmtime.com
tinyml_techxchange_promo_2
Machine learning does not have to run on big servers in the cloud. This TechXchange presents articles and videos about TinyML.
Ruslan Batiuk | Dreamstime
Generative Ai Tech Xchange Promo Ruslan Batiuk Dreamstime 271129759
Generative artificial intelligence (AI) like chatbots are changing the way many use AI.
About the Author

William G. Wong | Senior Content Director - Electronic Design and Microwaves & RF

I am Editor of Electronic Design focusing on embedded, software, and systems. As Senior Content Director, I also manage Microwaves & RF and I work with a great team of editors to provide engineers, programmers, developers and technical managers with interesting and useful articles and videos on a regular basis. Check out our free newsletters to see the latest content.

You can send press releases for new products for possible coverage on the website. I am also interested in receiving contributed articles for publishing on our website. Use our template and send to me along with a signed release form. 

Check out my blog, AltEmbedded on Electronic Design, as well as his latest articles on this site that are listed below. 

You can visit my social media via these links:

I earned a Bachelor of Electrical Engineering at the Georgia Institute of Technology and a Masters in Computer Science from Rutgers University. I still do a bit of programming using everything from C and C++ to Rust and Ada/SPARK. I do a bit of PHP programming for Drupal websites. I have posted a few Drupal modules.  

I still get a hand on software and electronic hardware. Some of this can be found on our Kit Close-Up video series. You can also see me on many of our TechXchange Talk videos. I am interested in a range of projects from robotics to artificial intelligence. 

Sponsored Recommendations

Comments

To join the conversation, and become an exclusive member of Electronic Design, create an account today!