TekSoru

May 22, 2009

Character Confusion

Filed under: Uncategorized — Tags: — admin @ 9:14 am

In the beginning…

…there was ASCII, and it was good.

Computer technology accelerated quickly in the United States, and accordingly so did certain standards. Foremost was the decision to codify the basic unit of data in a byte (1). A byte was large enough to hold all characters in the English language as well as all digits, common punctuation, and still have room left over. In the end, the American National Standard Code for Information Interchange (ASCII) was devised to standardize how computers would store and communicate a, b, c, 1, 2, 3…

But anything as useful as a computer could not remain the province of one country or language, so software systems evolved to support people around the word. The big problem was… well… big characters.

English has an amazingly compact alphabet – just 26 characters. Double that to allow for capital and lower case, and toss in digits 0 through 9, and you get a whopping 82 possible combinations before including punctuation. Since a byte can hold 256 different representations, ASCII and a one-byte-per-character system worked just fine for Americans, using 1/2 less than space available in a single byte.

But it didn’t work for the Japanese, Chinese, and a number of cultures around the globe. Depending on the source, the idiomatic Chinese language can have upwards of 80,000 distinct characters. Using basic binary math, we see that instead of one byte for every character, Chinese computers would need to use upwards of three bytes. Add other languages and regional variations, and you had a mess. So different computer manufacturers, standards organizations, and government agencies went forward to solve this problem.

Digital Tower of Babble

The nice thing about standards is that you have so many to choose from!

In the rush to support all possible character sets, several different systems for codifying characters came into existence. This of course meant that if you created software on one operating system, it likely would not run on another. This made exporting software an absurd business since basic functions – like sorting strings of characters – would have to change from system to system and language to language.

But time and market dynamics have helped reduce this hodge-podge of character sets to a manageable few, with some obvious choices. Here we document those that really matter.

ASCII

As noted, ASCII is the primordial character set. It serves all English speaking countries, and with common extensions in the extra storage provided in one byte of data, even local variations (such as the British Pound symbol – £ – or common European characters – ö) can be accommodated. By using the spare either bit, ASCII was extended to include characters for other languages such as Cyrillic, Arabic, Greek, Hebrew. If your product will never be sold outside of the US and Western Europe, then ASCII may be sufficient. Just remember, never is a long, long time.

Double byte – a nice concept, but…

Doubling the size of ASCII encoding – from one byte to two – would offer 65,536 possible combinations, as opposed to a mere 256. Though this is not enough to hold all possible characters sets of all languages, it would hold enough to make common communications possible (2).

But there was a problem, namely money. Not long ago, computer memory and storage was expensive. Computer programmers constantly searched for ways on economizing storage needs. This led to a number of half steps to a universal encoding scheme. Most notable was the multibyte system.

Multi-byte

Programmers, being the slick people they are, devised a complicated way of using a little space as possible for storing characters, yet allowing for language representation from compact English to the full range of Chinese.

However, for the sake of compactness, multi-byte added complexity. A language like Chinese might represent a character in one, two or three bytes depending on its position in a character table. Needless to say, this complicated even simple tasks like scanning text for specific elements, or sorting strings, or even displaying text on screen.

This is not to say that multi-byte systems are rare. The UTF-8 standard is common in systems that were born in the age of ASCII (UNIX being an obvious example). Multilingual web sites are often encoded in UTF-8, which provides both flexibility for supporting many languages as well as compactness in transmitting data across potentially slow internet connections.

The ideal solution would be one where all characters from all languages could be stored in identically sized units (i.e., the same number of bytes regardless of the language in use). Once again, time and market pressures addressed the problem.

Unicode – a double byte standard

As computer storage became cheaper (as everything associated with computers do over time), a more direct way of encoding was needed. Having a uniform character size simplified systems software, application programming, and a few grey hairs.

Back in the dark ages of computing – around 1986 – some bright fellows at Xerox started to map the relationships between identical Japanese and Chinese characters to quickly build a font for extended Chinese characters. This humble start led other developers and companies (notably Apple) to drive toward a new standard called Unicode.

Unicode fixes all characters at two bytes and carries the fundamental characters of most languages. This means one character encoding scheme can store and present text in any language. For example, Unicode contains characters for Latin, Arabic, Cyrillic, (Uni)Han, Hebrew, and more. More to the point, these characters maintain a specific place in character tables, with the original ASCII characters in their original positions (talk about backwards compatibility!).

Unicode has become the de facto standard for most software development. Using Unicode allows a company to develop new applications for English speaking countries first, and rapidly localize for other regions without fundamental coding changes. Unicode is so popular, it is supported by all modern operating systems including all commercial versions of UNIX, Linux, Windows, MacOS and the World Wide Web.

Take a byte

Though the history of character encoding on computers is haggard, the future is clear. Standards have shaken out and your options for new product development are few. It is best to design for internationalization from the start because the effort is little more than if you devolved back to ASCII.

(1) Actually, the most fundamental data element is a ‘bit’, but bits do not by themselves transfer anything that is truly ‘information’.

(2) Unicode, which we discussed earlier, does just this. By eliminating obscure and infrequently used characters, Unicode compacts all industrialized languages into two bytes.

Ian Henderson oversees Rubric and the creation of a better localization experience.

Ian combines a deep knowledge of globalization issues with an equally deep knowledge of technology and distributed team management. This combination of skills has been the foundation of Rubric, and has achieved the company’s unprecedented 98% customer retention rate and the highest satisfaction ratings in the industry. Ian’s opinion is often reported throughout the localization industry and has appeared in Multilingual Computing & Technology and Software Business.

Prior to joining Rubric, Ian worked in a variety of management and engineering positions at Siemens (Germany), Expert Software and Phoenix Software (New Zealand) and Berlitz (England). Ian has been with Rubric since the inception of the company in 1994. Ian co-founded Rubric in 1994.

http://www.Rubric.com

Heat Rises – Re-Using the Heat Generated by Servers

Filed under: Uncategorized — Tags: — admin @ 9:14 am

With the green movement getting more and more attention every day, data centers are starting to get more and more flack for the amount of power the use up and companies are looking for ways to start “greening up.”   Most of the initiatives that data facilities are taking involve recycling the power within the facility, using energy-efficient servers and cooling the server room more effectively.  These efforts are great and lead to less wasted energy and a more efficient system.

But with all the attention focused on saving, little is being paid to making better use of the existing heat generated by the servers.  As it stands, this heat gets wasted.  It’s removed from the facility by whatever cooling system the data center uses.  Most commonly this is in the form of air conditioning or water cooling.

Missed Opportunity

But what if there was another way – what if the heat generated by the servers could be put to better use?  There is.  The solution lies in using the heat generated by the servers and putting toward something useful, rather than simply discarding it in the atmosphere.

Energy is wasted all the time these days.  Companies spend thousands of dollars to remove the heat from their server room, while simultaneously spending more money to heat other parts of the building.  This wasted heat is a missed opportunity in energy efficiency.

There are a few companies that have seen this opportunity and have taken the initiative to put the heat generated by servers to good use.  There is no limit to the potential that this has when harnessed and used productively.

Heat Recirculation

One example of a company using the existing heat from the servers is Quebecor, a Canadian media company.  Located in Winnipeg, where temperatures frequently reach 35 degrees below zero, Celsius in the winter.  The company found that even during the coldest months of the year, they were spending thousands of dollars cooling their data center, despite the harshly cold temperatures outside.  At the same time, they were spending just as much money heating up other parts of the building.

It just didn’t make sense.  Fortunately the regional director for the company saw this as an opportunity for the company to reuse the heat generated by the servers which helps save energy all around and at the same time reduces costs.  So that’s exactly what he did.  Instead of merely cooling the server room, he implemented a system whereby the excess heat was harnessed and pumped into other parts of the building to provide heat.

The heat generated from the servers is used to heat the offices upstairs which host the Winnipeg Sun.  Additional heat is allocated to the warehouse across the street which is often chilly due to the door opening and closing all day.  Whatever heat is left over is put back outside.

Heating a Swimming Pool

Another company in Switzerland had a similar idea of putting heat generated by servers to good use.  This company is located about 25-feet underground and 500 feet away from a swimming pool.  Instead of wasting the heat generated by the servers, the company decided pump the heat into the nearby swimming pool in order to provide it with the necessary heat.    Using custom-made cooling units overhead and powered by 200 servers, they are able to use the hot exhaust air to heat up the water and then filter it into the pool.  This creates a practical use for the excess heat while simultaneously eliminating the need for the pool to use oil as its energy source.

Going Green(house)

The latest example of a data center putting their excess heat to good use is at the University of Notre Dame.  Raising energy costs are forcing the local greenhouse to shut down.  Currently it costs around $1,000 a day to heat the greenhouse, which politicians have decided is too much to ask taxpayers to pay.

That’s where Notre Dame comes in.  They’ve developed a new type of data center that’s housed in a standard shipping container and placed right next to the greenhouse.  The excess heat from the data center will be pumped into the greenhouse to heat it.  This will reduce the costs of running the greenhouse enough so that it won’t have to shut down.

As an added bonus, the data center will be able to use the humidity generated by the greenhouse in the fall, winter and spring.  This reciprocal relationship benefits not only the greenhouse, but the servers as well.

Future of Heat

Reusing heat from data centers has many applications and lots of potential as an energy-saving initiative.  Most companies aren’t thinking along these lines though.  But they should.    Creating more efficient servers and cooling systems, reducing the amount of heat generated, and reusing the heat that is generated are all integral components of an energy-efficient data center.  Companies have many options to choose from and reusing excess heat is yet another way they can help reduce the amount of waste produced in their facility.

Saleh Tousi is the CEO of SmarttNet, a Vancouver IT company offering comprehensive business Internet services including Canada Colocation since

Powered by WordPress