Accurate hexadecimal conversion requires proper handling of character encodings. Below are enhanced methods with encoding management capabilities.
Method 1: Encoding-Aware Conversion
public static String hexToString(String hex) {
StringBuilder output = new StringBuilder();
for (int i = 0; i < hex.length(); i += 2) {
String str = hex.substring(i, i + 2);
output.append((char) Integer.parseInt(str, 16));
}
return output.toString();
}
Enhanced version accepts charset parameter. Converts hex pairs to byte array first, then applies specified encoding. Supports UTF-8, ISO-8859-1, etc.
Method 2: BigInteger with Custom Encoding
public static String hexToText(String hex, Charset charset) {
byte[] bytes = new BigInteger(hex, 16).toByteArray();
return new String(bytes, charset);
}
Modified to accept charset parameter. Handles leading zero bytes automatically. Suitable for multi-byte encodings like UTF-16.
Critical Implementation Details
- Encoding specification: Always explicitly define charset (e.g., StandardCharsets.UTF_8)
- Byte order marks: Handle BOM manually for UTF-16/UTF-32 encodings
- Error handling: Use try-catch for UnsupportedEncodingException
- Binary safety: For raw byte data, use ISO-8859-1 encoding
Encoding Usage Examples
// UTF-8 for multilingual text
hexToString("4a617661", StandardCharsets.UTF_8); // Returns "Java"
// ISO-8859-1 for byte preservation
hexToText("c0ff", StandardCharsets.ISO_8859_1); // Returns 0xC0 0xFF as characters
// UTF-16BE for wide-character encoding
hexToString("004a006100760061", StandardCharsets.UTF_16BE); // "Java"
Test with encoding-specific values: "00a3" (£ in ISO-8859-1 vs "c2a3" in UTF-8). Performance tip: Reuse Charset objects for repeated conversions.
Common Java Character Encodings Reference
Encoding | Java Constant | Typical Use | Key Characteristics |
---|---|---|---|
UTF-8 | StandardCharsets.UTF_8 | Web applications, multilingual text | Variable-width (1-4 bytes), backward compatible with ASCII |
UTF-16 | StandardCharsets.UTF_16 | Java internal strings, legacy systems | Fixed 2/4 bytes, uses BOM (Byte Order Mark) |
ISO-8859-1 | StandardCharsets.ISO_8859_1 | Binary data preservation | 8-bit encoding, covers Western European languages |
US-ASCII | StandardCharsets.US_ASCII | Basic English text | 7-bit encoding, no special characters |
UTF-16BE | StandardCharsets.UTF_16BE | Network protocols | Big-endian byte order without BOM |
Windows-1252 | Charset.forName("Cp1252") | Western Windows systems | Superset of ISO-8859-1 with extra symbols |
GBK | Charset.forName("GBK") | Simplified Chinese text | Double-byte encoding, backward compatible with GB2312 |
Note: Always prefer StandardCharsets constants over string names for type safety. For Windows legacy encodings, use the exact charset name with Charset.forName().