w3hello.com logo
Home PHP C# C++ Android Java Javascript Python IOS SQL HTML videos Categories
Where can I a mapping of Identity-H encoded characters to ASCII or Unicode characters?
It is not always possible to extract text from a PDF especially when the /ToUnicode map is missing as pointed out by mkl. If it is not possible to cut and paste the correct text from Acrobat then you will have very little chance of extracting the text yourself. If Acrobat cannot extract it then it is very unlikely that any other tool can extract the text correctly. If you manually create an encoding table then you could use this to remap the extracted characters to their correct values but this most likely will only work for this one document. Often this is done on purpose. I have seen documents that randomly remap characters differently for each font in the dot. It is used as a form of obfuscation and the only real way to extract text from these PDF's is to resort to OCR. Ther

Categories : Pdf

Invisible characters - ASCII
How a character is represented is up to the renderer, but the server may also strip out certain characters before sending the document. You can also have untitled youtube videos like https://www.youtube.com/watch?v=dmBvw8uPbrA by using some kind of UTF character. The code block below should contain that character... ‌‌ I don't know how to identify what it is though :(

Categories : PHP

How to get Vim to highlight non-ascii characters?
Using range in a [] character class in your search, you ought to be able to exclude the ASCII hexadecimal character range, therefore highlighting (assuming you have hlsearch enabled) all other characters lying outside the ASCII range: /[^x00-x7F] This will do a negative match (via [^]) for characters between ASCII 0x00 and ASCII 0x7F (0-127), and appears to work in my simple test. For extended ASCII, of course, extend the range up to xFF instead of x7F using /[^x00-xFF]. You may also express it in decimal via d: /[^d0-d127] If you need something more specific, like exclusion of non-printable characters, you will need to add those ranges into the character class [].

Categories : Regex

Why are non-ASCII characters not equal?
You can use String#force_encoding to force a string into a specified encoding 2.0.0-p195 :001 > "xB1".encoding => #<Encoding:UTF-8> 2.0.0-p195 :002 > eight_bit = "xB1".force_encoding(Encoding::ASCII_8BIT) => "xB1" 2.0.0-p195 :003 > eight_bit.encoding => #<Encoding:ASCII-8BIT> 2.0.0-p195 :004 > eight_bit == "xB1" => false 2.0.0-p195 :005 > eight_bit.force_encoding(Encoding::UTF_8) == "xB1" => true 2.0.0-p195 :006 > eight_bit.force_encoding("xB1".encoding) == "xB1" => true Note the default encoding for Ruby 2.0.0 is UTF-8

Categories : Ruby

how to convert from decimal to ASCII characters
You are very close: all you need is a cast instead of a call of Integer.toString - private static String unmangle(String word) { String newTemp = word.substring(word.indexOf('%')+1); char hex = (char)hexToInt(newTemp); word=word.replace("%", ""); word=word.replace("+", " "); return word = word.replace(newTemp, "")+ hex; }

Categories : Java

How to read ASCII control characters from INI
I think problem at writing to ini. You must use that code writing to ini file : //Writing to ini file string pFileName = "test.ini" string W_Text = "[" + pSection + "]" + Environment.NewLine + pKey + "=" + pDefValue; System.IO.File.WriteAllText(pFileName, W_Text, Encoding.UTF8); You can read from ini file above codes : //Reading from ini file pFileName="test.ini"; IniDocument W_Ini = new IniDocument(pFileName, IniFileType.WindowsStyle); IConfigSource W_Source = new IniConfigSource(W_Ini); return W_Source.Configs[pSection].Get(pKey);

Categories : C#

Characters outside of ASCII are not displayed properly
Specify UTF-8 encoding in your HTML file. Here are some ways. Check if your JavaScript file is really UTF-8-encoded (see also this question).

Categories : Javascript

ASCII characters in ElasticSearch with Tire
The error seems to be coming from Elasticsearch, which trips on the invalid JSON received. In general, Tire handles accented characters in searches just fine: # encoding: UTF-8 require 'tire' s = Tire.search do query { string 'Žluťoučký' } end p s.results You should enable the Tire logging with: Tire.configure { logger STDERR, level: "debug" } or with the Rails logger, to find the offending JSON, debug it, and possibly post more information here.

Categories : Ruby On Rails

T-SQL: Validate nvarchar ASCII characters
You can create a validation function CREATE FUNCTION [dbo].[NVarChar_Validate] ( @@Value [nvarchar](max), @@Min [int], @@Max [int] ) RETURNS [bit] AS BEGIN DECLARE @Index [int] SET @Index = 1 WHILE @Index <= LEN(@@Value) AND UNICODE(SUBSTRING(@@Value, @Index, 1)) BETWEEN @@Min AND @@Max SET @Index = @Index + 1 RETURN CASE WHEN @Index > LEN(@@Value) THEN 1 ELSE 0 END END GO And then add a check constraint to the table, for example: ALTER TABLE [dbo].[TableToControl] ADD CONSTRAINT [CK_NVarChar_Validate] CHECK ( [dbo].[NVarChar_Validate]([FieldToControl], 33, 127) = 1 ) As a result all fields will be guaranteed to have chars from 33 to 127 only.

Categories : Sql Server

How to test for non ASCII characters in a file name
This should probably do the trick: foreach (char c in s) { if (c >= 128) { Response.Write("Non-ascii char detected: {0}", c); } } I believe that Encoding.ASCII.GetBytes converts to ASCII first, so you should never see non-ASCII characters when you call that.

Categories : C#

How can you display Non-Ascii characters in Rails?
I guess you are not interested in this anymore, but this is the solution i found, in case anyone else needs it. value = "allá" value.force_encoding('UTF-8').html_safe With that the controller no longer raises an exception when trying to render the view :)

Categories : Ruby On Rails

Python3 : unescaping non ascii characters
You seem to misunderstand encodings. To be protected against common errors, we usually encode a string when it leaves our application, and decode it when it comes in. Firstly, let's look at the documentation for unicode_escape, which states: Produce[s] a string that is suitable as Unicode literal in Python source code. Here is what you would get from the network or a file that claims its contents are Unicode escaped: b'\u20ac\n' Now, you have to decode this to use it in your app: >>> s = b'\u20ac\n'.decode('unicode_escape') >>> s '€ ' and if you wanted to write it back to, say, a Python source file: with open('/tmp/foo', 'wb') as fh: # binary mode fh.write(b'print("' + s.encode('unicode_escape') + b'")')

Categories : Python

Removing non-ascii characters in a csv file
If you really want to strip it, try: import unicodedata unicodedata.normalize('NFKD', title).encode('ascii','ignore') * WARNING THIS WILL MODIFY YOUR DATA * It attempts to find a close match - i.e. ć -> c Perhaps a better answer is to use unicodecsv instead. ----- EDIT ----- Okay, if you don't care that the data is represented at all, try the following: # If row references a unicode string b.create_from_csv_row(row.encode('ascii', 'ignore')) If row is a collection, not a unicode string, you will need to iterate over the collection to the string level to re-serialize it.

Categories : Python

Get properties of a file whose name contains special (non-ASCII) characters
As you are in windows you should try with ntpath module instead of os.path from ntpath import getmtime As I don't have windows I can't test it. Every os has a different path convention, so, Python provides a specific module for the most common operative systems.

Categories : Python

ANTLR3 does not match extended ASCII characters
ANTLR 3.x through 4.0 can match any UTF-16 code unit except U+FFFF. ANTLR 4.1 will be able to match U+FFFF as well. To match characters in the range U+10000 to U+10FFFF, you'll need to explicitly encode them as UTF-16 surrogate pairs in your grammar.

Categories : Ant

C Programming - ascii for windows "unknown" characters
By console do you mean cmd.exe? It doesn't handle Unicode well, but you can get it to display "ANSI" characters by changing the display font to Lucida Console and changing the code page from "OEM" to "ANSI." By the choice of characters you seem to be Western European, so try giving this command before running your application: chcp 1252 If you want to try your luck with UTF-8 output use chcp 65001 instead.

Categories : C

Regex for a (twitter-like) hashtag that allows non-ASCII characters
#([^#]+)[s,;]* Explanation: This regular expression will search for a # followed by one or more non-# characters, followed by 0 or more spaces, commas or semicolons. var input = "#hasta #mañana #babהַ"; var matches = input.match(/#([^#]+)[s,;]*/g); Result: ["#hasta ", "#mañana ", "#babהַ"] EDIT - Replaced  for word boundary

Categories : Javascript

Is there a variable in python that holds all ASCII characters?
You can make one. ASCII = ''.join(chr(x) for x in range(128)) If you need to check for membership, there are other ways to do it: if c in ASCII: # c is an ASCII character if c <= 'x7f': # c is an ASCII character If you want to check that an entire string is ASCII: def is_ascii(s): """Returns True if a string is ASCII, False otherwise.""" try: s.encode('ASCII') return True except UnicodeEncodeError: return False

Categories : Python

How can I use $.ajax to POST accented characters (eg. > ASCII 127)?
I believe that your example works fine. You've already told the browser to treat this page as UTF-8 in the <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> tag, meaning it will submit the form data as UTF-8. Percent-encoding is irrelevant here as you're sending the character as part of the http body (as multipart/form-data). The characters you see in Chrome's developer tools are a result of the dev tools treating your single-character é as two single-byte characters, rather than as a two-byte UTF-8 character. Your server should properly read the two bytes as a single character, not as two characters like chrome dev tools does. (I just tested your example using a Node.js server, and the server properly interpreted the form body as UTF-8, returning "é", as inpu

Categories : Jquery

Is there a way to construct a RegEx to exclude ASCII characters above or below a certain value?
AFAIK you can only do ranges on sets of characters (a-z, A-Z, 0-9), not ranges of ascii representations . I believe the way to go about this would be to transform the entire string to acii, and split match on (&#[0-9]{2,3};), then test each. Theoretically, you could just write a regex that explicitly matches each and every valid 256 length possibility. Don't think you'd want to though...which begs the question, why regex in the first place?

Categories : C#

How to convert ASCII characters to string in Android?
€ is not an Ascii character. Instead you probably are using Windows-1252 encoding on your source files, or similar. There the € is mapped to a character 128, which in Unicode is a control character. Edit: as it is apparent, that the text is loaded from the internet, and nonmodifiable, and whatnot, then the following code could work: InputStreamReader reader = new InputStreamReader( yourInputStream, "windows-1252"); // or what ever seems to be the correct encoding! StringBuilder builder = new StringBuilder(); while (reader.ready()) { builder.append(reader.read()); } reader.close(); String string = builder.toString();

Categories : Java

Is CharacterEncodingFilter in spring filter Non Ascii characters?
CharacterEncodingFilter sets up character encoding on request and responce, it not filters any input or output, it's filter because it implements http://docs.oracle.com/javaee/5/api/javax/servlet/Filter.html interface.

Categories : Spring

How to output extended ascii characters using Oracle utl_file
You haven't said what your database character set is, and thus whether it's legitimate to have 'extended ascii' (probably 8859-1, with chr(235) in this case) in a string, or if this is just a demo. Either way, I think, your problem is trying to implicitly convert a non-unicode string. ë is code point EB, which is also UTF-8 C3 AB. You're getting separate characters à (code point C3) and « (code point AB). So it can't do a direct translation from chr(235), which is 0x00EB, to U+00EB. It seems to be going via the UTF-8 C3 AB as two separate characters. I'm not going to try to understand exactly why... You can either use the convert function: l_file := utl_file.fopen('OUT', 'a.txt', 'w'); utl_file.put_line(l_file, convert('Rosëttenville', 'WE8ISO8859P1', 'UTF8')); ... or, as use of

Categories : Oracle

Converting lower & upper case ASCII characters
The simplest way is probably to use the % modulus operator: int letter_add = ((input.at(i) - 'a' + cmd_int) % 26) + 'a'; You'll need a symmetrical line for capital letters (or just make the 'a' a variable too).

Categories : C++

JQuery mobile and ASCII characters in input text
I had a similar problem sometime back. I don't think you can rely on simulating keypress (which is what I think you're trying to do). keypress will not happen if its not trusted and specifying what key to simulating might not work on some browsers. But, here's how that's achieved. $("button").click( function() { var e = jQuery.Event("keydown"); e.which = 50; $("textarea").trigger(e); }); But, this doesn't work on Chrome. So i suggest you try injecting the value through code. I setup a fiddle for you for this current situation. Here's the markup <textarea></textarea> <button class="special" data-char="~">Insert tilde</button> and the JS $(".special").click(function () { var char = $(this).data("char"); $("textarea").val(function () { return

Categories : PHP

regular expression to match basic ascii characters
Seems to be be working fine. For a string (containing unicode): "sdfs u2022" // "sdfs •" For matching only ASCII part of the string: "sdfs u2022".match("[\u0000-\u007F]*") // "sdfs " But if you need to check that the string is composed of only ASCII: "sdfs u2022".match("^[\u0000-\u007F]*$") // null For a string (not containing unicode): "sdfs ".match("^[\u0000-\u007F]*$") // "sdfs "

Categories : Javascript

SQL Full Text Indexing, ASCII control characters
I think I figured out the issue. On investigating the full text crawl log I found the database size was reached (it is express edition). After doing some clean up all the rows are being returned properly. The link that helped me in troubleshooting: http://technet.microsoft.com/en-us/library/ms142495(v=sql.105).aspx

Categories : Sql Server

Converting byte array containing ASCII characters to a String
Use new String(myByteArray, "UTF-8"); String class provides a constructor for this. Side note:The second argument here is the CharSet(byte encoding) which should be handled carefully. More here.

Categories : Java

How to match spaces and ascii characters in .htaccess file
First of all i wouldn't use any ascii characters in my url. Maybe try trimming them so you have thelma-louise or thelma+louise. But thats my personal experience. Second, you rewrite to your ID with the name of the movie. Can't you do it like this: movie/([0-9a-zA-Z-]+)-([0-9]+) movie.php?id=$2 so its looks something like movie/thelma-louise-101 Lots of movies have the same name. And now you know that IDS are at least an INT. Don't forget to check in PHP offcourse.

Categories : PHP

Python: Replace non ascii characters in a list of strings
>>> mylist = ["apple", "samsung", "toshiba", "Don’t know", "Can’t recall"] >>> [item.replace('xe2x80x99',"'") for item in mylist] ['apple', 'samsung', 'toshiba', "Don't know", "Can't recall"] If all the items are already unicode: >>> mylist = [u"apple", u"samsung", u"toshiba", u"Don’t know", u"Can’t recall"] >>> [item.replace(u'’',u"'") for item in mylist] [u'apple', u'samsung', u'toshiba', u"Don't know", u"Can't recall"]

Categories : Python

Send string with non-ascii characters out serial port c#
Can't be certain, but I'll bet your device is expecting 1-byte chars, but the C# char is 2 bytes. Try converting your string into a byte array with Encoding.ASCII.GetBytes(). You'll probably also need to return the byte[] array instead of a string, since you'll end up converting it back to 2 byte chars. using System.Text; // ... public static byte[] EmbedDataInString(string Cmd, byte Data) { byte[] ConvertedToByteArray = new byte[Cmd.Length + 2]; System.Buffer.BlockCopy(Encoding.ASCII.GetBytes(Cmd), 0, ConvertedToByteArray, 0, ConvertedToByteArray.Length - 2); ConvertedToByteArray[ConvertedToByteArray.Length - 2] = Data; /*Add on null terminator*/ ConvertedToByteArray[ConvertedToByteArray.Length - 1] = (byte)0x00; return ConvertedToByteArray; } If your dev

Categories : C#

Mysql replace all special unicode characters with their ascii counterpart
An sql fiddle example is at http://www.sqlfiddle.com/#!2/c1d90/1/0 the query to select is select * from test where maintext rlike '[^x00-x7F]' Hope this helps

Categories : Mysql

Converting WAV file audio input into plain ASCII characters
ASCII characters like Az09 are only a portion of the ASCII Table. WAV files like any other file is stored and accessed in bytes. 1 byte has 256 different values. Therefore one can't simply convert bytes into Az09 since there are not enough Az09 characters. You'll have to find a library which opens WAV files and creates the wave format for you. In relation to the wave's intensity and length, a chain of Az or Az09 characters can be produced. I believe you're trying to convert the wave to a series of notes. That's possible too, using the same approach.

Categories : C++

Iterating through const unsigned char * containing "Extended ASCII" characters
Rather than using a TEXT column, I would recommend using a BLOB column where the data contains an array of integers of whatever size you want to use (perhaps 16-bit unsigned). You can use sqlite_column_bytes() to determine the size of the column, allowing for variable-length columns to be used. This will avoid the complexity you are currently facing.

Categories : IOS

pass no-ascii characters to the tr() internationalization method of PyQt4 with python3
You need to use trUtf8(). tr() is for ASCII strings. (its signature is const char * sourceText, ...) QDialog.trUtf8(QObject(), 'abcγδε')

Categories : Python

Why Xcode C compiler does not display properly some of the ASCII table characters?
The ASCII character 3 is a control character called "end of text." It does not stand for a hearts symbols. You may think that it does because the PC console generated a heart-suit symbol when a program tried to print that character. There's no reason why a modern system should follow the same convention, although the console emulator of Windows cmd.exe might still do it. If you want to output a heart-suit symbol in a modern environment you should use Unicode, for example: printf("%s", "u2665");

Categories : C

Emacs lisp: Translate characters to standard ASCII transcription
There is no built-in capability that i know of. I wrote a package unidecode specifically for your task. It uses the same approach as in Python's same-named library. To install just add MELPA repository to your repository list: (add-to-list 'package-archives '("melpa" . "http://melpa.milkbox.net/packages/") t) Then run M-x package-install RET unidecode. unidecode has 2 functions, unidecode-unidecode that turns Unicode into ASCII, and unidecode-sanitize that discards non-alphanumeric characters and transforms space into hyphen. ELISP> (unidecode-unidecode "¡Hola!, Grüß Gott, Hyvää päivää, Tere õhtust, Bonġu Cześć!, Dobrý den, Здравствуйте!, Γειά σας, გამარჯობა") "!Hola!, Gruss Gott, Hyvaa paivaa, Tere ohtust, Bongu Czesc!, Dobry den,

Categories : Emacs

Perl beginner: How can I find/replace ascii characters in a file?
Use the x## notation: perl -pi~ -e 's/x00/*/g' test.txt To replace each "special" character with its code in brackets, use the /e option: perl -pi~ -e 's/([x0-x09x11-x1f])/"[" . ord($1) . "]"/eg' test.txt

Categories : Perl

Replace non-ASCII characters with SGML entity codes with Emacs
I searched high and low but it seems Emacs (or at least version 24.3.1) doesn't have such a function. Nor can I find it somewhere. Based on a similar (but different) function I did find, I implemented it myself: (require 'cl) (defun html-nonascii-to-entities (string) "Replace any non-ascii characters with HTML (actually SGML) entity codes." (mapconcat #'(lambda (char) (case char (t (if (and (<= 8 char) (<= char 126)) (char-to-string char) (format "&#%02d;" char))))) string "")) (defun html-nonascii-to-entities-region (region-begin region-end) "Replace any non-ascii characters with HTML (actually SGML) entity codes." (interactive "r") (save-excursion (let ((escaped (html-non

Categories : HTML

Printing Extended-Ascii Characters In Python 3 In Both Windows and Linux
There must be a better way, but how about something like this: dic = { '\' : b'xe2x95x9a', '-' : b'xe2x95x90', '/' : b'xe2x95x9d', '|' : b'xe2x95x91', '+' : b'xe2x95x94', '%' : b'xe2x95x97', } def decode(x): return (''.join(dic.get(i, i.encode('utf-8')).decode('utf-8') for i in x)) print(decode('+------------------------------------%')) print(decode('| Hello World! |')) print(decode('\------------------------------------/')) Windows: C:Temp>python temp.py ╔════════════════════════════════════╗ ║ Hello World! ║ ╚════════════════════════════════════╝ Linux: $ python3 temp.py

Categories : Python



© Copyright 2017 w3hello.com Publishing Limited. All rights reserved.