I think it has to do with the Character-Encoding. Maybe Chinese filenames will directly match to be UTF-8 or some other chinese encoding.
How do you use fread? do you read bytes or characters?
I have this little problem with reading GIF files. Now I'm trying to determine if a specific file is a GIF or not, I simply open the file in binary (fopen) and read the first few bytes (fread) which must contain the characters "GIF" for this file to a valid GIF file. This is exactly what I get with an "english" filename. However if that same file is renamed to a "chinese" name, the very same first bytes are completely different! What I mean is the following:
Reading the english GIF file yields the following for the first 3 bytes:
71 73 70 (equivalent to the GIF characters in unicode, 0x47 0x49 0x46 in hex)
Reading the very same GIF file but with a chinese filename yields the following for the first 3 bytes:
228 99 24 (equivalent to something meaningless)
I can't seem to find a connection between the two, not even a reason for these values to look like that cause with vi, opening both files I can see GIF at the top but not when I use fread. The only way I can think of round this is to copy the chinese file somewhere else renaming it to something english (that would give me the GIF characters) but that's way too expensive... Any ideas?
I think it has to do with the Character-Encoding. Maybe Chinese filenames will directly match to be UTF-8 or some other chinese encoding.
How do you use fread? do you read bytes or characters?
I read bytes. You mean the OS itself is responsible for matching any chinese file (with a chinese filename) to its appropriate encoding? I mean I did rename an english gif file to chinese and that definitely didn't change the data in the file itself, yet I get the "gif" characters when I fread the english named file but not the chinese one. However how come "vi", an editor under linux, displays the "gif" when used to open both files?Code:FILE* fd = fopen(aPath, "rb"); char* tBuffer; tBuffer = (char*)malloc(4); // copy the bytes into the buffer. fread(tBuffer, 1, 4, fd);
I didn't programm C for many years now. But can't you use a byte-array for reading data? You read chars! Maybe the compiler (or OS) is "intelligent" enough to use UTF (or any other encoding) automatically when reading char from a file.
Even there is no real definition of what is a char in C. It maybe 8Bit, but could also be 32Bit, depending on the compiler and OS.
I think the only definition is byte <= word <= int <= long and char???.
So here are some ideas:
- use byte instead of char
- use malloc(4*sizeof(byte))
- print the values of the read bytes
- print the value of sizeof(char) // maybe it is not 1
if all that does not help you could send a more complete code.
Indeed! I took C's char as 8bit for granted and that was exactly the problem; it isn't. Instead I tried reading actual bytes and now it works. I always thought char was 8bit... Thanks for the tip!