Auto Simplified-Traditional Chinese Conversion in Search #8398

Closed
opened 2025-12-29 20:36:13 +01:00 by adam · 5 comments
Owner

Originally created by @frnksmdlkedjnfr on GitHub (Aug 2, 2023).

NetBox version

c6abd184c2b3 (v3.5.7)

Feature type

Data model extension

Proposed functionality

Regarding variants (alternative writing & simplification) of Chinese characters as the same when searching.

Use case

This might be a niche request, but it's a real pain for people like me working in a cross-country company. We use traditional Chinese and simplified Chinese at the same time, because we have employees from China, Hong Kong, and Taiwan. Though all three of these regions use Chinese, they all have their own unique standards of the writing system, much like the difference of the orthography of American English and British English.

Examples:
CN / HK / TW
卫 / 衞 / 衛
里 / 裏 / 裡

Technically all the three characters mean the same thing. They're just seen as the standard respectively. Native speakers can generally read the different variants with no problem, but when it comes to searching it becomes a serious problem—they have totally different Unicode codings.

So is it possible to regard these variants as the same character when searching? Supporting materials are easily available on the Internet (1, 2) I know it's very challenging for a non-native speaker to grasp the concept and struggle here, just think of this as trying to get 'authorisation' with 'authorization'. Or can you suggest some ways to modifying the code by myself?

Database changes

None.

External dependencies

None.

Originally created by @frnksmdlkedjnfr on GitHub (Aug 2, 2023). ### NetBox version c6abd184c2b3 (v3.5.7) ### Feature type Data model extension ### Proposed functionality Regarding variants (alternative writing & simplification) of Chinese characters as the same when searching. ### Use case This might be a niche request, but it's a real pain for people like me working in a cross-country company. We use traditional Chinese and simplified Chinese at the same time, because we have employees from China, Hong Kong, and Taiwan. Though all three of these regions use Chinese, they all have their own unique standards of the writing system, much like the difference of the orthography of American English and British English. Examples: CN / HK / TW 卫 / 衞 / 衛 里 / 裏 / 裡 Technically all the three characters mean the same thing. They're just seen as the standard respectively. Native speakers can generally read the different variants with no problem, but when it comes to searching it becomes a serious problem—they have totally different Unicode codings. So is it possible to regard these variants as the same character when searching? Supporting materials are easily available on the Internet ([1](https://zh.wikipedia.org/zh-hant/%E7%B0%A1%E7%B9%81%E8%BD%89%E6%8F%9B%E4%B8%80%E5%B0%8D%E5%A4%9A%E5%88%97%E8%A1%A8), [2](https://zh.wikisource.org/zh-hant/%E7%AC%AC%E4%B8%80%E6%89%B9%E5%BC%82%E4%BD%93%E5%AD%97%E6%95%B4%E7%90%86%E8%A1%A8)) I know it's very challenging for a non-native speaker to grasp the concept and struggle here, just think of this as trying to get 'authorisation' with 'authorization'. Or can you suggest some ways to modifying the code by myself? ### Database changes None. ### External dependencies None.
adam added the type: featuretopic: Internationalization labels 2025-12-29 20:36:13 +01:00
adam closed this issue 2025-12-29 20:36:14 +01:00
Author
Owner

@jsenecal commented on GitHub (Aug 2, 2023):

Honestly, while I do understand the challenge here, I see little value in implementing such functionality (and supporting it) due to its nature. I'm not even sure there is an easy way to support this.

However! You can potentially use custom fields to support this, as they are searchable.
A text field, say "Alternate Typography" could contain the various variants and be searched for.

Let me know if that makes sense @frnksmdlkedjnfr

@jsenecal commented on GitHub (Aug 2, 2023): Honestly, while I do understand the challenge here, I see little value in implementing such functionality (and supporting it) due to its nature. I'm not even sure there is an easy way to support this. However! You can potentially use custom fields to support this, as they are searchable. A text field, say "Alternate Typography" could contain the various variants and be searched for. Let me know if that makes sense @frnksmdlkedjnfr
Author
Owner

@frnksmdlkedjnfr commented on GitHub (Aug 3, 2023):

Thanks for the reply. The problem with custom fields is that I already have many custom fields for my assets, and they contain variant characters already.

@frnksmdlkedjnfr commented on GitHub (Aug 3, 2023): Thanks for the reply. The problem with custom fields is that I already have many custom fields for my assets, and they contain variant characters already.
Author
Owner

@jsenecal commented on GitHub (Aug 3, 2023):

and they contain variant characters already.

Its not working for you ? Perhaps there is a different issue here...

@jsenecal commented on GitHub (Aug 3, 2023): > and they contain variant characters already. Its not working for you ? Perhaps there is a different issue here...
Author
Owner

@frnksmdlkedjnfr commented on GitHub (Aug 10, 2023):

and they contain variant characters already.

Its not working for you ? Perhaps there is a different issue here...

It's not that it's not working. The problem is that these variant characters are everywhere. They're in the names, in the descriptions, in the comments, and in custom fields I've already created.

Just imagine there's a circuit named 'British Gaol - American Gaol', and the description goes 'The network centre circuit transmitting prisoners to be transferred by aeroplanes'. A circuit created by a British person obviously.

Then an American user is trying to search this circuit with 'British Jail - American Jail', and he fails because of the different orthography. Then he tries to get it by searching 'network center' and fails again, because 'center' is spelt as 'centre' in the UK.

It's not feasible to create an custom all-information-in-one field that contains the enumeration of the possible spellings. One like 'Orthography Proof Field': 'British Gaol - American Gaol|British Jail - American Jail: The network centre circuit transmitting prisoners to be transferred by aeroplanes|The network center circuit transmitting prisoners to be transferred by airplanes'.

And even worse, for Chinese there're technically up to 4 variants of the same character. I cannot fathom the amount of combinations.

One way that might tackle the problem is, when a character that has variants is in the search input, instead of just returning the results for this one input, return the results of all possible combinations of variant characters.

Example:

  1. Input: 卫生部专线 (Simplified Chinese, meaning dedicated line for the health dept.)

  2. All possible variant combinations: 卫[衞, 衛]生部专[專]线[綫,線]

  3. Enumeration (18 in total):

[卫生部专线, 衞生部专线, 衛生部专线,
卫生部专線, 衞生部专線, 衛生部专線,
卫生部专綫, 衞生部专綫, 衛生部专綫,
卫生部專线, 衞生部專线, 衛生部專线,
卫生部專線, 衞生部專線, 衛生部專線,
卫生部專綫, 衞生部專綫, 衛生部專綫,] 
  1. Output:
    (search results for all 18 combinations)

Just I was writing this, I thought of another solution that might be a lot more easier: search with Pinyin (basically the romanisation of Chinese characters). No matter how many variants a character has, they are all pronounced the same (except for few exceptions like '校', which can be pronounced as 'xiào' and 'jiào').

So if the pronunciation of these variant character are all the same, why not just treat Chinese characters are Pinyin in the search process?

In the example I gave above, all 18 combinations can be broadly transcribed (without the tone markers) as 'WeiShengBuZhuanXian'. So when the user tries to search any of the 18 combinations, they could all be first transcribed into 'WeiShengBuZhuanXian', and then we try to match that transliteration with the assets' data.

But this entails another problem: all the assets have to be 'compressed' together into a huge chunk of text, and then transcribed into Pinyin. Otherwise nothing would match the user input.

So I guess ultimately we need to implement an auto-transliteration function when the user searches with Chinese and assets with Chinese are created. I know it's of little value to you developers, but would you provide some guidance as to how to code this myself?

@frnksmdlkedjnfr commented on GitHub (Aug 10, 2023): > > and they contain variant characters already. > > Its not working for you ? Perhaps there is a different issue here... It's not that it's not working. The problem is that these variant characters are everywhere. They're in the names, in the descriptions, in the comments, and in custom fields I've already created. Just imagine there's a circuit named `'British Gaol - American Gaol'`, and the description goes `'The network centre circuit transmitting prisoners to be transferred by aeroplanes'`. A circuit created by a British person obviously. Then an American user is trying to search this circuit with `'British Jail - American Jail'`, and he fails because of the different orthography. Then he tries to get it by searching `'network center'` and fails again, because 'center' is spelt as `'centre'` in the UK. It's not feasible to create an custom all-information-in-one field that contains the enumeration of the possible spellings. One like `'Orthography Proof Field': 'British Gaol - American Gaol|British Jail - American Jail: The network centre circuit transmitting prisoners to be transferred by aeroplanes|The network center circuit transmitting prisoners to be transferred by airplanes'`. And even worse, for Chinese there're technically up to 4 variants of the same character. I cannot fathom the amount of combinations. One way that might tackle the problem is, when a character that has variants is in the search input, instead of just returning the results for this one input, return the results of all possible combinations of variant characters. **Example:** 1. Input: 卫生部专线 (Simplified Chinese, meaning dedicated line for the health dept.) 2. All possible variant combinations: 卫`[衞, 衛]`生部专`[專]`线`[綫,線]` 3. Enumeration (18 in total): ``` [卫生部专线, 衞生部专线, 衛生部专线, 卫生部专線, 衞生部专線, 衛生部专線, 卫生部专綫, 衞生部专綫, 衛生部专綫, 卫生部專线, 衞生部專线, 衛生部專线, 卫生部專線, 衞生部專線, 衛生部專線, 卫生部專綫, 衞生部專綫, 衛生部專綫,] ``` 4. Output: `(search results for all 18 combinations)` Just I was writing this, I thought of another solution that might be a lot more easier: **search with Pinyin** (basically the romanisation of Chinese characters). No matter how many variants a character has, they are all pronounced the same (except for few exceptions like `'校'`, which can be pronounced as `'xiào'` and `'jiào'`). So if the pronunciation of these variant character are all the same, why not just treat Chinese characters are Pinyin in the search process? In the example I gave above, all 18 combinations can be broadly transcribed (without the tone markers) as `'WeiShengBuZhuanXian'`. So when the user tries to search any of the 18 combinations, they could all be first transcribed into `'WeiShengBuZhuanXian'`, and then we try to match that transliteration with the assets' data. But this entails another problem: all the assets have to be 'compressed' together into a huge chunk of text, and then transcribed into Pinyin. Otherwise nothing would match the user input. So I guess ultimately we need to **implement an auto-transliteration function** when the user searches with Chinese and assets with Chinese are created. I know it's of little value to you developers, but would you provide some guidance as to how to code this myself?
Author
Owner

@jeremystretch commented on GitHub (Aug 10, 2023):

I'm sorry but this doesn't seem like something we'll be able to commit resources to working on, especially given how few developers actively contribute to NetBox. If you'd like to experiment with an implementation yourself and come up with a workable solution we'll be happy to take a look. Short of that this isn't something we'll be able to support.

@jeremystretch commented on GitHub (Aug 10, 2023): I'm sorry but this doesn't seem like something we'll be able to commit resources to working on, especially given how few developers actively contribute to NetBox. If you'd like to experiment with an implementation yourself and come up with a workable solution we'll be happy to take a look. Short of that this isn't something we'll be able to support.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/netbox#8398