Scripted text processing

When working on the last project at work, my tasks were often defined poorly. I had to make web services that would accept complicated data structures, but I only had one example file to go by. The file got updated from time to time, but never came as a valid XML – some can hardly be convinced you can’t just write strings instead of numbers or put a “-” anywhere in the file and expect it to be valid. So I had to process every file by hand. And by “hand”, I of course mean regexes or Python, just as any geek would do when faced with an arduous task.

Geeks vs Non-geeks doing repetitive tasks

I should perhaps note that I do have an exceptionally low ‘boredom’ threshold. Started scripting right away.

Since the mistakes were always the same, I would have liked to be able to load the input file into Notepad++, run a script and be done with it. I wanted a tool, ideally Notepad++, that would enable me to write a script that takes the currently opened file, processes it and spits it back out. Convinced someone must have needed that before me, I tried asking on SuperUser, without success.

After this particular project was finished, I found that, of course, there actually is a plugin that does exactly what I need. It is, shockingly, called Python Script, and provides a neat way to add new scripts and execute them, directly from the menu.

Python Script menu

For a quick example, consider you want to shuffle the letters inside words, like in this famous piece of text:

It dseno’t mtaetr in waht oerdr the ltteres in a wrod are, the olny iproamtnt tihng is taht the frsit and lsat ltteer be in the rghit pclae.

To simplify, we expect the resultant words to be separated by a single space, and we won’t cover the corner cases.

Click Pyhton Script | New Script and call it A new file is created and saved in the plugin’s script folder, and is immediately available from the Scripts submenu.

So now, when we start with something like

Fuzzy sheep are great companion animals

we can get to

 Fzuzy sehep are geart comianopn amianls

in just one click. This simplified version would break on other whitespace characters, punctuation and more, but it proves the point.

Another great extension is PyNPP, which can run the currently opened Python script directly, or in interactive mode, which is great for quick writing and debugging. Both of these plugins are available directly from the NPP Plugin Manager.

So, all in all, I now have a great toolset, which I’ll hopefully never need again. :)

This entry was posted in it and tagged , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">