Pipes! Fatbergs! Chemistry!

Hey all,

One of my freetime interests is in pipes. When I was a kid, like maybe 5, my father was an engineer working on the country’s nuclear power plant. This had to do with a lot of printing and so where’s other 5 year olds were colouring pictures of animals, I was colouring cooling systems diagrams. I blame this for my fascination with pipes. The odds of someone else sharing this interest are probably exactly zero, but you never know right?

So, let’s start with something epic. How can sewage be epic you ask? Well, like this:

…this tank is capable of containing 13 giga-litres of sewage (13000000000 l). It fills up when it’s raining (and there is snow melt), so that’s why the camera is wet. Here’s a construction phase picture of the pipes that lead into it:

Feel sorry for the people who live in that town tho. That much sewage can’t smell nice no matter how well executed the containment is.

The truth is, as counter-intuitive as that may sound, sewage and water don’t mix. From the microbiological perspective, water makes it difficult for oxygen to reach the organic compounds and so their degradation slows, or the sewage becomes infested with microbes that do not use oxygen, producing a foul smell. Thus, as much as I admire the americans that built this fancy giga-litre storage tank, it’s fundamentally a bad idea. I do understand that they have no other way to really solve the issue though. It’s not like people are going to give up flush toilets to help them do it.

 

One of the more easier to get to videos about the sewers is this stuff pushed by the BBC:

What you have here is people who have no idea what they are talking about, sounding authoritative, because that’s just the way the BBC rolls. That stuff is not fat, it’s calcium soap.

Some searching online, will lead you to several research articles, there’s actually only a few (unsurprisingly not many scientists are interested in what goes on in the sewers). What happens is when sewage travels along the sewage network too far, it begins to decompose on route to the treatment plant. The microbes growing in it start to produce sulphuric acid, which eats away at the concrete pipes that the sewage is flowing through. This causes calcium and similar minerals to leech from the concrete, which when they come into contact with fresh water and fats from buildings further down the network, produces this calcium soap, which unlike normal soap is not water soluble and deposits in the pipes.

But, so you may say, the BBC is still correct in saying that this is because people are dumping oils into the sewers? Well, not really. Fat in the sewers is not just dumped oil, it’s also basically anything you use soap or dish soap for. Research shows that normal domestic outputs like sinks and showers from a single skyscraper, contribute enough fat to create a problem. The grease interceptors that they have been promoting as a result of this fatberg problem (which are a good idea by the way, EU requires them to be installed in basically every parking lot), actually lengthen the amount of time that the fats spend in the water and increases the amount of problematic compounds in the sewage.

In other words, carrying fats is the basic and unavoidable function of sewers, and the BBC is fooling you into believing it’s your fault that the sewers are incorrectly designed. But oh their glorious engineers from the 60s’! Infallible!

 

Didn’t really want to turn this into a rant, but hey it’s my blog! Deal with it!

Just kidding. Hope you learned something. Anything you may be wondering about, feel free to drop a comment.

LP,
Jure

Imapsync on Ubuntu 18.04

Hi,

Imapsync, as you may know, is a tool for copying / transferring / backing up email accounts between two IMAP servers.

I found it interesting that while there are some blogs out there that claim to contain the instructions how to set this up, none of them are actually correct. The original author of imapsync has since removed all traces of any pre-compiled binaries in an effort to focus people on his paid Imapsync service. The only way you can use Imapsync for free is if you compile it yourself. You will need a Linux machine for this.

If you don’t want the hassle of all of this and just want to transfer or backup your mail, go spend 60 €. The author deserves it. If not however, follow the instructions below:

First, make sure you have the tools to be able to follow these instructions:

sudo apt install git make cpanminus

Now open a terminal and download the Imapsync package:

git clone https://github.com/imapsync/imapsync.git

This creates a folder called “imapsync”, go into it and run the install script:

cd imapsync
sudo make install

This will run a long process, which will at the end tell you exactly what you need to do. In my case it looks like this:

Here is a cpanm command to install missing Perl modules:
cpanm App::cpanminus Authen::NTLM Crypt::OpenSSL::RSA Data::Uniqid Dist::CheckConflicts IO::Tee JSON::WebToken JSON::WebToken::Crypt::RSA Mail::IMAPClient Module::ScanDeps PAR::Packer Parse::RecDescent Readonly Regexp::Common Sys::MemInfo Test::Mock::Guard Test::MockObject Test::Pod Test::Requires Test::Deep Test::Warn Unicode::String

Run the command you are given (do not copy mine), with sudo.

Some of them will most probably fail to install. This is because they depend on system libraries that must be installed with apt. You will most likely need:

sudo apt install libssl-dev libpar-packer-perl

If anything else fails to install, google it. Perl is extremely widespread and instructions are very easy to come by.

To see what else is missing you can re-run this at any time:

sudo make install

Repeat these steps until this no longer yields any errors. At this point your installation is ready and you can start using it.

Before you use it, please be sure to at least glance the very useful “FAQ” documentation! Most significantly, if you are copying from or to GMail, I highly recommend using the  –gmail# switch as appropriate and as documented here. This will take care of all of GMail’s quirks, many of which you otherwise need to consider for a successful sync.

 

My experience with the other installation instructions was that it could result in a slightly broken install that caused Imapsync to use a lot of CPU (100%) and work at the rather minimal speed of about 0.01 messages per second. The other procedures also trash up the system with various libraries that are going to be set as manually installed and will not be automatically removed should you ever choose to uninstall some of this stuff.

In any event, if you have a few gigabytes of emails it will take a few hours so either run it in a server in the background or leave it alone to work. Imapsync will write a log even if you don’t tell it to, it’ll be placed in the “LOG_imapsync” folder. You can interrupt the proces at any time and if you run the command again later it will be able to resume.

 

My blogs usually come with explanations on the hows and whys. In this case I figured people would abhor having to read through that stuff, so I’m putting it at the end.

IMAP is heavily server centric. If your email client detects as connection to an IMAP server that does not contain the messages in your inbox, it will delete all your local copies immediately. This makes any client-side backups of IMAP mail rather unreliable. The only real way to back up your email is to copy it to another IMAP server.

Short of various questionable solutions such as setting up two email accounts and copying emails between them (which is time-consuming and can fail completely), the only real option is to use something like Imapsync. This will preserve the email unique IDs and ensure that the copying process does not create duplicates. It will also synchronise other information such as which emails have been read and stuff like that.

I think that’s all there is to it. I will update this blog if I remember anything else that’s important.

Good luck.

LP,
Jure

Artificial Intelligence

Hi,

Artificial Intelligence has been one of those promising things that people have been talking about for some time.

I always somehow screw up the serious tone of my English tech blogs by dwelling too deep on some AI-related topic. The truth is, AI has been an interest of mine for a long time (those of you that are good with a search engine will be able to find my contributions to comp.ai.philosophy from about 20+ years ago) and I still have some ideas about it that I wish I had time to put into code.

There is a lot of mysticism online related to artificial intelligence. For a lot of people it’s little more than a science fiction level fascination and you can immediately tell this is the case based on their persistent and senseless recycling of Asimov’s laws, which originate from 60’s science fiction and are not applicable and never will be applicable to any real-world software program. I am not one of those people. I view AI from the perspective of a software developer and there is no place in my understanding of AI, for overly vague abstractions that are made up and have no translation in real life machine code.

It is probably no secret that I work as in IT. Few lines of work make it as obvious, that there are routine tasks that machines are good at — and creative tasks that machines suck at and need a fallible organic operator to get them done. Such tasks are a core element of maintenance of any large scale system in use today. The goal of creating AI is automating the latter in a way that machines can understand the problems and solve them creatively.

Today’s AI is not quite there yet. I mean, today’s AI is far more than anything dreamed up in the 60s. Big companies like Google and Facebook figured out long ago that humans are not very good at understanding their role in social groups, when the number of people exceeds about 100 members. Google’s AI is a hivemind superintelligence that connects people using automation and… figures out what news items and Youtube videos you’ll be interested in. I have no doubt that some of you who use their services realised by now that it seems to figure out what you really needed in a day or so. Somehow.

But that AI is still not quite the understanding problem solver I was looking for, for my IT jobs. That being said, I do believe that we have the technology to make such an AI a reality. Presently, the problem is mainly that nobody has yet figured out how to make money from constructing a real AI. I don’t know if there is a business model to support the creation of a true AI, other than aiming to make a company with the explicit intention of inflating enough buzzwords to end up being brought out by Google and earning a fortune that way. Jokes aside, I think if there was commercial interest in creating a true AI, we’d have one by now.

My reasoning is primarily that.. I think I know how to make one. And since Randall manages to implement things I’ve thought about into comics so well, I think hundreds of other people just like us must be thinking about the same solutions as well. This means there probably already is a critical mass of developers out there who could make an AI if they put their minds to it.

The most promising piece of code that I have seen thus far was a trivial chatterbot created by a friend several years ago, who had intended to make it capable of learning a language. He allowed the bot to create a database of words in which words were no longer mere words, but abstractions. If you put the idea that the database has to make sense to you as the developer out of your head, you would have been able to see that the bot demonstrated some kind of rudimentary “true understanding” of the meaning behind the words. I think this concept, expanded to relate to observations and reenactments more functional than mere words, could be an AI capable of understanding a problem, which could then potentially be wired to try to solve said problem, based on observed solutions. And potentially it could be made capable of linking up simpler solutions into more complex ones, in other words: It could be made to solve problems independently.

The point that this does not seem impossible is… fascinating. I have been toying with the idea of simply sacrificing some of my free time to work on such an AI, on and off for the last few years. I want to work on it but… I’m not enough of a hardware robots guy to be capable of putting such an AI into a practical environment it could learn from as humans do (by connecting words to things we see and experience). I also lack certain skills in maths, which would help me make things like efficient poly-dimensional searches, which are required for efficient processing of learned abstractions from multiple aggregated inputs.

What got me writing this post was the idea I had last, that I might after all be able to create an AI that can creatively solve networking problems. Networking as it so happens is pretty native to computers and it would not be outside my skill set to set it up like that. I’d still face problems like how to set up an environment in such a way that problem solving would actually be an advantage over some more brute force approach. But at the very least it’s something where solutions could be taught and then demonstrated. There is potential.

I’m still not really sure why I write these blogs, given the likelihood that nobody will ever read them. Still, if other people are thinking as I am on this, there is a nonzero chance that someone will find some advantage or encouragement in reading what my thoughts on the subject were, I suppose.

Good luck. 🙂

LP,
Jure

Key lime pie Internet “mystery”

Hi,

So the time has come around again where the Internet has reminded me of the Key Lime Pie Internet Mystery. You’ve likely found it on Reddit.

It relates to a phenomena of SPAM comments on the Internet, on random websites, seemingly about key lime pie (pictured), sentences eventually devolving into pornographic proportions of nonsense. The thing is, that while a spambot may be to blame, it’s difficult to explain why they would be advertising pie of all things, and why they would keep this up for over a decade.

I’ve investigated this some time ago and found it to be an encryption scheme, probably used for some deep web style illegal internet based exchange of messages. I believe a Stargate episode long ago implied that the CIA uses this sort of thing as well on occasion. Who knows what the reality is, but it’s true that things are often best hidden in plain sight, as demonstrated by the number of people and amount of time wasted on this mystery, yielding little or nothing public.

What frustrates me is that despite some people knowing the nature of this stuff, it keeps being perpetuated as a “mystery” — because of course something instantly ceases to be interesting as soon as you have explained it. So people just don’t get to know the truth. I think even my reply to the Reddit got taken down. Well, I can try and fix it again on here and hopefully someone finds this page in their research.

 

So, what is the key to understanding the nonsensical messages regarding key lime pie? It is an encryption technique called steganography.

You may have noticed that the comments regarding key lime pie do not look like code. They look like at least semi-sensible sentences. This is the key component of steganography, like the wiki says: “Steganography is the practice of concealing a file, message, image, or video within another file, message, image, or video”. In other words, the hidden messages are hidden within these sentences. They are encoded by the means of… the order of words, the length of particular words, the punctuation and similar. Things that a human reader might easily glance over while reading a message.

After all, we all know what SPAM looks like. We know bots tend to include semi-sensible text into their messages to defeat automated anti-SPAM protection. These types of messages are normal… right? Nobody would suspect them to hold meaning. As said however, the meaning is not in the text itself, it’s in the things we assume are random.

One of the downsides of steganography, like every other form of encryption, is that besides the message itself, it also requires an encryption key of some kind. Let’s take an easier example, an ancient roman syctale:

The use of the encryption device isn’t difficult to figure out form just looking at it. You take a stick, wrap a strap of paper on it and then write your message horizontally across the strip. When you unwrap your paper strap, the letters will appear to be giberish until you wrap them around the same kind of stick again. In this case, the strap of paper is the encrypted message and the width of the stick is the key. Without a stick of the same width, the message does not come together again.

Giving every commander the same kind of wooden stick might have worked well in ancient times, before the invention of the ruler and standardized measuring units. But these days, it would take very little time to figure out and decode. So modern encryption schemes use  a key that keeps changing.

 

Back to the key lime pie example, the SPAM comments posted in various internet sites are clearly the encrypted messages, but what is the key? Where would a group of people want to post a key that keeps changing over time, that would be anonymously accessible to everyone, but arouse no suspicion? The answer is Facebook.

Of course, because in this case the encryption scheme is steganography, the key is the original text which is modulated to generate the encrypted messages. In layman’s terms, to encrypt a message, you could take an original message which is used as a key, split it down to segments of different length, then make each segment represent a letter of your message. Put the gibberish back together in the order of your characters in your message and you’ve successfully encrypted your hidden message.

Let’s try doing that together. Let’s take the latest Facebook post by our friend Jake Carson, and paste it into a spreadsheet. Then, let’s take each line and assign a character to it. I’m going with the english alphabet, plus a space and I’ve skipped the lines that are only dots. I’ve trimmed off the remaining messages. We end up with a key table like this.

Now let’s encode our message with it. I’m going with “hello world”:

..And We Hate To Sound Like A Broken Record But Here Is A Key Lime Pie For Our Buddy “Maurice White”, Founder Of The Great Group “Earth, Wind and Fire”!..Rest In Peace Dude.…Can’t Get Enough Of That Key Lime Pie, Key Lime Pie, Key Lime Pie. Can’t Get Enough Of That Key Lime Pie Or I’ll Just Cry Until I Die, I Don’t Know Why I Just Love My Key Lime Pies!….are so wild about him and his Famous Cheese Burgers and Key Lime Pies, are so wild about him and his Famous Cheese Burgers and Key Lime Pies,His Drop Dead Gorgeous Wife “Miss Anita” together in they’re Historic Key..Miss Anita And ’Chef ‘Captain Kutchie Pelaez’s Key West-Kutcharitaville Key Lime Pie Factory And Cafe’, “Where Eating Is A Pleasure And Cooking Is An Art”….. Hell, “Chef Kutchie Pelaez” Has More Talent In His Toe-Nail Clippings Than All The Others Have In Their Entire Bodies!..Figure!!!!!!!….His Drop Dead Gorgeous Wife “Miss Anita” together in they’re Historic Key Your Time in They’re Little “Key West Island” near the Biltmore Estate are so wild about him and his Famous Cheese Burgers and Key Lime Pies, …Kobe Bryant May Be Retiring From Basket Ball But Captain Kutchie’s Is Still His Pie Of Choice!…

Does this read as something familiar? The reason why some of these texts are all capitals whereas some of them aren’t is mainly because if you check the Facebook post, the segments added in the later posts do not have all capital letters.

But now that you have the encrypted message, if you go back to the key table and find the individual segments, you can reconstruct the original message “hello world”.

But this is just an example, the actual encryption scheme probably doesn’t use the alphabet, they probably encrypt their messages with something else and then use that, to determine which segments to use in the text. The original text is more than 200 individual lines and likely the lines with just the dots mean something too. But for me at this point the mystery is solved. If you really want to know what the messages say, you’re going to have to fiddle with it some more after this point. Just remember to use the latest key posted on Facebook at the same time as the message you are decrypting. 🙂

As for why the text is about key lime pies in particular? Well, it needed to be something mundane that would not arouse suspicion or identify the author, likely the programmer of the encryption tool just googled “key” and eventually arrived at an ad for key lime pie, which they copied. When they realized over time that they need more lines for a more complex key, they just padded it with nonsense from a porn site, about a woman named the same way as Captain Kutchie’s wife, for the lulz. Likely authored lines of nonsense to help combine these two himself.

By the way, would be glad to answer any questions you might have, just post them in the comments.

Also, enjoy your pie. 🙂

LP,
Jure

Heterogeneous systems advantages

Hi,

As you know (or maybe you don’t? Who knows how you got here) I’ve made an operating system kernel from scratch a while ago. Now the date of the page is right, the thing was put together in 2008, but then wasn’t touched again after 2013. The reason why this happened is because I realized that in order to achieve frame rates higher than 16-ish FPS (which is enabled by VideoBIOS), I would have to implement every graphics driver ever, for any card that ever existed, which did not seem like a worthwhile use of my time. 😛

I was explaining  this to a coworker not long ago and this made the subject fresh in my mind. I remembered the event driven architecture I had in plan for this kernel… The idea was to let the kernel allocate requested system resources directly (such as clock intervals, screen regions, I/O messages, and so forth) and allow for an unbroken chain of events from the hardware directly into the individual apps, to entirely avoid polling of any kind, which is commonly used in modern operating systems (and is inherently inefficient).

 

This morning I suppose my mind was still processing this idea and I had a dream about multiprocessing. It came with visuals of hardware architecture and all, I think I saw a FM2 socket, it seems my subconscious is fairly well-versed in technology. 😛

Jokes aside, I was always fascinated by heterogeneous multiprocessing. That is… processing using multiple CPUs, where the different CPUs are different and do different things. It’s an idea AMD tackled with quite a bit while trying to marry GPU architecture (which is mainly parallel) with CPU architecture (which is heavily serial), starting out with the Geode cores and eventually arriving at the modern Ryzen APUs. It was also an idea visited by the Cell line of processors, although I suspect that move was more based in cost saving than performance hunting.

The natural assumption is, that since the different cores are good at different things, in tandem they should be better at a random workload than a homogeneous CPU, which does one thing well and others worse. In reality this is not the case, as much of the performance of modern machines comes from carefully timing the delays between the individual pieces of hardware and the difference between a slow and a fast program is mainly in how well the program catches the rhythm of the underlying hardware. A heterogeneous high performance system is always going to be fairly uncommon and therefore no programs are optimized for it, yielding poor overall performance.

However, there are use cases, such as for example the management controller on a server. It is usually a not very powerful secondary CPU, that runs an entirely separate system that watches for failures in the main system and acts to correct. The main advantage is that it is independent and therefore still able to act in case the main system becomes inoperable.

The role of a main CPU on a motherboard is of course fairly obviously separate from the other micro-controllers on a typical PC motherboard, even in cases where the CPU is on a standardized interconnect:

The idea suggested in my dream was that you could run one of the CPU cores (or a second CPU on the motherboard) with an entirely separate program, from the main OS. In reality, this would work, if the CPU worked with it’s own independent memory and didn’t have any conflict or need to share other hardware with the other CPU. Which begs the question regarding the use case. 😛

As described before, while this would be pretty cool, it would obviously not offer a performance advantage as — modern computers are designed to have all CPUs share the same OS and have very similar roles, sometimes even shared cache, which my idea is in great conflict with. There is also no advantage to having a supercharged management controller as management tasks are just not that elaborate.

When a computer boots, it uses a single CPU, however soon after the second CPU (or third, etc) is given something to do by pointing it to a memory location and letting it do it’s thing. At that point in the process, the programmer has to be aware that the second CPU will be executing simultaneously and independently of his main CPU boot thread and therefore the problems involved in multi-threaded processing become relevant. You are basically booting multiple computers at the same time and sharing every running program between them.

The performance of the system at this point, depends on how well the operating system orchestrates all the components and their workloads, so that on the one hand it acts reliably, and on the other, the timings are not off and every part has something to do without waiting it’s turn. This is why most supercomputers are designed for a specific type of task, despite having petaflops of processing power to work with.

 

I think as far as my idea goes, trying to figure out how to make it work with hardware is ultimately pointless (at least as long as I don’t have a chip fab at my disposal). When I first realized I would have to code drivers, my solution was that I wouldn’t use PC hardware, but rather something like a tablet where all parts would all share the same hardware. I think my event-driven concept has value, but if I implemented it today, I would probably go for something like Linux, instead of coding the whole thing myself. Yes, it wouldn’t be as efficient, but it would work and it would be a finite effort.

I think the advantage of such a system is not so much in the raw performance, as it could be in usability. One thing that my hardware-shared and event-driven system offered as an advantage over modern operating systems, is that it enabled usability. One could stop thinking in terms of what a computer has attached, but rather be thinking of the resources of an entire local network collectively (as events could always be transmitted over a network). And I think the useful value here is in simple usability. Use a system with processing power for it’s processing resources and use a portable system with a broad user interface (a tablet) for it’s user interface resources. Migrate resources between compatible devices seamlessly.

When I first thought of tablets for my kernel in 2008, mobile devices were not yet in widespread use. In the 10 years, this has changed. Perhaps the above paragraph will also successfully predict what in technology will change over the next 10 years. Will the cloud become local? We will see.

LP,
Jure

First entry

Hello everyone,

So sometime in 2016 I’ve decided to migrate my tech blog to somewhere self-hosted but I never got around to it. Typically I would code my site myself, however this time around I don’t feel like spending all the time and tinkering needed to set it up. I just want this over with, haha.

So, welcome. This blog will contain techy stuff that would be worth posting about in English. I do also have a tech blog in Slovenian, which is educational in nature, however sometimes I just get ideas or somesuch that I want to put online somewhere so people have a chance to see them, should a random Google search lead them this way. I’m sure the AI knows if you really wanted to know my idea or not. 😛 Aye I realize nobody will ever see or read this stuff, but… it’s worth a shot.

I might also occasionally spam this blog with various other thoughts that I sometimes get inspired with.

LP,
Jure

OpenWRT on TP-Link TL-MR3420

Hi,

This post was made on my old blog back in 2015, however I figured I’d migrate it since it was the only post to ever be looked up by other people using a search engine.

Hello,

So, the promissed blog entry on setting up OpenWRT on a router with insufficient flash.

I had never used OpenWRT before and was curious to find how it would preform against Smoothwall, a linux distribution aimed at being an easy to use, cheap replacement for professional routers. I had set up both routers on a virtual machine host in a fast network and compared performance. OpenWRT by far outperformed the dated Smoothwall, given the same hardware, and I was quite impressed.

However, setting up OpenWRT on actual hardware was something new for me and I still had to get my bearings. Having done all that right now, I thought I’d give any Googler who finds his way onto my blog, the advantage of not having to learn all this themselves. Now, I realize that these how-tos invariably become cooking recipes, which people copy & paste into their computers, boil for a while and come up with the magic potion — OpenWRT in a router, with no idea whatsoever as to how that was accomplished. Let’s try to avoid that and go over some basics first.

To understand OpenWRT in a router, we must first understand that OpenWRT is a Linux-based open-source software package and that a Router is an embedded device:

  • Linux-based open-source packages are basically bundles of programs, which were already created for another purpose beforehand. The challenge for the creators of these packages, is to mix these existing programs in the right combination to suit another purpose.
  • Embedded devices, on the other hand are basically computers, which have been simplified to the greatest possible extent, in order to serve a particular purpose. The challenge for the creators of these embedded devices, is to produce the same functionality in the fewest cheap components possible.

These two factors make Linux in a Router an odd couple, but fortunately for us, the guys at OpenWRT already did all the heavy-lifting involved in squeezing their Linux programs into as small a package as possible.

The key advantage of OpenWRT over stock firmware that is supplied by the manufacturer of the embedded device, is that it is:

  1. Open source, meaning there are legal and legitimate ways to know what it actually does
  2. Made up off smaller parts for different purposes, which you can pick and choose
  3. Actually has very good performance as a router and contains many advanced features you might want to have in a router

For embedded devices, unless you are plugging your wires directly into the manufacturer’s board diagnostic port, the only way to configure them, is to have almost all of the routers functions operating, so that you can connect to it over the network and run commands. This means, that if anything goes wrong with any of the core programs, you’ve ended up with a router that you cannot connect to to try to repair the problem or upgrade again, meaning it’s bricked and you can go buy a new one. For this purpose, OpenWRT comes as a series of pre-built packages for most routers that have all the configuration to work properly. All you have to do to use them is to download the right one to your computer and then “upgrade” your router with this file. Nothing can go wrong because the package was designed specifically for your model and is known to work.

Having noted that OpenWRT does not work on just any router and also having noted that my suppliers do not import just any router either, my choice was the most generic Chinese product I could find on the market, which seemed to exist on OpenWRT’s pages — the TP-Link TL-MR3420.

As you can see on the picture (image courtesy of the OpenWRT wiki) this embedded device is indeed simplified, with barely anything other than the square CPU chip, set next to the rectangular Flash chip on the board. What I didn’t realize was that the Flash itself has been minimized to a meagre 4 MB of NOR flash. In this atomic age, where the smallest USB key on the market contains several orders of magnitude more flash, TP-Link has ingeniously chosen to limit themselves to 4 MB. Now this may not make much sense to me and any western reader of this blog, but you have to realize, all of these circuits are manufactured in China, where their mentality on the issue is slightly different, and in some twisted way, 4 MB of flash is exactly the thing you want to put on this particular circuit.

Unfortunately for us, OpenWRT struggles to fit into 4 MB of flash. You can get the basic router into this package, but not for instance the browser based configuration interface, which most people take for granted.

So how do we beat this? Some people on the OpenWRT forum suggest soldering off the 4 MB flash chip and attaching a bigger chip, unfortunately I’m not that kind of an electronics expert and… as I recall even the more experienced people note — this is a great way to produce a non-functioning piece of electronics.

Instead, there is an interesting opportunity. The TP-Link TL-MR3420 also has an USB port. With some skill, some luck and some effort, it should be possible to extend it’s storage capacity with an USB key. Now at the time I have never done that before either and for most software projects I have seen, this kind of procedure would be quite complicated. Fortunately for us, OpenWRT was designed for these sorts of modifications and we don’t even have to be programmers to adjust it in this manner.

One thing to understand about OpenWRT before we proceed, is that one of the steps taken, to ensure the package is as small as possible, is that the system in which the programs are stored within the package is compressed, like a ZIP file. The problem with storing programs and settings in this manner is that in order to use every last bit of space, the files are stacked very closely together and this means that if a program were to make an addition, say try to save a configuration item, it would have to move all the other programs to make room for this bit of information. To avoid this situation, OpenWRT uses a rather remarkable combination of filing systems. The first, called SquashFS contains all the programs like a big ZIP file and cannot be changed or have any additions made to it, the second is JFFS2 which does allow additions but doesn’t compress as efficiently as the first one. They are combined in a way that all the pre-installed programs are in SquashFS and all the changes are contained in JFFS2. Because both systems are not equally efficient however, this means that the pre-installed programs take up less space, than the ones you install afterwards.

There is a package on the OpenWRT website, called an Image Builder. It is a tool, that can be used on a 64bit Linux computer, to generate the same packages that are otherwise available for download, but with whatever changes you desire. This generator, can add any OpenWRT program you choose, into the pre-installed package, making the end result fit into a small package better than it would if you had installed the same programs after flashing your router.

There are multiple versions of OpenWRT and if you are unfamiliar with open source programs, “stable” is what you are looking for. Open source programs tend to have versions which are tried and work called “stable” and a version called “bleeding edge” which contains the latest features the devlopers are working on. Although you may think that having the latest features is a good thing (and it may yet be if you want to help the developers with their development process), it is called “bleeding edge” for a reason — if you use it, get ready to bleed — or in other words, suffer from all the broken features which do not yet work properly. The stable version at the time I was writing this blog is “14.07 Barrier Breaker”.

My router, the TP-Link TL-MR3420, has an ar71xx CPU, so the image builder for it can be found at:

http://downloads.openwrt.org/barrier_breaker/14.07/ar71xx/generic/OpenWrt-ImageBuilder-ar71xx_generic-for-linux-x86_64.tar.bz2

You will also want to take a look at this resource to get you up to speed as to what you need to do with your computer in order to be able to run the image builder. I will assume you know how to come up with a 64bit Linux installation on your computer, that is a setup I would warmly recommend if you ever intend to do any sort of software hacking, it is practically a pre-requisite.

I’ve of course tried to run this image builder to create a package with the browser based configuration interface (called “luci”), but the result was too big. I did however, manage to create a package with everything needed to utilize an USB key for additional storage. The line I needed to use was:

make image PROFILE=TLMR3420 PACKAGES=”kmod-scsi-core kmod-usb-core kmod-usb2 kmod-usb-storage block-mount kmod-fs-ext4 fdisk”

…the package kmod-scsi-core is needed because of the way an USB key appears to Linux in this router, the kmon-usb-core, kmod-usb2 and kmod-usb-storage deal with support for an USB key as such, block-mount is a package needed to attach additional filing systems to your router, kmod-fs-ext4 is support for the filing system we are going to use on our USB key and fdisk is a tool we may have to use in order to find our USB key once it is connected.

This procedure created a sufficiently small binary that I was able to flash onto the router trough the “Firmware upgrade” interface. Go grab a cup of coffee and stay away from the computer until the upgrade process is complete, as panicking and interrupting the process will brick the router. Not forgetting to replug my ethernet cable so that my computer got the router’s new IP, and not forgetting that our new router software doesn’t have a web interface yet, I connected to the router using telnet:

telnet 192.168.1.1

At this point, it is useful to input some basic settings into our router. It doesn’t have a web interface yet so we won’t bother doing a lot, but should something happen to the USB key attached to the router, it will fall back to the state that we configure now, so it would be worthwhile to configure a password (using the command “passwd“) and a wired subnet (using the commands “uci set network.lan.ipaddr=a.b.c.1“, replace a.b.c with a subnet of your choosing, and “uci commit“). As soon as you set a password, the router will reconfigure itself to disable telnet access, and enable SSH, which is a lot more secure but doesn’t work without a password, this procedure may take a few seconds so do not panic if you cannot reconnect to your router straightaway. To connect using SSH use “ssh root@a.b.c.1“. To continue it may be wise to reboot your router to make sure it will take into account all the settings we have entered thus far. Use the “reboot” command.

The next step is to take a USB key you are willing to sacrifice for this procedure, connect it to your desktop and format it as EXT4. Needless to say this can only be done on a Linux machine. When you are finnished, connect the newly formatted USB key to the router. What we are going to do next is copy the existing files from the router to the USB key, make the router use the USB key and reboot it so that it starts up from that USB key, where there is additional free space.

However, there is something important to note at this point. Flash. I’ve mentioned it before. What is Flash? Why is it called Flashing, what does that mean? Flash is very simply a technology to permanently store data in a chip. The now popular SSD drives run on flash, but you may already be more familiar with flash memory as such in various phone and camera SD cards, USB keys, etc. Flashing, is simply the act of recording data onto these flash chips. This is actually some kind of convoluted procedure that an SSD or USB key makes completely seamless and automatic, but it actually involves erasing the chip and then writing new information into it. There are multiple technologies in use which are all called Flash, most significantly NOR flash — more expensive, but faster, typically used to store programs in embedded devices, and NAND flash — slower, but cheaper and usually much bigger, normally used to store data, typically used in USB keys.

The 4 MB chip in our router is NOR and the USB key I used is NAND. I should therefore want to use the 4 MB chip as much as I can and only resort to the USB key for when I run out of space. I decided to re-use the same system that OpenWRT uses to keep it’s changes and it’s core programs seperate — and keep all the pre-installed programs on the fast NOR flash, and all my changes on the slower, but bigger USB key. So instead of copying all the router files to the USB key, I only copy the changes kept in the “overlay” folder. Keep in mind that if the USB key is not the first disk on the router, it will not be “sda1“, if necessary use the “fdisk” command to find the right device.

mkdir /mnt/sda1

mount /dev/sda1 /mnt/sda1

This connects our USB key to the folder called /mnt/sda1, that we have created for this purpose.

tar -C /overlay -cvf – . | tar -C /mnt/sda1 -xf –

This is a convoluted way to copy files, that makes sure that they written as they were read and not altered in any way in the process, using a command normally used to make backups.

Add to the “/etc/config/fstab” file:

config mount

option target        /overlay

option device        /dev/sda1

option fstype        ext4

option options       rw,sync,noatime

option enabled       1

option enabled_fsck  0

This makes our USB key (at /dev/sda1) be used to provide the /overlay folder, with a few additional options that will make the flash in the USB key wear out less in the long run.

Finally, reboot the router with the “reboot” command and observe as the router reloads, fetching information from the USB key as necessary. Once you have completed this step, you can connect to the router and install additional software. If you use the “df -h” command, you will be able to see the amount of free space you have available. If the procedure has worked, there should be a few GB of free space (depending on the size of your USB key, mine is 8 GB), if not there will be around 640k of free space.

Assuming you have successfully secured enough free space from your USB key in the previous step, you may now proceed to install the browser based interface and finish configuring your router trough a browser:

opkg update

This command retrieves a fresh list of downloadable programs for your router

opkg install luci

This command will automatically download and install the browser based interface “luci”

/etc/init.d/uhttpd start

This command will start up LUCI, if this works you can continue to the next step

/etc/init.d/uhttpd enable

This command enables LUCI startup at each boot. Don’t do this if it doesn’t start manually in the previous step, because that may result in a bricked router.

After this step, configure the router as you like. When the USB light stops flashing, you can reboot the router by unplugging the power as you desire.

Try to avoid installing things you do not need, as there is no “uninstall” feature in OpenWRT, other than to erase everything and start over. However as you have several GB of free space and most programs only take a few KB (two orders of magnitude less), installing things you do need should not be a problem.

LP,
Jure