Monday, November 16, 2009

Getting rid of the "Security Tool" Virus from Windows XP or how Ubuntu saved the day

I recently had my first Windows Virus experience: A friend caught on to the "Security Tool" virus from the Internet while downloading free mp3 songs. This post is about how to remove the virus files, but basically, its a case-in-point about removing any stubborn or harmful Windows files or copying them out.

This software is a really nasty piece of code, it resembles a genuine anti-virus/anti-spyware tool, tempting users to click on it. It makes the system really sluggish, even worse, it changes the registry files. I also found that it had corrupted the Master Boot Record (though this may have been because of multiple hard resets).

The virus files basically sit in a folder with the following path:

C:\Documents and Settings\All Users\Application Data\[random numbers]\

So this is the folder that needs to be removed.

There are available programs that will remove Security Tool, but this presumes that the system is responsive enough to allow you to download the anti-malware, and that it boots up fine in the safe mode. These programs are a easier way to go forward if these two conditions are met.

However, the computer I worked on, was beyond that point, getting the blue screen of death on every boot. It was more important to get all the useful data out to some external storage, since the data was not backed up.

So, we had, Ubuntu to the rescue. The idea is to create a bootable CD of Ubuntu, which is one of the best Linux distributions, and then boot from the CD. (This requires your BIOS setting to have CD as a preferred boot device to the Hard disk, but that's normally the case in most computers - In case its not, this can be changed. Here's how)

Once Ubuntu has booted from CD, next step is to mount the Windows Hard disk drive. This can be done using the terminal(Here's how) for those more comfortable with it, or with the Disk Manager Utility in System Administration(Here's Ubuntu Manual).

Once mounted, copying data out to an external disk, or deleting files will look familiar to most of the us used to pretty GUIs.

So why not Windows recovery console? I tried, it did not seem to work well for me.

Does this work when Windows has an administrative password? I don't know for sure, but it should. I had a dual boot system that had admin password for Windows, but that never stopped the Linux distro from reading/writing files there.

Lessons to learn? Stay safe on the Internet, and backup data frequently.

Thoughts? Makes me uneasy about how easy it may be to steal files from random unattended computers,I wonder what can be done about it (Have the hard disk password protected?).. Also, May be Windows Vista or Windows 7 is safer, with stricter control on what gets installed..

Acknowledgments? a gifted collaborator, my liege lord, and my close friend, who gave me my learning platform.

Wednesday, November 11, 2009

On Confirming Sine Waves

I was recently asked what was the best way of confirming that a given measured waveform was a sine wave. It was a simple question, but that got me thinking...Its easy enough to eyeball a waveform and give a quick judgment, but when one really has to be sure, how to go about this?

If the frequency of the expected sine waveform is known, One method would be, to take the cross-correlation between the measured signal and the ideally expected one.

But the most foolproof way, is to take the fourier transform of the waveform. This gives a lot more information than the correlation, it actually tells you what all frequencies make up your wave. Matlab has an implementation of the fast fourier transform (called fft for short).

The fourier transform of a sine wave of a frequency f, is two dirac delta functions of half the amplitude of the original wave. When plotted on a frequency vs. amplitude plot, they show up as two lines at f, -f with half the amplitude of the original wave.

The Fourier Transform method, however, also has to caveats. The fourier transform function expects an infinite waveform as an input. The measured signal would be anything but that (because of equipment limitations/practicality). Hence, a windowing function is needed to attenuate the edges. The other gotcha is, that sometimes, Harmonics of a single frequency will show up on the fourier transform, i.e. spectral lines at 1f, 2f etc if f was your expected frequency. In such cases, the amplitude of the main f frequency spectral line to the other harmonics' amplitude serves as a good indicator of how good the signal is.

This post diverts from my other more systems-like posts, but I just realised that this is hands-on stuff, that probably doesn't get published anywhere, and hence noteworthy.

p.s. There's also a C implementation of the Fast fourier transform from MIT. Cheekily, its called the fastest fourier transform in the west(FFTW) :)

Saturday, October 24, 2009

Tutorial on Communication Principles

Being a software person in Computer Networks, sometimes the physical layer details of communication (for example, OFDM) look really hairy and not to be messed around with.

I found a good set of tutorials, that explains, intuitively - a lot of physical layer concepts of what all needs to go on when we send messages between computers, here:

http://www.complextoreal.com/tutorial.htm


Take a look, the analogies they have with real life make it easy for concepts to penetrate in and stay there.

Tuesday, October 6, 2009

On sending automated emails

For some time now, I have come across a trivial but annoying problem: I have to send out some periodic emails to admins, to inform them that I'd be utilizing certain resources. But writing the same email every time is painful to say the least.

So here's a little trick to automate the sending task on Linux, provided that the sendmail program has been installed (it mostly is). One way to test that is to send a fake email to yourself, by checking if you received it. On a terminal, type:

echo "did I get this?"| mail -s "test" yourName@yourDomain.com

If you do get the email message with subject "test" and content "did I get this", then the first step towards automated emails is to type out the email content and save it in to a file (which we call emailContent here). The rest is easy: the command to send it is the following:

mail -s "SubjectOfTheEmail" recipient1@someDomain.com recipient2@someDomain.com < emailContent

The line above can be added to shell script, which can then be executed whenever necessary. What is even cooler, is that now email can be sent from a program written in a scripting language (like Perl) by simply calling the script.

For the extremely lazy sorts, even the execution of the shell script can be automated, using the cron job scheduler.

More information on mail, cron can be obtained from the man pages (type 'man mail' or 'man cron' on a terminal) or google for them.

Wednesday, November 26, 2008

1 TB quirks

I have started using one terabyte disks for storing some of my data. And, only lately have I started realizing the size it represents.

One of my machines started behaving funny (would not let you create files) even though the disk was only 63% full. Further investigation by wiser heads than mine revealed that the ext3 filesystem I was using ran out of inodes, just when the disk was about half full...

There are few things that can be done now, given that there is substantial data in the disk, and cannot be formatted anytime soon.. note to me: 1 Terabyte disks run out of inodes way before they run out of disk space on ext3. Next time, they should be split into two partitions of 500GB each to manage the filesystem info..

Thursday, October 23, 2008

Chill is back :)

Temperatures dip below 10 degrees for the first time this fall.
I like it :)

Thursday, September 4, 2008

Distributed 101: How to write large distributed programs

Lately, I have been involved in writing some huge distributed programs, that run on hundreds of machines. Its a known fact that writing such programs is non-trivial, but just how non-trivial is something that is not understood till one actually attempts to do so. Here are some of the lessons that I learned (the hard way), which make life a bit easier, lest I forget again (Not likely to ;-), its been a lot of pain).

1. Log everything: The first thing that everyone knows about computers is that they act weird sometimes, and the more computers you have, more likely it is that things go wrong. Logging what goes on is the one and only way that you can actually get to know what went wrong where, specially when dealing with a huge number of machines, since it is impossible to be monitoring all machines at once by hand.

2. Making code restartable and idempotent:
Closely tied to the point above, it makes life a lot easier if the code is restartable and idempotent; This becomes important when machines die off in the middle of a lengthy process; Having the ability to restart processes again makes it relatively painless to manage machines. Log serves as a useful way of knowing exactly where the process was when it died off; reading log serves as a useful feedback mechanism.

3. Keeping synchronization to a minimum:
When multiple unreliable machines have to co-ordinate with each other, life becomes hard. That's when you understand that animal-trainers have a tough job ;-) . If it is possible, it is much better to divide a complex job such that each computer works on an independent chunk of the whole, rather than working on parts that need to be passed around.

4. Taking Advantage of Memory:
Disks are slow - really, really slow, when compared to memory. While you'd think that you can take it or leave it for small programs, for large programs, the amount of disk you touch has a significant impact on performance. It is always a good idea to take advantage of the large amounts of memory available to most computers today.

So, touching one disk file instead of many small ones is better (because the buffer cache of the operating system kicks in, to read ahead from disk to memory in anticipation)
, but best would be to gobble the entire file in memory explicitly before processing.

The tricky part is writing:if you process too much without committing to the disk, you run the risk of losing all your computation if the computer fails. On the other hand, very frequent writing is bound to cost in terms of performance. This is a very delicate balance to achieve, but with the CPU speeds we have, my own rule of thumb is to favour repeating computation rather than frequent writes.

5. Reducing communication costs: Communication setup costs sometime dominate the cost of actual connection. For example, setting up an ssh or an scp session is quite expensive(~1 sec to establish connection?). Considering this, it is better to optimize the amount of data transferred per communication setup.

So secure copying a tarred directory is better than transferring the contents of the directory one by one..

6. Algorithms Matter: The lesson is, that the easiest way to code a problem may not be most efficient, and efficiency counts for long running jobs.

As a very vivid example, I came across some code that a friend had written that inserted entries into a database table. For those not in the know, a database table has a primary key (a number) to identify each row (just like the row numbers that MS Excel has).

However, the code to insert a row (or tuple as a purist would put it) was written such that, it read through all previous rows to find the last row number, and insert the new row with the largest row number+1. So each time one needs to insert the nth row, all n-1 rows have to be read. One can quickly see how this would ruin the performance of inserts. With this algorithm, it would take the code 2.5 hours to insert 10,000 entries into the table.

After changing the algorithm, so that the last primary key was stored in another table, the algorithm improved substantially - just how substantially? It inserted 10,000 entries in under 30 seconds!

So its much better to go for something slightly more painful, if its algorithmically better. Specially considering that you'd probably deploy the code in multiple machines, and its hard to change code again and again.