Archive for the ‘guide’ Category

Creating Indexes/Sorting on very large tables in Mysql

If your dataset is anywhere between 1 million to  180 million then you have come to the right blog.  Normally a procedure like this will require atleast 1 day or more depending on dataset. But I will teach you how to get results in as quick as 3 hrs. You will be able to execute queries on indexes fully utilizing them without going through the painful process of creating them on the the entire table. Recently at work I had to deal with a dataset of 180 million  and this entry gives details of the work flow that I adopted.  I will not effectively sort the complete table but will tell you how to achieve the same effect in less than 2-3 hrs for 180 million rows. I tried well known methods to achieve indexing. I tried to use Alter table statement after 2 hrs stopped it. I tried to dump the data in hope of inserting it in an indexed table but after 1 hr stopped that as well. Then I explored Mysql functionality a bit to discover 3 features which can really reduce time, those were Memory tables, Merge tables and Partitioning(not necessary). Fortunately sorting using merge sort is a recursive job, so in the process of sorting 180 million rows I also had to learn how to sort 2 million faster than normal index creation. So the first thing that you have to do is  find out the order of rows that you have.

If you have less than 1 million rows then you are better off using mysql’s inbuilt commands

Alter table <tblname> create index (<column name>);

If you have more than than then read on…..

The concepts I will be using are as follows:

Memory Tables:

These tables are hash maps that mysql stores in memory. These tables are temporary (in the sense they disappear after mysql server restart)  and have limited rows. However they are lighting fast on index creation or sorting.

Partitioning:

Somebody in Mysql finally realized that to be fast database has to split the data across servers. Finally in version 5 they have introduced partitioning. If you have the luxery of more than 1 server you can use this. I did not have this luxury so will do indexing without it.

Merge tables:

So these are 1 server counterpart of Partitioning. If you want to split your dataset into multiple tables and run single query on all of them you can use Merge tables. They have their own share of problems, but for general functionality of insertion, deletion, searching and dumping, they work great. The only 2 major limitations one that they only work if base tables are MyISAM and searching on them is multiples of logn(see below if interested)

Irrelevant Mathematical jargon(feel free to skip)

So if you search  table on indexed columns then search takes Log(n) time (n = number of tuples). So imagine you have 100*n size of table and you create 100 sub tables of n each. Now searching on one table will take Log(n) time and total time is 100*log(n)=log(n to the power 100). If it were a single table it would have taken Log(100*n) =Log(100)+log(n). So in a single table it scale logarithmically but merge table scales exponentially.

I will explain the steps that I took below, you can customize them to your dataset.

There are several ways of creating index however mine will require you to have quite a powerful machine and a reliable power supply. Atleast 2gb of ram is required and more is better here. If you have around 2 million rows then skip to step 4.

Step 0

This step is about reconnaissance. It will determine the numerical data in the next steps. You need to know the approximate amount of space your row takes. This will determine the size of your memory table. You can determine the size of memory table by trial and error also but a numerical estimate will help.  In the steps below I will assume initial table as client_data and final table as client_data_sorted. All other tables are temp and generated on the spot.

Step 1: Increasing allowed size of Memory table to fit data

The first thing you need to do is extend the allowed memory limit of mysql. To sort faster we will need memory tables. Do so by adding the lines marked as red below under  [mysqld] in my.cnf file. Typically located in /etc/my.cnf or /etc/mysql/my.cnf depending on your distribution.

my.cnf file

[mysqld]

datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
..
max_heap_table_size=4096M
tmp_table_size=4096M

Adding these lines will ensure that mysql can now take 4GB space for storing memory tables.Feel free to increase this number if you have higher memory but ensure that you dont encounter swapping.

Step 2: Dump data in memory table for indexing

Now you need to create a  memory table with same structure as your final table.

create table client_data_memory like client_data_sorted;
alter table client_data_memory engine=Memory;

The second line alters the engine to memory. The only thing to keep in mind is that memory engine keeps all the data in memory hence if your mysql server restarts or machine restarts then all the data is lost. Ensure that you never rely on them as a reliable storage.

Figure out  how many rows your memory table can contain. In a typical scenario this would be order of 10 million. This value will change depending on your system’s memory and values set in step 1.  You can test it out by inserting data from source table to memory table. After the limit the process will interrupt in the middle giving error like table is full.

Insert into client_data_memory select * from client_data limit <test limit value>;

Ex:  Insert into client_data_memory select * from client_data limit 0,10000000;    # 10 million rows from 0th row

Step 4: Storing data from Memory table to MyISAM table

If you have order of 10 million rows then the process stops for you here. Just use an alter table command to create index on memory table(3-4 minutes) then insert the memory table data into client_data_sorted table. You just saved yourself  hours of pain. If you have more than 10 million then  skip this step.

Alter table client_data_memory add index (<column name>);

(or sort the table)

Insert into client_data_sorted select * from client_data_memory;

If you can store 10million rows in your memory table and total tables are around 40 million then you are better off iteratively repeating the above steps. Merely sort 10 million then insert, then truncate memory table , then insert next 10 million , sort then insert. The process will take exponentially more time every time but still will finish far faster than normal.  If you have more than 40 million then read on. A world of pain awaits you….

Step 5: Create Merge table for accessing data

The above process will not work iteratively for you if you have more than 50 million rows as while inserting you have to merge too. Suppose your memory can store 10 million and you have 180 million rows then create 18 temporary tables of engine type MyISAM.

create table data_temp1 like client_data_sorted;
alter table data_temp1 engine=MyISAM;

Now use the above technique to insert data 10 million a piece into each of these tables. So data_temp0 will contain 0-10 million then data_temp1 will have 10 miliion to 20 million and so forth.

truncate table client_data_memory;insert into client_data_memory select * from client_data limit
0,15000000;insert into data_temp0 select * from client_data_memory;
truncate table client_data_memory;insert into client_data_memory select * from client_data limit
10000000,15000000;insert into data_temp1 select * from client_data_memory;

…….

This will take quite a long time, for me it took 10 min for every 10 million so 180 min in total. Meanwhile consider some of the more worthwhile alternatives like  hive, its too late for it now but its worth a look. This is also useful.

Now create a merge table. A merge table is created on top of several MyISAM tables of same structure. Its important to have same structure and MyISAM tables. You can use this as  a single table however all the searching is performed on all tables and hence is slower than single table.

create table client_data_merge like client_data_sorted;
alter table client_data_merge engine=merge union=

(data_temp0,data_temp1,data_temp2,data_temp3,data_temp4,data_temp5,data_temp6,data_temp7,data_temp8,data_temp

9,data_temp10,data_temp11,data_temp12) INSERT_METHOD=LAST;

Step 5: Final Sorted table(if required)

For most temporary uses this merge table will work. You can use this table normally. Insert data into the table etc etc. However if the limitations of merge tables are not acceptable you have to create the final table  from this merge table. You can use merge tables to dump the data in a sorted fashion according to indexes or simply use

Insert into <tblname> Select * from <merge table> order by <index1>,<index2>…..

This way you get to Insert much faster as the data is already sorted and mysql simply has to insert the data and update the indexes. If this method helped.. do let me know in the comments.

Advertisements

Download videos from any sharing site(Youtube, Dailymotion, Metacafe, vimeo etc)

There are a lot of ways to download content from a video sharing sites. Most folks rely on applications to download videos. Applications are really unsuitable for 3 reasons

  • Mostly they are specific to one or two sites
  • In some cases they are shareware
  • They have hardcoded techniques for url retrieval which outdate easily.

A slightly better technique for downloading is relying on sites to decode the url for you. This removes the latter 2 limitations of applications. For most of you who have come here Ripzor will suffice .  However the top limitation still remains if you are looking for a site not listed in ripzor.com. Also some times the url retrieval technique changes and sites do not update their scripts. I will give you the method to retrieve URLs in an easy manner using every day tools.

Tools required

  1. Firefox
  2. Firebug(or any other debugging tool)
  3. [Optional] Downloader ( like flashget or Downthemall)

Dont get scared from the word debugger. This technique is mostly a misuse of the word debugging 🙂 .  Now after you have installed the above “tools” . Lets start

Step 1: Select a video downloading site like youtube.com/dailymotion.com and pick a video. For further screenshots I have picked up this youtube video and this dailymotion video. The same techniques will work on any video site.

Step2: Now open video in Firefox and enable firebug debugging tool from Tools ->Firebug-> Open Firebug in a new window

Open Firebug in Firefox

Open Firebug in Firefox

Step3: Click on the youtube tab and click on Activate Firebug for the firefox tab button in the firebug window

Activate Firebug for Firefox tab

Activate Firebug for Firefox tab

Step 4: Select the Net tab among Console/HTML/ CSS/Script/DOM and Net on the top . Then select the Flash tab. This tab will give you all the web requests that a .swf file will make. Most sites use flash players for playing the video so this tab will work. If the following steps dont work then select the All tab for seeing all requests

Tabs to be used are net and flash

Tabs to be used are net and flash

Step5. Play the video you will get a screen like this in firebug. Now look for requests that look like video requests. For ex. those ending with .flv, .mp4 or in youtube’s case get_video. Right click on that link and select Copy Location. If there is no such link then you need to go back to step3 and select All tab instead of flash.

Copy link location

Copy link location

Step6. Use your downloader to download that video or paste that link in firefox itself to use it for downloading the video. In some cases you will have to close the video site tab to download this video as only one streaming might be allowed.

So now you can download any video from any video sharing site as long as the video is not encrypted with the player. Let me know what all sites you tried below.

Enable Multitouch(two-fingers) on Synaptics trackpad on Linux

This could be your Trackpad

This could be your Trackpad

Very few people know that every synaptics trackpad supports multitouch and as long as your OS supports it  you can always use multitouch to enhance your computers interface. Apple has patented several UI design elements ,hence has inhibited Windows and Linux to come out with similar UI elements like pinch and zoom, scroll etc.   

microsoft-windows-vista-logoIf you are using Windows xp, vista or Windows 7 then click here

signature_color_high_res_0 Install Synaptics driver  in ubuntu

apt-get install  xserver-xorg-input-synaptics 

Now  edit your xorg.conf file

sudo vi /etc/X11/xorg.conf

find the input section

Section “InputDevice” Identifier “Synaptics Touchpad”

Add these lines in it

Option “HorizEdgeScroll” “on”

Option “VertEdgeScroll” “on”
Option “VertTwoFingerScroll” “on
Option “HorizTwoFingerScroll” “on”
Option “CornerCoasting” “on”
Option “PalmDetect” “on”
Option “CircularScrolling” “on”
Option “CircScrollTrigger” “3”

 

Enable Multitouch(two-fingers) on Synaptics trackpad for Windows xp/Vista/7

This could be your Trackpad

This could be your Trackpad

Very few people know that every synaptics trackpad supports multitouch and as long as your OS supports it you can always use multitouch to enhance your computers interface. Apple has patented several UI design elements ,hence has inhibited Windows and Linux to come out with similar UI elements like pinch and zoom, scroll etc.

 
linux-online-incIf you are using Linux and want to setup multitouch then click here 

signature_color_high_res_0

You only need a patched driver to enable multitouch features in synaptics. Download the latest driver from here

Download links (for XP/Vista 32/64-bit):
http://www.sendspace.com/file/9n9c6i
http://mihd.net/42vtopk

This driver is made from an outdated driver of synaptics. So you may have to uninstall your existing driver

  1. Go to Add-Remove Software. 
  2. Find and uninstall synaptic driver
  3. Reboot

Extract the zip file(touchpad_synaptics_v10.2.4_vistax32x64_xpx32x64_modded_build2.zip) to a folder

Install selected driver.

This driver is not signed. Hence windows will issue a warning. Do not install this driver on your production machine. This is a development build and using it may crash your system

Now you can  enable these 3 features on your Windows Desktop using Synaptic Options. Go to bottom right in your Windows taskbar to select synaptics tray icon or search synaptic in control panel. Then select Synaptic touchpad in Device Settings. 

  1. Multitouch scrolling : You will be able to scroll using 2/3 fingers. The scrolling is not as smooth as mac (shucks!!) but works equivalent to page up and page down. Here is how to enable it in Synaptics Options. You can configure what each direction will cause.
    Two finger Gestures -> Top/Bottom/Left/Right
  2. Chiral Motion : This technique is very useful for precise scrolling. Whenever you wish to scroll just slide two fingers downwards(or upwards) and then you can move your fingers in clockwise or anticlockwise order to continue scrolling.This is a much more precise way of scrolling but you need to get used to it. Here is how to enable it in Synaptics Options
    Virtual Scrolling -> Enable Chiral motion scrolling
  3. Momentum : This grants your mouse momentum so if you drag your mouse and leave it. It will travel some more distance depending on the speed of your drag. This is quite cool for some time, but very difficult to handle and impractical. Still somthing cooool to show off :). . Here is how to enable it in Synaptics Options
       Pointer Motion -> Momentum -> Enable Momentum
  4.  

 

UPDATE: The driver has not functioned for some laptops. It is suited only for some laptops. If you have installed this driver, do let us know in the comments if this worked for you. Thanks!!!

Hyper-V: problems and solutions

Windows Server 2008

Hyper-V is the new hypervisor based virtualization solution from Microsoft. The product is targeted mainly for the data centers for server consolidation. I have been using it since the day it became RTM and am really impressed by its performance and stability. There are some minor issues which is acceptable as the product is still young. I have a Intel Core 2 6420, which has the Intel-VT extension required for Hyper-V and 4 GB of ram sufficient enough for running a single guest. Hyper-V is supported only on windows 2008 server 64 bit edition.

Installation Steps

These are the installation steps for installing Hyper-V

  1. Check if your processor supports either Intel-VT or AMD Pacifica extensions.
  2. Install Windows 2008 server 64 bit edition.
  3. Install all the updates.
  4. Add the role Hyper-V using server manager.
  5. Install updates for the Hyper-V role.

Now you are all set to install you guest operating system. Run the Hyper-V manager application to manage your guest operating systems. Create a new virtual machine and setup different properties like memory, number of processors etc.. If you are planning to install your guest operating system from network make sure that you add a legacy network adapter as the guest operating system wont be having drivers at install time for the normal network adapter. Once you are done installing the guest operating system install the integration services from the settings menu of that virtual machine. Installation of integration services is highly required for better performance and also getting remote desktop to work.

Issues

1) No audio on the virtual machine. How to listen to your favorite mp3’s ?

The first thing that you might notice is the lack of Audio device. Hyper-V is for running multiple servers on a single physical machine. Not many people play mp3’s on a production server, I suppose. So how do we get audio to work on your virtual machine ???

The solution is simple. Don’t use the hyper-v manager to connect to the guest operating system. Instead enable remote desktop on the guest machine and use mstsc to connect to the guest. If you are using windows 2003 as the guest you still find that the audio doesn’t work. There are couple of setting that you need to make to get audio to work. First thing is to change the windows audio service startup to automatic from disabled and start the service. This can be done by running services.msc from the run dialog. Next, you have to fire tscc.msc from the run dialog to change a couple of terminal services settings. In the connection tab, double click on the Rdp-tcp row to get the properties tab. Unselect the “Audio Mapping” option there and save the settings. Remember you have to logout from that session and start a new session for the settings to take effect. Of course simpler thing would be to reboot the guest. Connect again to the guest and enjoy your favorite mp3’s …..

2) Unreadable/fuzzy  text on LCD monitors if you are using remote desktop

So you think you solved your hyper-v problems by connecting using mstsc instead of the hyper-v manager,  then maybe you are using a CRT monitor or you have never seen cleartype in action. Yes, font smoothing doesn’t work with RDP if you have either windows XP or windows 2003 as the guest OS. And if you have a LCD monitor like me, you know how much pain in the *** it is to work on a LCD without font smoothing. So now you have to fix this cleartype issue.

Microsoft has finally issued a Hot fix for this feature at http://support.microsoft.com/kb/946633. Download the fix and install it. You have to reboot the machine for the fix to work. Also make sure the registry key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Windows\TSEnableFontSmoothing is set to 1.

3) Strange cursor problem and fix

Also, when I used to connect to guest machine using the hyper-v manager instead of remote desktop client, I faced a weird problem with my mouse cursor. After using the guest machine for sometime may be a couple of hours the mouse pointer becomes fixed meaning that it doesn’t change depending on the thing beneath it. It sometimes becomes a wait cursor, sometimes a small dot and like that. This is quite frustrating while you try to re-size windows or frames as you cannot make whether you are at the proper point to re-size the window. Luckily after using remote desktop to connect to the host this problem also is solved.

Hope these solutions fix some of your hyper-v problems.

How to change/install Media Player Classic codec filter

If you are dissatisfied with the codec used by your Media Player classic to decode your videos you can easily switch them to other codecs.

Step 1:

Decide the codec you wanna use. There are considerable number of guides available on this on the net. I have found doom9 forum quite helpful and informative.

MPEG 4 part 2/ AVI codec review: Divx vs Xvid here

H.264/ MP4/ HD codec review: CoreAVC vs FFmpeg here

Step 2:

Download the codec. Though there are several sites that provide opensource codecs for free, there are few that provide closed source too.  I have found Final build site to be quite good and provides almost all codecs. If you dont have any codecs at all I do suggest you to install Klite codec Pack and have some basic codecs to start with.

Step 3:

If you have started using Media Player classic without any change then you are most probably using FFmpeg codecs. These are very good codecs and use libavcodec for most of the decoding. It decodes most formats however not all of them perfectly. Now that you have your new filter installed.

  1. Open the options window in media player classic(by pressing ‘o’ or through right click menu )
  2. Go to External filters option. Now in the right side of the window select Add Filter
  3. You will get an elaborate window with a plethora of options. Select your codec
    1. If your codec is not available there you will have to put a little more stress. Open the folder that your codec installed in.
    2. find a file with .ax extension in that folder and put it in filters folder inside Klite Codec Pack folder
    3. Now you should click browse in the bottom left of codec window and select the filter inside filters folder. You can browse to select ax file from the folder it was installed but needlessly adds a dependency.
  4. After selecting that filter go the right of the window and click Prefer among the 3 options(prefer, block and merit)

Eureka your filter is changed, you have a better video already!!!